Add EMNISTDataModule #676

sugatoray · 2021-06-26T19:00:49Z

What does this PR do?

A summary of changes and modifications ⭐ 🔥 [CLICK TO EXPAND]

File Added:
- pl_bolts/datasets/emnist_dataset.py 🟢
- Contents:
  - EMNIST_METADATA
  - EMNIST dataset
  - BinaryEMNIST dataset Need New PR or add to #672 ⚠️
File Added:
- pl_bolts/datamodules/emnist_dataset.py 🟢
- Contents:
  - EMNISTDataModule
  - BinaryEMNISTDataModule Need New PR or add to #672 ⚠️
Files Modified
- Package: pl_bolts
  - pl_bolts/datasets/__init__.py 🟢
  - pl_bolts/datamodules/__init__.py 🟢
- Tests:
  - For datamodules:
    - tests/datamodules/test_imports.py 🟢
    - tests/datamodules/test_datamodules.py WIP 🟠

Adding BinaryEMNIST and BinaryEMNISTDataModule was logical, looking at how MNIST and BinaryMNIST (dataset and datamodules) were implemented.

About the dataset

source: https://arxiv.org/pdf/1702.05373.pdf [Table-I]

source: https://arxiv.org/pdf/1702.05373.pdf [Table-II]

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements) Make EMNISTDataModule #672
Did you read the contributor guideline, Pull Request section? Y 🟢
Did you make sure your PR does only one thing, instead of bundling different changes together? Y 🟢
Did you make sure to update the documentation with your changes? Y 🟢
Did you write any new necessary tests? [not needed for typos/docs] Y 🟢
Did you verify new and existing tests pass locally with your changes? Y 🟢
If you made a notable change (that affects users), did you update the CHANGELOG? Y 🟢

PR review

Is this pull request ready for review? (if not, please submit in draft mode) READY 🟢

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Contains `class` BinaryEMNIST as well.

This adds **EMNISTDataModule** `class`.

file added: datamodules/binary_emnist_datamodule.py

pep8speaks · 2021-06-26T19:00:52Z

Hello @sugatoray! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-08-13 13:04:38 UTC

for more information, see https://pre-commit.ci

codecov · 2021-06-26T19:02:48Z

Codecov Report

Merging #676 (ef13456) into master (bd28835) will increase coverage by 47.35%.
The diff coverage is 80.00%.

❗ Current head ef13456 differs from pull request most recent head 5833e60. Consider uploading reports for the commit 5833e60 to get more accurate results

@@             Coverage Diff             @@
##           master     #676       +/-   ##
===========================================
+ Coverage   24.32%   71.67%   +47.35%     
===========================================
  Files         120      120               
  Lines        7486     7397       -89     
===========================================
+ Hits         1821     5302     +3481     
+ Misses       5665     2095     -3570

Flag	Coverage Δ
cpu	`71.67% <80.00%> (+47.35%)`	⬆️
pytest	`71.67% <80.00%> (+47.35%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pl_bolts/transforms/dataset_normalizations.py	`72.00% <20.00%> (+37.00%)`	⬆️
pl_bolts/datasets/emnist_dataset.py	`90.90% <90.90%> (ø)`
pl_bolts/datamodules/__init__.py	`100.00% <100.00%> (ø)`
pl_bolts/datasets/__init__.py	`100.00% <100.00%> (ø)`
pl_bolts/models/rl/advantage_actor_critic_model.py
pl_bolts/losses/self_supervised_learning.py	`71.33% <0.00%> (+0.18%)`	⬆️
pl_bolts/models/self_supervised/cpc/cpc_module.py	`20.58% <0.00%> (+0.58%)`	⬆️
...l_bolts/models/self_supervised/byol/byol_module.py	`22.34% <0.00%> (+0.83%)`	⬆️
pl_bolts/models/vision/image_gpt/igpt_module.py	`20.65% <0.00%> (+0.87%)`	⬆️
... and 70 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd28835...5833e60. Read the comment docs.

Changes made in: `datamodules/binary_mnist_datamodule.py` - class ``BinaryEMNISTDataModule` - [x] modified `prepare_data()` - [x] added `setup()`

Changes made in: `datamodules/emnist_datamodule.py` - class `EMNISTDataModule` - [x] modified `prepare_data()` - [x] added `setup()`

- updated docs - updated property: `num_classes` > Now the code shows correct `num_classes` based on the `split` provided by the user.

This reverts commit 41a1252.

This reverts commit 42df382.

akihironitta

LGTM!

Description of the change: Introduced a new arg strict_val_split: bool to use the balanced validation set defined in the EMNIST paper.

(always open for any discussions/comments on the API (strict_val_split) :])

Note: The failing tests are irrelevant to the changes in this PR.

sugatoray

@akihironitta Looks good to me. Thank you, for working on it. I am not sure, if you were waiting for any feedback from me. It seems that this has not been reviewed by other reviewers since your approval. Please let me know if there is something I need to do.

akihironitta · 2021-08-10T07:19:03Z

@sugatoray Hi, thanks for having a look again!

@PyTorchLightning/core-bolts Would you mind having a look at the changes?

pl_bolts/datamodules/emnist_datamodule.py

akihironitta · 2021-08-12T06:41:36Z

The failing tests are irrelevant to the change in this PR.

pytest summary at c6adbae

=========================== short test summary info ============================
FAILED tests/callbacks/verification/test_base.py::test_verification_base_get_input_array[device0]
FAILED tests/callbacks/verification/test_batch_gradient.py::test_batch_gradient_verification_pl_module[device0-True]
FAILED tests/callbacks/verification/test_batch_gradient.py::test_batch_gradient_verification_pl_module[device0-False]
FAILED tests/callbacks/verification/test_batch_gradient.py::test_batch_gradient_verification_callback[0]
FAILED tests/models/rl/test_scripts.py::test_cli_run_rl_dqn[ --env PongNoFrameskip-v4 --max_steps 10 --fast_dev_run 1 --warm_start_size 10 --n_steps 2 --batch_size 10]
FAILED tests/models/rl/test_scripts.py::test_cli_run_rl_reinforce[ --env CartPole-v0 --max_steps 10 --fast_dev_run 1 --batch_size 10]
FAILED tests/models/rl/test_scripts.py::test_cli_run_rl_vanilla_policy_gradient[ --env CartPole-v0 --max_steps 10 --fast_dev_run 1 --batch_size 10]
= 7 failed, 288 passed, 23 skipped, 1 xfailed, 532 warnings in 1053.39s (0:17:33) =

akihironitta · 2021-08-12T06:43:11Z

@PyTorchLightning/core-bolts Could you have another look?

sugatoray added 8 commits June 26, 2021 12:18

added EMNIST dataset

09295d2

Contains `class` BinaryEMNIST as well.

updated datasets/__init__.py for EMNIST and BinaryEMNIST

cd141fb

added emnist_datamodule.py to datamodules

3874962

This adds **EMNISTDataModule** `class`.

added EMNISTDataModule to datamodules/__init__.py

d479d4a

fixed a typo in datamodules/emnist_datamodule.py

c386bea

added BinaryEMNISTDataModule to datamodules

4561c77

file added: datamodules/binary_emnist_datamodule.py

added BinaryEMNISTDataModule to datamodules/__init__.py

1541e52

corrected a typo in datasets/emnist_dataset.py

c8dec9a

github-actions bot added the datamodule Anything related to datamodules label Jun 26, 2021

[pre-commit.ci] auto fixes from pre-commit.com hooks

e11fb46

for more information, see https://pre-commit.ci

sugatoray added 13 commits June 26, 2021 20:52

added EMNISTDataModule and BinaryEMNISTDataModule to test_imports.py

4219ca9

made changes to BinaryEMNISTDataModule

e38c0b4

Changes made in: `datamodules/binary_mnist_datamodule.py` - class ``BinaryEMNISTDataModule` - [x] modified `prepare_data()` - [x] added `setup()`

made changes to EMNISTDataModule

772957d

Changes made in: `datamodules/emnist_datamodule.py` - class `EMNISTDataModule` - [x] modified `prepare_data()` - [x] added `setup()`

added tests for EMNISTDataModule and BinaryEMNISTDataModule

523217c

added emnist metadata to emnist_dataset.py

a90e288

updated binary_emnist_datamodule.py

2eef30b

- updated docs - updated property: `num_classes` > Now the code shows correct `num_classes` based on the `split` provided by the user.

updated emnist_datamodule.py

e1d8b8e

- updated docs - updated property: `num_classes` > Now the code shows correct `num_classes` based on the `split` provided by the user.

fixed linting errors in emnist_dataset.py

a8c5db9

fixed linting errors in test_datamodules.py

26119b0

fixed some linting errors in emnist_datamodule.py

e84bc44

fixed some linting errors in emnist_datamodule.py

6e374f6

fixed linting errors in emnist_datamodule.py

69b0bb9

fixed linting errors in binary_emnist_datamodule.py

5d78844

sugatoray marked this pull request as ready for review June 29, 2021 00:08

sugatoray requested review from akihironitta, ananyahjha93, awaelchli and Borda as code owners June 29, 2021 00:08

akihironitta added 7 commits July 31, 2021 11:22

Follow up of strict_val_step

2622ea4

Fix num_classes doc

12258cb

Remove TODO

0d7a607

Change func name in tests

159be6c

Remove EMNIST from emnist_dataset.py

7e624da

Fix tests

6ff4028

Update CHANGELOG

e1abfbb

akihironitta marked this pull request as ready for review July 31, 2021 21:08

akihironitta added 2 commits August 1, 2021 06:15

Revert "Temporarily disable GPU testing"

ec1ed51

This reverts commit 41a1252.

Revert "Temporarily disable GPU testing"

b08fb46

This reverts commit 42df382.

akihironitta approved these changes Jul 31, 2021

View reviewed changes

sugatoray commented Aug 9, 2021

View reviewed changes

Merge branch 'master' into feature/672_EMNISTDataModule

5243f08

tchaton reviewed Aug 10, 2021

View reviewed changes

pl_bolts/datamodules/emnist_datamodule.py Outdated Show resolved Hide resolved

pl_bolts/datamodules/emnist_datamodule.py Outdated Show resolved Hide resolved

pl_bolts/datamodules/emnist_datamodule.py Show resolved Hide resolved

akihironitta mentioned this pull request Aug 11, 2021

Use pin_memory=True, shuffle=True and num_workers=0 by default #701

Merged

7 tasks

akihironitta added 2 commits August 11, 2021 18:28

Simplify default_transforms()

4eebe3f

Change datamodules' default values

c6adbae

akihironitta added the ready label Aug 12, 2021

mergify bot added the has conflicts label Aug 13, 2021

Borda requested a review from tchaton August 13, 2021 12:41

Merge branch 'master' into feature/672_EMNISTDataModule

ef13456

mergify bot removed the has conflicts label Aug 13, 2021

Borda approved these changes Aug 13, 2021

View reviewed changes

mergify bot added the has conflicts label Aug 13, 2021

Merge branch 'master' into feature/672_EMNISTDataModule

5833e60

mergify bot removed the has conflicts label Aug 13, 2021

Borda merged commit 4ece8db into Lightning-Universe:master Aug 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EMNISTDataModule #676

Add EMNISTDataModule #676

sugatoray commented Jun 26, 2021 •

edited

Loading

pep8speaks commented Jun 26, 2021 •

edited

Loading

codecov bot commented Jun 26, 2021 •

edited

Loading

akihironitta left a comment •

edited

Loading

sugatoray left a comment

akihironitta commented Aug 10, 2021

akihironitta commented Aug 12, 2021

akihironitta commented Aug 12, 2021

Add EMNISTDataModule #676

Add EMNISTDataModule #676

Conversation

sugatoray commented Jun 26, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

pep8speaks commented Jun 26, 2021 • edited Loading

Comment last updated at 2021-08-13 13:04:38 UTC

codecov bot commented Jun 26, 2021 • edited Loading

Codecov Report

akihironitta left a comment • edited Loading

Choose a reason for hiding this comment

sugatoray left a comment

Choose a reason for hiding this comment

akihironitta commented Aug 10, 2021

akihironitta commented Aug 12, 2021

akihironitta commented Aug 12, 2021

sugatoray commented Jun 26, 2021 •

edited

Loading

pep8speaks commented Jun 26, 2021 •

edited

Loading

codecov bot commented Jun 26, 2021 •

edited

Loading

akihironitta left a comment •

edited

Loading