Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Refactor preprocess_cls to preprocess, add Serializer, add DataPipelineState #229

Merged
merged 55 commits into from
Apr 22, 2021

Conversation

ethanwharris
Copy link
Collaborator

@ethanwharris ethanwharris commented Apr 20, 2021

What does this PR do?

Bit of a big one. Makes the following changes:

  • Switch from {pre/post}process_cls to just handing around references to instantiated objects. This includes:
    • removing classmethods called e.g. instantiate_preprocess (now just handled in the __init__ of the class)
    • moving instantiation dependencies to the preprocess classes (e.g. default image transforms are now on the ImageClassification preprocess rather than the datamodule
  • Adds a Serializer class so that a datamodule is now made up of a Preprocess, Postprocess, and a Serializer. These are classes which control the conversion from model output to desired prediction format (e.g. labels, classes etc.). They are seperate from the Postprocess as the user should be able to change the Serializer without messing with the more fundamental stuff in the Postprocess (without which the traning would usually fail). This Fixes Support more return types in predict #191 .
  • Adds a DataPipelineState. Children of the data pipeline need to be able to communicate, this expands the PreprocessState to be shared across preprocess, postprocess, and serializer. This enables e.g. Labels serializer just gets the labels from the ClassificationState registered in the load_data of the ImageClassificationPreprocess.
  • Adds an example showing multi_label image classification. Uses a subset of the movie poster genre data here: https://www.cs.ccu.edu.tw/~wtchu/projects/MoviePoster/
  • Adds option to pass the data_fetcher as an argument to from_load_data_inputs

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests? [not needed for typos/docs]
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented Apr 20, 2021

Codecov Report

Merging #229 (b1310e5) into master (b7436c4) will increase coverage by 0.24%.
The diff coverage is 92.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #229      +/-   ##
==========================================
+ Coverage   86.57%   86.81%   +0.24%     
==========================================
  Files          58       58              
  Lines        2912     2981      +69     
==========================================
+ Hits         2521     2588      +67     
- Misses        391      393       +2     
Flag Coverage Δ
unittests 86.81% <92.13%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
flash/data/callback.py 98.70% <ø> (ø)
flash/data/data_module.py 77.92% <83.33%> (-0.16%) ⬇️
flash/vision/classification/data.py 88.20% <87.80%> (+0.34%) ⬆️
flash/tabular/classification/data/data.py 88.28% <88.46%> (-0.35%) ⬇️
flash/text/classification/data.py 82.47% <88.88%> (+0.65%) ⬆️
flash/core/classification.py 92.06% <90.90%> (-7.94%) ⬇️
flash/data/data_pipeline.py 87.87% <90.90%> (+0.59%) ⬆️
flash/core/model.py 91.42% <93.93%> (-0.06%) ⬇️
flash/data/process.py 88.73% <97.95%> (+1.99%) ⬆️
flash/data/batch.py 79.83% <100.00%> (+0.16%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7436c4...b1310e5. Read the comment docs.

@kaushikb11
Copy link
Contributor

@ethanwharris Could you please add some description about the PR.

@ethanwharris
Copy link
Collaborator Author

@kaushikb11 will do, only a draft at the moment 😃

@pep8speaks
Copy link

pep8speaks commented Apr 20, 2021

Hello @ethanwharris! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-22 10:02:35 UTC

flash/core/model.py Outdated Show resolved Hide resolved
docs/source/reference/multi_label_classification.rst Outdated Show resolved Hide resolved
docs/source/general/data.rst Outdated Show resolved Hide resolved
flash/vision/detection/data.py Outdated Show resolved Hide resolved
Copy link
Contributor

@edgarriba edgarriba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat tests :)

docs/source/reference/multi_label_classification.rst Outdated Show resolved Hide resolved
docs/source/reference/multi_label_classification.rst Outdated Show resolved Hide resolved
docs/source/reference/multi_label_classification.rst Outdated Show resolved Hide resolved
flash/core/classification.py Show resolved Hide resolved
flash/core/classification.py Show resolved Hide resolved
flash/core/model.py Show resolved Hide resolved
flash/data/process.py Outdated Show resolved Hide resolved
ethanwharris and others added 3 commits April 21, 2021 14:21
Co-authored-by: Edgar Riba <edgar.riba@gmail.com>
@ethanwharris ethanwharris mentioned this pull request Apr 21, 2021
8 tasks
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing PR :)

@ethanwharris ethanwharris merged commit 1ab7346 into master Apr 22, 2021
@ethanwharris ethanwharris deleted the feature/multiple_return_types branch April 22, 2021 11:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support more return types in predict
5 participants