Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for huggingface and torchvision datasets #145

Merged
merged 99 commits into from
Apr 11, 2023

Conversation

sanjanag
Copy link
Member

@sanjanag sanjanag commented Apr 5, 2023

Adds support for huggingface and torchvision datasets

  • Supported APIs: Imagelab(hf_dataset=hf_dataset, image_key="img") Imagelab(torchvision_dataset=tv_dataset)
  • Created a new class Dataset for handling different formats. It is iterable class storing index and data through which data is accessed throughout the code
  • dataset.index is a list of integers from 0 to n-1. Path information for local folder datasets is stored as metadata and appended to self.issues as a separate column named image_path
  • Added tests corresponding to huggingface and torch dataset cleanvision runs.

Docs preview:
https://forked-cleanvision.readthedocs.io/en/hf-dataset/index.html

@codecov
Copy link

codecov bot commented Apr 5, 2023

Codecov Report

Merging #145 (6e53262) into main (1efcbb5) will increase coverage by 1.80%.
The diff coverage is 95.51%.

@@            Coverage Diff             @@
##             main     #145      +/-   ##
==========================================
+ Coverage   91.87%   93.68%   +1.80%     
==========================================
  Files           9       14       +5     
  Lines         763      839      +76     
  Branches      147      156       +9     
==========================================
+ Hits          701      786      +85     
+ Misses         41       32       -9     
  Partials       21       21              
Impacted Files Coverage Δ
src/cleanvision/imagelab.py 88.82% <82.60%> (+3.82%) ⬆️
...anvision/issue_managers/duplicate_issue_manager.py 96.55% <92.30%> (+3.27%) ⬆️
src/cleanvision/dataset/hf_dataset.py 94.73% <94.73%> (ø)
src/cleanvision/dataset/torch_dataset.py 95.00% <95.00%> (ø)
src/cleanvision/dataset/base_dataset.py 100.00% <100.00%> (ø)
src/cleanvision/dataset/folder_dataset.py 100.00% <100.00%> (ø)
src/cleanvision/dataset/utils.py 100.00% <100.00%> (ø)
...ion/issue_managers/image_property_issue_manager.py 95.20% <100.00%> (-0.15%) ⬇️
src/cleanvision/utils/base_issue_manager.py 93.54% <100.00%> (+0.21%) ⬆️
src/cleanvision/utils/utils.py 89.58% <100.00%> (ø)
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@sanjanag sanjanag force-pushed the hf-dataset branch 5 times, most recently from 2c89285 to 05ceb44 Compare April 6, 2023 18:32
@sanjanag sanjanag marked this pull request as ready for review April 6, 2023 19:23
@sanjanag sanjanag changed the title Making cleanvision compatible with huggingface and torchvision datasets Adds support for huggingface and torchvision datasets Apr 6, 2023
src/cleanvision/dataset/utils.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
sanjanag and others added 2 commits April 10, 2023 17:42
Co-authored-by: Jonas Mueller <1390638+jwmueller@users.noreply.github.com>
examples/run.py Outdated Show resolved Hide resolved
Copy link
Member

@jwmueller jwmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sanjanag sanjanag merged commit ec23600 into cleanlab:main Apr 11, 2023
21 checks passed
@sanjanag sanjanag deleted the hf-dataset branch April 11, 2023 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resolve type check issue in sorted() Ensure library is compatible with deep learning packages
4 participants