Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifications to Image Loader #3695

Merged
merged 17 commits into from
Dec 17, 2023
Merged

Conversation

aaronrockmenezes
Copy link
Contributor

@aaronrockmenezes aaronrockmenezes commented Dec 2, 2023

Description

Image Loader can now be used to load images as labels.
[Issues: Sorting using sorted() is causing problems due to lexographic sorting, this is not what a normal OS does, leading to mismatch between labels and data].

Type of change

Please check the option that is related to your PR.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • In this case, we recommend to discuss your modification on GitHub issues before creating the PR
  • Documentations (modification for documents)

Checklist

  • My code follows the style guidelines of this project
    • Run yapf -i <modified file> and check no errors (yapf version must be 0.32.0)
    • Run mypy -p deepchem and check no errors
    • Run flake8 <modified file> --count and check no errors
    • Run python -m doctest <modified file> and check no errors
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New unit tests pass locally with my changes
  • I have checked my code and corrected any misspellings

@aaronrockmenezes
Copy link
Contributor Author

@rbharath Please Review

@aaronrockmenezes aaronrockmenezes marked this pull request as ready for review December 12, 2023 16:52
@shreyasvinaya
Copy link
Member

@aaronrockmenezes please use Yapf version 0.32.0 for python linting

@shreyasvinaya
Copy link
Member

This seems to be causing

FAILED deepchem/data/tests/test_image_loader.py::TestImageLoader::test_multitype_zip_load - assert (2,) == (2, 768, 1024, 3)
  Right contains 3 more items, first extra item: 768
  Full diff:
  - (2, 768, 1024, 3)
  + (2,) 

in python 3.8 unit tests

@aaronrockmenezes
Copy link
Contributor Author

@shreyasvinaya It seems like the a_image.tif is of a different dimension, (330, 44) and that doesn't match with the .png image for the test. Could that be the issue?

Copy link
Member

@shreyasvinaya shreyasvinaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, the image loader test that was failing in python 3.10 is also now fixed

@aaronrockmenezes
Copy link
Contributor Author

@rbharath Please do a final review

for subfile in os.listdir(label_file)
]
remainder += dirfiles
elif extension == ".zip":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronrockmenezes Do zip files have a set ordering upon extraction? The example above uses a label zip and a data directory zip. Do we know ordering is preserved or is it implementation dependent and can change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All files within a zipped folder are extracted in the order they appear in the directory that is to be extracted. So yes, as far as I know, the order is maintained.

Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronrockmenezes This looks mostly good but have a question about zip file handling below

@aaronrockmenezes
Copy link
Contributor Author

@rbharath Yes, the zip file will be extracted in the same order as they appear in the directory that is to be extracted.

@rbharath
Copy link
Member

@aaronrockmenezes Can you add a unit test that verifies this? Have sample directory, zip then use loader and verify it works correctly. Once that's added we can merge in

@aaronrockmenezes
Copy link
Contributor Author

@rbharath Added test to verify that the order of contents is maintained. Please review

# These are the known dimensions of face.png
assert dataset.X.shape == (1, 768, 1024, 3)
assert dataset.y.shape == (1, 768, 1024, 3)

def test_tif_simple_load(self):
loader = dc.data.ImageLoader()
dataset = loader.create_dataset(self.tif_image_path)
# TODO(rbharath): Where are the color channels?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbharath do we remove this comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should but fine to do in follow up PR

Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rbharath rbharath merged commit 5493df6 into deepchem:master Dec 17, 2023
24 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants