Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix full dataset download after partial #4681

Merged

Conversation

brownj85
Copy link
Contributor

@brownj85 brownj85 commented Oct 16, 2023

Context:
Link to shortcut story

If you call load and download only a partial dataset (eg. load("qchem", molname="H2", attributes=["hf_state"])), then try to load the full dataset (call load() again but without the attributes kwarg), it will not download the rest of the dataset. This is because if _download_dataset sees that there is already a file for some dataset, it assumes it has the data it needs. You can add force=True to ensure it's re-downloaded, but you shouldn't need to.

Description of the Change:
If all attributes are requested in qml.load() (e.g attributes is None), and the dataset already exists locally, _download_partial() will be used to download the missing attributes. If force is specified, the existing attributes in the local dataset will be overwritten.

_download_full() will only be called if the dataset does not exist at all locally.

Benefits:
Mixed calls to qml.load() work as expected.

Possible Drawbacks:

Related GitHub Issues:

@brownj85 brownj85 changed the title Fix partial dataset download Fix full dataset download after partial Oct 16, 2023
@github-actions
Copy link
Contributor

Hello. You may have forgotten to update the changelog!
Please edit doc/releases/changelog-dev.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

@brownj85
Copy link
Contributor Author

Heads up, this will be slow on large datasets until #4674 is merged

@brownj85 brownj85 marked this pull request as ready for review October 16, 2023 19:35
@codecov
Copy link

codecov bot commented Oct 16, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (89c2759) 99.64% compared to head (49e8d41) 99.63%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4681      +/-   ##
==========================================
- Coverage   99.64%   99.63%   -0.01%     
==========================================
  Files         377      377              
  Lines       34002    33754     -248     
==========================================
- Hits        33881    33632     -249     
- Misses        121      122       +1     
Files Coverage Δ
pennylane/data/data_manager/__init__.py 98.65% <100.00%> (+0.12%) ⬆️

... and 42 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@timmysilv timmysilv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! just a few things.

tests/data/data_manager/test_dataset_access.py Outdated Show resolved Hide resolved
pennylane/data/data_manager/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@timmysilv timmysilv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! you should add yourself to the contributor list too 😄

EDIT: I see you added yourself in the other PR, so nvm :p

Copy link
Contributor

@DSGuala DSGuala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

doc/releases/changelog-dev.md Outdated Show resolved Hide resolved
Co-authored-by: Utkarsh <utkarshazad98@gmail.com>
@timmysilv timmysilv added the merge-ready ✔️ All tests pass and the PR is ready to be merged. label Oct 19, 2023
@brownj85 brownj85 enabled auto-merge (squash) October 19, 2023 21:19
@brownj85 brownj85 merged commit 8166719 into master Oct 20, 2023
33 checks passed
@brownj85 brownj85 deleted the sc-42605-download-dataset-works-after-download-partial branch October 20, 2023 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-ready ✔️ All tests pass and the PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants