Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug for dataset.get_shape() after dataset.set_shard() #2802

Merged
merged 3 commits into from
Jan 8, 2022

Conversation

DingQK
Copy link
Contributor

@DingQK DingQK commented Jan 1, 2022

Description

Fix #2772 @arunppsg

Change self.legacy_metadata into True after dataset.set_shard(), meaning that shape metadata has changed. dataset.get_shape() should fall back to loading data from disk and acquire the shape.

Type of change

Please check the option that is related to your PR.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • In this case, we recommend to discuss your modification on GitHub issues before creating the PR
  • Documentations (modification for documents)

Checklist

  • My code follows the style guidelines of this project
    • Run yapf -i <modified file> and check no errors (yapf version must be 0.22.0)
    • Run mypy -p deepchem and check no errors
    • Run flake8 <modified file> --count and check no errors
    • Run python -m doctest <modified file> and check no errors
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New unit tests pass locally with my changes
  • I have checked my code and corrected any misspellings

@arunppsg
Copy link
Contributor

arunppsg commented Jan 4, 2022

Hey, a very nice first contribution. Thanks for it!

I presume that legacy_metadata holds whether the current state of metadata is valid or not. If it is True, then it means that metadata has become legacy and hence invalid. Am I right here?



def test_setshard_with_X_y():
"""Test setharding on a simple example"""
Copy link
Contributor

@arunppsg arunppsg Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo set sharding

X = np.random.rand(10, 3)
y = np.random.rand(10,)
dataset = dc.data.DiskDataset.from_numpy(X, y)
assert dataset.get_shape()[0][0] == 10
Copy link
Contributor

@arunppsg arunppsg Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe,

X_shape, y_shape, _, _ == dataset.get_shape()
assert X_shape[0] == 10
assert y_shape[0] == 10

will be more meaningful. One can quickly understand what the values returned by dataset.get_shape() represents.

Same suggestions for line 11, 18, 19.

@DingQK
Copy link
Contributor Author

DingQK commented Jan 4, 2022

Hey, a very nice first contribution. Thanks for it!

I presume that legacy_metadata holds whether the current state of metadata is valid or not. If it is True, then it means that metadata has become legacy and hence invalid. Am I right here?

Thanks! I think that's what legacy_metadata means. I believe contributing to deepchem is very meaningful and I am very happy to do so.

@DingQK DingQK requested a review from arunppsg January 4, 2022 13:00
Copy link
Contributor

@arunppsg arunppsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@arunppsg
Copy link
Contributor

arunppsg commented Jan 5, 2022

The CI failures were fixed in #2799 . @rbharath do we make a rebase and re-run the CI and merge or can we go ahead and directly merge?

@rbharath
Copy link
Member

rbharath commented Jan 5, 2022

Let's do a rebase here to be safe. @DingQK let us know if you haven't run a rebase before and we can point you to resources

@DingQK
Copy link
Contributor Author

DingQK commented Jan 5, 2022

Let's do a rebase here to be safe. @DingQK let us know if you haven't run a rebase before and we can point you to resources

Hi @rbharath, I haven't run a rebase before and I am happy to try. I think I need resources for this action. Thank you so much!

@arunppsg
Copy link
Contributor

arunppsg commented Jan 6, 2022

I usually do it like this

  1. set your current branch as the branch which you want to rebase (in your case, you have to be in develop branch)
  2. Fetch the changes in deepchem:master using: git fetch upstream
  3. Rebase your develop branch using: git rebase upstream/master
  4. Push the changes to remote: git push -f origin develop

@DingQK
Copy link
Contributor Author

DingQK commented Jan 6, 2022

Hi @arunppsg,

Thank you so much for your advice! I am not sure whether I did it correctly. Can you please check again? Thank you so much for your patience.

@arunppsg
Copy link
Contributor

arunppsg commented Jan 6, 2022

Something amiss. The reason is there are redundant commits - see, the first 3 are original commits, the last 3 are duplicate commits of the same. Make sure that you are in develop branch. Then, do the following:

git reset --hard 987afc6
git remote set-url upstream https://github.com/deepchem/deepchem.git
git fetch upstream
git rebase upstream/master
git push -f origin develop

Briefly, we are resetting it to commit 987afc6 and setting remote upstream url, fetching new changes from upstream, updating it to the local repo and pushing the changes. Your total commits in the PR before and after should be same (3). This ref might help you to understand more about rebase.

@DingQK
Copy link
Contributor Author

DingQK commented Jan 8, 2022

Hi @arunppsg ,

Thank you so much for your patient and detailed instructions!

@arunppsg arunppsg merged commit 7d14983 into deepchem:master Jan 8, 2022
@arunppsg
Copy link
Contributor

arunppsg commented Jan 8, 2022

Thanks for a wonderful contribution on your first PR! Looking forward for others!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in dataset.get_shape() when used after dataset.set_shard()
3 participants