-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug for dataset.get_shape() after dataset.set_shard() #2802
Conversation
Hey, a very nice first contribution. Thanks for it! I presume that |
deepchem/data/tests/test_setshard.py
Outdated
|
||
|
||
def test_setshard_with_X_y(): | ||
"""Test setharding on a simple example""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo set sharding
deepchem/data/tests/test_setshard.py
Outdated
X = np.random.rand(10, 3) | ||
y = np.random.rand(10,) | ||
dataset = dc.data.DiskDataset.from_numpy(X, y) | ||
assert dataset.get_shape()[0][0] == 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe,
X_shape, y_shape, _, _ == dataset.get_shape()
assert X_shape[0] == 10
assert y_shape[0] == 10
will be more meaningful. One can quickly understand what the values returned by dataset.get_shape()
represents.
Same suggestions for line 11, 18, 19.
Thanks! I think that's what legacy_metadata means. I believe contributing to deepchem is very meaningful and I am very happy to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Let's do a rebase here to be safe. @DingQK let us know if you haven't run a rebase before and we can point you to resources |
I usually do it like this
|
Hi @arunppsg, Thank you so much for your advice! I am not sure whether I did it correctly. Can you please check again? Thank you so much for your patience. |
Something amiss. The reason is there are redundant commits - see, the first 3 are original commits, the last 3 are duplicate commits of the same. Make sure that you are in
Briefly, we are resetting it to commit |
Hi @arunppsg , Thank you so much for your patient and detailed instructions! |
Thanks for a wonderful contribution on your first PR! Looking forward for others! |
Description
Fix #2772 @arunppsg
Change
self.legacy_metadata
intoTrue
afterdataset.set_shard()
, meaning that shape metadata has changed.dataset.get_shape()
should fall back to loading data from disk and acquire the shape.Type of change
Please check the option that is related to your PR.
Checklist
yapf -i <modified file>
and check no errors (yapf version must be 0.22.0)mypy -p deepchem
and check no errorsflake8 <modified file> --count
and check no errorspython -m doctest <modified file>
and check no errors