Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow non-int sample ids in DatasetDefault #259

Merged
merged 7 commits into from Jan 24, 2023

Conversation

alex-golts
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@mosheraboh mosheraboh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alex!
See inline.
We can discuss it if it's clear or if you don't agree.

@@ -187,8 +187,9 @@ def getitem(
# get sample id
if not self._explicit_sample_ids_mode:
sample_id = item
if sample_id >= self._final_sample_ids:
raise IndexError
if isinstance(sample_id, (int, np.integer)): # allow using non int sample_ids
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it means that it works?
If yes, for the official fix - I want to support it more explicitly.
In the constructor,
We will support setting sample_ids to None.
If it will set to None - it means that Dataset has no control over the used sample ids, and therefore iter() and len() will raise an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it worked.
I made some changes, let me know if it's better

mosheraboh
mosheraboh previously approved these changes Jan 22, 2023
Copy link
Collaborator

@mosheraboh mosheraboh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Minors inline

Optionally, you can provide an integer that describes only the size of the dataset. This is useful in massive datasets
(for example 100M samples). In such case, multiple functionalities will not be supported, mainly -
cacher, allow_uncached_sample_morphing and get_all_sample_ids
:param sample_ids: list of sample_ids included in dataset. Or:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

fuse/data/datasets/dataset_default.py Outdated Show resolved Hide resolved
fuse/data/datasets/dataset_default.py Outdated Show resolved Hide resolved
fuse/data/datasets/dataset_default.py Outdated Show resolved Hide resolved
@alex-golts alex-golts merged commit 2553c41 into master Jan 24, 2023
@alex-golts alex-golts deleted the allow_non_int_sample_ids branch January 24, 2023 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants