allow non-int sample ids in DatasetDefault #259

alex-golts · 2023-01-22T10:00:55Z

No description provided.

mosheraboh

Thanks Alex!
See inline.
We can discuss it if it's clear or if you don't agree.

mosheraboh · 2023-01-22T12:20:36Z

fuse/data/datasets/dataset_default.py

@@ -187,8 +187,9 @@ def getitem(
        # get sample id
        if not self._explicit_sample_ids_mode:
            sample_id = item
-            if sample_id >= self._final_sample_ids:
-                raise IndexError
+            if isinstance(sample_id, (int, np.integer)):  # allow using non int sample_ids


So it means that it works?
If yes, for the official fix - I want to support it more explicitly.
In the constructor,
We will support setting sample_ids to None.
If it will set to None - it means that Dataset has no control over the used sample ids, and therefore iter() and len() will raise an error.

yes it worked.
I made some changes, let me know if it's better

mosheraboh

Looks great!
Minors inline

mosheraboh · 2023-01-22T18:00:34Z

fuse/data/datasets/dataset_default.py

-            Optionally, you can provide an integer that describes only the size of the dataset. This is useful in massive datasets
-             (for example 100M samples). In such case, multiple functionalities will not be supported, mainly -
-              cacher, allow_uncached_sample_morphing and get_all_sample_ids
+        :param sample_ids: list of sample_ids included in dataset. Or:


fuse/data/datasets/dataset_default.py

allow non-int sample ids in DatasetDefault

0ef8273

alex-golts requested a review from mosheraboh January 22, 2023 10:01

black format

a4fc3af

mosheraboh requested changes Jan 22, 2023

View reviewed changes

alex-golts added 2 commits January 22, 2023 08:15

explicitly support no sample ids

806f4bf

black format

719e42e

alex-golts requested a review from mosheraboh January 22, 2023 13:26

mosheraboh previously approved these changes Jan 22, 2023

View reviewed changes

review comments

fcbd054

alex-golts dismissed mosheraboh’s stale review via fcbd054 January 23, 2023 08:19

alex-golts added 2 commits January 23, 2023 03:20

Merge branch 'master' into allow_non_int_sample_ids

88d972c

Merge branch 'master' into allow_non_int_sample_ids

0b86d14

mosheraboh approved these changes Jan 24, 2023

View reviewed changes

alex-golts merged commit 2553c41 into master Jan 24, 2023

alex-golts deleted the allow_non_int_sample_ids branch January 24, 2023 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow non-int sample ids in DatasetDefault #259

allow non-int sample ids in DatasetDefault #259

alex-golts commented Jan 22, 2023

mosheraboh left a comment

mosheraboh Jan 22, 2023 •

edited

alex-golts Jan 22, 2023

mosheraboh left a comment

mosheraboh Jan 22, 2023

allow non-int sample ids in DatasetDefault #259

allow non-int sample ids in DatasetDefault #259

Conversation

alex-golts commented Jan 22, 2023

mosheraboh left a comment

Choose a reason for hiding this comment

mosheraboh Jan 22, 2023 • edited

Choose a reason for hiding this comment

alex-golts Jan 22, 2023

Choose a reason for hiding this comment

mosheraboh left a comment

Choose a reason for hiding this comment

mosheraboh Jan 22, 2023

Choose a reason for hiding this comment

mosheraboh Jan 22, 2023 •

edited