- 
                Notifications
    You must be signed in to change notification settings 
- Fork 79
Adding multisample feature along with testcases #740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding multisample feature along with testcases #740
Conversation
1b01b6f    to
    6a77302      
    Compare
  
    | @tchaton @deependujha @bhimrazy Can you verify the approach once? I can then make changes to the README. | 
| index_path: Optional[str] = None, | ||
| force_override_state_dict: bool = False, | ||
| transform: Optional[Union[Callable, list[Callable]]] = None, | ||
| is_multisample: bool = False, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will you know how many sample_count user wants?
| is_multisample: bool = False, | |
| sample_count: int = 1, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better. I'll add this.
| def __len__(self) -> int: | ||
| return self.get_len(self.num_workers, self.batch_size if self.batch_size else 1) | ||
| original_len = self.get_len(self.num_workers, self.batch_size if self.batch_size else 1) | ||
| return original_len if not self.is_multisample else original_len * len(self.transform) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting approach, using the number of transforms to determine dataset length implies that the user has defined as many transform functions as the number of desired samples.
It makes sense conceptually, but I’m a bit unsure about the practicality. In most cases, the transforms won’t differ drastically; a user could easily handle minor variations in transforms with using simple conditionals and sample_idx parameter.
So I’m wondering if this design might add unnecessary complexity for limited flexibility.
Curious to hear your thoughts on this, @bhimrazy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that approach makes sense. Adding samples_count as a parameter instead of deriving it from the number of transform functions is more logical. This works well when self.transform is a single function. However, when it’s a list of transform functions, we need to decide how to handle it. In such cases, we can either override the multi-sample behavior or return samples_count samples for each transform function. I believe the first option is better, since the user has already defined multi-sample outputs through the list of transforms.
Thoughts? @deependujha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VijayVignesh1 let's wait for @bhimrazy thoughts, if he would like to suggest something.
| Codecov Report❌ Patch coverage is  Additional details and impacted files@@         Coverage Diff         @@
##           main   #740   +/-   ##
===================================
  Coverage    80%    80%           
===================================
  Files        52     52           
  Lines      7330   7350   +20     
===================================
+ Hits       5869   5887   +18     
- Misses     1461   1463    +2     🚀 New features to boost your workflow:
 | 
Before submitting
What does this PR do?
Fixes #317
PR review
Added support for multisample item.
Basically added a boolean parameter which creates a batch of sub samples for each sample, given a list of transform functions.
Sample code:
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃