New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sample behaviour #23
Comments
We can add
|
I don't think the 'None' version makes sense. Also, sampling is random, so I don't think a distinction between 'cycle' and 'repeat' makes sense. |
Maybe we can have |
My question is, does it really make sense to sample outside of len-1 index? |
Scenarios with oversampling will probably be due to an error in user-logic, and I think it’s better to get an error earlier rather than later. Should a creative user finds the need for it, he could always do: ds_sampled = ds.sample(42)
ds_oversampled = ds.concat(ds_sampled) Though not that bad, it does break the call-chain. And seeing as the implementation is trivial, why not add the |
Should this also extend to take function as well? If one is allowed I would expect the other to be as well. However, if the user wants to sample more samples than what is present they could use the repeat function? This would be more transparent in my opinion and would not add any complexity.
|
Let's remove the oversampling and throw an error instead |
Currently, if more samples are requested on
.sample
, than are available in the dataset, we will sample some samples multiple times. Should we raise an error instead?The text was updated successfully, but these errors were encountered: