-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Prefetching feature #17
base: main
Are you sure you want to change the base?
Conversation
fb3d302
to
a343741
Compare
src/indexable/dataloader/mod.rs
Outdated
pub fn iter(&self) -> SingleProcessDataLoaderIter<'_, D, S, C> { | ||
/// Return owning iterator over the dataloader. | ||
/// TODO: Find a way to not consume the Dataloader | ||
pub fn iter(self) -> SingleProcessDataLoaderIter<D, S, C> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataLoaderBuilder can return an Arc<DataLoader>
instead of DataLoader
this way SingleProcessDataLoaderIter
will not consume the Dataloader
(moved in a thread)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See next commit where dataset is stored as Arc inside Dataloader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've also come to an Arc
when iterating during the designing of the API. At the end, I choose to mimic what the standard iterator do:
Dataloader::into_iter
take owensershipDataloader::iter
andDataloader::iter_mut
don't
I thought it would be less surprising for the end user and also have a lower overhead.
Is the Arc
necessary to your change or you can go with iter
if you don' t want to consume the dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately in order to not consume the Dataloader like here :
let mut iter = dataloader.iter();
and be able to iterator over again I didn't find a solution without using Arc
because of the thread::spawn that move variables (dataloader, sampler ..)
But IMO we should not be impacted by the overhead during the iteration, btw burn-rs use a lot Arc
for multitheaded dataloading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution could be to remove fn iter
or make it private, for the Dataloader
and keep only into_iter
then we can move Dataset & Sampler into the spawned thread without using an Arc
but it seems too restrictive WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @Tudyx I have two commit in this branch now:
- One with Arc
- One without -> but we must consume the Dataloader in order to create the iterator
Also on another topic: @Tudyx Also impl |
547e353
to
3b390da
Compare
That's another possible design. I think having a
Totally agree! I think this could be a good subject for a separated PR |
My bad I think it's better to keep it that way :) with a simple API EDIT: @Tudyx EDIT 2: |
16565b1
to
e68e8ac
Compare
Note: |
f74b62f
to
8db9d2b
Compare
8db9d2b
to
73d8df6
Compare
73d8df6
to
ae1c3de
Compare
This is a first working implementation but with some drawbacks:
EDIT:
Deadlock should be avoided thanks to crossbeam
select!
macro