[WIP] Prefetching feature #17

AzHicham · 2023-09-01T12:24:44Z

This is a first working implementation but with some drawbacks:

Dataloader is consumed, and we cannot iterate multiples times over it

EDIT:
Deadlock should be avoided thanks to crossbeam select! macro

AzHicham · 2023-09-01T17:03:39Z

src/indexable/dataloader/mod.rs

-    pub fn iter(&self) -> SingleProcessDataLoaderIter<'_, D, S, C> {
+    /// Return owning iterator over the dataloader.
+    /// TODO: Find a way to not consume the Dataloader
+    pub fn iter(self) -> SingleProcessDataLoaderIter<D, S, C> {


DataLoaderBuilder can return an Arc<DataLoader> instead of DataLoader this way SingleProcessDataLoaderIter will not consume the Dataloader (moved in a thread)

See next commit where dataset is stored as Arc inside Dataloader

Yeah, I've also come to an Arc when iterating during the designing of the API. At the end, I choose to mimic what the standard iterator do:

Dataloader::into_iter take owensership

Dataloader::iter and Dataloader::iter_mut don't

I thought it would be less surprising for the end user and also have a lower overhead.
Is the Arc necessary to your change or you can go with iter if you don' t want to consume the dataset?

Unfortunately in order to not consume the Dataloader like here :
let mut iter = dataloader.iter(); and be able to iterator over again I didn't find a solution without using Arc because of the thread::spawn that move variables (dataloader, sampler ..)

But IMO we should not be impacted by the overhead during the iteration, btw burn-rs use a lot Arc for multitheaded dataloading

Another solution could be to remove fn iter or make it private, for the Dataloader and keep only into_iter then we can move Dataset & Sampler into the spawned thread without using an Arc but it seems too restrictive WDYT ?

FYI @Tudyx I have two commit in this branch now:

One with Arc

One without -> but we must consume the Dataloader in order to create the iterator

AzHicham · 2023-09-05T07:50:42Z

Also on another topic: @Tudyx
Maybe we can have a both a BatchSampler and a BatchSamplerExact then use generic in Dataloader & Builder instead of the boolean value drop_last ?

Also impl ExactSizeIterator when possible

Tudyx · 2023-09-05T20:23:38Z

Also on another topic: @Tudyx Maybe we can have a both a BatchSampler and a BatchSamplerExact then use generic in Dataloader & Builder instead of the boolean value drop_last ?

That's another possible design. I think having a drop_last function is a pretty good API though, it's discoverable through auto completion, the end user doesn't need to be aware of the underlying BatchSampler and it's in phase with the PyTorch API.

Also impl ExactSizeIterator when possible

Totally agree! I think this could be a good subject for a separated PR

AzHicham · 2023-09-05T20:30:00Z

That's another possible design. I think having a drop_last function is a pretty good API though, it's discoverable through auto completion, the end user doesn't need to be aware of the underlying BatchSampler and it's in phase with the PyTorch API.

My bad I think it's better to keep it that way :) with a simple API

EDIT: @Tudyx
Another solution could be to do like you did for pub fn shuffle(self) -> Builder<D, RandomSampler, C>
two functions, which implies having a trait for the BatchSampler
pub fn drop_last(self) -> Builder<D, BatchSamplerExact, S, C>
pub fn keep_last(self) -> Builder<D, BatchSampler, S, C>

EDIT 2:
I tried an impl with BatchSamplerTrait<S: Sampler> but it becomes more complicated :/

AzHicham · 2023-09-05T20:59:56Z

Note:
Find a way to cancel the background thread (prefetching) if the dataloader is droped, without deadlocking !!!

AzHicham · 2023-09-17T20:46:33Z

@Tudyx
I found a way to avoid the deadlock by using this amazing macro in crossbeam
WDYT ?

AzHicham force-pushed the haz/prefetching branch from fb3d302 to a343741 Compare September 1, 2023 12:36

AzHicham commented Sep 1, 2023

View reviewed changes

AzHicham force-pushed the haz/prefetching branch 8 times, most recently from 547e353 to 3b390da Compare September 5, 2023 20:12

AzHicham force-pushed the haz/prefetching branch 4 times, most recently from 16565b1 to e68e8ac Compare September 5, 2023 20:56

AzHicham force-pushed the haz/prefetching branch 5 times, most recently from f74b62f to 8db9d2b Compare September 7, 2023 12:30

AzHicham added 2 commits September 17, 2023 20:47

feat: prefetching

8070737

refacto: impl with reference instead of Arc

44158f8

AzHicham force-pushed the haz/prefetching branch from 8db9d2b to 73d8df6 Compare September 17, 2023 20:20

fix: Deadlock should be fixed

ae1c3de

AzHicham force-pushed the haz/prefetching branch from 73d8df6 to ae1c3de Compare September 17, 2023 20:45

Tudyx mentioned this pull request Apr 1, 2024

Add proper benchmark #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Prefetching feature #17

[WIP] Prefetching feature #17

AzHicham commented Sep 1, 2023 •

edited

Loading

AzHicham Sep 1, 2023

AzHicham Sep 3, 2023

Tudyx Sep 5, 2023

AzHicham Sep 5, 2023 •

edited

Loading

AzHicham Sep 5, 2023 •

edited

Loading

AzHicham Sep 6, 2023

AzHicham commented Sep 5, 2023 •

edited

Loading

Tudyx commented Sep 5, 2023

AzHicham commented Sep 5, 2023 •

edited

Loading

AzHicham commented Sep 5, 2023

AzHicham commented Sep 17, 2023

[WIP] Prefetching feature #17

Are you sure you want to change the base?

[WIP] Prefetching feature #17

Conversation

AzHicham commented Sep 1, 2023 • edited Loading

AzHicham Sep 1, 2023

Choose a reason for hiding this comment

AzHicham Sep 3, 2023

Choose a reason for hiding this comment

Tudyx Sep 5, 2023

Choose a reason for hiding this comment

AzHicham Sep 5, 2023 • edited Loading

Choose a reason for hiding this comment

AzHicham Sep 5, 2023 • edited Loading

Choose a reason for hiding this comment

AzHicham Sep 6, 2023

Choose a reason for hiding this comment

AzHicham commented Sep 5, 2023 • edited Loading

Tudyx commented Sep 5, 2023

AzHicham commented Sep 5, 2023 • edited Loading

AzHicham commented Sep 5, 2023

AzHicham commented Sep 17, 2023

AzHicham commented Sep 1, 2023 •

edited

Loading

AzHicham Sep 5, 2023 •

edited

Loading

AzHicham Sep 5, 2023 •

edited

Loading

AzHicham commented Sep 5, 2023 •

edited

Loading

AzHicham commented Sep 5, 2023 •

edited

Loading