You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I add two info!() in the get() method of random.rs and dataset.rs. The number of log of the former is twice of the later.
2024-03-06T18:29:35.381089Z INFO burn_dataset::transform::random: type of self.dataset: "alloc::sync::Arc<dyn burn_dataset::dataset::base::Dataset<regression::dataset::DiabetesItem>>"
2024-03-06T18:29:35.381105Z INFO burn_dataset::transform::random: origial index: 0 new index: 120
2024-03-06T18:29:35.381109Z INFO regression::dataset: get 120
2024-03-06T18:29:35.381111Z INFO burn_dataset::transform::random: type of self.dataset: "burn_dataset::dataset::sqlite::SqliteDataset<regression::dataset::DiabetesItem>"
2024-03-06T18:29:35.381114Z INFO burn_dataset::transform::random: origial index: 120 new index: 411
2024-03-06T18:29:35.381138Z INFO burn_core::data::dataloader::batch: current index: 1
2024-03-06T18:29:35.381142Z INFO burn_dataset::transform::random: type of self.dataset: "alloc::sync::Arc<dyn burn_dataset::dataset::base::Dataset<regression::dataset::DiabetesItem>>"
2024-03-06T18:29:35.381145Z INFO burn_dataset::transform::random: origial index: 1 new index: 12
2024-03-06T18:29:35.381147Z INFO regression::dataset: get 12
2024-03-06T18:29:35.381149Z INFO burn_dataset::transform::random: type of self.dataset: "burn_dataset::dataset::sqlite::SqliteDataset<regression::dataset::DiabetesItem>"
2024-03-06T18:29:35.381152Z INFO burn_dataset::transform::random: origial index: 12 new index: 262
2024-03-06T18:29:35.381167Z INFO burn_core::data::dataloader::batch: current index: 2
2024-03-06T18:29:35.381170Z INFO burn_dataset::transform::random: type of self.dataset: "alloc::sync::Arc<dyn burn_dataset::dataset::base::Dataset<regression::dataset::DiabetesItem>>"
2024-03-06T18:29:35.381173Z INFO burn_dataset::transform::random: origial index: 2 new index: 81
2024-03-06T18:29:35.381176Z INFO regression::dataset: get 81
2024-03-06T18:29:35.381178Z INFO burn_dataset::transform::random: type of self.dataset: "burn_dataset::dataset::sqlite::SqliteDataset<regression::dataset::DiabetesItem>"
2024-03-06T18:29:35.381180Z INFO burn_dataset::transform::random: origial index: 81 new index: 72
2024-03-06T18:29:35.381196Z INFO burn_core::data::dataloader::batch: current index: 3
2024-03-06T18:29:35.381205Z INFO burn_dataset::transform::random: type of self.dataset: "alloc::sync::Arc<dyn burn_dataset::dataset::base::Dataset<regression::dataset::DiabetesItem>>"
2024-03-06T18:29:35.381208Z INFO burn_dataset::transform::random: origial index: 3 new index: 50
2024-03-06T18:29:35.381211Z INFO regression::dataset: get 50
2024-03-06T18:29:35.381213Z INFO burn_dataset::transform::random: type of self.dataset: "burn_dataset::dataset::sqlite::SqliteDataset<regression::dataset::DiabetesItem>"
2024-03-06T18:29:35.381216Z INFO burn_dataset::transform::random: origial index: 50 new index: 169
To Reproduce
Expected behavior
It should only be called once.
Screenshots
Desktop (please complete the following information):
OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]
Smartphone (please complete the following information):
Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]
Additional context
My guess is the get() is binded to the ShuffledDataset itself. And the self.dataset.get(*index) call itself again.
The text was updated successfully, but these errors were encountered:
antimora
changed the title
get() for ShuffledDataset is called twice for each self.dataset.get(self.current_index) in BatchDataloaderIterator
ShuffledDataset's get() called twice per iteration in BatchDataloaderIteratorMar 29, 2024
The symptom you're observing is just that the first ShuffledDataset::get(index) call comes from the dataloader, which calls the regression dataset's get(index), which in turn calls the self.dataset.get(index) (which is a ShuffledDataset).
The number of items actually retrieved by the dataset should still correspond to its length, it's just the debug info you added that seems to be leading you down the wrong path 🙂
If you really think there is a bug somewhere, please let us know. Otherwise I will close this issue.
Describe the bug
get() for ShuffledDataset is called twice for each self.dataset.get(self.current_index) in BatchDataloaderIterator
This method is called twice:
burn/crates/burn-dataset/src/transform/random.rs
Lines 44 to 50 in b429cc3
, when this method is called:
burn/crates/burn-core/src/data/dataloader/batch.rs
Line 161 in b429cc3
I add two info!() in the
get()
method ofrandom.rs
anddataset.rs
. The number of log of the former is twice of the later.To Reproduce
Expected behavior
It should only be called once.
Screenshots
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context
My guess is the
get()
is binded to theShuffledDataset
itself. And theself.dataset.get(*index)
call itself again.The text was updated successfully, but these errors were encountered: