-
Hi! I'm working on a class-incremental learning problem with a replay buffer. The scenario is SplitCIFAR10 with 5 experiences, and the model has a final layer of IncrementalClassifier. I use ClassBalancedBuffer with adaptive size in order to keep the buffre balanced among the classes in it. I'm trying to find a way to make a data loader that would load class-balanced batches - such that each batch conatains about the same amount of examplars from each class (wether it's from the buffer or from the current experience). I thought GroupBalancedDataLoader might fit here, but I can't make it work correctly. The documentation is not very clear. What should I pass to "datasets" in order to make it class-balanced? I see it should be a sequence of datasets, but what datasets? EDIT: I think I've managed to make GroupBalancedDataLoader to be class-balanced (I passed as "datasets" a list with a separate ClassificationDataset for each class).
As I passed shuffle=True, I expected the classes to be shuffled in the class. How can I make it happen? Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hey Shahariel, Essentially in cross validation methods there's a stratified version where we use a smaller subset of a dataset to train on and then test on the rest (this eliminates to an extinct a bias in choosing validation data) while it's not perfectly balanced, each fold mimics the distribution in the original dataset how to do this in Avalanche isn't immediatly clear but as you mentionned split_mnist.train_stream[0].dataset[image_index] contains a tuple with image tensor , target and the task stream it belongs to. My method essentially :
Code-wise , my solution looks like this but experiment with it until it fits your needs :
save the images localy [note here I'm creating all the possible combinations of 2 experience SplitMnist] , creating 20 balanced training subsets of the first training stream (for fine tuning purposes) and you can save the 2nd trainset and other 2 testsets by adjusting the "enumerate" loop .
and then to to create the custom dataset following pytorch's api , I made sure that the 2 test streams are seperate but in your case if you want to test on all in one go u should save them in the same folder and have a common ImageFolder object
from there you can essentialy use a classic train/validate method like
This is probably not the best way to things but until we hear back from the maintainers ,this is what I cooked up . |
Beta Was this translation helpful? Give feedback.
Thank you for your answer!
I actually think my solution (under "EDIT" in my question) is ok. It's not a problem that the batches are organized by the classes.