Data augmentation causes difference in input batches #6

milliema · 2021-12-29T04:11:15Z

Thanks for the great work!
I'm currently trying to adopt Mixmo for my own projects, I found some modifications that differs from Google'work MIMO.
For input batches generation, suppose we use input repetition=1.0, technically the indexes and images for the 2 inputs of 2 experts should be exactly the same.
In goolge's implementation, they read the images first, and then construct 2 inputs for the experts based on input repetition value (to shuffle partial indexes and keep the rest unchanged).
In your implementation, you compute the indexes first (based on input repetition), and then read the images accordingly.
The problem is, if we use default data augmentation (e.g. random cropping or flipping, which is also used in your code), even if the indexes are the same, it doesn't mean the images are the same because we apply DA to these imgaes randomly! However, in google's implementation, since the images are read in at first, this issue does not exist.
I hope I've made my ideas clearly.
I'm wondering that, how would this affect the performance, and which implementation is correct or reasonable? I'd like to hear your opinions. Appreciate your prompt reply!

alexrame · 2021-12-31T10:31:18Z

Thank you for your interest.
First, note that input repetition is never used in MixMo: it generally decreases diversity across subnetworks.
Yet, this implementation difference may also have some impact on the batch repetition.
Intuitively, I speculate that applying different pixels augmentation on the different duplicates of the same image may increase diversity across subnetworks. This coud partially explain why our MIMO baseline is stronger that the Google implementation.
Yet, this gain may remain as long as the augmentation process is not too strong/destructive. In that case, we would lose the benefit from batch repetition/input repetition. Please let me know if you have a try.
Finally, this discussion is quite related to the variance reduction discussion from this inspiring paper: https://arxiv.org/pdf/1901.09335.pdf

milliema · 2022-01-04T09:34:31Z

Thanks for your reply, it's really very helpful.
Indeed, differed data augmentation (DA) has similar effect as Batch Augmentation (thx for the sharing), and is beneficial for the performance.
You just remind me that input repetition is not used in Mixmo, but used in MIMO. Since the main idea of Mixmo is to enhance model diversity (through mixing block), it's reasonable to disable input repetition.
According to my understanding, both lower input repetition and differed data augmentation (DA) are capable of encouraging model diversity. I actually have done some experiments on my dataset. If differed DA is used, the best setting of input repetition is 0.9 in MIMO, which is much higher than suggested in the paper, and the achieved result is better than single model. I havn't try on Mixmo yet. I'll update you of any interesting results. Thanks!

alexrame closed this as completed Dec 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data augmentation causes difference in input batches #6

Data augmentation causes difference in input batches #6

milliema commented Dec 29, 2021

alexrame commented Dec 31, 2021

milliema commented Jan 4, 2022

Data augmentation causes difference in input batches #6

Data augmentation causes difference in input batches #6

Comments

milliema commented Dec 29, 2021

alexrame commented Dec 31, 2021

milliema commented Jan 4, 2022