Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data augmentation causes difference in input batches #6

Closed
milliema opened this issue Dec 29, 2021 · 2 comments
Closed

Data augmentation causes difference in input batches #6

milliema opened this issue Dec 29, 2021 · 2 comments

Comments

@milliema
Copy link

Thanks for the great work!
I'm currently trying to adopt Mixmo for my own projects, I found some modifications that differs from Google'work MIMO.
For input batches generation, suppose we use input repetition=1.0, technically the indexes and images for the 2 inputs of 2 experts should be exactly the same.
In goolge's implementation, they read the images first, and then construct 2 inputs for the experts based on input repetition value (to shuffle partial indexes and keep the rest unchanged).
In your implementation, you compute the indexes first (based on input repetition), and then read the images accordingly.
The problem is, if we use default data augmentation (e.g. random cropping or flipping, which is also used in your code), even if the indexes are the same, it doesn't mean the images are the same because we apply DA to these imgaes randomly! However, in google's implementation, since the images are read in at first, this issue does not exist.
I hope I've made my ideas clearly.
I'm wondering that, how would this affect the performance, and which implementation is correct or reasonable? I'd like to hear your opinions. Appreciate your prompt reply!

@alexrame
Copy link
Owner

Thank you for your interest.
First, note that input repetition is never used in MixMo: it generally decreases diversity across subnetworks.
Yet, this implementation difference may also have some impact on the batch repetition.
Intuitively, I speculate that applying different pixels augmentation on the different duplicates of the same image may increase diversity across subnetworks. This coud partially explain why our MIMO baseline is stronger that the Google implementation.
Yet, this gain may remain as long as the augmentation process is not too strong/destructive. In that case, we would lose the benefit from batch repetition/input repetition. Please let me know if you have a try.
Finally, this discussion is quite related to the variance reduction discussion from this inspiring paper: https://arxiv.org/pdf/1901.09335.pdf

@milliema
Copy link
Author

milliema commented Jan 4, 2022

Thanks for your reply, it's really very helpful.
Indeed, differed data augmentation (DA) has similar effect as Batch Augmentation (thx for the sharing), and is beneficial for the performance.
You just remind me that input repetition is not used in Mixmo, but used in MIMO. Since the main idea of Mixmo is to enhance model diversity (through mixing block), it's reasonable to disable input repetition.
According to my understanding, both lower input repetition and differed data augmentation (DA) are capable of encouraging model diversity. I actually have done some experiments on my dataset. If differed DA is used, the best setting of input repetition is 0.9 in MIMO, which is much higher than suggested in the paper, and the achieved result is better than single model. I havn't try on Mixmo yet. I'll update you of any interesting results. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants