Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any data augmentation strategy used during training? #19

Closed
qiusuor opened this issue Dec 22, 2020 · 8 comments
Closed

Is there any data augmentation strategy used during training? #19

qiusuor opened this issue Dec 22, 2020 · 8 comments
Assignees

Comments

@qiusuor
Copy link

qiusuor commented Dec 22, 2020

I try to train ScaleHyperprior model using code provide in example/train.py with Imagenet2012/DIV2K dataset. The learning rate for main optimizer and aux_optimizer both set to 1e-4. The learning rate of the main optimizer is then divided by 2 when the evaluation loss reaches a plateau as it described in https://interdigitalinc.github.io/CompressAI/zoo.html. However, atfer nearly 1 week's training, the rate-distortion performance on Kadok-24 still has a gap compared to the result provided.

I am also training model on vimeo_test_clean from Vimeo90K after 2 days, it seems will not to converge to the result provided.
Have I missed something? Is there any data augmentation strategy used during training?

@jbegaint
Copy link
Contributor

Hi, no we don't do any augmentation besides random cropping the data to patches of size 256x256. We used the vimeo dataset from here.

How large is the performance gap?

Another learning rate strategy would be to set the lr to 1e-5 after ~100-150 epochs (depending on your dataset/batch size). We have observed similar performances.

@qiusuor
Copy link
Author

qiusuor commented Dec 23, 2020

Thanks for quick reply.
The gap is huge when training with Lambda 0.0130.
image

I will try to train on the dataset provide above. Is all data used in training/validating or just part of them?

@qiusuor
Copy link
Author

qiusuor commented Dec 23, 2020

I have checked the example/train.py and found that gradients were clipped by hyper_param --clip_max_norm(default:0.1). Does it influence train result? I wonder if gradient clip operation is involved in your training process? If it is, what the value of hyper_param --clip_max_norm?

After declare network, I explicitly call the update func like below, is that ok?
net = ScaleHyperprior(M,N)
net.update(force=True)

@jbegaint
Copy link
Contributor

Yes ok, this gap should not happen.
We do use all the training/test images provided by the dataset (with random crop for training, and center crop for testing).

We use gradient norm clipping, but usually the value is set 1. not 0.1, I'll fix the example. Thanks for reporting!

You only need to call .update() after the training of your network, it's only required for the entropy coding (compress/decompress).

@jbegaint jbegaint self-assigned this Dec 23, 2020
@qiusuor
Copy link
Author

qiusuor commented Dec 27, 2020

Thanks. Finally I get similar results.

@qiusuor qiusuor closed this as completed Dec 27, 2020
@jbegaint
Copy link
Contributor

Great, thanks for the update!

@achel-x
Copy link

achel-x commented Dec 26, 2022

Yes ok, this gap should not happen. We do use all the training/test images provided by the dataset (with random crop for training, and center crop for testing).

We use gradient norm clipping, but usually the value is set 1. not 0.1, I'll fix the example. Thanks for reporting!

You only need to call .update() after the training of your network, it's only required for the entropy coding (compress/decompress).

Hi,
I met the same question. The training dataset is DIV2k. I down-sampled all 800 images to half of their sizes. All 1600 images are included. I just train one point, the performance is shown in the figure below
image

Here is a gap with the homepage shows.
The training set is follow the default set
epochs 100
lr is 1e-4
lamda is 1e-2
Here is part of training log.
image

1、I wonder where is the exact position of the update call.
2、I noticed that the training set mentioned on the homepage is from Vimeo90k. I also want to know exactly how many images were used in training

@achel-x
Copy link

achel-x commented Dec 26, 2022

@jbegaint sorry, i found the update command. While I try to update the trained model as python -m compressai.utils.update_model ./checkpoint.pth.tar -a bmshj2018-factorized`
Then to eval its performance. The results are the sameimage
The name of model file in command does not cause this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants