Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about reproducing results on COCO #36

Closed
GhostWnd opened this issue Jan 31, 2021 · 24 comments
Closed

Questions about reproducing results on COCO #36

GhostWnd opened this issue Jan 31, 2021 · 24 comments

Comments

@GhostWnd
Copy link

GhostWnd commented Jan 31, 2021

Hello, I tried to reproduce the result on COCO, I implemented my own framework and most of my files are the same as you, I only write my new train.py.
As is introduced in your paper, I have implemented EMA with 0.999, 1cycle policy with max learing rate 2e-4, Adam optimizer with weigth_decay 1e-4, img_size = 448*448 and batchsize = 16.

But when I train my model, the loss decreases from 120 to around 90 and then it just doesn't decrease and the performance on validation date is very bad, whose mAP is around 10, at first I guess it's because I didn't spend much time training, I only trained it for an hour, but when I try to train it longer, the loss still doesn't decrease, could you please tell me what I have done wrong?

My code is avaliable at https://github.com/GhostWnd/reproducingASL, thank you for your help.

@mrT23
Copy link
Contributor

mrT23 commented Jan 31, 2021

three observations to start with:

for testing convergence, use smaller resolution (224) and larger batch size (128)

i don't see any augmentations in your training code.
use RandAugment or AutoAugment at least + cutout

also, something weird with your scheduler:
scheduler = lr_scheduler.OneCycleLR(optimizer, max_lr = 0.0002, total_steps = total_step, epochs = 25)
its hardcoded to 25 epochs, yet you loop over only 5 epochs
add epochs as hyperparameter to arg list, and use it everywhere instead of hard-coded numbers. search for other hyper-parameters that should belong to arg list as well

@mrT23
Copy link
Contributor

mrT23 commented Jan 31, 2021

p.s. 1
also for testing and prototyping, use tresnet_m

p.s. 2
also you need to implement "true weight-decay" (not doing weight decay on bias and batch norm)

p.s. 3
i will probably notice other problems in the future, but we need to start from somewhere :-)

@GhostWnd
Copy link
Author

GhostWnd commented Jan 31, 2021

Thanks for your comment, I have tried to follow your instructions and the new train.py is available at https://github.com/GhostWnd/reproducingASL, the newest one is train_ver3.py, I will try to run it and report later.

I have run the train_ver3.py for around 600 iterations, and it appears that the training speed is much slower than the beginning, at the begining, it requires 3 second for each iteration, but after 600 iteration, it requires 9 second for an iteration. It puzzles me much, I doubu whether I have implemented the code right.

@mrT23
Copy link
Contributor

mrT23 commented Feb 1, 2021

i will take a look at the code and try to run It when i have the time.

good work so far, i think with joint forces we are on our way to finally have a modern multi-label code for the community to use, that vast majority of repo that exist are way outdated.

several more corrections and suggestions:

args.do_bottleneck_head = False (not True)

one more correction is that you are using the 2017 split. while this is not a "mistake" (and your results will be a little higher), in articles people use the 2014 split.

what about mixed precision ? with modern pytorch it is a few line of code ("with autocast():"...)

to improve speed, you don't have to update EMA every iteration. you cant update it every ~5 iterations with slightly higher decaey rate, and still ger similar results.

load a pretrained model, run only inference, and make sure you reproduce the article results (after switching to 2014 split)

make sure, especially in validation, that you are not building enormous vectors along the training that clog the RAM memory. sometimes its better to pre-allocate memory if you need to store large vectors

you have not implemented true WD correctly. this is not AdamW.
see example for true WD in:
https://github.com/rwightman/pytorch-image-models/blob/198f6ea0f3dae13f041f3ea5880dd79089b60d61/timm/optim/optim_factory.py
(def add_weight_decay...)

@GhostWnd
Copy link
Author

GhostWnd commented Feb 1, 2021

Thank you for your comment, I will try to correct it.
And if it's possible, could you please release the loss change when you run my code? Pure data is the best.
Thank you very much.

I have tried to correct true WD, it's now train_ver4.py, available at https://github.com/GhostWnd/reproducingASL

@mrT23
Copy link
Contributor

mrT23 commented Feb 1, 2021

Hi GhostWnd

I took a deeper look at the code. there are several major problems there.
make sure you understand whats the problem in each and every one, and apply proper corrections. don't skip a single one.
most of these problems are "deal-breakers".
after correcting all of them, repeat your runs, and we can compare results.
I hope i will have some results to compare until then (If i won't find more bugs)

don't get discouraged, we are making progress, and sometimes the journey is more educational than the destination.

problems:

  • currently not using randaugment (commented in train_loader)

  • using uninitilizaing model (for training and comparison to article, you should initialize model from relevant imagenet model https://github.com/Alibaba-MIIL/TResNet/blob/master/MODEL_ZOO.md)

  • using 2017 coco split is wrong (use instead 2014 coco split, different json files only )

  • Cutout(n_holes = 1, length = 16) -> Cutout(n_holes = 1, args.image_size/2)

  • validation should be done once an epoch, no more and no less

  •   preds.append(output.cpu())
      targets.append(target.cpu())
      
      ->
    

    preds.append(output.cpu().detach())
    targets.append(target.cpu().detach())

mAP_score = validate_multi(val_loader, model, args, ema)
->
model.eval()
mAP_score = validate_multi(val_loader, model, args, ema)
model.train()

  • calculate only mAP metrics. remove other metrics, they are only confusing during training

@mrT23 mrT23 closed this as completed Feb 1, 2021
@mrT23 mrT23 reopened this Feb 1, 2021
@mrT23
Copy link
Contributor

mrT23 commented Feb 2, 2021

just to give you motivation, i got a good score last night when running a corrected code...

@GhostWnd
Copy link
Author

GhostWnd commented Feb 2, 2021

Thank you for your comment and effort, I will try to correct the code and run it.
Thank you very much.

I have tried to fix the problems you mentioned , my code is train_ver5.py available at https://github.com/GhostWnd/reproducingASL

Other than train_ver5.py, I also edit helper_functions.py and to allow me to use 2014 json to train 2017 data.
Here is the change:

path = coco.loadImgs(img_id)[0]['file_name']
img = Image.open(os.path.join(self.root, path)).convert('RGB')
->
path = coco.loadImgs(img_id)[0]['file_name']
path = path.split('_')[-1] #remove 'MSCOCO_2014'
img = Image.open(os.path.join(self.root, path)).convert('RGB')

When I try to use 2014json to train 2017data, it seems that when validate, there are some images that in 2014 validate dataset while not in 2017 validate dataset, I would like to know, does the difference between 2014 and 2017 affect the result much?
Thank you very much.

@GhostWnd
Copy link
Author

GhostWnd commented Feb 3, 2021

Sorry to bother again, I know due to commercial issues you can't release your training code, but could you release the code you correct based on my train.py?

If that is not possible, could you please release the loss record of your corrected code based on my training code, so that I can compare the result myself?

Thank you very much.

@mrT23
Copy link
Contributor

mrT23 commented Feb 3, 2021

Hi GhostWnd

there were other problems in the code.
The two major ones:

  • sigmoid was done twice (!) - once in the direct prediction, second in the loss.
  • EMA was not performed correctly (its a separate model with separate validation)

anyway, this code fully reproduces the article results (i think it even surpasses it):
train_asl_reproduce.zip

i will attach logs for 224 and 448 trainings later

you are welcome to test it yourself and give me feedback.

thanks for the collaboration, together we will release the first publicly available modern multi-label code
:-)

@GhostWnd
Copy link
Author

GhostWnd commented Feb 3, 2021

Thank you so much! :-)
I will upload the train file to make it publicly available as well as test it by myself and give feedback to you as soon as possible.

@mrT23
Copy link
Contributor

mrT23 commented Feb 3, 2021

this is an example log file (notice - resolution 224, mtresnet)
mtresnet_224.txt

@mrT23
Copy link
Contributor

mrT23 commented Feb 3, 2021

Thank you so much! :-)
I will upload the train file to make it publicly available as well as test it by myself and give feedback to you as soon as possible.

do you have an objection that i will add the code also to
https://github.com/Alibaba-MIIL/ASL ?
i think it will help it gain more traction. there are very few (zero) modern multi-label code-bases like this, with top results.

i will of course share credit with you, i had made a lot of changes and enhancements to the code, but you provided the base implementation

@GhostWnd
Copy link
Author

GhostWnd commented Feb 3, 2021

No objection, it's my pleasure, thank you very much.

@GhostWnd
Copy link
Author

GhostWnd commented Feb 3, 2021

And I wonder whether you can put the mode based on tresnet_m and input size 224 into you pretrained model in https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md?

I would like to adjust some hyper parameters to test the influence of those hyperparameters.
And apply it to other dataset as pretrained model.
Thank you very much.

@mrT23
Copy link
Contributor

mrT23 commented Feb 3, 2021

i am not sure i fully understand your question.

models in
https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md
are standard imagenet models for downstream tasks. these are the models you should use to initialize training on COCO.

@GhostWnd
Copy link
Author

GhostWnd commented Feb 3, 2021

Well, I think if I don't make a mistake
models in ASL/blob/main/MODEL_ZOO.md are models trained on MSCOCO, the link is https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md
while models in TResNet/blob/master/MODEL_ZOO.md are standard imagenet models, the link is https://github.com/Alibaba-MIIL/TResNet/blob/master/MODEL_ZOO.md, right?

I just wonder whether you could upload the model you trained with tresnet_m with input size 224 into https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md, the ASL/blob/main/MODEL_ZOO.md one

@GhostWnd
Copy link
Author

GhostWnd commented Feb 3, 2021

Or could you please share the model you trained on tresnet_m and input size 224 with me?

I would like to adjust some hyper parameters to test the influence of those hyperparameters.
And apply it to other dataset as pretrained model.
Thank you very much.

@mrT23
Copy link
Contributor

mrT23 commented Feb 3, 2021

just to be clear:
tresnet_m 224 model trained on MS-COCO ?

@GhostWnd
Copy link
Author

GhostWnd commented Feb 3, 2021

yes, the one that produces the log file mtresnet_224.txt

@mrT23
Copy link
Contributor

mrT23 commented Feb 4, 2021

@GhostWnd
Copy link
Author

GhostWnd commented Feb 4, 2021

Thank you very much.

@LOOKCC
Copy link

LOOKCC commented Apr 8, 2021

this is an example log file (notice - resolution 224, mtresnet)
mtresnet_224.txt

Can you attach logs for 448 resolution with tresnet_l using this training code? I found it's hard for me to reprodect the 86.8mAP resault in paper.

@mrT23
Copy link
Contributor

mrT23 commented Apr 8, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants