Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to replicate results after retraining #5

Closed
aemrey opened this issue Feb 12, 2020 · 5 comments
Closed

Unable to replicate results after retraining #5

aemrey opened this issue Feb 12, 2020 · 5 comments

Comments

@aemrey
Copy link

aemrey commented Feb 12, 2020

Hello and thank you for this fantastic repo!

I am trying to retrain your model using COCO features I have extracted myself using the bottom-up attention repo as you have suggested in #2. I am currently on epoch 15 and the highest CIDEr score on the test set has been 1.13. This is much less than the 1.31 that I get when using your pretrained model. Other than the new features, I am using your default values for all hyperparameters.

Could you give me some guidance in order to better replicate your results?

@baraldilorenzo
Copy link
Member

Dear @aemrey,

thanks for your interest in our code. From what you mention, it seems that the RL training part has not even started so it is fairly normal that you are getting lower results. Please keep it running until the training stops by itself.

Lorenzo.

@alesolano
Copy link

alesolano commented Mar 10, 2020

Hi @aemrey, I'm following your steps and right now I'm extracting the features of my set of images. I'm using this model loaded with these weights.

Now, how do you pack those features into an Nx2048 tensor? I understood "features" by the output of the blob cls_prob here, but it returns an Nx1601 tensor. I'm sure I'm missing something here, maybe not taking the correct blob or using the wrong model.

Thanks!

P.S.: Don't know if I should maybe open a new issue

EDIT: Well, I understand now that I should take the blob res5c, maybe? Though the output is Nx2048x14x14, so don't really know what to do with those 14x14. And still I guess I need the cls_prob to sort the array.
I'll keep updating, but if you see something odd on what I'm doing and have a quick hint, please let me know.

EDIT2: Moved to #7

@ruotianluo
Copy link

I ran a couple of times your code.

image

The test cider is always below 1.30. Any clues?

@marcellacornia
Copy link
Member

marcellacornia commented Mar 19, 2020

Hi @ruotianluo,
thanks for your interest in our work.

We noticed that the results and the training behavior can be slightly different by changing the underlying architecture. For this reason, we also provided the weights of our final model.

In our experiments, we used an NVIDIA 2080 Ti GPU. The other settings are the ones we reported in our repository.

@wanboyang
Copy link

I ran a couple of times your code.

image

The test cider is always below 1.30. Any clues?

I think it caused by the difference between bottom-up features provided in [1] and provided in this project. The testing results shown in image, the curve named m2_transformer_wan used the features provided in [1] and the curve named m2_transformer_wan n used the features provided in this project.
I compare the features of [1] and this project , and find numbers of box in some image_id are different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants