Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance on NLVR2 #1

Closed
sanjayss34 opened this issue Aug 22, 2019 · 3 comments
Closed

Bad performance on NLVR2 #1

sanjayss34 opened this issue Aug 22, 2019 · 3 comments

Comments

@sanjayss34
Copy link

Hi, thanks for releasing your code! I'm not able to reproduce your fine-tuning result on NLVR2. I followed your instructions by downloading the pre-trained model, downloading the image features, pre-processing the nlvr2 JSON files, and running the nlvr2_finetune.bash script as is. However, I get the following results, which are much lower than the result you reported. Do you know why this might be happening?

Epoch 0: Train 52.32
Epoch 0: Valid 50.86
Epoch 0: Best 50.86

Epoch 1: Train 50.50
Epoch 1: Valid 49.14
Epoch 1: Best 50.86

Epoch 2: Train 50.56
Epoch 2: Valid 49.31
Epoch 2: Best 50.86

Epoch 3: Train 54.83
Epoch 3: Valid 51.65
Epoch 3: Best 51.65

@airsplay
Copy link
Owner

Many thanks for running the experiment and point this issue out!!!

I am now running a verified experiment and would let you the result tomorrow morning.

PyTorch Version

My initial guess is the PyTorch version. Could you help to try torch==1.0.1? Installation command:

pip install --force torch==1.0.1

I found that I used an old virtualenv with an old PyTorch version. I was supposing that PyTorch should be backward-compatible in computing gradients but it seems not the case.

By the way, here is a full list of my virtualenv. I believe that the only difference might be the torch version.

Package         Version  
--------------- ---------
boto3           1.9.205  
botocore        1.12.205 
certifi         2019.6.16
chardet         3.0.4    
docutils        0.14     
idna            2.8      
jmespath        0.9.4    
numpy           1.17.0   
pip             19.2.1   
python-dateutil 2.8.0    
requests        2.22.0   
s3transfer      0.2.1    
setuptools      41.0.1   
six             1.12.0   
torch           1.0.1    
tqdm            4.33.0   
urllib3         1.25.3   
wheel           0.33.4 

If so, it's really strange but I will update requirement.txt first.

Raw Feature

And could you also try to use the raw feature from our server in replace of the feature from zip files with the command:

wget nlp.cs.unc.edu/data/lxmert_data/nlvr2_imgfeat/train_obj36.tsv -P data/nlvr2_imgfeat
wget nlp.cs.unc.edu/data/lxmert_data/nlvr2_imgfeat/valid_obj36.tsv -P data/nlvr2_imgfeat

In case there are some broken zip files.

@airsplay
Copy link
Owner

Hi, I got the result of accuracy 74.39% (within the range 74.0% to 74.5% in README) with the same command in README last night.

Here is a snapshot of the results:
image

I would recommend still trying torch==1.0.1 if possible.

@sanjayss34
Copy link
Author

Thanks for the update! Yes, it does look like the torch version was the issue. (So far I have re-trained for 1 epoch using pytorch 1.0.1 and got a validation accuracy of 67.86.) Previously, I was using version 1.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants