Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release of pretrained snapshots #11

Closed
ayshrv opened this issue Oct 21, 2019 · 11 comments
Closed

Release of pretrained snapshots #11

ayshrv opened this issue Oct 21, 2019 · 11 comments

Comments

@ayshrv
Copy link

ayshrv commented Oct 21, 2019

Hi,

Could you please release the snapshot (model weights) of the best model for single run and beam search that led to the numbers reported in the paper?
Thanks!

@airsplay
Copy link
Owner

Hi,

The single-run model could be achieved by the code in this Github.

However, the beam-search part is still ineffective for now... It seems that I misedit some code thus the speaker score is not accurate. It only affects the beam-search inference and would not change the results of agent/speaker training. I am still looking into this issue but haven't located it.

So I share my code before cleaning here, which could achieve the results I reported in the paper: https://drive.google.com/file/d/1ML98KBE5MkGt987vc0KUhoOPbB6vGNPY/view?usp=sharing.
The weights of models are also available in the zip file.

I hope it helps! Thanks.

@ayshrv
Copy link
Author

ayshrv commented Oct 23, 2019

Thank you for your response.

The single-run model could be achieved by the code in this Github.

Yes, I have been able to reproduce the results of the single-run model using the codebase.

Thanks a lot for sharing the code and the snapshots. Could you also mention the values of the hyperparameters used in beam search, mainly beam size and alpha?

Thanks!

@airsplay
Copy link
Owner

airsplay commented Oct 23, 2019

In my memory, the beam size is set to 20. Following the paper `speaker and follower', alpha (the ratio between speaker score and agent score) is tuned based on the validation set, where I select the one with the highest accuracy from [0.00, 0.01, 0.02, ..., 0.99, 1.00].

@ayshrv
Copy link
Author

ayshrv commented Oct 23, 2019

Thank you.

Could you also please provide me access to the candidate features img_features/ResNet-152-candidate.tsv? And it would be great if you could specify what they are.
Are they required to achieve the results mentioned in the paper? I ask because the codebase available on GitHub does not have them. The feature_state for the GitHub version is (feature, state) but for this code, it is (feature, candidate, state).

@airsplay
Copy link
Owner

The file is available here: https://drive.google.com/file/d/1I3o_lp13zmM1ohIbPGqljSRN2GLpBjbt/view?usp=sharing.

Since the candidate locations are not always centered at the camera if only 36 views are used. I rotate the camera to the specific angel and capture new images for each candidate; the features in this file is the ImageNet features of these images.

It's historic design but I remove it from both this Github version and the code I shared. The shared code will still load this tsv file but would not use it when creating the features.

@ayshrv
Copy link
Author

ayshrv commented Oct 24, 2019

Thanks a lot.

I have tried the new codebase that you provided in the Google Drive link.

I have been able to reproduce similar numbers as reported in the paper for the single-run model using the snapshots you provided.
The command that I run was:

name=hao_agent_bt_val_torch0.4.1
flag="--attn soft --train validlistener
      --submit
      --load snap/hao_agent_bt/best_val_unseen
      --angleFeatSize 128
      --featdropout 0.4
      --subout max --maxAction 35"
mkdir -p snap/$name

CUDA_VISIBLE_DEVICES=0 unbuffer python3 r2r_src_correct/advance_train.py $flag --name $name | tee snap/$name/log

My results were (logfile):

Env name: val_unseen, nav_error: 5.5096, oracle_error: 3.5808, steps: 24.3499, lengths: 9.5445, success_rate: 0.5028, oracle_rate: 0.5756, spl: 0.4625
Env name: val_seen, nav_error: 4.1535, oracle_error: 2.6730, steps: 26.1205, lengths: 10.2490, success_rate: 0.6014, oracle_rate: 0.6876, spl: 0.5725

But, for the beam search part, I was not able to get close to 69% Success Rate for Val Unseen and 75% for Val Seen. I have tried using both Pytorch 0.4.1 and 1.1.
At max, I am getting 60% for Val Unseen and 67% for Val Seen.
The command that I run for beam search is:

name=hao_agent_bt_val_beamsearch_beam20_alpha0.64_torch0.4.1
flag="--attn soft --train validlistener
      --beam
      --candidates 20
      --alpha 0.63
      --speaker snap/hao_speaker/best_val_unseen_bleu
      --load snap/hao_agent_bt/best_val_unseen
      --angleFeatSize 128
      --featdropout 0.4
      --subout max --maxAction 35"
mkdir -p snap/$name
CUDA_VISIBLE_DEVICES=0 unbuffer python3 r2r_src_correct/advance_train.py $flag --name $name | tee snap/$name/log

And here is the logfiles for param search (using --paramSearch) and results for Pytorch 0.4.1 and
Pytorch 1.1.
Pytorch 0.4.1: Param Search Log which gave alpha=0.63 and Results Log
Pytorch 1.1: Param Search Log which gave alpha=0.67 and Results Log

I am not sure what the issue is. Did I miss passing any arguments? Please let me know if you have any ideas.

Also, for the beam size 20, the trajectory length is around 400-450 and the results reported in the paper have trajectory length close to 660-700. So, I tried running beam search with beam size 30, and I got trajectory length 700-750. Maybe, you had set the beam size to a number higher than 20?
But even after increasing the beam size, I am unable to see an increase in Success Rate.
Results for beam size 30:

Avg speaker True, Avg listener True, For the speaker weight 0.6400, the result is 0.5854
Env Name: val_unseen,nav_error: 4.2674 ,oracle_error: 0.0857 ,steps: 345.7403 ,lengths: 705.3034 ,success_rate: 0.5854 ,oracle_rate: 0.9936 ,spl: 0.0085

Avg speaker True, Avg listener True, For the speaker weight 0.6400, the result is 0.6601
Env Name: val_seen,nav_error: 3.4159 ,oracle_error: 0.0355 ,steps: 363.7542 ,lengths: 754.3993 ,success_rate: 0.6601 ,oracle_rate: 0.9980 ,spl: 0.0099

Please let me know if you have any ideas. Thanks a lot!

@airsplay
Copy link
Owner

Thank you a lot for verifying it!!!

I tested my scripts under PyTorch 1.2.0 and PyTorch 0.4.0 on my server and the logs are the same:

Avg speaker True, Avg listener True, For the speaker weight 0.8900, the result is 0.6790
Env Name: val_unseen,nav_error: 3.3182 ,oracle_error: 0.0696 ,steps: 322.1856 ,lengths: 663.1498 ,success_rate: 0.6790 ,oracle_rate: 0.9945 ,spl: 0.0105 

Avg speaker True, Avg listener True, For the speaker weight 0.8900, the result is 0.7424
Env Name: val_seen,nav_error: 2.7213 ,oracle_error: 0.0315 ,steps: 337.2997 ,lengths: 702.4499 ,success_rate: 0.7424 ,oracle_rate: 0.9961 ,spl: 0.0117

I do not remember whether the code/snapshots are the same. In case I did anything differently before, I re-upload everything to the google drive (including the source files, running scripts, and snapshots). Could you help download and retest it? Please let me know if there is any problem.
https://drive.google.com/file/d/1R0MDiu7JaQJBXsVWCq-h5IC-q0U5UI9t/view?usp=sharing

Here are some other specifications of the running environments, which might be related to the results. However, I do not think that they would cause this problem for now...

python==3.6.6
numpy==1.15.4
CUDA==9.0
cudnn==7.0.5

@airsplay
Copy link
Owner

By the way, since the feature loading costs a lot of time, adding --fast to the argument would only load the top 5000 image features. It would be easier to validate the correctness of the running scripts and code.

@ayshrv
Copy link
Author

ayshrv commented Oct 25, 2019

Thanks a lot for providing the new checkpoints. I have been able to verify the results for beam search using the new weights. My results are exactly the same as yours.

Thanks, closing the issue now.

@ayshrv ayshrv closed this as completed Oct 25, 2019
@airsplay
Copy link
Owner

Thanks for verifying it! I will look into the difference between these codes and update this Github repo to support beam-search.

@buxpeng
Copy link

buxpeng commented Aug 29, 2020

Hi,
I used the new code you provided, and the result was this, which was about 5% less. The place I modified was to delete allennlp. The torch version is 1.5. Is it the torch version or allennlp?
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants