Release of pretrained snapshots #11

ayshrv · 2019-10-21T19:25:54Z

Hi,

Could you please release the snapshot (model weights) of the best model for single run and beam search that led to the numbers reported in the paper?
Thanks!

airsplay · 2019-10-23T00:33:48Z

Hi,

The single-run model could be achieved by the code in this Github.

However, the beam-search part is still ineffective for now... It seems that I misedit some code thus the speaker score is not accurate. It only affects the beam-search inference and would not change the results of agent/speaker training. I am still looking into this issue but haven't located it.

So I share my code before cleaning here, which could achieve the results I reported in the paper: https://drive.google.com/file/d/1ML98KBE5MkGt987vc0KUhoOPbB6vGNPY/view?usp=sharing.
The weights of models are also available in the zip file.

I hope it helps! Thanks.

ayshrv · 2019-10-23T01:57:02Z

Thank you for your response.

The single-run model could be achieved by the code in this Github.

Yes, I have been able to reproduce the results of the single-run model using the codebase.

Thanks a lot for sharing the code and the snapshots. Could you also mention the values of the hyperparameters used in beam search, mainly beam size and alpha?

Thanks!

airsplay · 2019-10-23T02:11:46Z

In my memory, the beam size is set to 20. Following the paper `speaker and follower', alpha (the ratio between speaker score and agent score) is tuned based on the validation set, where I select the one with the highest accuracy from [0.00, 0.01, 0.02, ..., 0.99, 1.00].

ayshrv · 2019-10-23T23:29:57Z

Thank you.

Could you also please provide me access to the candidate features img_features/ResNet-152-candidate.tsv? And it would be great if you could specify what they are.
Are they required to achieve the results mentioned in the paper? I ask because the codebase available on GitHub does not have them. The feature_state for the GitHub version is (feature, state) but for this code, it is (feature, candidate, state).

airsplay · 2019-10-24T00:32:59Z

The file is available here: https://drive.google.com/file/d/1I3o_lp13zmM1ohIbPGqljSRN2GLpBjbt/view?usp=sharing.

Since the candidate locations are not always centered at the camera if only 36 views are used. I rotate the camera to the specific angel and capture new images for each candidate; the features in this file is the ImageNet features of these images.

It's historic design but I remove it from both this Github version and the code I shared. The shared code will still load this tsv file but would not use it when creating the features.

ayshrv · 2019-10-24T03:40:10Z

Thanks a lot.

I have tried the new codebase that you provided in the Google Drive link.

I have been able to reproduce similar numbers as reported in the paper for the single-run model using the snapshots you provided.
The command that I run was:

name=hao_agent_bt_val_torch0.4.1
flag="--attn soft --train validlistener
      --submit
      --load snap/hao_agent_bt/best_val_unseen
      --angleFeatSize 128
      --featdropout 0.4
      --subout max --maxAction 35"
mkdir -p snap/$name

CUDA_VISIBLE_DEVICES=0 unbuffer python3 r2r_src_correct/advance_train.py $flag --name $name | tee snap/$name/log

My results were (logfile):

Env name: val_unseen, nav_error: 5.5096, oracle_error: 3.5808, steps: 24.3499, lengths: 9.5445, success_rate: 0.5028, oracle_rate: 0.5756, spl: 0.4625
Env name: val_seen, nav_error: 4.1535, oracle_error: 2.6730, steps: 26.1205, lengths: 10.2490, success_rate: 0.6014, oracle_rate: 0.6876, spl: 0.5725

But, for the beam search part, I was not able to get close to 69% Success Rate for Val Unseen and 75% for Val Seen. I have tried using both Pytorch 0.4.1 and 1.1.
At max, I am getting 60% for Val Unseen and 67% for Val Seen.
The command that I run for beam search is:

name=hao_agent_bt_val_beamsearch_beam20_alpha0.64_torch0.4.1
flag="--attn soft --train validlistener
      --beam
      --candidates 20
      --alpha 0.63
      --speaker snap/hao_speaker/best_val_unseen_bleu
      --load snap/hao_agent_bt/best_val_unseen
      --angleFeatSize 128
      --featdropout 0.4
      --subout max --maxAction 35"
mkdir -p snap/$name
CUDA_VISIBLE_DEVICES=0 unbuffer python3 r2r_src_correct/advance_train.py $flag --name $name | tee snap/$name/log

And here is the logfiles for param search (using --paramSearch) and results for Pytorch 0.4.1 and
Pytorch 1.1.
Pytorch 0.4.1: Param Search Log which gave alpha=0.63 and Results Log
Pytorch 1.1: Param Search Log which gave alpha=0.67 and Results Log

I am not sure what the issue is. Did I miss passing any arguments? Please let me know if you have any ideas.

Also, for the beam size 20, the trajectory length is around 400-450 and the results reported in the paper have trajectory length close to 660-700. So, I tried running beam search with beam size 30, and I got trajectory length 700-750. Maybe, you had set the beam size to a number higher than 20?
But even after increasing the beam size, I am unable to see an increase in Success Rate.
Results for beam size 30:

Avg speaker True, Avg listener True, For the speaker weight 0.6400, the result is 0.5854
Env Name: val_unseen,nav_error: 4.2674 ,oracle_error: 0.0857 ,steps: 345.7403 ,lengths: 705.3034 ,success_rate: 0.5854 ,oracle_rate: 0.9936 ,spl: 0.0085

Avg speaker True, Avg listener True, For the speaker weight 0.6400, the result is 0.6601
Env Name: val_seen,nav_error: 3.4159 ,oracle_error: 0.0355 ,steps: 363.7542 ,lengths: 754.3993 ,success_rate: 0.6601 ,oracle_rate: 0.9980 ,spl: 0.0099

Please let me know if you have any ideas. Thanks a lot!

airsplay · 2019-10-24T20:24:50Z

Thank you a lot for verifying it!!!

I tested my scripts under PyTorch 1.2.0 and PyTorch 0.4.0 on my server and the logs are the same:

Avg speaker True, Avg listener True, For the speaker weight 0.8900, the result is 0.6790
Env Name: val_unseen,nav_error: 3.3182 ,oracle_error: 0.0696 ,steps: 322.1856 ,lengths: 663.1498 ,success_rate: 0.6790 ,oracle_rate: 0.9945 ,spl: 0.0105 

Avg speaker True, Avg listener True, For the speaker weight 0.8900, the result is 0.7424
Env Name: val_seen,nav_error: 2.7213 ,oracle_error: 0.0315 ,steps: 337.2997 ,lengths: 702.4499 ,success_rate: 0.7424 ,oracle_rate: 0.9961 ,spl: 0.0117

I do not remember whether the code/snapshots are the same. In case I did anything differently before, I re-upload everything to the google drive (including the source files, running scripts, and snapshots). Could you help download and retest it? Please let me know if there is any problem.
https://drive.google.com/file/d/1R0MDiu7JaQJBXsVWCq-h5IC-q0U5UI9t/view?usp=sharing

Here are some other specifications of the running environments, which might be related to the results. However, I do not think that they would cause this problem for now...

python==3.6.6
numpy==1.15.4
CUDA==9.0
cudnn==7.0.5

airsplay · 2019-10-24T20:27:45Z

By the way, since the feature loading costs a lot of time, adding --fast to the argument would only load the top 5000 image features. It would be easier to validate the correctness of the running scripts and code.

ayshrv · 2019-10-25T02:40:07Z

Thanks a lot for providing the new checkpoints. I have been able to verify the results for beam search using the new weights. My results are exactly the same as yours.

Thanks, closing the issue now.

airsplay · 2019-10-25T03:06:15Z

Thanks for verifying it! I will look into the difference between these codes and update this Github repo to support beam-search.

buxpeng · 2020-08-29T13:17:27Z

Hi,
I used the new code you provided, and the result was this, which was about 5% less. The place I modified was to delete allennlp. The torch version is 1.5. Is it the torch version or allennlp?

ayshrv closed this as completed Oct 25, 2019

airsplay mentioned this issue Nov 4, 2019

Some issues #12

Closed

airsplay mentioned this issue Dec 6, 2019

Retrained model does not get the same SPL on val unseen as reported in paper #14

Open

airsplay mentioned this issue Mar 6, 2020

About the back-translation performance and trigger of pre-exploration #17

Closed

convnets mentioned this issue Apr 6, 2020

Can this repo reproduce the reported result ? #18

Open

airsplay mentioned this issue Jul 3, 2020

About make_action when feedback method is 'sample' or 'argmax' #21

Closed

airsplay mentioned this issue Oct 22, 2020

Beam search validation #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release of pretrained snapshots #11

Release of pretrained snapshots #11

ayshrv commented Oct 21, 2019

airsplay commented Oct 23, 2019

ayshrv commented Oct 23, 2019 •

edited

Loading

airsplay commented Oct 23, 2019 •

edited

Loading

ayshrv commented Oct 23, 2019

airsplay commented Oct 24, 2019

ayshrv commented Oct 24, 2019

airsplay commented Oct 24, 2019

airsplay commented Oct 24, 2019

ayshrv commented Oct 25, 2019

airsplay commented Oct 25, 2019

buxpeng commented Aug 29, 2020

Release of pretrained snapshots #11

Release of pretrained snapshots #11

Comments

ayshrv commented Oct 21, 2019

airsplay commented Oct 23, 2019

ayshrv commented Oct 23, 2019 • edited Loading

airsplay commented Oct 23, 2019 • edited Loading

ayshrv commented Oct 23, 2019

airsplay commented Oct 24, 2019

ayshrv commented Oct 24, 2019

airsplay commented Oct 24, 2019

airsplay commented Oct 24, 2019

ayshrv commented Oct 25, 2019

airsplay commented Oct 25, 2019

buxpeng commented Aug 29, 2020

ayshrv commented Oct 23, 2019 •

edited

Loading

airsplay commented Oct 23, 2019 •

edited

Loading