Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running evaluation for open domain VQA #221

Closed
RishabhMaheshwary opened this issue Sep 1, 2022 · 6 comments
Closed

Error while running evaluation for open domain VQA #221

RishabhMaheshwary opened this issue Sep 1, 2022 · 6 comments
Assignees

Comments

@RishabhMaheshwary
Copy link

Hi,
I am evaluating the pre-trained ofa_large.pt for open domain VQA. The evaluation runs fine for certain input samples but then fails with the following error:

2022-09-01 04:52:07 | INFO | tasks.ofa_task | source dictionary: 59457 types
2022-09-01 04:52:07 | INFO | tasks.ofa_task | target dictionary: 59457 types
local datafile ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 begin to initialize row_count and line_idx-to-offset mapping
local datafile ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 finished initializing row_count and line_idx-to-offset mapping
file ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 row count 4999 total row count 4999
/private/home/rbh/anaconda3/envs/ofa/lib/python3.7/site-packages/torchvision/transforms/transforms.py:258: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
  "Argument interpolation should be of type InterpolationMode instead of int. "
/private/home/rbh/OFA/data/mm_data/vqa_gen_dataset.py:64: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  decoder_prompts = np.array([s['decoder_prompt'].tolist() for s in samples])
2022-09-01 04:53:16 | INFO | fairseq.logging.progress_bar | :     11 / 625 sentences=8
2022-09-01 04:53:21 | INFO | fairseq.logging.progress_bar | :     21 / 625 sentences=8
Traceback (most recent call last):
  File "../../evaluate.py", line 156, in <module>
    cli_main()
  File "../../evaluate.py", line 151, in cli_main
    cfg, main, ema_eval=args.ema_eval, beam_search_vqa_eval=args.beam_search_vqa_eval, zero_shot=args.zero_shot
  File "/private/home/rbh/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
    distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
  File "/private/home/rbh/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
    main(cfg, **kwargs)
  File "../../evaluate.py", line 134, in main
    result, scores = eval_step(task, generator, models, sample, **kwargs)
  File "/private/home/rbh/OFA/utils/eval_utils.py", line 306, in eval_step
    return eval_vqa_gen(task, generator, models, sample, **kwargs)
  File "/private/home/rbh/OFA/utils/eval_utils.py", line 47, in eval_vqa_gen
    hypos = task.inference_step(generator, models, sample, prefix_tokens=sample['prefix_tokens'])
  File "/private/home/rbh/OFA/fairseq/fairseq/tasks/fairseq_task.py", line 518, in inference_step
    models, sample, prefix_tokens=prefix_tokens, constraints=constraints
  File "/private/home/rbh/anaconda3/envs/ofa/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/private/home/rbh/OFA/models/sequence_generator.py", line 207, in generate
    return self._generate(models, sample, **kwargs)
  File "/private/home/rbh/OFA/models/sequence_generator.py", line 379, in _generate
    step, lprobs, scores, tokens, prefix_tokens, beam_size
  File "/private/home/rbh/OFA/models/sequence_generator.py", line 624, in _prefix_tokens
    assert (first_beam == target_prefix).all()
AssertionError

I am using the following evaluation command

#!/usr/bin/env bash

# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=8082

user_dir=../../ofa_module
bpe_dir=../../utils/BPE

# val or test
split=$1

data=../../dataset/vqa_data/textvqa_val1.tsv
ans2label_file=../../dataset/vqa_data/trainval_ans2label.pkl
path=../../pretrained_checkpoints/ofa_large.pt
result_path=../../results/vqa_${split}_beam
selected_cols=0,5,2,3,4

val_inference_type=beamsearch
python3 -m torch.distributed.launch ../../evaluate.py \
    ${data} \
    --path=${path} \
    --bpe-dir=${bpe_dir} \
    --prompt-type=src \
    --selected-cols=${selected_cols} \
    --user-dir=${user_dir} \
    --task=vqa_gen \
    --batch-size=8 \
    --log-format=simple --log-interval=10 \
    --seed=7 \
    --gen-subset=${split} \
    --results-path=${result_path} \
    --fp16 \
    --beam-search-vqa-eval \
    --zero-shot \
    --unconstrained-training \
    --beam=5 \
    --unnormalized \
    --temperature=1.0 \
    --val-inference-type=${val_inference_type}
    --num-workers=0 \
    --model-overrides="{\"data\":\"${data}\",\"bpe_dir\":\"${bpe_dir}\",\"selected_cols\":\"${selected_cols}\",\"ans2label_file\":\"${ans2label_file}\"}"
@yangapku
Copy link
Member

yangapku commented Sep 2, 2022

Hi, the script you used is only able to evaluate the finetuned checkpoint. We have updated the code which adds a script run_scripts/vqa/evaluate_vqa_zeroshot.sh to perform zero-shot open-domain VQA inference for pretrained OFA checkpoint. Please checkout the latest code and have a try!

@RishabhMaheshwary
Copy link
Author

Hi @yangapku, Thanks for the reply. I used the run_scripts/vqa/evaluate_vqa_zeroshot.sh to get predictions on vqav2 validation set and all the answers predicted by the model are either "yes" or "no". The accuracy is 0.3405. I also ran evaluation on a custom dataset their also all predictions are "yes" or "no". Have you also observed similar results using the above script?

@RishabhMaheshwary
Copy link
Author

Hi @yangapku, I was able to get the accuracy of 0.6552 on VQAv2 validation set using the ofa_large_384.pt by adding here the line below.

if kwargs["zero_shot"]:
    generator.constraint_trie = None

Also is #124 complete? Maybe the above line can fix the issue in the validation of fine-tuned open domain VQA model.

@yangapku
Copy link
Member

yangapku commented Sep 8, 2022

Hi @yangapku, Thanks for the reply. I used the run_scripts/vqa/evaluate_vqa_zeroshot.sh to get predictions on vqav2 validation set and all the answers predicted by the model are either "yes" or "no". The accuracy is 0.3405. I also ran evaluation on a custom dataset their also all predictions are "yes" or "no". Have you also observed similar results using the above script?

No. I have tested this script using the released pretrained ofa_large.pt (pretrained in 480*480 resolution), which can get 72.98 accuracy score on our VQAv2 val split and generate correct answers in most cases, which is in expectation. Please make sure you use the latest code and run the script correctly.

@yangapku
Copy link
Member

yangapku commented Sep 8, 2022

Hi @yangapku, I was able to get the accuracy of 0.6552 on VQAv2 validation set using the ofa_large_384.pt by adding here the line below.

if kwargs["zero_shot"]:
    generator.constraint_trie = None

Also is #124 complete? Maybe the above line can fix the issue in the validation of fine-tuned open domain VQA model.

In fact, the codes related with zero-shot inference on pretrained OFA checkpoint (like ofa_large_384.pt) are in zero_shot_utils.py rather than eval_utils.py which is for finetuned OFA checkpoint. So I'm somewhat confused why your edit makes difference in the zero-shot evaluation setting. PR #124 is still under fixing. Thanks for your comment and I will have a try.

@RishabhMaheshwary
Copy link
Author

RishabhMaheshwary commented Sep 8, 2022

Thanks for the reply! Yeah, with the latest pull i am able to get 73.05 VQA 2 val accuracy with a beam size of 20 without adding the above line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants