Error while running evaluation for open domain VQA #221

RishabhMaheshwary · 2022-09-01T12:44:04Z

Hi,
I am evaluating the pre-trained ofa_large.pt for open domain VQA. The evaluation runs fine for certain input samples but then fails with the following error:

2022-09-01 04:52:07 | INFO | tasks.ofa_task | source dictionary: 59457 types
2022-09-01 04:52:07 | INFO | tasks.ofa_task | target dictionary: 59457 types
local datafile ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 begin to initialize row_count and line_idx-to-offset mapping
local datafile ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 finished initializing row_count and line_idx-to-offset mapping
file ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 row count 4999 total row count 4999
/private/home/rbh/anaconda3/envs/ofa/lib/python3.7/site-packages/torchvision/transforms/transforms.py:258: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
  "Argument interpolation should be of type InterpolationMode instead of int. "
/private/home/rbh/OFA/data/mm_data/vqa_gen_dataset.py:64: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  decoder_prompts = np.array([s['decoder_prompt'].tolist() for s in samples])
2022-09-01 04:53:16 | INFO | fairseq.logging.progress_bar | :     11 / 625 sentences=8
2022-09-01 04:53:21 | INFO | fairseq.logging.progress_bar | :     21 / 625 sentences=8
Traceback (most recent call last):
  File "../../evaluate.py", line 156, in <module>
    cli_main()
  File "../../evaluate.py", line 151, in cli_main
    cfg, main, ema_eval=args.ema_eval, beam_search_vqa_eval=args.beam_search_vqa_eval, zero_shot=args.zero_shot
  File "/private/home/rbh/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
    distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
  File "/private/home/rbh/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
    main(cfg, **kwargs)
  File "../../evaluate.py", line 134, in main
    result, scores = eval_step(task, generator, models, sample, **kwargs)
  File "/private/home/rbh/OFA/utils/eval_utils.py", line 306, in eval_step
    return eval_vqa_gen(task, generator, models, sample, **kwargs)
  File "/private/home/rbh/OFA/utils/eval_utils.py", line 47, in eval_vqa_gen
    hypos = task.inference_step(generator, models, sample, prefix_tokens=sample['prefix_tokens'])
  File "/private/home/rbh/OFA/fairseq/fairseq/tasks/fairseq_task.py", line 518, in inference_step
    models, sample, prefix_tokens=prefix_tokens, constraints=constraints
  File "/private/home/rbh/anaconda3/envs/ofa/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/private/home/rbh/OFA/models/sequence_generator.py", line 207, in generate
    return self._generate(models, sample, **kwargs)
  File "/private/home/rbh/OFA/models/sequence_generator.py", line 379, in _generate
    step, lprobs, scores, tokens, prefix_tokens, beam_size
  File "/private/home/rbh/OFA/models/sequence_generator.py", line 624, in _prefix_tokens
    assert (first_beam == target_prefix).all()
AssertionError

I am using the following evaluation command

#!/usr/bin/env bash

# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=8082

user_dir=../../ofa_module
bpe_dir=../../utils/BPE

# val or test
split=$1

data=../../dataset/vqa_data/textvqa_val1.tsv
ans2label_file=../../dataset/vqa_data/trainval_ans2label.pkl
path=../../pretrained_checkpoints/ofa_large.pt
result_path=../../results/vqa_${split}_beam
selected_cols=0,5,2,3,4

val_inference_type=beamsearch
python3 -m torch.distributed.launch ../../evaluate.py \
    ${data} \
    --path=${path} \
    --bpe-dir=${bpe_dir} \
    --prompt-type=src \
    --selected-cols=${selected_cols} \
    --user-dir=${user_dir} \
    --task=vqa_gen \
    --batch-size=8 \
    --log-format=simple --log-interval=10 \
    --seed=7 \
    --gen-subset=${split} \
    --results-path=${result_path} \
    --fp16 \
    --beam-search-vqa-eval \
    --zero-shot \
    --unconstrained-training \
    --beam=5 \
    --unnormalized \
    --temperature=1.0 \
    --val-inference-type=${val_inference_type}
    --num-workers=0 \
    --model-overrides="{\"data\":\"${data}\",\"bpe_dir\":\"${bpe_dir}\",\"selected_cols\":\"${selected_cols}\",\"ans2label_file\":\"${ans2label_file}\"}"

The text was updated successfully, but these errors were encountered:

yangapku · 2022-09-02T07:48:52Z

Hi, the script you used is only able to evaluate the finetuned checkpoint. We have updated the code which adds a script run_scripts/vqa/evaluate_vqa_zeroshot.sh to perform zero-shot open-domain VQA inference for pretrained OFA checkpoint. Please checkout the latest code and have a try!

RishabhMaheshwary · 2022-09-06T03:03:47Z

Hi @yangapku, Thanks for the reply. I used the run_scripts/vqa/evaluate_vqa_zeroshot.sh to get predictions on vqav2 validation set and all the answers predicted by the model are either "yes" or "no". The accuracy is 0.3405. I also ran evaluation on a custom dataset their also all predictions are "yes" or "no". Have you also observed similar results using the above script?

RishabhMaheshwary · 2022-09-08T14:38:33Z

Hi @yangapku, I was able to get the accuracy of 0.6552 on VQAv2 validation set using the ofa_large_384.pt by adding here the line below.

if kwargs["zero_shot"]:
    generator.constraint_trie = None

Also is #124 complete? Maybe the above line can fix the issue in the validation of fine-tuned open domain VQA model.

yangapku · 2022-09-08T15:01:43Z

Hi @yangapku, Thanks for the reply. I used the run_scripts/vqa/evaluate_vqa_zeroshot.sh to get predictions on vqav2 validation set and all the answers predicted by the model are either "yes" or "no". The accuracy is 0.3405. I also ran evaluation on a custom dataset their also all predictions are "yes" or "no". Have you also observed similar results using the above script?

No. I have tested this script using the released pretrained ofa_large.pt (pretrained in 480*480 resolution), which can get 72.98 accuracy score on our VQAv2 val split and generate correct answers in most cases, which is in expectation. Please make sure you use the latest code and run the script correctly.

yangapku · 2022-09-08T15:09:37Z

Hi @yangapku, I was able to get the accuracy of 0.6552 on VQAv2 validation set using the ofa_large_384.pt by adding here the line below.
if kwargs["zero_shot"]:
    generator.constraint_trie = None
Also is #124 complete? Maybe the above line can fix the issue in the validation of fine-tuned open domain VQA model.

In fact, the codes related with zero-shot inference on pretrained OFA checkpoint (like ofa_large_384.pt) are in zero_shot_utils.py rather than eval_utils.py which is for finetuned OFA checkpoint. So I'm somewhat confused why your edit makes difference in the zero-shot evaluation setting. PR #124 is still under fixing. Thanks for your comment and I will have a try.

RishabhMaheshwary · 2022-09-08T17:04:38Z

Thanks for the reply! Yeah, with the latest pull i am able to get 73.05 VQA 2 val accuracy with a beam size of 20 without adding the above line.

JustinLin610 assigned yangapku Sep 2, 2022

yangapku closed this as completed Sep 9, 2022

jun297 mentioned this issue Feb 24, 2023

KeyError: 'ema' during inference on VQA #360

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while running evaluation for open domain VQA #221

Error while running evaluation for open domain VQA #221

RishabhMaheshwary commented Sep 1, 2022

yangapku commented Sep 2, 2022

RishabhMaheshwary commented Sep 6, 2022

RishabhMaheshwary commented Sep 8, 2022

yangapku commented Sep 8, 2022

yangapku commented Sep 8, 2022 •

edited

Loading

RishabhMaheshwary commented Sep 8, 2022 •

edited

Loading

Error while running evaluation for open domain VQA #221

Error while running evaluation for open domain VQA #221

Comments

RishabhMaheshwary commented Sep 1, 2022

yangapku commented Sep 2, 2022

RishabhMaheshwary commented Sep 6, 2022

RishabhMaheshwary commented Sep 8, 2022

yangapku commented Sep 8, 2022

yangapku commented Sep 8, 2022 • edited Loading

RishabhMaheshwary commented Sep 8, 2022 • edited Loading

yangapku commented Sep 8, 2022 •

edited

Loading

RishabhMaheshwary commented Sep 8, 2022 •

edited

Loading