[Evaluation]Evaluation with Otter pre-trained model on COCO dataset return 0 CIDER score #287

Thedatababbler · 2023-11-01T03:32:31Z

Hi,
I used the evaluate.py file in the pipeline to evaluate my models. To make sure this evaluation can run on my single GPU node, I made some minor changes on the code for initialization of multi-gpu environments and remained other thing the same. The modified shell script is like the following:
`
#!/bin/bash

export CUDA_VISIBLE_DEVICES="0"
export MASTER_ADDR="localhost"
export MASTER_PORT="29501"
export WORLD_SIZE=4
export RANK=0

cd /path/to/Otter
realpath .
python -m pipeline.eval.evaluate
--model=otter
--results_file="OTTER_mpt1b_origin.json"
--model_path="luodian/OTTER-MPT1B-RPJama-Init" \

--precision="bf16" \
--batch_size=1 \
--eval_coco \
--device="cuda" \
--coco_train_image_dir_path "/path/to/images/train2014" \
--coco_val_image_dir_path "/path/to/coco/images/val2014" \
--coco_karpathy_json_path "/path/to/dataset_coco.json" \
--coco_annotations_json_path "/path/to/captions_val2014.json" \

`
Above shell script is used for running evaluation on COCO dataset with the pre-trained Otter 1b model. However, this evaluation result returns a 0 CIDEr store for all few-shots tests.

Magically, after I include the below argument in the run script, the evaluation returns a normal number for all tests.
`--checkpoint_path="path/to/checkpoint/OTTER-MPT1B-RPJama-Init/final_weights.pt'
Where the pt file here is a model fine-tuned by myself.
It seems like the model didn't properly loaded the pre-trained weights? That's why when my personalized ckpt file was loaded, it can return the results.

Could you help loacte the problem, which part of the code could possibly be blamed of this bug? Thank you!

The text was updated successfully, but these errors were encountered:

Luodian · 2023-11-01T04:36:52Z

hi the MPT1B-init is only for init to train Otter-MPT-1B model. It’s not being trained and directly migrated from OpenFlamingo-1B, but with added special tokens. So evaluating this weight may not be suitable. could you try MPT7B version? Best Regards, Bo

…

On 1 Nov 2023 at 11:32 +0800, Thedatababbler ***@***.***>, wrote: Hi, I used the evaluate.py file in the pipeline to evaluate my models. To make sure this evaluation can run on my single GPU node, I made some minor changes on the code for initialization of multi-gpu environments and remained other thing the same. The modified shell script is like the following: ` #!/bin/bash export CUDA_VISIBLE_DEVICES="0" export MASTER_ADDR="localhost" export MASTER_PORT="29501" export WORLD_SIZE=4 export RANK=0 cd /path/to/Otter realpath . python -m pipeline.eval.evaluate --model=otter --results_file="OTTER_mpt1b_origin.json" --model_path="luodian/OTTER-MPT1B-RPJama-Init" \ --precision="bf16" \ --batch_size=1 \ --eval_coco \ --device="cuda" \ --coco_train_image_dir_path "/path/to/images/train2014" \ --coco_val_image_dir_path "/path/to/coco/images/val2014" \ --coco_karpathy_json_path "/path/to/dataset_coco.json" \ --coco_annotations_json_path "/path/to/captions_val2014.json" \ ` Above shell script is used for running evaluation on COCO dataset with the pre-trained Otter 1b model. However, this evaluation result returns a 0 CIDEr store for all few-shots tests. Magically, after I include the below argument in the run script, the evaluation returns a normal number for all tests. `--checkpoint_path="path/to/checkpoint/OTTER-MPT1B-RPJama-Init/final_weights.pt' Where the pt file here is a model fine-tuned by myself. It seems like the model didn't properly loaded the pre-trained weights? That's why when my personalized ckpt file was loaded, it can return the results. Could you help loacte the problem, which part of the code could possibly be blamed of this bug? Thank you! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Thedatababbler · 2023-11-02T03:27:01Z

I tried to use "luodian/OTTER-Image-MPT7B" to replace the "luodian/OTTER_MPT1B_RPJama-Init" for the --model_path argument and the evaluation again. The CIDEr score is still 0.0 for all shots. It's really weird. What did I do wrong?

Luodian · 2023-11-02T03:28:55Z

@pufanyi Could Fanyi take a look at this issue, I thought we did the OTTER-MPT7B evaluation and report good numbers on COCO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evaluation]Evaluation with Otter pre-trained model on COCO dataset return 0 CIDER score #287

[Evaluation]Evaluation with Otter pre-trained model on COCO dataset return 0 CIDER score #287

Thedatababbler commented Nov 1, 2023

Luodian commented Nov 1, 2023 via email

Thedatababbler commented Nov 2, 2023

Luodian commented Nov 2, 2023

[Evaluation]Evaluation with Otter pre-trained model on COCO dataset return 0 CIDER score #287

[Evaluation]Evaluation with Otter pre-trained model on COCO dataset return 0 CIDER score #287

Comments

Thedatababbler commented Nov 1, 2023

Luodian commented Nov 1, 2023 via email

Thedatababbler commented Nov 2, 2023

Luodian commented Nov 2, 2023