About test code #3

aspenstarss · 2021-01-11T02:44:49Z

Hello Zhihong,
Thanks for opening your source code. It's very nice work.
I'd like to ask you a few questions about reproducing paper results.

I evaluate results by saving generation sentence to json file.
When I resume the model from your provide in checkpoint, and using command as follows:

CUDA_VISIBLE_DEVICES=4 python main.py \
--image_dir data/iu/images/ \
--ann_path data/iu/annotation.json \
--dataset_name iu_xray \
--max_seq_length 60 \
--threshold 3 \
--batch_size 16 \
--epochs 100 \
--save_dir results/reproduce_iu_xray \
--step_size 50 \
--gamma 0.1 \
--seed 9223 \
--resume data/model_iu_xray.pth

I see Checkpoint loaded. Resume training from epoch 15. And the model generates output JSON files.
I use pycocoevalcap to evaluate the results. The results are as follows:

Bleu_1	Bleu_2	Bleu_3	Bleu_4	CIDEr	ROUGE_L	METEOR
0.4334	0.2863	0.2069	0.1554	0.5432	0.3245	0.1945

It seems different somewhere.
Could you give me you test code or provide your generated results JSON file?

The text was updated successfully, but these errors were encountered:

aspenstarss · 2021-01-11T03:05:04Z

My generated sentece can be find in gist.
Thanks for you attention.

ksz-creat · 2021-01-25T12:40:47Z

Hello Zhihong,
Thanks for opening your source code. It's very nice work.
I'd like to ask you a few questions about reproducing paper results.

I evaluate results by saving generation sentence to json file.Thus, I I comment the train code(Trainer.py Line190-202) and add some code in you Trainer.py-L177 as following:
    def _output_generation(self, predictions, gts, idxs, epoch, subset):
        # for evaluating and saving
        from nltk.translate.bleu_score import sentence_bleu
        import json
        # for saving json file
        output = list()
        for idx, pre, gt in zip(idxs, predictions, gts):
            score = sentence_bleu([gt.split()], pre.split())
            output.append({'filename': idx, 'prediction': pre, 'ground_truth': gt, 'bleu4': score})

        output = sorted(output, key=lambda x: x['bleu4'], reverse=True)
        output_filename = os.path.join(self.checkpoint_dir, 'Enc2Dec-' + str(epoch) + '_' + subset + '_generated.json')
        with open(output_filename, 'w') as f:
            json.dump(output, f, ensure_ascii=False)
And using the function in Trainer.py-L232 as follows:
self._output_generation(test_res, test_gts, test_idxs, epoch, 'test')
When I resume the model from your provide in checkpoint, and using command as follows:
CUDA_VISIBLE_DEVICES=4 python main.py \
--image_dir data/iu_2image/images/ \
--ann_path data/iu_2image/annotation.json \
--dataset_name iu_xray \
--max_seq_length 60 \
--threshold 3 \
--batch_size 16 \
--epochs 100 \
--save_dir results/reproduce_iu_xray \
--step_size 50 \
--gamma 0.1 \
--seed 9223 \
--resume data/model_iu_xray.pth
I see Checkpoint loaded. Resume training from epoch 15. And the model generates output JSON files.
I use pycocoevalcap to evaluate the results. The results are as follows:

Bleu_1 Bleu_2 Bleu_3 Bleu_4 CIDEr ROUGE_L METEOR
0.4334 0.2863 0.2069 0.1554 0.5432 0.3245 0.1945
However, when I use nltk.translate.bleu_score.sentence_bleu function to evaluate, I got Bleus as follows:

Bleu_1 Bleu_2 Bleu_3 Bleu_4
0.4879 0.3194 0.2324 0.1772
It seems I made a mistake somewhere.
Could you give me you test code or provide your generated results JSON file?

Best,
Shuxin

hello
could you tell me where the mistake is
I will be appreciated
Thanks

nooralahzadeh · 2021-01-27T08:21:30Z

Hi, Thanks for sharing the code, it helps to reproduce the results.
When I run the code on the IU X-RAY dataset, I am getting the following results, it seems that you report the performance of the model based on the best result on the test set and not based on the validation set!
whereas it is also lower than the reported one on this dataset in the paper . I wonder if your hyperparameter setting is different than the one you have in the GitHub!
Best results (w.r.t BLEU_4) in validation set: val_BLEU_4 : 0.13399466273794292

    epoch          : 22
    train_loss     : 0.5585661835395372
    val_BLEU_1     : 0.3783063834532518
    val_BLEU_2     : 0.24550515630336806
    val_BLEU_3     : 0.17719979948687553
    val_ROUGE_L    : 0.3390687946347631
    test_BLEU_1    : 0.38823129074169
    test_BLEU_2    : 0.24574379214373973
    test_BLEU_3    : 0.17582539135315156
    test_BLEU_4    : 0.13336767408445888
    test_ROUGE_L   : 0.3419288795162052`

Best results (w.r.t BLEU_4) in test set: test_BLEU_4 : 0.15495773913794939

    epoch          : 15
    train_loss     : 0.9282983885361598
    val_BLEU_1     : 0.4048809446059833
    val_BLEU_2     : 0.24409295208500478
    val_BLEU_3     : 0.16905404129725854
    val_BLEU_4     : 0.1268627057419589
    val_ROUGE_L    : 0.3322505123128848
    test_BLEU_1    : 0.446254324796507
    test_BLEU_2    : 0.27826410927242545
    test_BLEU_3    : 0.20113763688850164
    test_ROUGE_L   : 0.35131416076389516`

ksz-creat · 2021-02-24T12:47:54Z

Hello Zhihong,
Thanks for opening your source code. It's very nice work.
I'd like to ask you a few questions about reproducing paper results.

I evaluate results by saving generation sentence to json file.
When I resume the model from your provide in checkpoint, and using command as follows:
CUDA_VISIBLE_DEVICES=4 python main.py \
--image_dir data/iu/images/ \
--ann_path data/iu/annotation.json \
--dataset_name iu_xray \
--max_seq_length 60 \
--threshold 3 \
--batch_size 16 \
--epochs 100 \
--save_dir results/reproduce_iu_xray \
--step_size 50 \
--gamma 0.1 \
--seed 9223 \
--resume data/model_iu_xray.pth
I see Checkpoint loaded. Resume training from epoch 15. And the model generates output JSON files.
I use pycocoevalcap to evaluate the results. The results are as follows:

Bleu_1 Bleu_2 Bleu_3 Bleu_4 CIDEr ROUGE_L METEOR
0.4334 0.2863 0.2069 0.1554 0.5432 0.3245 0.1945
It seems different somewhere.
Could you give me you test code or provide your generated results JSON file?

hello, I try your code to produce the generation sentence, however it can onle generate 14 sentence , could you tell me where the problem is , thanks very much

zhjohnchan · 2021-03-02T14:36:30Z

Thanks for your attention to our paper!

I will add more features you mentioned in the future after I finish those things with high priorities.

mlii0117 · 2021-03-04T01:00:54Z

Hi guys,
Firstly, thanks for sharing your code, and it is really a nice work.
When I used your code for training, I found the best results in epoch 1.
However, the value of the training loss is the highest.
Do you have any idea about this?
Does that mean the current metric is not suitable to this task?

Did you find the same problem @nooralahzadeh ?
Thanks

nooralahzadeh · 2021-03-04T09:10:05Z

@mlii0117 can you give more info, how did you run the code? what are the values of the parameters? If you look at the generated report it probably generates the same sentence for all the cases by having the best result in first epoch1

mlii0117 · 2021-03-04T09:24:17Z

@nooralahzadeh thanks for your reply. I have found the reason.

luantunez · 2021-06-15T00:24:29Z

Thank you for your explanations. I was wondering about the content of annotation.json
What does it have?

Markin-Wang · 2022-03-27T11:39:06Z

@nooralahzadeh thanks for your reply. I have found the reason.

Hi, I also have this problem, can you provide any reasons causing this problem and how to solve it?
Thanks.

wlufy · 2022-04-11T09:28:02Z

Hi, Thanks for sharing the code, it helps to reproduce the results. When I run the code on the IU X-RAY dataset, I am getting the following results, it seems that you report the performance of the model based on the best result on the test set and not based on the validation set! whereas it is also lower than the reported one on this dataset in the paper . I wonder if your hyperparameter setting is different than the one you have in the GitHub! Best results (w.r.t BLEU_4) in validation set: val_BLEU_4 : 0.13399466273794292
    epoch          : 22
    train_loss     : 0.5585661835395372
    val_BLEU_1     : 0.3783063834532518
    val_BLEU_2     : 0.24550515630336806
    val_BLEU_3     : 0.17719979948687553
    val_ROUGE_L    : 0.3390687946347631
    test_BLEU_1    : 0.38823129074169
    test_BLEU_2    : 0.24574379214373973
    test_BLEU_3    : 0.17582539135315156
    test_BLEU_4    : 0.13336767408445888
    test_ROUGE_L   : 0.3419288795162052`
Best results (w.r.t BLEU_4) in test set: test_BLEU_4 : 0.15495773913794939
    epoch          : 15
    train_loss     : 0.9282983885361598
    val_BLEU_1     : 0.4048809446059833
    val_BLEU_2     : 0.24409295208500478
    val_BLEU_3     : 0.16905404129725854
    val_BLEU_4     : 0.1268627057419589
    val_ROUGE_L    : 0.3322505123128848
    test_BLEU_1    : 0.446254324796507
    test_BLEU_2    : 0.27826410927242545
    test_BLEU_3    : 0.20113763688850164
    test_ROUGE_L   : 0.35131416076389516`

Sorry to bother you.
I also get the similar result. And I find that these metrics are very high in the first few epochs like @mlii0117 . Can you give some advise on this problem and how to solve it?
Thanks!

aspenstarss changed the title ~~About reproducing paper results~~ About test code Jan 11, 2021

aspenstarss closed this as completed Jun 15, 2021

Markin-Wang mentioned this issue Aug 28, 2023

Why does the best model and results reproduced on the iu-xray dataset appear in the 3rd epoch? Markin-Wang/XProNet#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About test code #3

About test code #3

aspenstarss commented Jan 11, 2021 •

edited

aspenstarss commented Jan 11, 2021 •

edited

ksz-creat commented Jan 25, 2021

nooralahzadeh commented Jan 27, 2021 •

edited

ksz-creat commented Feb 24, 2021

zhjohnchan commented Mar 2, 2021

mlii0117 commented Mar 4, 2021

nooralahzadeh commented Mar 4, 2021

mlii0117 commented Mar 4, 2021

luantunez commented Jun 15, 2021

Markin-Wang commented Mar 27, 2022

wlufy commented Apr 11, 2022

About test code #3

About test code #3

Comments

aspenstarss commented Jan 11, 2021 • edited

aspenstarss commented Jan 11, 2021 • edited

ksz-creat commented Jan 25, 2021

nooralahzadeh commented Jan 27, 2021 • edited

ksz-creat commented Feb 24, 2021

zhjohnchan commented Mar 2, 2021

mlii0117 commented Mar 4, 2021

nooralahzadeh commented Mar 4, 2021

mlii0117 commented Mar 4, 2021

luantunez commented Jun 15, 2021

Markin-Wang commented Mar 27, 2022

wlufy commented Apr 11, 2022

aspenstarss commented Jan 11, 2021 •

edited

aspenstarss commented Jan 11, 2021 •

edited

nooralahzadeh commented Jan 27, 2021 •

edited