Infinite loss when fine-tuning #50

Spongeorge · 2023-04-29T00:11:58Z

I'm trying to fine-tune the AMR3.0 large SBART checkpoint on another dataset, but during training I get the following warnings:

2023-04-29 00:02:05 | WARNING | tensorboardX.x2num | NaN or Inf found in input tensor.
2023-04-29 00:02:05 | WARNING | tensorboardX.x2num | NaN or Inf found in input tensor.
2023-04-29 00:02:05 | WARNING | tensorboardX.x2num | NaN or Inf found in input tensor.
2023-04-29 00:02:05 | WARNING | tensorboardX.x2num | NaN or Inf found in input tensor.
2023-04-29 00:02:05 | INFO | train | {"epoch": 1, "train_loss": "inf", "train_nll_loss": "inf", "train_loss_seq": "inf", "train_nll_loss_seq": "inf", "train_loss_pos": "0.710562", "train_nll_loss_pos": "0.710562", "train_wps": "687.9", "train_ups": "0.51", "train_wpb": "1354.7", "train_bsz": "55.2", "train_num_updates": "71", "train_lr": "1.87323e-06", "train_gnorm": "17.868", "train_loss_scale": "8", "train_train_wall": "45", "train_wall": "158"}

In my config I set the fairseq-preprocess arguments as:

FAIRSEQ_PREPROCESS_FINETUNE_ARGS="--srcdict /content/DATA/AMR3.0/models/amr3.0-structured-bart-large-neur-al/seed42/dict.en.txt --tgtdict /content/DATA/AMR3.0/models/amr3.0-structured-bart-large-neur-al/seed42/dict.actions_nopos.txt"

and train args as:

FAIRSEQ_TRAIN_FINETUNE_ARGS="--finetune-from-model /content/DATA/AMR3.0/models/amr3.0-structured-bart-large-neur-al/seed42/checkpoint_wiki.smatch_top5-avg.pt --memory-efficient-fp16 --batch-size 16 --max-tokens 512 --patience 10"

Any ideas as to what I'm doing wrong?
Thanks in advance.

The text was updated successfully, but these errors were encountered:

Spongeorge · 2023-05-02T15:43:47Z

Output from tests/correctly_installed.sh

pytorch 1.10.1+cu102
cuda 10.2
Apex not installed
smatch installed
pytorch-scatter installed
fairseq works
[OK] correctly installed

I also tried with the wiki25 dataset downloaded in tests/minimal_test.sh and got the same issue, infinite loss in both training and validation, so I don't think its an issue with my input. During tests/minimal_test.sh the loss isn't infinite, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite loss when fine-tuning #50

Infinite loss when fine-tuning #50

Spongeorge commented Apr 29, 2023

Spongeorge commented May 2, 2023 •

edited

Infinite loss when fine-tuning #50

Infinite loss when fine-tuning #50

Comments

Spongeorge commented Apr 29, 2023

Spongeorge commented May 2, 2023 • edited

Spongeorge commented May 2, 2023 •

edited