Fix convergence issues and switch to LLaMA in the SST-2 example #343

mryab · 2023-07-12T12:13:25Z

This PR switches the model to LLaMA in the SST-2 example notebook, as well as fixes NaNs appearing during training by using proper mixed precision with loss scaling. I've removed deep p-tuning for now for the sake of simplicity, will restore and fix the PersonaChat notebook once I fix it as well.

The example run is here: https://wandb.ai/mryab/bloom-sst-2/runs/3e0suayd?workspace=user-mryab. As we can see, the loss is still a bit unstable, bug at least we have no outright divergence:

When merged, this should fix #327, #283 and #241.

justheuristic

LGTM

poedator · 2023-07-21T17:16:14Z

The example run is here: https://wandb.ai/mryab/bloom-sst-2/runs/3e0suayd?workspace=user-mryab

@mryab , this link is not valid. 404

mryab · 2023-07-21T17:23:02Z

Just made it public, thanks for noticing!

mryab requested a review from borzunov July 12, 2023 12:13

This was linked to issues Jul 12, 2023

The prompt tuning example (prompt-tuning-sst2) don't work #241

Open

Exploding gradients/weights/logits in a fine-tuning notebook #283

Open

justheuristic approved these changes Jul 12, 2023

View reviewed changes

Fix convergence issues and switch to LLaMA in the SST-2 example

fc4a06e

mryab force-pushed the fix_sst2_example branch from edf84ad to fc4a06e Compare July 12, 2023 12:25

Fix a comment

8e802d4

mryab merged commit 13f4e3a into main Jul 12, 2023
6 of 7 checks passed

mryab deleted the fix_sst2_example branch July 12, 2023 12:50

This was referenced Jul 12, 2023

Prompt Tuning met NaN error #327

Open

Exploding gradients/weights/logits in a fine-tuning notebook #283

Open

The prompt tuning example (prompt-tuning-sst2) don't work #241

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix convergence issues and switch to LLaMA in the SST-2 example #343

Fix convergence issues and switch to LLaMA in the SST-2 example #343

mryab commented Jul 12, 2023 •

edited

justheuristic left a comment

poedator commented Jul 21, 2023 •

edited

mryab commented Jul 21, 2023

Fix convergence issues and switch to LLaMA in the SST-2 example #343

Fix convergence issues and switch to LLaMA in the SST-2 example #343

Conversation

mryab commented Jul 12, 2023 • edited

justheuristic left a comment

Choose a reason for hiding this comment

poedator commented Jul 21, 2023 • edited

mryab commented Jul 21, 2023

mryab commented Jul 12, 2023 •

edited

poedator commented Jul 21, 2023 •

edited