Added global funetuning & validation loss early stopping & gemma support #50

Godofnothing · 2024-03-11T17:34:25Z

In this pull request following features are added:

full finetuning with teacher logits
validation loss is tracked instead of the rate of loss change on validation
gemma models support

galqiwi · 2024-03-13T16:50:49Z

convert_to_hf.py

+
+try:
+    import safetensors
+except:


Let's narrow down exception case with except ModuleNotFoundError:

justheuristic · 2024-03-18T13:22:41Z

main.py

-                .view(outs_batch.shape[0], -1)
-                .mean(dim=1)
-                .sqrt()
+                (outs_batch - outs_tensor[j].to(device)).float().square().view(batch_size, -1).mean(dim=-1)


Minor: the new formula looks similar to the previous one, but subtly different: it divides by squared norm instead of variance/std (without subtracting mean).
This looks like a minor change but let's double check: is this intentional?

I think this makes more sense. It is signal to noise ratio.

justheuristic · 2024-03-18T13:24:23Z

main.py

@@ -724,6 +766,12 @@ def update_outs_parallel(
        default=None,
        help="(finetuning only) Per-device and per-forward-pass batch size used to accumulate global --batch_size",
    )
+    parser.add_argument(


Do we have a pair of applex-to-apples evaluations where validation early stopping improves final model, as opposed to naive (previous) early stopping? If yes, please attach links to wandb experiments

There are no apples-to-apples comparison.
I observed that the training in generally more robust with this option to the choice of learning rate.
The improvement is typically of order ~0.02 - 0.05 compared to our best runs.
In my opinion, this option is better than relative_mse_tolerance.

This run
https://wandb.ai/rock-and-roll/PPL_LLAMA_2/runs/anhk6jv6?nw=nwuserspiridon_sun_rotator
wikitext2 ppl = 6.22
is better than what we had in the paper
https://wandb.ai/rock-and-roll/PPL_LLAMA_2/runs/whmdskj8?nw=nwuserspiridon_sun_rotator
wikitext2 ppl = 6.31

justheuristic · 2024-03-18T13:27:12Z

src/modelutils.py

            print("Loading quantized model ...")
            model = load_quantized_model(model, load_quantized)
+            # TODO works only for Llama


if still true, please make this into an assert statement (e.g. assert config["model_type"] == "llama")

Added global funetuning & validation loss early stopping & gemma support

3a79f9d

Godofnothing requested review from Vahe1994 and justheuristic March 11, 2024 17:34

Vahe1994 mentioned this pull request Mar 12, 2024

Reproduce perplexity #49

Closed

galqiwi reviewed Mar 13, 2024

View reviewed changes

justheuristic reviewed Mar 18, 2024

View reviewed changes

Godofnothing added 2 commits March 18, 2024 17:37

Added couple of small fixes

5d44346

Updated README with finetuning description

04353f2

Godofnothing merged commit 45733ef into main Mar 19, 2024
2 checks passed

Vahe1994 mentioned this pull request Mar 24, 2024

How long for the quantizing a 70b model? I had ran for 2days #36

Closed

Vahe1994 deleted the improved_finetuning branch May 29, 2024 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added global funetuning & validation loss early stopping & gemma support #50

Added global funetuning & validation loss early stopping & gemma support #50

Godofnothing commented Mar 11, 2024

galqiwi Mar 13, 2024

justheuristic Mar 18, 2024

Godofnothing Mar 18, 2024

justheuristic Mar 18, 2024

Godofnothing Mar 18, 2024

Godofnothing Mar 18, 2024 •

edited

Loading

justheuristic Mar 18, 2024

Added global funetuning & validation loss early stopping & gemma support #50

Added global funetuning & validation loss early stopping & gemma support #50

Conversation

Godofnothing commented Mar 11, 2024

galqiwi Mar 13, 2024

Choose a reason for hiding this comment

justheuristic Mar 18, 2024

Choose a reason for hiding this comment

Godofnothing Mar 18, 2024

Choose a reason for hiding this comment

justheuristic Mar 18, 2024

Choose a reason for hiding this comment

Godofnothing Mar 18, 2024

Choose a reason for hiding this comment

Godofnothing Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

justheuristic Mar 18, 2024

Choose a reason for hiding this comment

Godofnothing Mar 18, 2024 •

edited

Loading