add feature cosine similarity loss by entrpn · Pull Request #3218 · AI-Hypercomputer/maxtext

entrpn · 2026-02-23T19:41:16Z

Description

Adds optional cosine similarity loss between attention outputs of various layers of teacher/student.

Tests

Updated train_distill_test unit test.
Ran train_distill.py on llama3.1-8b both with cosine loss enabled/disabled.
Updated maxtext_utils tests.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-02-23T21:52:06Z

Codecov Report

❌ Patch coverage is 55.10204% with 22 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ners/post_train/distillation/distillation_utils.py	72.00%	3 Missing and 4 partials ⚠️
.../trainers/post_train/distillation/train_distill.py	0.00%	6 Missing ⚠️
src/maxtext/models/models.py	0.00%	5 Missing ⚠️
src/maxtext/layers/attentions.py	0.00%	1 Missing and 1 partial ⚠️
src/maxtext/utils/maxtext_utils.py	81.81%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

vlad-karp

A couple of comments

vlad-karp

LGTM overall

gagika · 2026-02-26T03:23:51Z

+  match nested_key:
+    case "out_projection_activations":
+      if nested_key in model.decoder.layers["self_attention"]:
+        intermediate_value = model.decoder.layers["self_attention"][nested_key].get_value()[-1]


why is it only returning the last element e.g. [-1]?
what is the shape of model.decoder.layers["self_attention"][nested_key] ? wondering what is getting dropped?

this is because sow appends values in a tuple, so its just a way to retrieve it.

in a follow up PR could you comment what's inside the tuple and what is being retrieved here?

gagika · 2026-02-26T03:31:36Z

    """Computes Eval Loss and returns empty aux dict (required for consistency)."""
    # Parent logic for task loss
    # We re-implement simple CE here to ensure float32 casting
    s_logits = student_output.astype(jnp.float32)


should you do s_logits = student_output[0].astype(jnp.float32) similar to compute_loss now that model_forward_fn returns a tuple? also we can have a TODO to add other metrics in eval.

looks like our tests are not testing this function?

yes this will cause eval to fail. I'll fix this and add unit tests.

gagika

thanks

entrpn requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners February 23, 2026 19:41

entrpn force-pushed the cos_loss branch from 32c83bb to 4b52371 Compare February 23, 2026 20:11

entrpn force-pushed the cos_loss branch from 4b52371 to bbe0c90 Compare February 24, 2026 23:22

entrpn closed this Feb 24, 2026

entrpn force-pushed the cos_loss branch from 4fd25d2 to 3737eb3 Compare February 24, 2026 23:41

entrpn reopened this Feb 24, 2026

entrpn requested review from Obliviour, SujeethJinesh, jacoguzo, mitalisi, notabee, parambole and shauryagup as code owners February 24, 2026 23:44

entrpn requested review from Lumosis, gpolovets1, jrplatin, mailvijayasingh, patemotter, shuningjin and xuefgu as code owners February 24, 2026 23:44

entrpn force-pushed the cos_loss branch from 80c3d2b to cda485d Compare February 24, 2026 23:51

gagika reviewed Feb 24, 2026

View reviewed changes

entrpn force-pushed the cos_loss branch 3 times, most recently from fcb8fe8 to 07ff5de Compare February 25, 2026 01:08

gagika reviewed Feb 25, 2026

View reviewed changes

Comment thread src/maxtext/configs/types.py Outdated

gagika reviewed Feb 25, 2026

View reviewed changes

Comment thread src/maxtext/trainers/post_train/distillation/distillation_utils.py

gagika reviewed Feb 25, 2026

View reviewed changes

Comment thread src/maxtext/trainers/post_train/distillation/distillation_utils.py Outdated

entrpn force-pushed the cos_loss branch 2 times, most recently from 5e27282 to 1f3cd42 Compare February 25, 2026 22:09

add feature cosine similarity loss

5d53a91

entrpn force-pushed the cos_loss branch from 1f3cd42 to 5d53a91 Compare February 25, 2026 22:30

JamesDeng42 self-requested a review February 26, 2026 00:35

JamesDeng42 approved these changes Feb 26, 2026

View reviewed changes

vlad-karp reviewed Feb 26, 2026

View reviewed changes

Comment thread src/maxtext/trainers/post_train/distillation/distillation_utils.py

Comment thread src/maxtext/trainers/post_train/distillation/distillation_utils.py

vlad-karp approved these changes Feb 26, 2026

View reviewed changes

entrpn added the pull ready label Feb 26, 2026

gagika reviewed Feb 26, 2026

View reviewed changes

gagika approved these changes Feb 26, 2026

View reviewed changes

copybara-service Bot merged commit 5a4a9c3 into main Feb 26, 2026
163 checks passed

copybara-service Bot deleted the cos_loss branch February 26, 2026 07:16

entrpn mentioned this pull request Feb 26, 2026

add distillation forward output class to distillation pipeline. #3260

Merged

4 tasks

Conversation

entrpn commented Feb 23, 2026

Description

Tests

Checklist

Uh oh!

codecov Bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vlad-karp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vlad-karp left a comment

Choose a reason for hiding this comment

Uh oh!

gagika Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

entrpn Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gagika Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gagika Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

entrpn Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gagika left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented Feb 23, 2026 •

edited

Loading