Skip to content

Conversation

@RissyRan
Copy link
Collaborator

@RissyRan RissyRan commented Aug 26, 2025

Description

Onboard GPT OSS model:

  • Add two GPT OSS model configs
  • Add necessary configs in base.yml file (e.g., mlp_activations_limit, mlp_bias, attention_bias, attention_sink)
  • Add sink feature for dot_product and enable for flash_attention
  • Add decoder layers for scan and unscan features

Next step: we will work on ckpt conversion and verify logits

Tests

  • Unit test check_gpt_vs_reference.py against reference implementation for attention & mlp blocks
  • End-to-end functional tests for scan and unscan on gpt-oss-20b model

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@RissyRan RissyRan force-pushed the gpt_oss_onboard branch 3 times, most recently from 8900a7b to 34457b6 Compare August 26, 2025 05:54
@RissyRan RissyRan marked this pull request as ready for review August 26, 2025 05:56
@RissyRan RissyRan marked this pull request as draft August 26, 2025 06:02
@RissyRan RissyRan marked this pull request as ready for review August 26, 2025 06:20
@RissyRan RissyRan force-pushed the gpt_oss_onboard branch 2 times, most recently from f271f48 to a5ef467 Compare August 26, 2025 19:56
Copy link
Collaborator

@shuningjin shuningjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great work!

Copy link
Collaborator

@gagika gagika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments, if you agree, feel free to address them in a separate PR.

@RissyRan RissyRan force-pushed the gpt_oss_onboard branch 2 times, most recently from 78a574a to c710f50 Compare August 28, 2025 18:50
Copy link
Collaborator

@aireenmei aireenmei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ran!

Copy link
Collaborator

@gagika gagika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@copybara-service copybara-service bot merged commit 87edd11 into main Aug 29, 2025
21 of 22 checks passed
@copybara-service copybara-service bot deleted the gpt_oss_onboard branch August 29, 2025 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants