Skip to content

support neo++ fp8#993

Merged
llmc-reviewer merged 1 commit intomainfrom
neo
Apr 9, 2026
Merged

support neo++ fp8#993
llmc-reviewer merged 1 commit intomainfrom
neo

Conversation

@helloyongyang
Copy link
Copy Markdown
Contributor

No description provided.

@llmc-reviewer llmc-reviewer merged commit 73147d4 into main Apr 9, 2026
2 checks passed
@llmc-reviewer llmc-reviewer deleted the neo branch April 9, 2026 07:04
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces FP8 quantization support for the NeoPP model, including new configuration files and example scripts for 1k and 2k resolutions. It also updates weight registration in the transformer architecture to explicitly handle null biases and use default weight types for specific heads. Feedback is provided regarding several inconsistencies in the new 2k resolution example scripts where the index_offset_cond parameter does not match the provided KV cache filenames, which would likely cause incorrect RoPE indexing during inference.

"/data/nvme1/yongyang/FL/neo_9b_new/vlm_tensor_44000_ema_2k/to_x2v_uncond_kv_1_12.pt",
)
pipe.runner.set_inference_params(
index_offset_cond=366,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The index_offset_cond value (366) does not match the offset indicated in the filename of the KV cache being loaded on line 48 (..._1_360.pt). This inconsistency will likely lead to incorrect RoPE indexing during inference. It appears this value was copy-pasted from the 1k example without adjustment.

Suggested change
index_offset_cond=366,
index_offset_cond=360,

"/data/nvme1/yongyang/FL/neo_9b_new/vlm_tensor_44000_ema_2k/to_x2v_uncond_kv_2_15.pt",
)
pipe.runner.set_inference_params(
index_offset_cond=441,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The index_offset_cond value (441) does not match the offset in the filename of the KV cache being loaded on line 71 (..._2_439.pt). Please ensure the parameter matches the actual data being loaded.

Suggested change
index_offset_cond=441,
index_offset_cond=439,

"/data/nvme1/yongyang/FL/neo_9b_new/vlm_tensor_44000_ema_2k/to_x2v_uncond_kv_1_12.pt",
)
pipe.runner.set_inference_params(
index_offset_cond=366,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The index_offset_cond value (366) does not match the offset in the filename of the KV cache being loaded on line 48 (..._1_360.pt).

Suggested change
index_offset_cond=366,
index_offset_cond=360,

"/data/nvme1/yongyang/FL/neo_9b_new/vlm_tensor_44000_ema_2k/to_x2v_uncond_kv_2_15.pt",
)
pipe.runner.set_inference_params(
index_offset_cond=441,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The index_offset_cond value (441) does not match the offset in the filename of the KV cache being loaded on line 71 (..._2_439.pt).

Suggested change
index_offset_cond=441,
index_offset_cond=439,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants