Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces FP8 quantization support for the NeoPP model, including new configuration files and example scripts for 1k and 2k resolutions. It also updates weight registration in the transformer architecture to explicitly handle null biases and use default weight types for specific heads. Feedback is provided regarding several inconsistencies in the new 2k resolution example scripts where the index_offset_cond parameter does not match the provided KV cache filenames, which would likely cause incorrect RoPE indexing during inference.
| "/data/nvme1/yongyang/FL/neo_9b_new/vlm_tensor_44000_ema_2k/to_x2v_uncond_kv_1_12.pt", | ||
| ) | ||
| pipe.runner.set_inference_params( | ||
| index_offset_cond=366, |
There was a problem hiding this comment.
The index_offset_cond value (366) does not match the offset indicated in the filename of the KV cache being loaded on line 48 (..._1_360.pt). This inconsistency will likely lead to incorrect RoPE indexing during inference. It appears this value was copy-pasted from the 1k example without adjustment.
| index_offset_cond=366, | |
| index_offset_cond=360, |
| "/data/nvme1/yongyang/FL/neo_9b_new/vlm_tensor_44000_ema_2k/to_x2v_uncond_kv_2_15.pt", | ||
| ) | ||
| pipe.runner.set_inference_params( | ||
| index_offset_cond=441, |
There was a problem hiding this comment.
| "/data/nvme1/yongyang/FL/neo_9b_new/vlm_tensor_44000_ema_2k/to_x2v_uncond_kv_1_12.pt", | ||
| ) | ||
| pipe.runner.set_inference_params( | ||
| index_offset_cond=366, |
| "/data/nvme1/yongyang/FL/neo_9b_new/vlm_tensor_44000_ema_2k/to_x2v_uncond_kv_2_15.pt", | ||
| ) | ||
| pipe.runner.set_inference_params( | ||
| index_offset_cond=441, |
No description provided.