-
Notifications
You must be signed in to change notification settings - Fork 625
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Your Question
我参照 https://thudm.github.io/slime/zh/advanced/low-precision.html 文档中例子做了FP8训练,TransformerEngine最终存了FP16的torch_dist权重,在转成HF格式后,再从FP16转到FP8是不是无损的呢?如果还是会有训推不一致的问题,怎么和TransformerEngine训练时的FP8保持对齐?
目前FP16转FP8用的是slime中提供的工具:tools/convert_hf_to_fp8.py
What I've Tried
I have tried the fp8 training.
Environment (if relevant)
- slime version:
- Python version:
- PyTorch version:
- CUDA/ROCm version:
- GPU type and count:
- OS:
Additional Context
No response
Pre-submission Checklist
- I have read the CONTRIBUTING.md and understand the collaboration scope.
- I have read the documentation and FAQ and my question is not answered there.
- I have searched for existing issues and my question has not been asked before.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested