Skip to content

[Question] 关于FP8训练的问题 #1618

@ifififa

Description

@ifififa

Your Question

我参照 https://thudm.github.io/slime/zh/advanced/low-precision.html 文档中例子做了FP8训练,TransformerEngine最终存了FP16的torch_dist权重,在转成HF格式后,再从FP16转到FP8是不是无损的呢?如果还是会有训推不一致的问题,怎么和TransformerEngine训练时的FP8保持对齐?

目前FP16转FP8用的是slime中提供的工具:tools/convert_hf_to_fp8.py

What I've Tried

I have tried the fp8 training.

Environment (if relevant)

  • slime version:
  • Python version:
  • PyTorch version:
  • CUDA/ROCm version:
  • GPU type and count:
  • OS:

Additional Context

No response

Pre-submission Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions