Skip to content

请问1.5B的训练过程format reward一直是0吗 #73

@zhaoyang123638

Description

@zhaoyang123638

全量微调1.5B模型过程中format reward一直是0,作者给出的训练曲线貌似也是一样的结果,是否需要放宽格式奖励条件

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions