Skip to content

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Sign in

Sign up

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

OpenDocCN / huggingface-doc-zh Public

generated from OpenDocCN/doc-template

Notifications
Fork 0
Star 5

Code
Issues
Pull requests
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Projects
Security
Insights

Breadcrumbs

huggingface-doc-zh
docs
trl_0.7

/

SUMMARY.md

Latest commit

History

28 lines (28 loc) · 1.21 KB

Breadcrumbs

huggingface-doc-zh
docs
trl_0.7

/

SUMMARY.md

File metadata and controls

28 lines (28 loc) · 1.21 KB

Raw

TRL 0.7 中文文档
开始吧
TRL - Transformer Reinforcement Learning
快速入门
安装
训练常见问题
在训练后使用模型
训练定制
日志记录
应用程序接口
模型
训练器
奖励建模
监督微调训练器
PPO 训练器
N 最佳抽样：在没有基于 RL 的微调的情况下获得更好的模型输出的替代方法
DPO 训练师
去噪扩散策略优化
迭代训练器
文本环境
例子
示例
情感微调示例
使用 peft 与 trl 示例，对使用低秩适应（LoRA）微调 8 位模型的示例。
使用 PPO 解毒语言模型
使用 LLaMA 模型与 TRL
学习工具（实验性 🧪）
多适配器强化学习（MARL）- 一个用于所有的单一基础模型

Footer

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.

Files

SUMMARY.md

Latest commit

History

SUMMARY.md

File metadata and controls