Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: TRL trainer preprocessing step was running in one process #1583

Conversation

ali-mosavian
Copy link
Contributor

Description

We weren't passing dataset_num_proc to TRL training config, thus the initial data preprocessing steps in the TRL trainer was running in one process only.

Motivation and Context

Speeds up training start time by a lot depending on the number logical cores you have.

How has this been tested?

Tested it with the ORPO trainer in axolotl.

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

https://www.linkedin.com/in/ali-mosavian-7a27457/
https://github.com/ali-mosavian

Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

…ainer args and directly to the trainer when DPO
@ali-mosavian
Copy link
Contributor Author

I made a misstake, i sent it as trainer args in all cases, but it only applies CPO, KTO and ORPO trainer args. For the DPO it needs to be sent directly to the trainer. My latest commit in the PR fixes that.

@winglian winglian merged commit b9bb169 into OpenAccess-AI-Collective:main May 3, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants