Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the log and enable to print the number of tokens each second. #7853

Merged
merged 6 commits into from Mar 6, 2024

Conversation

Xreki
Copy link
Contributor

@Xreki Xreki commented Jan 17, 2024

PR types

Others

PR changes

Others

Description

优化日志,训练每个step打印tokens/s/device数据。预训练日志如下(有更新,保留了原来的interval_samples_per_second,只是预训练新增了一个interval_tokens_per_second_per_device的条目):
image

Copy link

paddle-bot bot commented Jan 17, 2024

Thanks for your contribution!

Copy link

codecov bot commented Feb 28, 2024

Codecov Report

Attention: Patch coverage is 77.77778% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 56.42%. Comparing base (37e85e6) to head (e5b85ff).
Report is 3 commits behind head on develop.

Files Patch % Lines
paddlenlp/trainer/trainer_utils.py 66.66% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7853      +/-   ##
===========================================
- Coverage    56.56%   56.42%   -0.15%     
===========================================
  Files          589      589              
  Lines        89964    90258     +294     
===========================================
+ Hits         50889    50924      +35     
- Misses       39075    39334     +259     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -1230,12 +1230,14 @@ def _maybe_log_save_evaluate(self, tr_loss, model, epoch, ignore_keys_for_eval,
self.args.train_batch_size * self.args.gradient_accumulation_steps * self.args.dataset_world_size
)
num_steps = self.state.global_step - self._globalstep_last_logged
seq_length = getattr(self.model.config, "seq_length", None) if hasattr(self.model, "config") else None
logs.update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 seq_length 方便到 config 中获取吗?还有就是,你确定这个字段 在训练阶段都会根据训练配置重设?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sft sel_len 不确定的话,确实有问题的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 seq_length 方便到 config 中获取吗?还有就是,你确定这个字段 在训练阶段都会根据训练配置重设?

预训练中,当前会使用用户指定的max_seq_length设置config.seq_length,代码如下:

config.seq_length = data_args.max_seq_length

sft sel_len 不确定的话,确实有问题的。

当前预训练统一使用PretrainingTrainer类,我在该类中加了个is_pretraining的标记,不影响精调的日志,你看如何?

Comment on lines 330 to 331
tokens_per_second_per_device = samples_per_second * seq_length / paddle.distributed.get_world_size()
result[f"{split}_tokens(tokens/sec/device)"] = round(tokens_per_second_per_device, 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的话,trainer 里面大部分打印的都是 global 的指标。有一点点和其他指标不一样了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在预训练都是用tokens/s/card比较多,可以看下是否还需要添加一下tokens/s指标,不过日志会比较长了

Comment on lines 330 to 331
tokens_per_second_per_device = samples_per_second * seq_length / paddle.distributed.get_world_size()
result[f"{split}_tokens(tokens/sec/device)"] = round(tokens_per_second_per_device, 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tokens_per_second_per_device = samples_per_second * seq_length / paddle.distributed.get_world_size()
result[f"{split}_tokens(tokens/sec/device)"] = round(tokens_per_second_per_device, 4)
tokens_per_second_per_device = samples_per_second * seq_length / paddle.distributed.get_world_size()
result[f"{split}_tokens_per_sec_per_device)"] = round(tokens_per_second_per_device, 4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ZHUI
ZHUI previously approved these changes Feb 29, 2024
@wawltor wawltor merged commit 092c845 into PaddlePaddle:develop Mar 6, 2024
7 of 10 checks passed
@Xreki Xreki deleted the log_ips branch March 6, 2024 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants