-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ModelZoo] Refactor ERNIE-M usage in Model Zoo #4324
Conversation
Thanks for your contribution! |
…nto zoo_ernie_m
…nto zoo_ernie_m
…nto zoo_ernie_m
Codecov Report
@@ Coverage Diff @@
## develop #4324 +/- ##
===========================================
- Coverage 44.64% 44.63% -0.01%
===========================================
Files 446 446
Lines 64361 64361
===========================================
- Hits 28731 28729 -2
- Misses 35630 35632 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
model_zoo/ernie-m/run_classifier.py
Outdated
warmup = training_args.warmup_steps if training_args.warmup_steps > 0 else training_args.warmup_ratio | ||
if training_args.do_train: | ||
num_training_steps = ( | ||
training_args.max_steps | ||
if training_args.max_steps > 0 | ||
else len(train_ds) // training_args.train_batch_size * training_args.num_train_epochs | ||
) | ||
else: | ||
num_training_steps = len(train_data_loader) * args.num_train_epochs | ||
num_train_epochs = args.num_train_epochs | ||
|
||
warmup = args.warmup_steps if args.warmup_steps > 0 else args.warmup_proportion | ||
|
||
lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps, warmup) | ||
num_training_steps = 10 | ||
lr_scheduler = LinearDecayWithWarmup(training_args.learning_rate, num_training_steps, warmup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删除吧
@@ -250,94 +231,103 @@ def do_train(args): | |||
for n, p in model.named_parameters(): | |||
name_dict[p.name] = n | |||
|
|||
simple_lr_setting = partial(layerwise_lr_decay, args.layerwise_decay, name_dict, n_layers) | |||
simple_lr_setting = partial(layerwise_lr_decay, model_args.layerwise_decay, name_dict, n_layers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的话,不建议这么用。
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/AdamW_cn.html#daimashili
设置内部的 params 的 learning_rate。造一个 params list 然后再set一下即可。
trainer.set_optimizer_grouped_parameters(params_to_train)
model_zoo/ernie-m/run_classifier.py
Outdated
|
||
trainer = Trainer( | ||
model=model, | ||
criterion=criterion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要 criterion 了,模型里面支持 label 输入
eval_dataset=eval_ds if training_args.do_eval else None, | ||
tokenizer=tokenizer, | ||
compute_metrics=compute_metrics, | ||
optimizers=[optimizer, lr_scheduler], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimizers=[optimizer, lr_scheduler], |
删除
if training_args.do_eval: | ||
combined = {} | ||
for language in all_languages: | ||
eval_ds = load_xnli_dataset(model_args, "xnli", language, split="validation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 可以 支持一下 eval_ds dict输入
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我看了HF的逻辑,如果我们支持 trianer.evaluate(eval_dataset: Union[Dict[str, Dataset], Dataset, None])
,会和HF的行为不对齐。
如果计划支持这个功能,我另起一个PR实现下。
HF里 trainer.evaluate()
认为传入的eval_dataset
要么是datasets.Dataset
,要么是torch.utils.data.IterableDataset
(见evaluate和get_eval_dataloader)。如果在实例化trainer时trainer.eval_dataset
传入了一个dict,那么在训练调用trainer.evaluate
之前先做类型判断,拆出字典中的dataset循环送进trainer.evaluate
(见_maybe_log_save_evaluate)。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对 这块 确实不是这个场景。应该是 边训练 边 eval的时候支持多个 数据输入。
如果计划支持这个功能,我另起一个PR实现下。
有计划的,你可以帮忙加一下吗?张斌那边有这个场景
model_zoo/ernie-m/run_classifier.py
Outdated
n_layers = model.config.num_hidden_layers | ||
for static_name, param in model.named_parameters(): | ||
if any(nd in static_name for nd in ["bias", "norm"]): | ||
params_list.append({"params": param}) | ||
continue | ||
|
||
if "encoder.layers" in static_name: | ||
idx = static_name.find("encoder.layers.") | ||
layer = int(static_name[idx:].split(".")[2]) | ||
ratio = layerwise_decay ** (n_layers - layer) | ||
elif "embedding" in static_name: | ||
ratio = layerwise_decay ** (n_layers + 1) | ||
|
||
params_list.append({"params": param, "learning_rate": param.optimize_attr["learning_rate"] * ratio}) | ||
return params_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight_decay
和 learning_rate
是独立设置的。不能 continue 跳过。
这里 param.optimize_attr["learning_rate"]
应该不需要吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Function optimization
PR changes
APIs
Description
Refactor ERNIE-M usage in Model Zoo and add unittest.