Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ModelZoo] Refactor ERNIE-M usage in Model Zoo #4324

Merged
merged 29 commits into from
Feb 20, 2023

Conversation

Yam0214
Copy link
Contributor

@Yam0214 Yam0214 commented Jan 3, 2023

PR types

Function optimization

PR changes

APIs

Description

Refactor ERNIE-M usage in Model Zoo and add unittest.

@paddle-bot
Copy link

paddle-bot bot commented Jan 3, 2023

Thanks for your contribution!

model_zoo/ernie-m/run_classifier.py Outdated Show resolved Hide resolved
model_zoo/ernie-m/run_classifier.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 31, 2023

Codecov Report

Merging #4324 (03626da) into develop (9f78fa5) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #4324      +/-   ##
===========================================
- Coverage    44.64%   44.63%   -0.01%     
===========================================
  Files          446      446              
  Lines        64361    64361              
===========================================
- Hits         28731    28729       -2     
- Misses       35630    35632       +2     
Impacted Files Coverage Δ
paddlenlp/utils/downloader.py 65.04% <0.00%> (-0.89%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

scripts/regression/ci_case.sh Outdated Show resolved Hide resolved
model_zoo/ernie-m/run_classifier.py Outdated Show resolved Hide resolved
@ZHUI ZHUI self-requested a review February 9, 2023 08:12
model_zoo/ernie-m/README.md Show resolved Hide resolved
model_zoo/ernie-m/run_classifier.py Show resolved Hide resolved
model_zoo/ernie-m/run_classifier.py Outdated Show resolved Hide resolved
Comment on lines 215 to 224
warmup = training_args.warmup_steps if training_args.warmup_steps > 0 else training_args.warmup_ratio
if training_args.do_train:
num_training_steps = (
training_args.max_steps
if training_args.max_steps > 0
else len(train_ds) // training_args.train_batch_size * training_args.num_train_epochs
)
else:
num_training_steps = len(train_data_loader) * args.num_train_epochs
num_train_epochs = args.num_train_epochs

warmup = args.warmup_steps if args.warmup_steps > 0 else args.warmup_proportion

lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps, warmup)
num_training_steps = 10
lr_scheduler = LinearDecayWithWarmup(training_args.learning_rate, num_training_steps, warmup)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除吧

@@ -250,94 +231,103 @@ def do_train(args):
for n, p in model.named_parameters():
name_dict[p.name] = n

simple_lr_setting = partial(layerwise_lr_decay, args.layerwise_decay, name_dict, n_layers)
simple_lr_setting = partial(layerwise_lr_decay, model_args.layerwise_decay, name_dict, n_layers)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的话,不建议这么用。

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/AdamW_cn.html#daimashili

image

设置内部的 params 的 learning_rate。造一个 params list 然后再set一下即可。

trainer.set_optimizer_grouped_parameters(params_to_train)


trainer = Trainer(
model=model,
criterion=criterion,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要 criterion 了,模型里面支持 label 输入

eval_dataset=eval_ds if training_args.do_eval else None,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
optimizers=[optimizer, lr_scheduler],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
optimizers=[optimizer, lr_scheduler],

删除

if training_args.do_eval:
combined = {}
for language in all_languages:
eval_ds = load_xnli_dataset(model_args, "xnli", language, split="validation")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 可以 支持一下 eval_ds dict输入

Copy link
Contributor Author

@Yam0214 Yam0214 Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看了HF的逻辑,如果我们支持 trianer.evaluate(eval_dataset: Union[Dict[str, Dataset], Dataset, None]),会和HF的行为不对齐。
如果计划支持这个功能,我另起一个PR实现下。

HF里 trainer.evaluate()认为传入的eval_dataset要么是datasets.Dataset,要么是torch.utils.data.IterableDataset(见evaluateget_eval_dataloader)。如果在实例化trainer时trainer.eval_dataset传入了一个dict,那么在训练调用trainer.evaluate之前先做类型判断,拆出字典中的dataset循环送进trainer.evaluate(见_maybe_log_save_evaluate)。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对 这块 确实不是这个场景。应该是 边训练 边 eval的时候支持多个 数据输入。

如果计划支持这个功能,我另起一个PR实现下。

有计划的,你可以帮忙加一下吗?张斌那边有这个场景

Comment on lines 238 to 256
n_layers = model.config.num_hidden_layers
for static_name, param in model.named_parameters():
if any(nd in static_name for nd in ["bias", "norm"]):
params_list.append({"params": param})
continue

if "encoder.layers" in static_name:
idx = static_name.find("encoder.layers.")
layer = int(static_name[idx:].split(".")[2])
ratio = layerwise_decay ** (n_layers - layer)
elif "embedding" in static_name:
ratio = layerwise_decay ** (n_layers + 1)

params_list.append({"params": param, "learning_rate": param.optimize_attr["learning_rate"] * ratio})
return params_list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight_decaylearning_rate 是独立设置的。不能 continue 跳过。

这里 param.optimize_attr["learning_rate"] 应该不需要吧?

Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joey12300 joey12300 merged commit dd376ce into PaddlePaddle:develop Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants