-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fea/nn graph/warmup amp config #5969
Conversation
strint
commented
Aug 19, 2021
- WarmUpLR
- eager
- graph
- amp & nn.Graph.config
- flow.config
class WarmUpLR(WarmUpLrScheduler): | ||
def __init__( | ||
self, | ||
lrsch_or_optimizer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
和torch开发分支中一样的接口。
但是是个torch的增强版本:可以组合一个普通的LrScheduler。支持eager和graph。
…fea/nn_graph/warmup_amp_config
|
||
flow.backends.cudnn.set_reserved_mem_mbytes(1000) | ||
|
||
flow.utils.load_library("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config接口调用的例子
# amp | ||
self.config.enable_amp(True) | ||
grad_scaler = flow.nn.graph.amp.GradScaler(3000, 2.0, 0.5, 1000) | ||
self.set_grad_scaler(grad_scaler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amp调用的例子
return self.config.proto | ||
|
||
@property | ||
def _optimization_conf_proto(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个只是为了 debug 打印吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,为了内部debug,加下滑线前缀的方法都是内部使用的私有方法,不保证稳定
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LR 部分的内容 @leaves-zwx 文骁 Review 吧,通过了就能合并~
|
||
# amp | ||
self.config.enable_amp(True) | ||
grad_scaler = flow.amp.GradScaler(3000, 2.0, 0.5, 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要提供 key args 把,不然别人不知道 3000、2、1000 都是啥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里和torch接口对齐了,用户传参时也可以写 arg=val,也可以不写
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对,但是我觉得我们的示例代码要写明白,参考:
# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.025 if epoch == 0
# lr = 0.03125 if epoch == 1
# lr = 0.0375 if epoch == 2
# lr = 0.04375 if epoch == 3
# lr = 0.005 if epoch >= 4
scheduler = WarmUpLR(self.opt, warmup_factor=0.5, warmup_iters=4, warmup_method="linear")
for epoch in range(100):
train(...)
validate(...)
scheduler.step()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们的测试脚本,用户也会尝试模仿。所以最好还是带上参数更明白
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有道理,已经加上
…neflow-Inc/oneflow into fea/nn_graph/warmup_amp_config
Speed stats:
|
) | ||
|
||
from oneflow.framework.config_util import ( | ||
api_nccl_use_compute_stream as enable_use_compute_stream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不是有吗? 😂 @strint @leaves-zwx
flow.boxing.nccl.enable_use_compute_stream()