LazyInterpret for FeedVariableOpExpr #5490

chengtbf · 2021-07-14T09:27:48Z

LazyInterpret 支持传入 EagerTensor 构造 VariableOp

chengtbf · 2021-07-14T09:29:18Z

oneflow/core/framework/op_interpreter/lazy_op_interpreter.cpp

+  // NOTE(chengcheng): Record variable op output LazyTenosr
+  TensorNameScope::Global()->Record(outputs->at(0), op_name + "/" + obn);
+  // NOTE(chengcheng): Record EagerTensor as variable tensor name
+  TensorNameScope::Global()->Record(input_tensor, op_name + "/" + obn);


这里我会把输入的 EagerTensor 也记录下来，这样后续的 UserOp LazyInterpret 里的 input 如果是 EagerTensor 也能找到正确的 lbn。

variable 的 lazy tensor 和 eager tensor 的内存共享会在哪里实现？

NNGraph 里，LazyInterpret 不关心这个事情。由 python 端的 nn.Graph 把 Variable 的 EagerTensor 和对应的 names 传进 NNGraph 里，NNGraph记录该信息，在 Runtime 启动时绑定 Regst。

chengtbf · 2021-07-14T09:31:28Z

oneflow/core/framework/op_interpreter/lazy_op_interpreter.cpp

+    // TODO(chengcheng): GenerateParallelDistributionString by tensor.
+  }
+  if (!input_tensor->requires_grad()) { var_conf->set_trainable(false); }
+  // TODO(chengcheng, xuxiaoyu): Set L1/L2 RegularizerConf by nn.Graph Optimizer


这里我列了一个 TODO：现在单凭 EagerTensor 是不知道 Variable 的 L1 和 L2 参数的，PyTorch 的应该是放在了 Optimizer 里？ @strint 啸宇你后续研究一下，看 nn.Graph 如何支持配置每个 Variable 的 L1 和 L2

是不同参数配置不同的learning rate么

可以不同参数被不同optimizer绑定，每个opt一个lr
https://pytorch.org/docs/stable/optim.html

还有这种：

optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)

optimizer中把参数分为多个group，每个group一个lr，而且还有个默认的lr

不是，LR 是 learning rate，我说的这个是 l1、l2 正则化的参数

https://tensorflow.google.cn/api_docs/python/tf/keras/regularizers/L1L2?hl=en

torch 没有统一，都可以在定义loss时，自己手动写，另外标准的写法：

对于l2，写在optimizer的weight_decay中；

对于l1，写在loss中；
参考：https://stackoverflow.com/a/66543549/2038618

看我们是否有必要，在optimizer的分group的参数配置中，加一个 l1_norm/l2_norm参数，进行配置 @wyg1997

应该是所有Optimizer构造时传的参数，都支持单独指定，如果l1、l2在Optimizer的参数列表里，每个ParamGroup都也应该支持的

strint · 2021-07-15T09:30:10Z

oneflow/python/test/graph/test_variable_op_expr.py

+            out_tensor = var_op.apply([x_tensor_in_c], attrs)[0]
+            test_case.assertEqual(out_tensor.shape, (1, 1, 10, 10))
+            test_case.assertTrue(out_tensor.is_lazy)
+            test_case.assertTrue(out_tensor.is_consistent)


当前默认产出都是consistent的？

Lazy 其实只有 Consistent 概念； Mirror 也是展成 Consistent 的，即使你传进去的是一个 local tensor，也会翻译成 placement 是本 rank 的 ConsistentTensor

strint · 2021-07-15T09:49:06Z

oneflow/core/framework/op_interpreter/lazy_op_interpreter.cpp

+
+  OperatorConf op_conf;
+  op_conf.set_name(op_expr.op_name());           // construct by python nn.Graph
+  op_conf.set_scope_symbol_id(scope_symbol_id);  // TODO(chengcheng): NewScope by cur scope.


外部创建了一个对该variable的scope，因为是单Tensor的，貌似可以复用，不用NewScope了？

需要，外部你创建的 scope 是 Block 的，没有真实的 ParallelDesc 信息。ParallelDesc 相关的 Scope 一定需要在 LazyInterpret· 里现场创建，无论是 Input、Variable 还是普通的 UserOp，因为这些都在输入的 Tensor 上保存的。

主要原因是我们现在要根据tensor去推理ParallelDesc，所以Block层面创建的Scope里面的ParallelDesc往往就没用了对吧

oneflow/core/framework/op_interpreter/lazy_op_interpreter.cpp

LazyInterpret for FeedVariableOpExpr

ad5847b

chengtbf added feature automerge system interface labels Jul 14, 2021

chengtbf added this to the v0.5.0 milestone Jul 14, 2021

chengtbf requested review from strint, leaves-zwx and oneflow-ci-bot July 14, 2021 09:27

chengtbf commented Jul 14, 2021

View reviewed changes

strint approved these changes Jul 15, 2021

View reviewed changes

Merge branch 'master' into dev_cc_feed_var

bfbba2b

leaves-zwx approved these changes Jul 15, 2021

View reviewed changes

Merge branch 'master' into dev_cc_feed_var

8aa798b

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 15, 2021 11:41

Merge branch 'master' into dev_cc_feed_var

7c80311

oneflow-ci-bot self-requested a review July 15, 2021 13:30

hjchen2 reviewed Jul 15, 2021

View reviewed changes

oneflow/core/framework/op_interpreter/lazy_op_interpreter.cpp Show resolved Hide resolved

Merge branch 'master' into dev_cc_feed_var

6844d27

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 15, 2021 15:48

Merge branch 'master' into dev_cc_feed_var

f08bed4

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 15, 2021 17:21

oneflow-ci-bot merged commit b0c3d7e into master Jul 15, 2021

oneflow-ci-bot deleted the dev_cc_feed_var branch July 15, 2021 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LazyInterpret for FeedVariableOpExpr #5490

LazyInterpret for FeedVariableOpExpr #5490

chengtbf commented Jul 14, 2021

chengtbf Jul 14, 2021

leaves-zwx Jul 15, 2021

chengtbf Jul 15, 2021

chengtbf Jul 14, 2021

strint Jul 14, 2021

strint Jul 14, 2021

chengtbf Jul 14, 2021

chengtbf Jul 14, 2021

strint Jul 14, 2021

strint Jul 14, 2021

wyg1997 Jul 14, 2021

strint Jul 15, 2021

chengtbf Jul 15, 2021

strint Jul 15, 2021 •

edited

Loading

chengtbf Jul 15, 2021

strint Jul 15, 2021

chengtbf Jul 15, 2021

LazyInterpret for FeedVariableOpExpr #5490

LazyInterpret for FeedVariableOpExpr #5490

Conversation

chengtbf commented Jul 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

strint Jul 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

strint Jul 15, 2021 •

edited

Loading