turn on view slice #8302

@@ -119,9 +119,10 @@ class ScalarMathBaseFunctor {
    if (inplace) {
      JUST(CheckInplaceCastValid(x, casted_vec[0]));
      JUST(CheckInplaceValid(x));
+      auto input = JUST(functional::InplaceToContiguous(x));


修改的主要差别在这一行auto input = JUST(functional::InplaceToContiguous(x)); @strint

需要看一下，按说 graph 内部所有的 source op 都被处理为 contigous了。

这时中间计算还能产生非 contigous 的tensor，按说是不合理的。

这里就是对一个 op 做了输入的 contigous，那就证明在它之前产生了非 contigous tensor，需要看是哪里产生的。

可以在 interpreter 那里加个 debug，对每个 op 的输出，如果是非 contigous，就打印一个日志。

这块可以revert 掉修复，加上日志看看，看看这个调用在哪些地方发生的

Flowingsun007 · 2022-06-10T05:28:40Z

这里看起来是 inplace 计算时，先把输入转成 contigous，然后再执行。

是的，修改前会有问题，因为在interpreter那里会统一对input->contiguous()，但这个操作是基于copy的，返回一个new tensor，对于inplace op那肯定会有问题；这里就是换成了用inplace版的contiguous处理了，保证还是同一个tensor

github-actions · 2022-06-13T03:15:52Z

Speed stats:

GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.7ms (= 12966.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.7ms (= 14266.8ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 142.7ms / 129.7ms)

OneFlow resnet50 time: 76.2ms (= 7624.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.7ms (= 8770.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.15 (= 87.7ms / 76.2ms)

OneFlow resnet50 time: 48.8ms (= 9761.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.7ms (= 11541.7ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.18 (= 57.7ms / 48.8ms)

OneFlow resnet50 time: 40.5ms (= 8108.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.6ms (= 9520.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.17 (= 47.6ms / 40.5ms)

OneFlow resnet50 time: 35.4ms (= 7071.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.2ms (= 7841.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.11 (= 39.2ms / 35.4ms)

OneFlow swin dataloader time: 0.242s (= 48.482s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.734s / 200, num_workers=1)
Relative speed: 0.613 (= 0.149s / 0.242s)

OneFlow swin dataloader time: 0.066s (= 13.189s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.232s / 200, num_workers=4)
Relative speed: 0.624 (= 0.041s / 0.066s)

OneFlow swin dataloader time: 0.057s (= 11.331s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.413s / 200, num_workers=8)
Relative speed: 0.389 (= 0.022s / 0.057s)

❌ OneFlow resnet50 time: 146.3ms (= 14631.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 170.1ms (= 17007.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 170.1ms / 146.3ms)

OneFlow resnet50 time: 96.4ms (= 9640.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.7ms (= 11265.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 112.7ms / 96.4ms)

OneFlow resnet50 time: 72.3ms (= 14465.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.6ms (= 17710.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 88.6ms / 72.3ms)

OneFlow resnet50 time: 57.7ms (= 11546.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.0ms (= 14792.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 74.0ms / 57.7ms)

OneFlow resnet50 time: 54.0ms (= 10794.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.8ms (= 15960.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.48 (= 79.8ms / 54.0ms)

github-actions · 2022-06-14T03:22:49Z

Speed stats:

GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.7ms (= 12971.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.0ms (= 14199.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 142.0ms / 129.7ms)

OneFlow resnet50 time: 75.9ms (= 7590.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.3ms (= 8627.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 86.3ms / 75.9ms)

OneFlow resnet50 time: 50.7ms (= 10136.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 61.3ms (= 12263.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.21 (= 61.3ms / 50.7ms)

OneFlow resnet50 time: 40.2ms (= 8044.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.5ms (= 8498.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.06 (= 42.5ms / 40.2ms)

OneFlow resnet50 time: 36.8ms (= 7363.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.0ms (= 8001.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.09 (= 40.0ms / 36.8ms)

OneFlow swin dataloader time: 0.251s (= 50.280s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.824s / 200, num_workers=1)
Relative speed: 0.593 (= 0.149s / 0.251s)

OneFlow swin dataloader time: 0.100s (= 19.948s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.218s / 200, num_workers=4)
Relative speed: 0.412 (= 0.041s / 0.100s)

OneFlow swin dataloader time: 0.055s (= 10.941s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.551s / 200, num_workers=8)
Relative speed: 0.416 (= 0.023s / 0.055s)

❌ OneFlow resnet50 time: 146.4ms (= 14637.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 172.7ms (= 17267.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 172.7ms / 146.4ms)

OneFlow resnet50 time: 94.2ms (= 9416.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.1ms (= 11212.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 112.1ms / 94.2ms)

OneFlow resnet50 time: 71.4ms (= 14272.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 99.0ms (= 19803.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.39 (= 99.0ms / 71.4ms)

OneFlow resnet50 time: 60.6ms (= 12118.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.8ms (= 14957.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 74.8ms / 60.6ms)

OneFlow resnet50 time: 52.7ms (= 10549.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.7ms (= 13537.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 67.7ms / 52.7ms)

github-actions · 2022-06-15T12:33:26Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions · 2022-06-15T13:14:52Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8302/

github-actions · 2022-06-15T13:15:02Z

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions · 2022-06-15T13:15:42Z

Speed stats:

github-actions · 2022-06-15T14:01:34Z

Speed stats:

GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.9ms (= 12888.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.8ms (= 14175.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.8ms / 128.9ms)

OneFlow resnet50 time: 75.3ms (= 7529.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.9ms (= 8485.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 84.9ms / 75.3ms)

OneFlow resnet50 time: 48.6ms (= 9718.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.8ms (= 11968.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.23 (= 59.8ms / 48.6ms)

OneFlow resnet50 time: 41.5ms (= 8295.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.1ms (= 8429.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.02 (= 42.1ms / 41.5ms)

OneFlow resnet50 time: 37.1ms (= 7411.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.5ms (= 7700.5ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.04 (= 38.5ms / 37.1ms)

OneFlow swin dataloader time: 0.239s (= 47.797s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.745s / 200, num_workers=1)
Relative speed: 0.622 (= 0.149s / 0.239s)

OneFlow swin dataloader time: 0.064s (= 12.848s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 8.003s / 200, num_workers=4)
Relative speed: 0.623 (= 0.040s / 0.064s)

OneFlow swin dataloader time: 0.054s (= 10.841s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.459s / 200, num_workers=8)
Relative speed: 0.411 (= 0.022s / 0.054s)

❌ OneFlow resnet50 time: 139.5ms (= 13949.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.6ms (= 16059.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.6ms / 139.5ms)

OneFlow resnet50 time: 86.1ms (= 8609.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10227.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 102.3ms / 86.1ms)

OneFlow resnet50 time: 59.8ms (= 11956.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.3ms (= 15862.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 79.3ms / 59.8ms)

OneFlow resnet50 time: 48.7ms (= 9749.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.8ms (= 13355.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 66.8ms / 48.7ms)

OneFlow resnet50 time: 45.3ms (= 9062.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.2ms (= 13439.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.48 (= 67.2ms / 45.3ms)

github-actions · 2022-06-16T00:37:24Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8302/

github-actions · 2022-06-16T00:40:19Z

Speed stats:

github-actions · 2022-06-16T01:11:38Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8302/

* Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b7, reversing changes made to baadf60. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Fix eval error in FusedMLP (#8413) Fix eval error * Init NCCL communicator in graph mode unifiedly (#8263) * centralized comm init * address review * revert * rename * ref nccl logical send recv * fix cpu only Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix dim_scatter 0-dim tensor bug (#8418) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * target based external libraries (#8421) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refine hardcoded attr setting/getting in ir (#8420) * use names in trait static func * more changes on op name attr * use wrapped func * Replace cu115 with cu116 in nightly (#8423) update workflows Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix repeat interleave 0-size tensor bug (#8414) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Autotest support print input in ci (#8383) * support print tensor value in autotest to provide more details in ci * revert * refine * auto format by CI * control precision to 1e-5 when record * fix bug * auto format by CI * relax tensor_size_mb * fix bug * fix bug * refine * releax * refinew * refine * fix bug * relax * refine * restruct * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Modify sbp.split()'s karg: axis to dim (#8411) * Modify sbp.split()'s axis karg to dim * Refine * Refine * Refine * Refine * Feat/graph logical op debug repr (#8131) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * add module config * save nn.Module info in job.proto for better debugging * add new line * add ModuleBlock.ops_proto() API * zero use stage 2 * print operators' info when print ModuleBlock * handle VariableOpConf * update * update * fix * move operators repr method to graph util * add limit consumer api * add new api * refine zero s select * add module block * fix * refact for rm op in module conf * fix * add sbp debug * add sbp repr * add shape * refine * add sys op in repr * add full op debug * fix index out of range * rm zero limit on device type * add no scope op to graph * zero test with activation checkpointing * fix order * add indentity when dp sequence len is 1 * add debug repr * refine repr of op * refine and fix * rm useless log * move to base with master * fix * fix * fix * fix proto * refine test * fix type * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * refine * restore test * refine pass and mem debug * merge master * repr dtype * add placement * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI * fix merge * auto format by CI * auto format by CI * refine get job api * refine graph util import order * auto format by CI * fix static check * auto format by CI * fix special case * refine level print and add full dtype repr * rm useless Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Cijie Xia <xiacijie1998@163.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * rm some test case in test_fused_dot_feature_interaction_pooling_sum (#8425) rm some case in test * Remove unused linkages (#8426) remove unused linkages * refactor stride (#8402) * Stride inherits DimVector Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix argument type of OFStrideToNumpyStride Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Move Tensor.__setitem__ and global related api to Python/C api (#8375) * add local_to_global, global_to_global, to_global. global_to_global still have bugs * fix bug of global_to_global * remove python api * add setitem * remove local_to_global sbp pack, format code * format code * remove redundant code * add error msg, refine check of to_global * fix bug of check * add error msg * fix clang static check error * remove useless api in tensor.py, remove redundant code, remove useless CHECK * add to_local * fix wrong exception type in unittest for to_local exception message * cuda add default error msg (#8427) default error Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Refactor ShapeView (#8422) * update Signed-off-by: daquexian <daquexian566@gmail.com> * update and add docs Signed-off-by: daquexian <daquexian566@gmail.com> * turn on view slice (#8302) * turn_on_view_slice * inplace scalar math hnandle non-contiguous input * fix clang check * add docs * refactor * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Add flow env init rdma api (#8415) * add_flow_env_init_rdma_api * adjust persistent_workers logic for RDMA support * adjust persistent_workers logic for RDMA support * add rmda_inited api * minro fix * add docs * Update python/oneflow/utils/data/dataloader.py Co-authored-by: daquexian <daquexian566@gmail.com> * fix typo * refine * fix RDMAIsInitialized * minor fix * refine * rename InitRdma to InitRDMA * refine Co-authored-by: Flowingsun007 <flowingsun007@163.com> Co-authored-by: daquexian <daquexian566@gmail.com> * add 1d send recv in nccl logical (#8355) * add 1d send recv in nccl logical * Update insert_nccl_logical_op_pass.cpp * auto format by CI Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support iree ci (#8419) * create mlir cpu and modify build gcc 7 shell script * fix the bug of test_iree_resnet.py cuda test in cpu version error * fix constant folding tests * suport oneflow_test_cpu_only * pub * build script add flag * modify test yml * add python3 into \PATH * don't use pretrain model * install flowvision Co-authored-by: mosout <mosout@qq.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> * Feat straighten task nodes (#8347) * Add a fast topological traversal * Add an initial implementation of straighen nodes * Add the straighen nodes algorithm * Change algorithm structure * Remove some debug information * Finalize the straighten algorithm after deciding the parameters by experiments * Notify the usage of straighten algorithm * Of format * Update oneflow/core/graph/straighten_nodes.cpp Of format Co-authored-by: daquexian <daquexian566@gmail.com> * Of format * Stop using visual string before we find a better key * Remove magic numbers and Of format * Remove starts * Of format * Fix a bug of using GetMaxVal<int32_t>() as an initial number for comparing * Refactor add straighten algo interface (#8435) * feat(*): export straighten nodes algorithm inferface * export documentation * Update python/oneflow/nn/graph/graph_config.py Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> * Use TopoForEachNodeFast as default. (#8436) * Use TopoForEachNodeFast as default. Rename the original one as TopoForEachNodeDynamic * Speed up TopoForEachNodeFast when traversing a subgraph * Rename the switch and code clean up * Hide the class TopoStruct * Hide all the other functions * Grammar * Of format Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * Refactor NLLLoss to support split class dim (#8380) * refactor * RuntimeError * avoid atomic add * test * fixes * update test * update test * update test * fix kernel * improve backward * update test * out_weight to be required * address static analysis errer * fix static analysis error * fix static analysis error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Strict ordering in memory reuse algorithm (#8441) * Support broadcast in fused_softmax kernel (#8321) * support broadcast * refine * Remove shape check * fix sbp when broadcast * rollback softmax grad threshold * increase threshold of test conv bn folding * tol to 1e-2 * check error msg of fuse softmax ops * add more dispatch * remove double datatype test and add broadcast test Co-authored-by: cheng cheng <472491134@qq.com> * Merge slice and logical slice (#8416) * remove Slice, SliceUpdate, SliceGrad op * rename logical_slice to slice and logical_slice_assign to slice_update * move gradient_func logical_slice.cpp to slice.cpp * fix some bug and refine local test * feat(SliceUpdate): support 0size tensor * test(Slice): refine consistent slice test * test(SliceUpdate): refine consistent slice_update test * not export slice_update's inplace parameter * auto format by CI * recovery slice_grad_op * fix slice_view bug * add error message and attr judgement * modified old test * auto format by CI * update test README * update tensor_string code * fix test bug * auto format by CI * fix(hsplit): hsplit functor bug * fix vsplit doc test bug * refine * fix test * fix pin_memory bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Graph block.config.set_stage() for recommended Pipeline api. (#8442) * Graph block.config.set_stage() for recommended Pipeline api. * revert diff * refine api doc Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Update PolynomialLR's doc and paramater (#8430) * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * rename the steps to decay_batch in parameters * update PolynomialLR test case Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add mv op (#8445) * add mv op with bug that Int is incompatible * add test * update test_mv.py * fix based on comments * fix based on comments Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * enable oneflow_iree(python package) and corresponding test works in ci (#8431) * update test.yml * add pytest for oneflow_iree examples * add oneflow frontend test * Dev tensor is pinned api (#8447) * support tensor.is_pinned * add test case * add docs * auto format by CI * refine * auto format by CI * refine * auto format by CI * refine * refine * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Nd sbp tensor str (#8458) * nd sbp tensor str * add nd sbp tensor str test * bigger input size * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Patch sbp cost (#8378) * Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Add the slight penalty for eager * Consider B -> (B, B) for a scalar * Do not consider parallel description in priority ratio * Of format * Fix a bug in the old version group boxing with 2D SBP (#8448) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Fix bugs of patch-sbp_cost (#8456) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Reduce to uniform B for 1 device. Use the actual parallel description for each tensor * Fix a bug of fix-group_boxing-bug * Group boxing reduce [2, 2]: (S0, S0) to [4]: S0, then we might infer a 1D SBP from a 2D SBP hint Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: cheng cheng <472491134@qq.com> * Decouple stream and instruction (#7607) * remove deprecated python api * backup code * backup code * fix compiler complaints * fix typo in refactoring * kMockDevice * add unit test test_mock.py * revert mock kernels * vert DEVICE_TYPE_SEQ * mock placement * address pr comments * register device kCriticalSectionDevice and kLazyJobLauncher * kControlDevice * Stream::vm_stream_ * fix compiler complaints * backup code * rename StreamIsTransport to IsCommNetStream * decouple vm::StreamType and vm::InstructionType * fix compiler complaints * remove 'gpu' related code * address static analyzer complaints * address static analyzer complaints * remove unused module in test_mock.py * the Env is never destroyed. * export Env into python * more unittests * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * fix oneflow.placement.__str__ * revert GlobalSync * init_producer_stream in oneflow.from_numpy * debug code for vm * init disable_vm_threads_ in VirtualMachine::VirtualMachine * Update oneflow/core/vm/virtual_machine.h Co-authored-by: daquexian <daquexian566@gmail.com> * create stream in forked subprocesses. * refactor StreamRoleSwitch to StreamRoleVisistor * ThreadLocalGuard * auto format by CI * fix compiler complaints * fix static analyzer complaints * VirtualMachine::GetVmStream * fix static analyzer complaints * reimplement AddAndReadVector by std::deque * reimplement AddAndReadVector * merge master * increase atol for test_consistent_rnn_cell.py * StreamRole::AsyncLaunchedCommNet is bound to EventRecordedCudaStreamType * auto format by CI * remove StreamRoleVisitor<T>::VisitInvalid * no copy in AddAndReadVector * fix bug of AddAndReadVector::size_ * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix AddAndReadVector::GetGranularity * remove bad unittest * auto format by CI * rename CallInstructionType to OpCallInstructionType * static variable GlobalSingletonPtr is a unique_ptr * replace ++atomic_cnt with atomic_cnt.fetch_add(1, std::memory_order_relaxed) * AddAndReadVector::operator[] * change comments 'lock free' to 'thread safe' * rename StatefulLocalOpKernel to StatefulOpKernel * rename VirtualMachine::vm_ to VirtualMachine::engine_ * mark VirtualMachine::NoMoreErasedInstructions private * mark VirtualMachine::FindOrCreateScheduleLocalDepObject private * remove unused version of VirtualMachineEngine::Receive * rename argname for VirtualMachineEngine::Receive * rename unused PendingInstructionList * rename AddAndReadVector to SteadyVector * optimize SteadyVector::operator[] by __builtin_clzll * refactor SteadyVector::granularity2vector_ to SteadyVector::granularity2data_ * reduce usage of steady_vector::size_ * rename unused anounymous namespace * greater atol for test_consistent_tensordot.py * fix BarrierInstructionType::ComputeInFuseMode * revert container_util.h * run AccessBlobByCallback in default stream of tensor->device * reslove static check * reslove static check * SteadyVector::MutableOrAdd Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> * fix_tensor_numpy_to_avoid_gpu_mem_increase (#8449) * fix_tensor_numpy_to_avoid_gpu_mem_increase * Update tensor.py * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Rename user op tensor shape to shape view (#8433) * ThreadLocalGuard * rename user_op::Tensor::shape to user_op::Tensor::shape_view * auto format by CI * fix static analyzer complaints * more verbose code for HobDataType * larger timeout * larger timeout Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> * speedup global test (#8468) * speedup global test * Test refine slice ops test (#8471) * refine consistent_slice test from 112s -> 30s in 4 device * test(SliceUpdate): refine test from 119s -> 28s in 4 device * delete useless code * auto format by CI Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: wyg1997 <wangyinggang@foxmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Set the minimum mtu value for IB communication connection (#8451) * Set the minimum mtu value for IB communication connection * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Merge branch 'master' into feat-general_basic_communication Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: ZZK <359521840@qq.com> Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Yao Zihang <1162526220@qq.com> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com> Co-authored-by: Cijie Xia <xiacijie1998@163.com> Co-authored-by: Jia <basicv8vc@gmail.com> Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: wyg1997 <wangyinggang@foxmail.com> Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>

* Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Fix a slight bug * Add at most 1 middle node for general basic communication * Add the cost for general basic communication * Add the slight penalty for eager * Skip initialization of boxing collector if not needed * Fix a bug * Dev nd nccl send recv boxing (#8467) * nd nccl_send_recv_boxing * rm print * support num_axes > 2 * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e5fa20b97229d6b0be46a7ff814dc8cd83. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b73a8041463e5e3d130784be386ee248bd8, reversing changes made to baadf6045f2fce69c090e442a755229c1c949773. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196d8530ffd2b9bf781abcf168b94ff9ca41. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * nccl send/recv support different placement * refine * auto format by CI * rm out ctrl * auto format by CI Co-authored-by: guo-ran <360112263@qq.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: ZZK <359521840@qq.com> Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Yao Zihang <1162526220@qq.com> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> * Support different hierarchy * Merge branch 'master' into feat-general_basic_communication (#8477) * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e5fa20b97229d6b0be46a7ff814dc8cd83. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b73a8041463e5e3d130784be386ee248bd8, reversing changes made to baadf6045f2fce69c090e442a755229c1c949773. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196d8530ffd2b9bf781abcf168b94ff9ca41. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Fix eval error in FusedMLP (#8413) Fix eval error * Init NCCL communicator in graph mode unifiedly (#8263) * centralized comm init * address review * revert * rename * ref nccl logical send recv * fix cpu only Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix dim_scatter 0-dim tensor bug (#8418) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * target based external libraries (#8421) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refine hardcoded attr setting/getting in ir (#8420) * use names in trait static func * more changes on op name attr * use wrapped func * Replace cu115 with cu116 in nightly (#8423) update workflows Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix repeat interleave 0-size tensor bug (#8414) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Autotest support print input in ci (#8383) * support print tensor value in autotest to provide more details in ci * revert * refine * auto format by CI * control precision to 1e-5 when record * fix bug * auto format by CI * relax tensor_size_mb * fix bug * fix bug * refine * releax * refinew * refine * fix bug * relax * refine * restruct * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Modify sbp.split()'s karg: axis to dim (#8411) * Modify sbp.split()'s axis karg to dim * Refine * Refine * Refine * Refine * Feat/graph logical op debug repr (#8131) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * add module config * save nn.Module info in job.proto for better debugging * add new line * add ModuleBlock.ops_proto() API * zero use stage 2 * print operators' info when print ModuleBlock * handle VariableOpConf * update * update * fix * move operators repr method to graph util * add limit consumer api * add new api * refine zero s select * add module block * fix * refact for rm op in module conf * fix * add sbp debug * add sbp repr * add shape * refine * add sys op in repr * add full op debug * fix index out of range * rm zero limit on device type * add no scope op to graph * zero test with activation checkpointing * fix order * add indentity when dp sequence len is 1 * add debug repr * refine repr of op * refine and fix * rm useless log * move to base with master * fix * fix * fix * fix proto * refine test * fix type * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * refine * restore test * refine pass and mem debug * merge master * repr dtype * add placement * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI * fix merge * auto format by CI * auto format by CI * refine get job api * refine graph util import order * auto format by CI * fix static check * auto format by CI * fix special case * refine level print and add full dtype repr * rm useless Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Cijie Xia <xiacijie1998@163.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * rm some test case in test_fused_dot_feature_interaction_pooling_sum (#8425) rm some case in test * Remove unused linkages (#8426) remove unused linkages * refactor stride (#8402) * Stride inherits DimVector Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix argument type of OFStrideToNumpyStride Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Move Tensor.__setitem__ and global related api to Python/C api (#8375) * add local_to_global, global_to_global, to_global. global_to_global still have bugs * fix bug of global_to_global * remove python api * add setitem * remove local_to_global sbp pack, format code * format code * remove redundant code * add error msg, refine check of to_global * fix bug of check * add error msg * fix clang static check error * remove useless api in tensor.py, remove redundant code, remove useless CHECK * add to_local * fix wrong exception type in unittest for to_local exception message * cuda add default error msg (#8427) default error Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Refactor ShapeView (#8422) * update Signed-off-by: daquexian <daquexian566@gmail.com> * update and add docs Signed-off-by: daquexian <daquexian566@gmail.com> * turn on view slice (#8302) * turn_on_view_slice * inplace scalar math hnandle non-contiguous input * fix clang check * add docs * refactor * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Add flow env init rdma api (#8415) * add_flow_env_init_rdma_api * adjust persistent_workers logic for RDMA support * adjust persistent_workers logic for RDMA support * add rmda_inited api * minro fix * add docs * Update python/oneflow/utils/data/dataloader.py Co-authored-by: daquexian <daquexian566@gmail.com> * fix typo * refine * fix RDMAIsInitialized * minor fix * refine * rename InitRdma to InitRDMA * refine Co-authored-by: Flowingsun007 <flowingsun007@163.com> Co-authored-by: daquexian <daquexian566@gmail.com> * add 1d send recv in nccl logical (#8355) * add 1d send recv in nccl logical * Update insert_nccl_logical_op_pass.cpp * auto format by CI Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support iree ci (#8419) * create mlir cpu and modify build gcc 7 shell script * fix the bug of test_iree_resnet.py cuda test in cpu version error * fix constant folding tests * suport oneflow_test_cpu_only * pub * build script add flag * modify test yml * add python3 into \PATH * don't use pretrain model * install flowvision Co-authored-by: mosout <mosout@qq.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> * Feat straighten task nodes (#8347) * Add a fast topological traversal * Add an initial implementation of straighen nodes * Add the straighen nodes algorithm * Change algorithm structure * Remove some debug information * Finalize the straighten algorithm after deciding the parameters by experiments * Notify the usage of straighten algorithm * Of format * Update oneflow/core/graph/straighten_nodes.cpp Of format Co-authored-by: daquexian <daquexian566@gmail.com> * Of format * Stop using visual string before we find a better key * Remove magic numbers and Of format * Remove starts * Of format * Fix a bug of using GetMaxVal<int32_t>() as an initial number for comparing * Refactor add straighten algo interface (#8435) * feat(*): export straighten nodes algorithm inferface * export documentation * Update python/oneflow/nn/graph/graph_config.py Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> * Use TopoForEachNodeFast as default. (#8436) * Use TopoForEachNodeFast as default. Rename the original one as TopoForEachNodeDynamic * Speed up TopoForEachNodeFast when traversing a subgraph * Rename the switch and code clean up * Hide the class TopoStruct * Hide all the other functions * Grammar * Of format Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * Refactor NLLLoss to support split class dim (#8380) * refactor * RuntimeError * avoid atomic add * test * fixes * update test * update test * update test * fix kernel * improve backward * update test * out_weight to be required * address static analysis errer * fix static analysis error * fix static analysis error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Strict ordering in memory reuse algorithm (#8441) * Support broadcast in fused_softmax kernel (#8321) * support broadcast * refine * Remove shape check * fix sbp when broadcast * rollback softmax grad threshold * increase threshold of test conv bn folding * tol to 1e-2 * check error msg of fuse softmax ops * add more dispatch * remove double datatype test and add broadcast test Co-authored-by: cheng cheng <472491134@qq.com> * Merge slice and logical slice (#8416) * remove Slice, SliceUpdate, SliceGrad op * rename logical_slice to slice and logical_slice_assign to slice_update * move gradient_func logical_slice.cpp to slice.cpp * fix some bug and refine local test * feat(SliceUpdate): support 0size tensor * test(Slice): refine consistent slice test * test(SliceUpdate): refine consistent slice_update test * not export slice_update's inplace parameter * auto format by CI * recovery slice_grad_op * fix slice_view bug * add error message and attr judgement * modified old test * auto format by CI * update test README * update tensor_string code * fix test bug * auto format by CI * fix(hsplit): hsplit functor bug * fix vsplit doc test bug * refine * fix test * fix pin_memory bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Graph block.config.set_stage() for recommended Pipeline api. (#8442) * Graph block.config.set_stage() for recommended Pipeline api. * revert diff * refine api doc Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Update PolynomialLR's doc and paramater (#8430) * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * rename the steps to decay_batch in parameters * update PolynomialLR test case Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add mv op (#8445) * add mv op with bug that Int is incompatible * add test * update test_mv.py * fix based on comments * fix based on comments Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * enable oneflow_iree(python package) and corresponding test works in ci (#8431) * update test.yml * add pytest for oneflow_iree examples * add oneflow frontend test * Dev tensor is pinned api (#8447) * support tensor.is_pinned * add test case * add docs * auto format by CI * refine * auto format by CI * refine * auto format by CI * refine * refine * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Nd sbp tensor str (#8458) * nd sbp tensor str * add nd sbp tensor str test * bigger input size * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Patch sbp cost (#8378) * Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Add the slight penalty for eager * Consider B -> (B, B) for a scalar * Do not consider parallel description in priority ratio * Of format * Fix a bug in the old version group boxing with 2D SBP (#8448) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Fix bugs of patch-sbp_cost (#8456) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Reduce to uniform B for 1 device. Use the actual parallel description for each tensor * Fix a bug of fix-group_boxing-bug * Group boxing reduce [2, 2]: (S0, S0) to [4]: S0, then we might infer a 1D SBP from a 2D SBP hint Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: cheng cheng <472491134@qq.com> * Decouple stream and instruction (#7607) * remove deprecated python api * backup code * backup code * fix compiler complaints * fix typo in refactoring * kMockDevice * add unit test test_mock.py * revert mock kernels * vert DEVICE_TYPE_SEQ * mock placement * address pr comments * register device kCriticalSectionDevice and kLazyJobLauncher * kControlDevice * Stream::vm_stream_ * fix compiler complaints * backup code * rename StreamIsTransport to IsCommNetStream * decouple vm::StreamType and vm::InstructionType * fix compiler complaints * remove 'gpu' related code * address static analyzer complaints * address static analyzer complaints * remove unused module in test_mock.py * the Env is never destroyed. * export Env into python * more unittests * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * fix oneflow.placement.__str__ * revert GlobalSync * init_producer_stream in oneflow.from_numpy * debug code for vm * init disable_vm_threads_ in VirtualMachine::VirtualMachine * Update oneflow/core/vm/virtual_machine.h Co-authored-by: daquexian <daquexian566@gmail.com> * create stream in forked subprocesses. * refactor StreamRoleSwitch to StreamRoleVisistor * ThreadLocalGuard * auto format by CI * fix compiler complaints * fix static analyzer complaints * VirtualMachine::GetVmStream * fix static analyzer complaints * reimplement AddAndReadVector by std::deque * reimplement AddAndReadVector * merge master * increase atol for test_consistent_rnn_cell.py * StreamRole::AsyncLaunchedCommNet is bound to EventRecordedCudaStreamType * auto format by CI * remove StreamRoleVisitor<T>::VisitInvalid * no copy in AddAndReadVector * fix bug of AddAndReadVector::size_ * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix AddAndReadVector::GetGranularity * remove bad unittest * auto format by CI * rename CallInstructionType to OpCallInstructionType * sta…

hjchen2 · 2022-07-29T06:54:38Z

oneflow/core/functional/impl/math_functor.cpp

+      // therefore, output tensor and input will not inplaced. When scalar_math op/kernel
+      // support strided tensor as input, the problem above will be solved!
+      JUST(OpInterpUtil::Dispatch(*op_, {x}, outputs.get(),
+                                  OpExprInterpContext(attrs, /*inplace=*/true)));


这个OpExprInterpContext的inplace只有这一个地方设置了，其他functor里都没设置，不会有问题吗

嗯，应该是没问题的，不会影响其他已有的inplace逻辑

turn_on_view_slice

eb9e0b2

Flowingsun007 marked this pull request as ready for review May 25, 2022 08:09

Flowingsun007 requested review from hjchen2, chengtbf and strint as code owners May 25, 2022 08:09

Flowingsun007 added enhancement eager op labels May 25, 2022

Flowingsun007 requested a review from oneflow-ci-bot May 25, 2022 08:43

Flowingsun007 enabled auto-merge (squash) May 25, 2022 08:43

Merge branch 'master' into turn_on_view_slice

8e14a4b

Flowingsun007 changed the title ~~turn_on_view_slice~~ turn on view slice May 26, 2022

Flowingsun007 added 2 commits May 26, 2022 11:14

Merge branch 'master' into turn_on_view_slice

422dd47

Merge branch 'master' into turn_on_view_slice

9ee6f0d

Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot May 26, 2022 04:44

Merge branch 'master' into turn_on_view_slice

9a5cea6

Flowingsun007 added 3 commits June 10, 2022 04:35

inplace scalar math hnandle non-contiguous input

bfc0419

Merge branch 'master' into turn_on_view_slice

6baa2e0

Merge branch 'master' into turn_on_view_slice

9415033

Flowingsun007 commented Jun 10, 2022

View reviewed changes

Merge branch 'master' into turn_on_view_slice

b7eba6c

Flowingsun007 and others added 2 commits June 15, 2022 12:32

refactor

33789f1

auto format by CI

d953a9b

Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 15, 2022 12:34

Merge branch 'master' into turn_on_view_slice

4bcae33

github-actions bot removed the automerge label Jun 15, 2022

Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 15, 2022 13:39

Merge branch 'master' into turn_on_view_slice

ce93ba5

Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 16, 2022 00:41

Flowingsun007 disabled auto-merge June 16, 2022 00:54

strint approved these changes Jun 16, 2022

View reviewed changes

jackalcooper approved these changes Jun 16, 2022

View reviewed changes

jackalcooper merged commit 362f19c into master Jun 16, 2022

jackalcooper deleted the turn_on_view_slice branch June 16, 2022 05:35

hjchen2 reviewed Jul 29, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turn on view slice #8302

turn on view slice #8302

Flowingsun007 commented May 25, 2022 •

edited

Loading

github-actions bot commented May 25, 2022

github-actions bot commented May 25, 2022

github-actions bot commented May 26, 2022

github-actions bot commented May 26, 2022

github-actions bot commented May 26, 2022

github-actions bot commented May 26, 2022

github-actions bot commented May 26, 2022

github-actions bot commented Jun 6, 2022

Flowingsun007 Jun 10, 2022

strint Jun 10, 2022 •

edited

Loading

strint Jun 15, 2022

Flowingsun007 commented Jun 10, 2022 •

edited

Loading

github-actions bot commented Jun 13, 2022

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 16, 2022

github-actions bot commented Jun 16, 2022

github-actions bot commented Jun 16, 2022

hjchen2 Jul 29, 2022

Flowingsun007 Jul 29, 2022

turn on view slice #8302

turn on view slice #8302

Conversation

Flowingsun007 commented May 25, 2022 • edited Loading

github-actions bot commented May 25, 2022

github-actions bot commented May 25, 2022

github-actions bot commented May 26, 2022

github-actions bot commented May 26, 2022

github-actions bot commented May 26, 2022

github-actions bot commented May 26, 2022

github-actions bot commented May 26, 2022

github-actions bot commented Jun 6, 2022

Flowingsun007 Jun 10, 2022

Choose a reason for hiding this comment

strint Jun 10, 2022 • edited Loading

Choose a reason for hiding this comment

strint Jun 15, 2022

Choose a reason for hiding this comment

Flowingsun007 commented Jun 10, 2022 • edited Loading

github-actions bot commented Jun 13, 2022

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 15, 2022

github-actions bot commented Jun 16, 2022

github-actions bot commented Jun 16, 2022

github-actions bot commented Jun 16, 2022

hjchen2 Jul 29, 2022

Choose a reason for hiding this comment

Flowingsun007 Jul 29, 2022

Choose a reason for hiding this comment

Flowingsun007 commented May 25, 2022 •

edited

Loading

strint Jun 10, 2022 •

edited

Loading

Flowingsun007 commented Jun 10, 2022 •

edited

Loading