Return infinity for different sbps while is_mutable #8783

Yipeng1994 · 2022-07-28T09:24:42Z

A patch to general basic communication

github-actions · 2022-07-29T05:56:53Z

Speed stats:

github-actions · 2022-07-29T08:54:20Z

Speed stats:

github-actions · 2022-07-29T18:23:11Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8783/

github-actions · 2022-07-29T18:25:40Z

Speed stats:

GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.7ms (= 12969.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.2ms (= 14319.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 143.2ms / 129.7ms)

OneFlow resnet50 time: 76.1ms (= 7610.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.6ms (= 8359.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 83.6ms / 76.1ms)

OneFlow resnet50 time: 49.0ms (= 9803.5ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.0ms (= 11790.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.20 (= 59.0ms / 49.0ms)

OneFlow resnet50 time: 37.2ms (= 7437.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.9ms (= 8371.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.13 (= 41.9ms / 37.2ms)

OneFlow resnet50 time: 31.9ms (= 6373.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.1ms (= 7829.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.23 (= 39.1ms / 31.9ms)

OneFlow swin dataloader time: 0.274s (= 54.812s / 200, num_workers=1)
PyTorch swin dataloader time: 0.148s (= 29.508s / 200, num_workers=1)
Relative speed: 0.538 (= 0.148s / 0.274s)

OneFlow swin dataloader time: 0.068s (= 13.687s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.526s / 200, num_workers=4)
Relative speed: 0.623 (= 0.043s / 0.068s)

OneFlow swin dataloader time: 0.042s (= 8.384s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.560s / 200, num_workers=8)
Relative speed: 0.544 (= 0.023s / 0.042s)

❌ OneFlow resnet50 time: 145.6ms (= 14561.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 177.1ms (= 17705.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 177.1ms / 145.6ms)

OneFlow resnet50 time: 95.1ms (= 9513.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.9ms (= 11289.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 112.9ms / 95.1ms)

OneFlow resnet50 time: 66.8ms (= 13362.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.4ms (= 17476.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 87.4ms / 66.8ms)

OneFlow resnet50 time: 55.6ms (= 11116.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.9ms (= 16173.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 80.9ms / 55.6ms)

OneFlow resnet50 time: 48.3ms (= 9669.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.6ms (= 13722.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.42 (= 68.6ms / 48.3ms)

* Add RMSLayerNorm Module (#8725) * add T5LayerNorm for libai * add docs and test for t5 layernorm * add docs and refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refactor lazy job instruction policy (#8735) * refactor lazy job instruction policy * refine * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refine qat conv module tests (#8748) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> * refine oneflow readme introduction (#8779) * refine * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * remove unused graph resource config API (#8727) * remove unused api * delete api gpu device num * refactor PadFunctor (#8747) * refactor padfunctor * refine * refine * refine * refactor touch tensors instruction type (#8774) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add SparseSoftmaxCrossEntropyMsGrad op (#8758) * fix gradient shuffle bug and typo (#8759) fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * United allocators (#8591) * ThreadLocalGuard * implementation hint * raw impl * refactor * rm useless code * refine * refactor * refine * refactor * refine * catch out_of_memory_error * debug * debug * refactor * VirtualMachineEngine::ForEachStreamWithinDevice * refactor signature of VirtualMachineEngine::DispatchInstruction * Dispatch ReleaseTensor instructions as mush as possiable. * dispatch ReleaseTensor * revert VirtualMachine::Dispatchable * rm useless code * raw impl of release tensor policy * refine * refine * refactor ReleaseTensorInstructionPolicy * refactor * rename * refactor * refine Co-authored-by: luyang <flowingsun007@163.com> * fix t5 layernorm test bug (#8793) * skip t5_layernorm test * revert * fix bug * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * MLIR sbp dialect attribute for parallel signature (#8492) * add dev docs * add todo * add docs * add 2d example * use abbreviation * add more docs * update docs * refine docs * add naive tests * basic parsing * fix order * rename ods * add docs * fix typo * add assemblyFormat * sbp dialect * add sbpdialect.cpp.inc * remove undefined td item * add attribute printer parser * remove sbp attr in oneflow dialect * precommit * append sbp dialect to oneflowops.h * variable enable new sbp attr * evoid null value and single source of truth * add basic parse of 1nd * 2nd support * 2d sbp signature * _2d to 2d * 2d to nd * dim to sbp * without mlir parser * use mlir parse * round trip is ok * wrap parse done * enable parse * modify readme.md * filecheck basic_parse (use tempfile package) * enable unittest 2nd and use tempfile to do filecheck * enable test script * rename * more details in error * lit check error * add parse input * rename as PrintSbpAttrToString * define get_mlir_from_serialized_job return string * trim include * remove commit * cuda to cpu * add ConvertJobToIR in pybind11 * refine * auto format by CI * serial pb in convertjobtoir * pub * auto format by CI * serialized savejobtoir convertjobtotosair * push * ninja c1 done * auto format by CI * sbp to SBP * rename parallel_signature to psig * auto format by CI * sbp.[s|b|p] to sbp.[S|B|P] * Update oneflow/ir/lib/OneFlow/Passes.cpp Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * rename psig to parallel * fix * Update oneflow/ir/include/OneFlow/OneFlowOps.td Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * auto format by CI * fix * fix * doc update * Update oneflow/ir/oneflow-translate/lib/OneFlow/MLIROneFlowTranslation.cpp Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> * fix * Update oneflow/ir/oneflow-translate/lib/OneFlow/Importer.cpp Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> * fix * fix * auto format by CI * fix * add exit * Update oneflow/ir/lib/OneFlow/Passes.cpp Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Update oneflow/ir/lib/OneFlow/Passes.cpp Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * if dyn_cast * extract function * sbp importer for linker * auto format by CI * not fix * fix link * auto format by CI * fix * fix * update oneflow iree version in test * add sbp::Any * fix * minor refactor * fix segfault * add * add * sort logged job * add loc * rm log * larger tol * copy Co-authored-by: yuhao <1171760467@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> * resolve the bug of using ONEFLOW_PYTHON_BASE_DIR in CMake (#8792) * resolve bug * remove cmake definition * fix amp pass when lbi2ibns size greater than 1 (#8746) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Return infinity for different sbps while is_mutable (#8783) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refactor ep stream types (#8790) * refactor_ep_stream_types * remove EpDeviceCtx * refine ~EpStreamPolicyBase() * reslove comments * minor fix * fix CreateEpBackendAllocator error * refine Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> * RawReader (#8721) * RawReader * direct * refine test * format * error message * Update oneflow/ir/include/OneFlow/OneFlowUserOps.td Co-authored-by: guo ran <360112263@qq.com> * Mut * refine * NOLINT * Mut * refine Co-authored-by: guo ran <360112263@qq.com> * Fix kineto and cupti not found (#8786) * fix kineto and cupti not found * fix compiling kineto * revert kineto version * fix dynamic_loss_scale_schedule ods and adjust the round trip pass order (#8799) fix dynamic_loss_scale_schedule ods and adjust the order of ir round trip and dynamic loss scale passes * refactor auto contiguous and check view inplace operation (#8791) * Fix pip install failure in release workflow (#8801) fix * Dev refactor critical section instruction policy (#8761) * refactor critical section instruction policy * refine * refine * change unique_ptr to shared_ptr * naive_instruction_policy * code format * add error output info * code format Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add isfinite (#8023) * add isfinite * fix * refine docstr * refine using new template * fix * fix * fix * fix format error in docstr * fix static check * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refactor ccl allreduce (#8760) * refactor_ccl_allreduce * reslove comment * move collective_communication/ to oneflow/user/kernels/ * fix static check error * fix static check error * refine * refine * refine * use collective_communication namespace * add UserOpRegistryMgr::IsOpKernelRegistered * rename CommunicationContext and ccl * remove CollectiveCommunicationFactory * refine * reslove comment and fix static check * minor fix * fix static check error Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * _shutdown_workers does nothing if _utils is freed (#8804) _shutdown_workers does nothing if _utils is free Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refactor_critical_section_and_lazy_job_stream_type (#8805) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * mv id_shuffle testcase to expensive dir (#8806) mv id_shuffle testcase to expensive Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix bug of init_tmp_buffer_ptr in CallContext (#8811) fix_init_tmp_buffer_ptr_bug_in_call_ctx * Fix global tensor clone (#8813) * Modify global tensor clone * Fix tensor to test * Fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: binbinHan <han_binbin@163.com> * relax cuda.set_device requirement (#8794) * relax set_cuda_device requirement Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Remove OfBlob, ForeignXXX kernels and other old code (#8785) * remove old serving code Signed-off-by: daquexian <daquexian566@gmail.com> * remove AddInputOutputOpsPass Signed-off-by: daquexian <daquexian566@gmail.com> * remove some old code Signed-off-by: daquexian <daquexian566@gmail.com> * remove OfBlob, foreign* kernels and other legacy code Signed-off-by: daquexian <daquexian566@gmail.com> * restore GetSerializedCurrentJob Signed-off-by: daquexian <daquexian566@gmail.com> * remove Blob in EagerBlobObject Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * completely remove ForeignXXX Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * remove some JobBuildAndInferCtx_* method, rt_mode and hob Signed-off-by: daquexian <daquexian566@gmail.com> * remove unused code after merging master Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Broadcast tensors (#8745) * ThreadLocalGuard * broadcast_tensors * address pr comments * fix static analyzer complaints Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Remove PhyInstrOperand and InstructionType (#8815) * Remove PhyInstrOperand and InstructionType * auto format by CI Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Tmp compute (#8570) * ThreadLocalGuard * StreamRole::kTmpCompute * SoftSyncStream in InstructionsBuilder::TouchTensors * fix conflicts * ONEFLOW_AD_PUT_LOSS_ON_TMP_COMPUTE_STREAM * merge master * AsyncedDevice2Host Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add double grad for slice op (#8784) * add double grad for scale op * optimize code path * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * scalar math kernel use primitive (#8612) * scalar math use primitive * fix * rm useless code * add div and fix bug * broadcast floormod and fmod * add test * address review Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Rename StreamRole to StreamType (#8816) * Rename StreamRole to StreamType * rm stream_role.h * refine define * refine Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Tensor from numpy support stride (#8808) * from_numpy support stride * add test case * refine * rm printf * fix comments * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Dev AdaDelta Optimizer (#8636) * add adadelta optimizer * fix bug and add eager unittest * support Graph Mode * support fuse update_ops_pass * Add adadelta docs * revert * fix docs Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Sequentialize add n (#8507) * ThreadLocalGuard * sequentialize backward add_n * sequentialize backward add_n Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> * Sync vm mode guard (#8212) * ThreadLocalGuard * SyncVmModeGuard * identity_eval * auto format by CI * fix static analyzer complaints * remove identity_eval * SyncVmMode Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix copy not support broadcast (#8773) * revert * revert * fix comment * refine test * auto format by CI * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix get default cpu device (#8752) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * separate lazy and eager tensor names (#8826) * Add Cross Feature Interaction in AMP List[OneEmbedding] (#8807) * Fix eval error * add cross feature interaction in amp list * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Env var compute on worker thread (#8687) * ThreadLocalGuard * refactor ONEFLOW_VM_WORKERLOAD_ON_SCHEDULER_THREAD to ONEFLOW_VM_COMPUTE_ON_WORKER_THREAD Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Schedule yield (#8796) * ThreadLocalGuard * std::this_thread::yield when nothing to do in vm. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add conv higher order derivative (#8688) * add conv higher order derivative * refine * refine * add testcase and refine * fix bug * update testcase * refine * refine testcase * refine * refine * optimize code path * auto format by CI * refine code comment * fix static analysis initialize error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * refine graph lr scheduler test (#8829) fix graph lr scheduler test * Fix nn init eye bug (#8825) * add nn init eye op * refine * fix op bug * refine * fix docs * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix binary cross entropy with logits op bug (#8819) * skip t5_layernorm test * revert * fix bug * refine * fix binary cross entropy with logits op bug * revert * refine * refine * refine * refine * refine test * refine Co-authored-by: mosout <mosout@qq.com> * Fix build failure when accessing https://docs.python.org/3/objects.inv (#8839) rm unused * Primitives check n_dims gt 0 (#8827) * Default copy eager boxing expr (#8830) * default_copy_eager_boxing_expr * minor fix * Update oneflow/api/python/framework/tensor_functions.cpp Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * Update oneflow/api/python/framework/tensor_functions.cpp Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * auto format by CI * fix eager broadcast op def bug Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Support OneEmbedding in cpp api[OneEmbedding] (#8681) * Add save interface to save snapshot info * Add one embedding oneflow api * fix namespace * change to use handler * add kv store option info * fix compile * fix * delete useless test * fix * refine one embedding in cpp api * clean codes * refine * use state dict to save * fix save logic * fix key error * Enable load multi one embedding tables * Remove redundant header file * add linux limit * Remove redundant headerfile Co-authored-by: mosout <mosout@qq.com> * Stream wait (#8571) * ThreadLocalGuard * stream_wait * Instruction::Prescheduleable * env var ONEFLOW_VM_ENABLE_STREAM_WAIT * fix static check error * fix conflicts * enable StreamWait * do not use an object after std::move * refactor Instruction::Done * Fix typo in oneflow/core/framework/instructions_builder.cpp * support stream_wait in AccesBlobByCallback * put flow._C.stream_touch(buffers) into post_forward_hook * no event query for StreamWait * Update oneflow/core/framework/instructions_builder.cpp Co-authored-by: binbinHan <han_binbin@163.com> * auto format by CI * merge master * include cuda_runtime_api.h * replace cuda_stream_api.h with cuda_stream.h * using default flags for cudaStreamWaitEvent * passing zero to 3rd argument of cudaStreamWaitEvent * fix complier complaints * fix bug in StreamWaitInstructionPolicy::InitInstructionStatus Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> * Refactor ccl all gather and reduce scatter (#8814) * rename REGISTER_COLLECTIVE_COMMUNICATION_FACTORY to REGISTER_COLLECTIVE_COMMUNICATION * refactor_ccl_allgather_and_reduce_scatter * reslove comment * reslove comments * fix macro lock error * fix an idiot error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Bump nccl up to 2.13.4 (#8738) bump nccl up to 2.13 Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * modify reduce_like_ops.cpp and broadcast_like_op.cpp (#8762) * modify reduce_like_ops.cpp and broadcast_like_op.cpp * test(BroadcaseLike): add global test * auto format by CI Co-authored-by: wyg1997 <wangyinggang@foxmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Refactor 1n1d sbp (#8755) * refactor sbp in 1n1d * init * refine * refine * refine * Fix SinkTick op GetSbp and revert some check (#8764) fix(*): fix SinkTick op GetSbp and revert some check * refine * fix static analysis error * Update oneflow/core/operator/operator.cpp Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> * Update oneflow/core/operator/operator.cpp Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> * auto format by CI * refine * refine * remove duplicate code * fix reduce_sum_like infer sbp error Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Prevent benchmark failure (#8860) rm from entry * Feat support more tensor setitem (#8741) * add code by hjchen2 * fix(SetItem): run contiguous slice setiem ok * add debug code(revert this commit later) * Support TensorScatterNdUpdate non-contiguous kernel (#8732) * feat(TensorScatterNdUpdate): support non-contiguous kernel * refine IsContiguous function for ShapeView input * refine IsContiguous for shape input * add TensorScatterNdUpdate test and insure contiguous index * Remove useless code * feat(MaskSetItem): support update scalar tensor * test(MaskSetItem): add test * Revert "add debug code(revert this commit later)" This reverts commit 8355bf2. * remove useless code * Update oneflow/core/functional/tensor_index.cpp Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * test(SetItem): add combined indexing setitem test * fix conflict in tensor_meta.h * Add check before transpose input tensor in setitem op * fix(SetItem): fix scalar tensor expand dim and setitem Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * libai support bfloat16 (#8818) * bert support bfloat16 * enable_amp add param dtype * refine * fuse_cast_scale support bfloat16 * fix build * fix tidy * fix build * fix build * fix build Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * resnet50 support amp data_type bfloat16 (#8812) * resnet50 support bfloat16 * enable_amp add param dtype * fix bug * address review * fix 0-size tensor Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix wrong paths to keep for op repr locations (#8851) fix wrong paths to keep * Refactor ccl reduce and broadcast (#8823) * rename REGISTER_COLLECTIVE_COMMUNICATION_FACTORY to REGISTER_COLLECTIVE_COMMUNICATION * refactor_ccl_allgather_and_reduce_scatter * refactor ccl::Reduce * remove useless code * refactor ccl::Broadcast * fix static check error * reslove comment * monir fix * reslove comments * fix macro lock error * refine * fix an idiot error * fix reduce functor bug Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix build for cuda_bf16 (#8862) fix build Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * remove old serving code (#8781) * remove old serving code Signed-off-by: daquexian <daquexian566@gmail.com> * remove AddInputOutputOpsPass Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add module.requires_grad_ api (#8836) * add module.requires_grad_ api * refine * register l2_normalize double dtype (#8863) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Liang Depeng <liangdepeng@gmail.com> Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: yuhao <1171760467@qq.com> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com> Co-authored-by: Ping Zhu <58718936+reygu@users.noreply.github.com> Co-authored-by: ZZK <359521840@qq.com> Co-authored-by: Shiyuan Shangguan <shiyuan@oneflow.org> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Zhimin Yang <76760002+small1945@users.noreply.github.com> Co-authored-by: wyg1997 <wangyinggang@foxmail.com>

Return infinity for different sbps while is_mutable

7e020b7

Yipeng1994 added bug graph graph mode labels Jul 28, 2022

Yipeng1994 requested a review from wyg1997 July 28, 2022 09:24

Yipeng1994 requested review from chengtbf and strint as code owners July 28, 2022 09:24

wyg1997 approved these changes Jul 28, 2022

View reviewed changes

strint approved these changes Jul 29, 2022

View reviewed changes

Yipeng1994 requested a review from oneflow-ci-bot July 29, 2022 02:37

Yipeng1994 added the automerge label Jul 29, 2022

Yipeng1994 and others added 2 commits July 29, 2022 10:38

Merge branch 'master' into fix-gbc-bug

1bc9863

Merge branch 'master' into fix-gbc-bug

b78ee5c

Merge branch 'master' into fix-gbc-bug

30e2db8

Yipeng1994 added 2 commits July 29, 2022 17:15

Merge branch 'master' into fix-gbc-bug

41c2d95

Merge branch 'master' into fix-gbc-bug

e1c232b

Yipeng1994 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 29, 2022 10:25

mergify bot added 2 commits July 29, 2022 12:52

Merge branch 'master' into fix-gbc-bug

b715ca5

Merge branch 'master' into fix-gbc-bug

793d9f4

mergify bot merged commit a2e5ba5 into master Jul 29, 2022

mergify bot deleted the fix-gbc-bug branch July 29, 2022 19:13

Yipeng1994 mentioned this pull request Aug 5, 2022

Fix bug after merging master into auto_parallel #8780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return infinity for different sbps while is_mutable #8783

Return infinity for different sbps while is_mutable #8783

Yipeng1994 commented Jul 28, 2022

github-actions bot commented Jul 29, 2022

github-actions bot commented Jul 29, 2022

github-actions bot commented Jul 29, 2022

github-actions bot commented Jul 29, 2022

Return infinity for different sbps while is_mutable #8783

Return infinity for different sbps while is_mutable #8783

Conversation

Yipeng1994 commented Jul 28, 2022

github-actions bot commented Jul 29, 2022

github-actions bot commented Jul 29, 2022

github-actions bot commented Jul 29, 2022

github-actions bot commented Jul 29, 2022