Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return infinity for different sbps while is_mutable #8783

Merged
merged 8 commits into from
Jul 29, 2022
Merged

Conversation

Yipeng1994
Copy link
Contributor

A patch to general basic communication

@Yipeng1994 Yipeng1994 added bug graph graph mode labels Jul 28, 2022
@Yipeng1994 Yipeng1994 requested a review from wyg1997 July 28, 2022 09:24
@github-actions
Copy link
Contributor

Speed stats:

@github-actions
Copy link
Contributor

Speed stats:

@Yipeng1994 Yipeng1994 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 29, 2022 10:25
@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8783/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.7ms (= 12969.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.2ms (= 14319.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 143.2ms / 129.7ms)

OneFlow resnet50 time: 76.1ms (= 7610.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.6ms (= 8359.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 83.6ms / 76.1ms)

OneFlow resnet50 time: 49.0ms (= 9803.5ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.0ms (= 11790.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.20 (= 59.0ms / 49.0ms)

OneFlow resnet50 time: 37.2ms (= 7437.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 41.9ms (= 8371.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.13 (= 41.9ms / 37.2ms)

OneFlow resnet50 time: 31.9ms (= 6373.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.1ms (= 7829.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.23 (= 39.1ms / 31.9ms)

OneFlow swin dataloader time: 0.274s (= 54.812s / 200, num_workers=1)
PyTorch swin dataloader time: 0.148s (= 29.508s / 200, num_workers=1)
Relative speed: 0.538 (= 0.148s / 0.274s)

OneFlow swin dataloader time: 0.068s (= 13.687s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.526s / 200, num_workers=4)
Relative speed: 0.623 (= 0.043s / 0.068s)

OneFlow swin dataloader time: 0.042s (= 8.384s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.560s / 200, num_workers=8)
Relative speed: 0.544 (= 0.023s / 0.042s)

❌ OneFlow resnet50 time: 145.6ms (= 14561.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 177.1ms (= 17705.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 177.1ms / 145.6ms)

OneFlow resnet50 time: 95.1ms (= 9513.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.9ms (= 11289.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 112.9ms / 95.1ms)

OneFlow resnet50 time: 66.8ms (= 13362.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.4ms (= 17476.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 87.4ms / 66.8ms)

OneFlow resnet50 time: 55.6ms (= 11116.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.9ms (= 16173.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 80.9ms / 55.6ms)

OneFlow resnet50 time: 48.3ms (= 9669.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.6ms (= 13722.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.42 (= 68.6ms / 48.3ms)

@mergify mergify bot merged commit a2e5ba5 into master Jul 29, 2022
@mergify mergify bot deleted the fix-gbc-bug branch July 29, 2022 19:13
Yipeng1994 added a commit that referenced this pull request Aug 8, 2022
* Add RMSLayerNorm Module (#8725)

* add T5LayerNorm for libai

* add docs and test for t5 layernorm

* add docs and refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* refactor lazy job instruction policy (#8735)

* refactor lazy job instruction policy

* refine

* refine

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* refine qat conv module tests (#8748)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>

* refine oneflow readme introduction (#8779)

* refine

* refine

* refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* remove unused graph resource config API (#8727)

* remove unused api

* delete api gpu device num

* refactor PadFunctor (#8747)

* refactor padfunctor

* refine

* refine

* refine

* refactor touch tensors instruction type (#8774)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add SparseSoftmaxCrossEntropyMsGrad op (#8758)

* fix gradient shuffle bug and typo (#8759)

fix bug

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* United allocators (#8591)

* ThreadLocalGuard

* implementation hint

* raw impl

* refactor

* rm useless code

* refine

* refactor

* refine

* refactor

* refine

* catch out_of_memory_error

* debug

* debug

* refactor

* VirtualMachineEngine::ForEachStreamWithinDevice

* refactor signature of VirtualMachineEngine::DispatchInstruction

* Dispatch ReleaseTensor instructions as mush as possiable.

* dispatch ReleaseTensor

* revert VirtualMachine::Dispatchable

* rm useless code

* raw impl of release tensor policy

* refine

* refine

* refactor ReleaseTensorInstructionPolicy

* refactor

* rename

* refactor

* refine

Co-authored-by: luyang <flowingsun007@163.com>

* fix t5 layernorm test bug (#8793)

* skip t5_layernorm test

* revert

* fix bug

* refine

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* MLIR sbp dialect attribute for parallel signature (#8492)

* add dev docs

* add todo

* add docs

* add 2d example

* use abbreviation

* add more docs

* update docs

* refine docs

* add naive tests

* basic parsing

* fix order

* rename ods

* add docs

* fix typo

* add assemblyFormat

* sbp dialect

* add sbpdialect.cpp.inc

* remove undefined td item

* add attribute printer parser

* remove sbp attr in oneflow dialect

* precommit

* append sbp dialect to oneflowops.h

* variable enable new sbp attr

* evoid null value and single source of truth

* add basic parse of 1nd

* 2nd support

* 2d sbp signature

* _2d to 2d

* 2d to nd

* dim to sbp

* without mlir parser

* use mlir parse

* round trip is ok

* wrap parse done

* enable parse

* modify readme.md

* filecheck basic_parse (use tempfile package)

* enable unittest 2nd and use tempfile to do filecheck

* enable test script

* rename

* more details in error

* lit check error

* add parse input

* rename as PrintSbpAttrToString

* define get_mlir_from_serialized_job return string

* trim include

* remove commit

* cuda to cpu

* add ConvertJobToIR in pybind11

* refine

* auto format by CI

* serial pb in convertjobtoir

* pub

* auto format by CI

* serialized savejobtoir convertjobtotosair

* push

* ninja c1 done

* auto format by CI

* sbp to SBP

* rename parallel_signature to psig

* auto format by CI

* sbp.[s|b|p] to sbp.[S|B|P]

* Update oneflow/ir/lib/OneFlow/Passes.cpp

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* rename psig to parallel

* fix

* Update oneflow/ir/include/OneFlow/OneFlowOps.td

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>

* auto format by CI

* fix

* fix

* doc update

* Update oneflow/ir/oneflow-translate/lib/OneFlow/MLIROneFlowTranslation.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* fix

* Update oneflow/ir/oneflow-translate/lib/OneFlow/Importer.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* fix

* fix

* auto format by CI

* fix

* add exit

* Update oneflow/ir/lib/OneFlow/Passes.cpp

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>

* Update oneflow/ir/lib/OneFlow/Passes.cpp

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>

* if dyn_cast

* extract function

* sbp importer for linker

* auto format by CI

* not fix

* fix link

* auto format by CI

* fix

* fix

* update oneflow iree version in test

* add sbp::Any

* fix

* minor refactor

* fix segfault

* add

* add

* sort logged job

* add loc

* rm log

* larger tol

* copy

Co-authored-by: yuhao <1171760467@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* resolve the bug of using ONEFLOW_PYTHON_BASE_DIR in CMake  (#8792)

* resolve bug

* remove cmake definition

* fix amp pass when lbi2ibns size greater than 1 (#8746)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Return infinity for different sbps while is_mutable (#8783)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Refactor ep stream types (#8790)

* refactor_ep_stream_types

* remove EpDeviceCtx

* refine ~EpStreamPolicyBase()

* reslove comments

* minor fix

* fix CreateEpBackendAllocator error

* refine

Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>

* RawReader (#8721)

* RawReader

* direct

* refine test

* format

* error message

* Update oneflow/ir/include/OneFlow/OneFlowUserOps.td

Co-authored-by: guo ran <360112263@qq.com>

* Mut

* refine

* NOLINT

* Mut

* refine

Co-authored-by: guo ran <360112263@qq.com>

* Fix kineto and cupti not found (#8786)

* fix kineto and cupti not found

* fix compiling kineto

* revert kineto version

* fix dynamic_loss_scale_schedule ods and adjust the round trip pass order (#8799)

fix dynamic_loss_scale_schedule ods and adjust the order of ir round trip and dynamic loss scale passes

* refactor auto contiguous and check view inplace operation (#8791)

* Fix pip install failure in release workflow (#8801)

fix

* Dev refactor critical section instruction policy (#8761)

* refactor critical section instruction policy

* refine

* refine

* change unique_ptr to shared_ptr

* naive_instruction_policy

* code format

* add error output info

* code format

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add isfinite (#8023)

* add isfinite

* fix

* refine docstr

* refine using new template

* fix

* fix

* fix

* fix format error in docstr

* fix static check

* fix

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Refactor ccl allreduce (#8760)

* refactor_ccl_allreduce

* reslove comment

* move collective_communication/ to oneflow/user/kernels/

* fix static check error

* fix static check error

* refine

* refine

* refine

* use collective_communication namespace

* add UserOpRegistryMgr::IsOpKernelRegistered

* rename CommunicationContext and ccl

* remove CollectiveCommunicationFactory

* refine

* reslove comment and fix static check

* minor fix

* fix static check error

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* _shutdown_workers does nothing if _utils is freed (#8804)

_shutdown_workers does nothing if _utils is free

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* refactor_critical_section_and_lazy_job_stream_type (#8805)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* mv id_shuffle testcase to expensive dir (#8806)

mv id_shuffle testcase to expensive

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Fix bug of init_tmp_buffer_ptr in CallContext (#8811)

fix_init_tmp_buffer_ptr_bug_in_call_ctx

* Fix global tensor clone (#8813)

* Modify global tensor clone

* Fix tensor to test

* Fix

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: binbinHan <han_binbin@163.com>

* relax cuda.set_device requirement (#8794)

* relax set_cuda_device requirement

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Remove OfBlob, ForeignXXX kernels and other old code (#8785)

* remove old serving code

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove AddInputOutputOpsPass

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove some old code

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove OfBlob, foreign* kernels and other legacy code

Signed-off-by: daquexian <daquexian566@gmail.com>

* restore GetSerializedCurrentJob

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove Blob in EagerBlobObject

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* completely remove ForeignXXX

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* remove some JobBuildAndInferCtx_* method, rt_mode and hob

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove unused code after merging master

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Broadcast tensors (#8745)

* ThreadLocalGuard

* broadcast_tensors

* address pr comments

* fix static analyzer complaints

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Remove PhyInstrOperand and InstructionType (#8815)

* Remove PhyInstrOperand and InstructionType

* auto format by CI

Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Tmp compute (#8570)

* ThreadLocalGuard

* StreamRole::kTmpCompute

* SoftSyncStream in InstructionsBuilder::TouchTensors

* fix conflicts

* ONEFLOW_AD_PUT_LOSS_ON_TMP_COMPUTE_STREAM

* merge master

* AsyncedDevice2Host

Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add double grad for slice op (#8784)

* add double grad for scale op

* optimize code path

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* scalar math kernel use primitive (#8612)

* scalar math use primitive

* fix

* rm useless code

* add div and fix bug

* broadcast floormod and fmod

* add test

* address review

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Rename StreamRole to StreamType (#8816)

* Rename StreamRole to StreamType

* rm stream_role.h

* refine define

* refine

Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Tensor from numpy support stride (#8808)

* from_numpy support stride

* add test case

* refine

* rm printf

* fix comments

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Dev AdaDelta Optimizer (#8636)

* add adadelta optimizer

* fix bug and add eager unittest

* support Graph Mode

* support fuse update_ops_pass

* Add adadelta docs

* revert

* fix docs

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Sequentialize add n (#8507)

* ThreadLocalGuard

* sequentialize backward add_n

* sequentialize backward add_n

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>

* Sync vm mode guard (#8212)

* ThreadLocalGuard

* SyncVmModeGuard

* identity_eval

* auto format by CI

* fix static analyzer complaints

* remove identity_eval

* SyncVmMode

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Fix copy not support broadcast (#8773)

* revert

* revert

* fix comment

* refine test

* auto format by CI

* refine

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix get default cpu device (#8752)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* separate lazy and eager tensor names (#8826)

* Add Cross Feature Interaction in AMP List[OneEmbedding] (#8807)

* Fix eval error

* add cross feature interaction in amp list

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Env var compute on worker thread (#8687)

* ThreadLocalGuard

* refactor ONEFLOW_VM_WORKERLOAD_ON_SCHEDULER_THREAD to ONEFLOW_VM_COMPUTE_ON_WORKER_THREAD

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Schedule yield (#8796)

* ThreadLocalGuard

* std::this_thread::yield when nothing to do in vm.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add conv higher order derivative (#8688)

* add conv higher order derivative

* refine

* refine

* add testcase and refine

* fix bug

* update testcase

* refine

* refine testcase

* refine

* refine

* optimize code path

* auto format by CI

* refine code comment

* fix static analysis initialize error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* refine graph lr scheduler test (#8829)

fix graph lr scheduler test

* Fix nn init eye bug (#8825)

* add nn init eye op

* refine

* fix op bug

* refine

* fix docs

* auto format by CI

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Fix binary cross entropy with logits op bug (#8819)

* skip t5_layernorm test

* revert

* fix bug

* refine

* fix binary cross entropy with logits op bug

* revert

* refine

* refine

* refine

* refine

* refine test

* refine

Co-authored-by: mosout <mosout@qq.com>

* Fix build failure when accessing https://docs.python.org/3/objects.inv (#8839)

rm unused

* Primitives check n_dims gt 0 (#8827)

* Default copy eager boxing expr (#8830)

* default_copy_eager_boxing_expr

* minor fix

* Update oneflow/api/python/framework/tensor_functions.cpp

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>

* Update oneflow/api/python/framework/tensor_functions.cpp

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>

* auto format by CI

* fix eager broadcast op def bug

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Support OneEmbedding in cpp api[OneEmbedding] (#8681)

* Add save interface to save snapshot info

* Add one embedding oneflow api

* fix namespace

* change to use handler

* add kv store option info

* fix compile

* fix

* delete useless test

* fix

* refine one embedding in cpp api

* clean codes

* refine

* use state dict to save

* fix save logic

* fix key error

* Enable load multi one embedding tables

* Remove redundant header file

* add linux limit

* Remove redundant headerfile

Co-authored-by: mosout <mosout@qq.com>

* Stream wait (#8571)

* ThreadLocalGuard

* stream_wait

* Instruction::Prescheduleable

* env var ONEFLOW_VM_ENABLE_STREAM_WAIT

* fix static check error

* fix conflicts

* enable StreamWait

* do not use an object after std::move

* refactor Instruction::Done

* Fix typo in oneflow/core/framework/instructions_builder.cpp

* support stream_wait in AccesBlobByCallback

* put flow._C.stream_touch(buffers) into post_forward_hook

* no event query for StreamWait

* Update oneflow/core/framework/instructions_builder.cpp

Co-authored-by: binbinHan <han_binbin@163.com>

* auto format by CI

* merge master

* include cuda_runtime_api.h

* replace cuda_stream_api.h with cuda_stream.h

* using default flags for cudaStreamWaitEvent

* passing zero to 3rd argument of cudaStreamWaitEvent

* fix complier complaints

* fix bug in StreamWaitInstructionPolicy::InitInstructionStatus

Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>

* Refactor ccl all gather and reduce scatter (#8814)

* rename REGISTER_COLLECTIVE_COMMUNICATION_FACTORY to REGISTER_COLLECTIVE_COMMUNICATION

* refactor_ccl_allgather_and_reduce_scatter

* reslove comment

* reslove comments

* fix macro lock error

* fix an idiot error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Bump nccl up to 2.13.4 (#8738)

bump nccl up to 2.13

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>

* modify reduce_like_ops.cpp and broadcast_like_op.cpp (#8762)

* modify reduce_like_ops.cpp and broadcast_like_op.cpp

* test(BroadcaseLike): add global test

* auto format by CI

Co-authored-by: wyg1997 <wangyinggang@foxmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Refactor 1n1d sbp (#8755)

* refactor sbp in 1n1d

* init

* refine

* refine

* refine

* Fix SinkTick op GetSbp and revert some check (#8764)

fix(*): fix SinkTick op GetSbp and revert some check

* refine

* fix static analysis error

* Update oneflow/core/operator/operator.cpp

Co-authored-by: Yipeng Li <jamesonli1313@gmail.com>

* Update oneflow/core/operator/operator.cpp

Co-authored-by: Yipeng Li <jamesonli1313@gmail.com>

* auto format by CI

* refine

* refine

* remove duplicate code

* fix reduce_sum_like infer sbp error

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: Yipeng Li <jamesonli1313@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Prevent benchmark failure (#8860)

rm from entry

* Feat support more tensor setitem (#8741)

* add code by hjchen2

* fix(SetItem): run contiguous slice setiem ok

* add debug code(revert this commit later)

* Support TensorScatterNdUpdate non-contiguous kernel (#8732)

* feat(TensorScatterNdUpdate): support non-contiguous kernel

* refine IsContiguous function for ShapeView input

* refine IsContiguous for shape input

* add TensorScatterNdUpdate test and insure contiguous index

* Remove useless code

* feat(MaskSetItem): support update scalar tensor

* test(MaskSetItem): add test

* Revert "add debug code(revert this commit later)"

This reverts commit 8355bf2.

* remove useless code

* Update oneflow/core/functional/tensor_index.cpp

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* test(SetItem): add combined indexing setitem test

* fix conflict in tensor_meta.h

* Add check before transpose input tensor in setitem op

* fix(SetItem): fix scalar tensor expand dim and setitem

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* libai support bfloat16 (#8818)

* bert support bfloat16

* enable_amp add param dtype

* refine

* fuse_cast_scale support bfloat16

* fix build

* fix tidy

* fix build

* fix build

* fix build

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* resnet50 support amp data_type bfloat16 (#8812)

* resnet50 support bfloat16

* enable_amp add param dtype

* fix bug

* address review

* fix 0-size tensor

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix wrong paths to keep for op repr locations (#8851)

fix wrong paths to keep

* Refactor ccl reduce and broadcast (#8823)

* rename REGISTER_COLLECTIVE_COMMUNICATION_FACTORY to REGISTER_COLLECTIVE_COMMUNICATION

* refactor_ccl_allgather_and_reduce_scatter

* refactor ccl::Reduce

* remove useless code

* refactor ccl::Broadcast

* fix static check error

* reslove comment

* monir fix

* reslove comments

* fix macro lock error

* refine

* fix an idiot error

* fix reduce functor bug

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix build for cuda_bf16 (#8862)

fix build

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* remove old serving code (#8781)

* remove old serving code

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove AddInputOutputOpsPass

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add module.requires_grad_ api (#8836)

* add module.requires_grad_ api

* refine

* register l2_normalize double dtype (#8863)

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Liang Depeng <liangdepeng@gmail.com>
Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: yuhao <1171760467@qq.com>
Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: Ping Zhu <58718936+reygu@users.noreply.github.com>
Co-authored-by: ZZK <359521840@qq.com>
Co-authored-by: Shiyuan Shangguan <shiyuan@oneflow.org>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: Zhimin Yang <76760002+small1945@users.noreply.github.com>
Co-authored-by: wyg1997 <wangyinggang@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants