Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix error on exiting #5053

Merged
merged 7 commits into from
May 31, 2021
Merged

fix error on exiting #5053

merged 7 commits into from
May 31, 2021

Conversation

daquexian
Copy link
Contributor

修复 #5042 引起的 python 退出时报错的问题

Signed-off-by: daquexian <daquexian566@gmail.com>
@oneflow-ci-bot oneflow-ci-bot removed their request for review May 31, 2021 12:55
@oneflow-ci-bot oneflow-ci-bot self-requested a review May 31, 2021 14:15
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot May 31, 2021 15:33
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot May 31, 2021 16:39
@oneflow-ci-bot oneflow-ci-bot merged commit 5133512 into master May 31, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the fix_atexit_error branch May 31, 2021 17:23
oneflow-ci-bot added a commit that referenced this pull request Jun 3, 2021
* Add scalar support of greater less module (#4841)

* add scalar input support

* format

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 9b11938c30c90ab8ea17a34a161f913503c254ee

* Refine optimizer (#4840)

* refactor(Optim): refine optimizer codes

* docs(SGD): add document for SGD

* docs(SGD): fix code

* test(Adam): fix test_optim_adam bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: cd6ffac6215df1231894d269e1c26f1eeb23b841

* add docstring (#4846)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: da82bb8cb0e7f9da7082d3c646f28679cf9fac3c

* fix eager with unknow symbol id (#4752)

* fix eager with unknow symbol id

* minor fix

* fix conflict

* remove unnnecessary function

* remove unnecessary header

* remove unnecessary methods

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: aa9f6f76b3a375b7b34fa54d63ab2f138f580dd4

* Dev fix linear module (#4836)

* add broadcast matmul support

* refine

* add batch matmul support

* remove redundant test case

* linear module support high dimension input

* format

* fix linear

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: f1ccf2a324a0e73b74c20bb8c61585a8c2d3087c

* Add rmsprop optimizer (#4834)

* add rmsprop optimizer

* fix rmsprop optimizer bug

* fix rmsprop optimizer bug

* add rmsprop optimizer docs

* add rmsprop docs

* fix comment

* fix comment

* fix comment

* fix comment

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: f476d48d9934efaf2450a8070305a9dd5e906af8

* add adamw optimizer (#4824)

* init adamw optimizer

* fix adamw optimizer bug

* fix comment

* fix comment

* code format

* fix comment

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: f15a8aea8b57f726405755204369c7937a6cd412

* add experimental apis (#4817)

* add experimental apis

* merge master fix conflict

* revert flow._oneflow_internal.dtype to flow.dtype

* refine

* fix test optimizer

* update module docs

* fix unit tests

* fix matmul module test

* fix adamw and rmsprop tests

* fix crossentropy loss grad

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: c50ff3cc54c05dd1ad6f924388f8f06057ade663

* remove MakeParallelDescByDevice, fix the missing setting of parallel_desc in InstructionMsg copy constructor (#4850)

Signed-off-by: daquexian <daquexian566@gmail.com>
Former-commit-id: d5d7ef56ac50d38502116a3e38d9f07b5aae2900

* add experimental (#4856)



Former-commit-id: 505d4865f714e71b4d8530430f2ef1334128fd1c

* A more efficient implementation of NLL Loss (#4854)

* A more efficient implementation of NLL Loss

* A more efficient implementation of NLL Loss

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: ad5b01290db298b282bf4d8e7cc4162687bf5e19

* Add where module (#4845)

* add broadcast_like module

* add where module, still has bug

* fix where module bug

* fix where module bug

* fix bug and add where module

* fix where module commnet

* code format

* fix where module

* code format

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: edf9a1900fc3b0552fdea0101bbbe229937d9b5a

* dev_compare_cfg_file (#4860)

* dev_compare_cfg_file

* add def of org_content

* minor fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 5e399d38c0fbb7a07cdc20d11ad60ac3e72318bc

* stateful local opkernel: return a temp parallel ctx (#4857)

* add temp parallel ctx for single card

Signed-off-by: daquexian <daquexian566@gmail.com>

* add TODO comment

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: f450ff89fff152471356de3c60285be6bcb7407e

* Optimize memory occupancy for interface 1.0 (#4844)

* Do not save inputs in function nodes even if requires_grad is true.

* Allocate raw memory with actual size.

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: e5dadaf569f9685e401dba9df11ed45e9d80f30d

* use less event records (#4861)

* use less event records

* more comments

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 218f61e3c6ebc84166359771406cd806a4476b5a

* Refine cast grad func. (#4853)



Former-commit-id: dfe7759e1d64ac5394c1da3699c4355cce13a46a

* copy eager blob object to/from numpy in c++, use busy loop to wait (#4839)

* numpy: create np arr in python and copy in c++, use busy loop to wait

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* add CopyBetweenMirroredTensorAndNumpy

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 87c0c3d8a0a98505133a91f8a8d78f498aa28932

* align squeeze module with torch (#4855)

* align squeeze module with torch

* fix comment

* fix argmax bug

* fix bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: a57b764926fd0bcd7be560db1614c1736f7049d8

* Fix adam weight decay param (#4835)

* fix adam weight decay

* fix adam weight decay

* fix comment

* fix comment

* fix commnet

* fix commnet

* fix commnet

* fix bug

* fix(Adam): fix Adam test bug

* revert adam test threshold to 1e-3

* fix(Adam): fix adam test bug and adjust param to increase error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: wyg1997 <wyg19970408@gmail.com>
Former-commit-id: 1d5f743d67744af6bad3fe5257147e15cdeee502

* Fix groupnorm (#4848)

* fix GroupNorm and modify test case

* add grad op for reshape_like op

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 9bf14b8b2fc785362677292ed3f800458ea0ad38

* use composed attr map in contexts (#4838)

* use composed attr map

Signed-off-by: daquexian <daquexian566@gmail.com>

* move implementation to .cpp

Signed-off-by: daquexian <daquexian566@gmail.com>

* OpExpr::New returns Maybe

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 9d78fdbae59fc25e3d0b8fbaa5b40e27e911f6d5

* flow.Size support negative index and add test (#4870)

* feat(PySize): support negative index and add test

* style(*): refine code

* format code

Former-commit-id: 3519d2e7de127abb6d5e74654fb91e9806eea849

* fix export experimental docs bug (#4867)

* fix export experimental docs bug

* fix export experimental docs bug

* fix export experimental docs bug

Co-authored-by: Yao Chi <later@usopp.net>
Former-commit-id: d78c2bc6340aeaa9f1afac432544f68ac15661e2

* reorder VirtualMachine fields (#4873)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 1ce9f262c40dc315357f1cc8d03ee568ae56f2fc

* Align module params with torch (#4865)

* align mean module

* allow negative dim param

* support tuple of negative dim param

* refine

* format

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 373cefce1c9761ecef79242cab323e5c67836628

* Generate cfg header and source files in parallel and prevent rebuild from scratch when Python version changes (#4876)

* refine

* Update cfg.cmake

* Update cfg.cmake

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 72204213754c32e48e704854eef802380da659ae

* fix interpreter determin output leaf and grad (#4872)

* fix interpreter determin output leaf and grad

* fix GradMode get

* simplify

* add test for no_grad

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: de05e7a561e83fe3dbf0f7fca314787fd4561d0b

* support crossentropy loss 3dim (#4875)

* support crossentropy loss 3dim

* merge conflict

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 4e689d258b193b836a5408648e599df6bf28e4dc

* supoort nllloss 3dim (#4874)

* supoort nllloss 3dim

* supoort nllloss 3dim

* merge conflict

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 30a75cee97eada21ea281fdaf00b467ea0080b68

* Device compute dep object (#4862)

* Device::compute_dep_object_

* sequantialize instructions in the same stream.

* adjust atexit sort

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: clackhan <han_binbin@163.com>
Former-commit-id: 55a223cfd0c54ac0fda2f0ac647795e90a14625d

* remove cambricon quantization test (#4879)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 9cddc629be2053dd5a51eb7f56672f733d6b390e

* Use dlopen to call ibverbs APIs (#4852)

* check in naive struct

* refine

* refine

* refine

* refine

* add functions

* refine

* refine

* refine

* fmt

* refine

* refine

* refine

* refine

* refine

* refine

* add note

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* rm include

* revert cmakelist changes

* refine

* address review

* rename

* address review

* address review

* remove glog dependency

* fix

* refine

* refine

* print lib path in stdout

* address review

* address review

* fix

* support ONEFLOW_LIBIBVERBS_PATH

* add case

* update init_cluster_env.py for ONEFLOW_LIBIBVERBS_PATH

* fix comment

* address review

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 75f11b8257112c7afd0c777abf7cddc01b6b495c

* Add copy user op (#4842)

* copy user op

* add to module and tensor.to interface

* remove unnecessary code

* backward for tensor.to

* remove capture of input

* support cpu only tensor

* module to (#4858)

* remove backward kernel and op

* friendly deal with when tensor.grad is None

* minor fix

* minor fix

* revert

* suport 1m1d only

* skip test normalization

* skip test normalization

* skip conv

* support construct device using string

* minor fix

* minor fix

* use maybe

* fix device id type for device infer ctx

* skip batchnorm

* skip some tensor test case

Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 2d5fae50b72583cea8d297b73ef7397b2153f356

* Fix reduce sum grad func. (#4882)

* Fix reduce sum grad func.

* Fix zeros op

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: a40222857621e0161a23a1d6fbcda9db60e477a6

* align dim size funtion (#4880)

* align dim size funtion

* fix dim usage

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 4592d467bf570afb606eca2f6ec175a5d283db89

* support expand and repeat op int datatype (#4883)

* support expand and repeat op int datatype

* code format

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 2939e8cbce6b0f0e6573ea524b9c6923b92a3e80

* return LocalTensor (TensorTuple) directly from op expr __call__ (#4864)

* expose local tensor

Signed-off-by: daquexian <daquexian566@gmail.com>

* mt19937 -> minstd_rand

Signed-off-by: daquexian <daquexian566@gmail.com>

* revert unnecessary diff

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix comments

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 72b76a2991061fba0b382399f5933031efb7b54d

* Support custom parameters for optimizer (#4881)

* feat(Optim): support custom parameters for optimizer

* feat(Adam): adam support custom parameters

* feat(Adamw): adamw support custom parameters

* feat(RMSprop): rmsprop support custom parameters

* style(Optim): refine adam and adamw

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Former-commit-id: a26f7080b866c4be0a606a6c82dffa7975233f32

* align transpose module with pytorch (#4877)

* align transpose module with pytorch

* fix comment

* align tranpose module

* support expand and repeat op int datatype

* fix bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 2a5bc3594e047f1e6f46d12a5d87934bf14692b7

* Add ones like op (#4889)

* Add ones like op.

Conflicts:
	oneflow/api/foreign_lock_helper.h
	oneflow/api/python/autograd/autograd.cpp
	oneflow/api/python/framework/tensor.cpp
	oneflow/core/framework/op_interpreter/op_interpreter.cpp
	oneflow/core/framework/op_interpreter/op_interpreter_util.cpp
	oneflow/core/framework/tensor.cpp
	oneflow/core/framework/tensor_impl.cpp
	oneflow/core/framework/tensor_impl.h

* Add ones_like unittest.

* Use SwithCase

* Fix typo

* undef

* Bugfix

* Fix merge conflicits

Co-authored-by: hjchen2 <hjchen2>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 7b4c8c5016797f070a562144328c877a7f5c36b7

* Cpu support conv module (#4894)

* support expand and repeat op int datatype

* support conv cpu module

* support conv cpu module

* support conv cpu module

* support conv cpu module

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 59fcd0272395e36c1227cd733493cb638fb910b6

* Async cuda stream type (#4895)

* add class AsyncCudaStreamType

* fix bug

* remove useless headfile

Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 8784b21dd445f098672d02db9c6b69c72e61cfc3

* skip infer instr when physical operand != nullptr, remove unused code (#4868)

* Disable infer instruction if instruction type has physical operand

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove more infer instructions

Signed-off-by: daquexian <daquexian566@gmail.com>

* raise UNIMPLEMENTED() in infer

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix hanging on exit

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix typo

Signed-off-by: daquexian <daquexian566@gmail.com>

* wrap results by Tensor() in .to()

Signed-off-by: daquexian <daquexian566@gmail.com>

* set need_check_mem_case to false for copy op

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 3137f51c726e551527a3d2a13e6bac3828dda13a

* fix and check to with module on forwad and backward (#4897)

* fix and check to on forwad and backward

* add todo

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 585afc097a1cf603b7b75292eb8fff73bff0cedb

* add JUST (#4891)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 98b88e209d6eccdaf08ee89580ba4e1c67081704

* Fix docs bug (#4892)

* support expand and repeat op int datatype

* fix modules docs bug

* fix docs bug

* fix docstring bug

* fix docs bug

* fix docs bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: fb164b58ebb86a5cfe388fb692a3626d66589099

* Add warning when no param update (#4896)

* style(Optim): add warning when no param update

* style(Optim): add TODO

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 7e9902aa05ce841da92d4dda3422319124b1b4a7

* Add gather embedding module (#4826)

* add gather module

* add gather module

* add test case

* add embedding module

* fix comments

* update embedding module and test case

* refine

* fix comment

* fix comment

* fix comment

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 986becca17c8285b12028adcf9cecd3eef6d656f

* Support create cpu only tensor (#4863)

* support create cpu tensor

* add empty op

* remove skip tensor test case:

* remove skip tensor test case

* remove TODO

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: dab6ba23172462d0eb3978c48828098104acae51

* Add permute module (#4901)

* support expand and repeat op int datatype

* fix modules docs bug

* fix docs bug

* fix docstring bug

* fix docs bug

* fix docs bug

* add permute module

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 698e8cd516a71fa93003bb8981b7e91a24fb7b27

* remove useless code in expand module test (#4903)



Former-commit-id: ac06d3c82a37681596dbd0620e6ddefe19f31c94

* Add conv docs (#4904)

* remove useless code in expand module test

* add conv2d docs

Former-commit-id: 422ced2780efc30ce6ba9c015c2e32780248564e

* Fix eager test bug (#4678)

* skip test_gpt_data_loader in eager mode

* 1_node_fix_egaer_test_bug

* remove useless head file

* skip tensor and module

* skip 2-D sbp in eager mode

* fix error

* fix bug and remove some skip under eager

* fix error

* del oneflow_api

* rm test_tensor.py

* skip test_summary in eager mode

* skip test_stateful_local_kernel under cpu only mode

* add class AsyncCudaStreamType

* fix bug

* import os

* remove BlobObject::is_python_shutting_down_

* fix error

* sikp 2d sbp

* minor fix

* refine comment

* make of_format

Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 425bd439b360088fd742b437e4301ae3841a2b3c

* cache cudnn handle in bn infer (#4906)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 516b3e5e76c0b8b8299997b5c82928a2104215f7

* Fix ones zeros like (#4907)

* feat(xxxLikeOp): ones_like and zeros_like use user op

* fix(Optim): fix learning rate device error bug

* style(*): format codes

* style(*): use int instead of np.int

* test(Optim): add optimizer gpu test (#4908)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 6d7c6d8bcddd846690aa3dc2e70d4c250517ad2e

* Bump nccl from  2.8.3  to v2.9.8 (#4899)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 3294cf2ad5f7e9420125fc5620b614bf086235f3

* Add prelu module (#4902)

* support expand and repeat op int datatype

* add prelu module

* add prelu module

* add prelu module

* fix comments

* fix comment

* fix comment

* add backward test

* fix comment

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 138af6167ea131537d734ddf73bc3ffd568e819c

* add InitEagerSession for eager mode (#4589)

* try to merge eager ofrecord to master branch

* refine

* temp fix

* try to add seed but fails

* try to add seed but failsclear

* use global function to init mirror/conssitent flag

* fix test

* add modules

* fix record modules

* fix destruction order

* fix mirror gen seed

* skip record unit test

* remove TODO

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: mosout <mosout@qq.com>
Co-authored-by: Ldpe2G <liangdepeng@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: a139f49eaff8a00512367abd5f9af65fc6ee786f

* add hardtanh module (#4914)



Former-commit-id: d3c91a97081f9b34756b191992b1006a43270ca5

* fix_matmul_module_test_ci_bug (#4905)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 5c4a0d748eb0bf69ac3f9f3af3d308714b565eee

* Cpu support of batchnorm layernorm module (#4890)

* add rsqrt moduel

* add batchnorm module cpu support

* refine

* update

* fix param ini

* add reduce series modules

* add batchnorm,layernorm modules and test cases

* refine

* update .rst

* refine according to comments

* refine

* update

* fix layernorm bug

* refine

* remove additional license

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 9262e1ae6db9f179ef806be53076d15f886cc791

* Add flow.tensor (#4829)

* add flow.tensor

* small fix

* fix dtype

* Update oneflow/python/framework/tensor.py

Co-authored-by: daquexian <daquexian566@gmail.com>

* deal with multi-dimension list or tuple

* remove list

* Update oneflow/python/framework/tensor.py

Co-authored-by: daquexian <daquexian566@gmail.com>

* add unit test case

* format

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: c43caa88395fa3dbf587793ab93429cf5a8a26c5

* Feat lr scheduler (#4921)

* feat(LrScheduler): add ConsineScheduler

* feat(LrScheduler): update cosine_scheduler and add test

* feat(LrScheduler): refine codes

* style(*): format codes

* docs(LrScheduler): add document

* docs(LrScheduler): refine documents

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: f7881ce7c500b0f768da559510c649051b784d8b

* Fix CommNetIf::RegisterMemory/UnRegisterMemory lock scope (#4918)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 1a7c26b1252090ce47443025c2108e88c50ff9f9

* add leakyrelu module (#4912)

* add leakyrelu module

* code format

* update docs

* update docs

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 2b74432f3eab93f60bf1f8edfb648d8adf16348f

* Support cast in tensor.to (#4917)

* support cast in to

* refactor to interface

* refine doc

* refine doc

* refine kwargs

* add test case support tensor

* minor fix and test case

* format

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 9fac6f36cbc7767fb705bbb4f0fe11c7c706390d

* add hard swish module (#4915)

* add hard swish module

* fix bug

* fix bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 98f937f1ff1b09412e8787e598802a864c557882

* Replace oneflow_worker with worker agent (#4900)

* naive impl

* refine

* refine

* add log

* refine

* refine

* refine

* refine

* add todo

* refine

* refine

* sync dynamic libs

* refine

* fix docker cmd

* fix rank

* refine

* refine

* add callbacks simple rpc

* refine

* refine

* fix

* refine

* refine

* refine

* fix conn

* support tradional mode

* refine

* refine

* refine

* rm

* refine

* refine

* refine

* refine todo

* refine

* refine

* rm unused

* rm todo

* revert

* refine

* add log

* refine

* refine

* fix order

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* rm

* rename

* add comment

* refine

* rm

* refine

* refine

* refine

* refine

* add todo

* add info

* refine

* refine

* refine

* add back some legacy code

* refine

* refine

* refine

* refine

* refine

* rm oneflow_worker exe

* rm log

* fix bug

* support --cmd

* add check

* refine

* fix

* fmt

Former-commit-id: 37c63928dab947b61f5844c68cdf44da9248889c

* add hardsigmoid module (#4919)

* add hardsigmoid module

* refine docs

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 937ccaf3a8cdbbcee71fdd086019444c7aa6591d

* Refactor tensor (#4916)

* refactor Tensor

* Export ConsistentTensor::is_cuda

* minor fix

* minor fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 087c9753ba0a2627bab235b0e81685758e580ae2

* add relu6 module (#4925)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: c013fac1f4aad5348e6a5514a7ece4c4ed294b29

* add cuda test for add method(module) (#4888)

* copy user op

* add to module and tensor.to interface

* remove unnecessary code

* backward for tensor.to

* remove capture of input

* support cpu only tensor

* module to (#4858)

* remove backward kernel and op

* friendly deal with when tensor.grad is None

* minor fix

* minor fix

* revert

* suport 1m1d only

* skip test normalization

* skip test normalization

* skip conv

* support construct device using string

* minor fix

* minor fix

* use maybe

* fix device id type for device infer ctx

* skip batchnorm

* skip some tensor test case

* startup of add backward

* startup of add gpu test

* refine

* add cuda test for Linear module

* refine after sum fixed

* gpu backward

* gpu backward crashed

* retain grad

* refine according to comments of WangYinggang

* refine: construct specified device tensor

* refine testcase

* refine: specifiy device when construct in test case

* refien testcase for linear

* refine

* refien to_device

* refine import statement

* refine import path

* remove useless _to_device fun

Co-authored-by: poohRui <yuruil@qq.com>
Co-authored-by: Yurui Li <32978179+poohRui@users.noreply.github.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 2e268dfc5a82c78d6f4dba35356d657519eff7b5

* add upsample module (#4923)

* add upsample module

* add upsample2d unittest

* add upsample2d unittest

* add docs

* add UpsamplingNearest2d and UpsamplingBilinear2d module

* code format and add docs

* add more unit_test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 8125b43f683af24868dd83806b6bb3c0d2529ca5

* add elu module (#4924)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 4dfc375a26affbb6ac92ed69b1b56b879cc28ce2

* improve ofrecord unit test (#4920)

* improve ofrecord unit test

* remove no_grad

* fix codes according to review

* fix format

Former-commit-id: 984b1f084b4590770dd8cc404c84f551ab082db9

* align inplace param (#4933)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 8b0bdcdc30125d184894aeaf7cc6ca3ac7871c07

* refactor SGDUpdate and MomentumUpdate UserOp (#4930)

* refactor(ModelUpdate): SGDUpdate and MomentumUpdate use optional input
                       for learning rate

* fix(*): fix bugs

* style(*): refine code

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 49d53fecb10e7ff8d2a97a1f7fd52542d26839c3

* Dev support tensor slice (#4898)

* add scalar input support

* format

* update tensor slice function

* add slice slice_update module

* add slice funtion in tensor

* refine

* fix tensor slice

* fix bug

* add tenser slice test case

* add logical_slice_assign module

* fix LogicalSliceAssign kernel to support eager local

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix tests

Signed-off-by: daquexian <daquexian566@gmail.com>

* update export strategy

* refine

* fix docs

* add more test case

* fix comments

* refine according to comments

* add TODO item

* format

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 4c958482cd62163cab174ca6551119a5a2b8b451

* add tensor.zeros_() and SoftSyncStream instr (#4927)

* add tensor.zeros_() and soft sync stream instr

Signed-off-by: daquexian <daquexian566@gmail.com>

* separate cpu and gpu version of SoftSyncStream

* Remove SyncAutoMemset

* fix compile error

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix wrong parallel_desc()

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove unused code

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 0e7f1c6abf1b241f9811a551b5d513c9c8af5fa4

* Only upload log if distributed test fails (#4934)

* only upload log if distributed test fails

* refine

* refine

* reduce timeout

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 08d62afa85b0b05dea51351d1182b0af9758dc55

* Add logsigmoid softplus module (#4929)

* add logsigmodi and softplus module

* code format

* fix docs

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 0a636c6f72d220f6ab28698342abf2a5bbf53963

* Add module backward test (#4926)

* add scalar input support

* format

* update tensor slice function

* add slice slice_update module

* add slice funtion in tensor

* refine

* fix tensor slice

* fix bug

* add tenser slice test case

* update softmax testcase

* refine softmax backward

* add logsoftmax backward test

* add maskedfill backward test case

* add sigmoid backward test

* rewrite transpose bacckward op

* add transpose backward test

* format

* refine

* rm useless code

* Fix transose unittest.

* refine according to comments

* update

* refine

* refine

* format

* fix backward testcase

* fix perm param

* fix bug

* numpy method to cal sigmoid grad

* format

* refine

* refine comments

Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 7911d713f2903d8ca73296be7ed964c19b56ed54

* fix flow.save (#4941)

Signed-off-by: daquexian <daquexian566@gmail.com>
Former-commit-id: dbd9d76e34d1acf7e0fb8b35abbbbd6df554b51b

* add math.abs (module)


Former-commit-id: 9670c24a707e01d5445196dc01c145bda792995d

* add math.abs (module)

* fix zero point in fake quantization pass (#4586)

* fix zero point

Signed-off-by: daquexian <daquexian566@gmail.com>

* round zero_point in fake quant kernel to align with onnx

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: bc9ec6c4c67e91287b9ddd78d3b9b322fd1c3b75

* also allow ONEFLOW_DEBUG (#4950)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: f65019f2891468b14f8ee6e910441204df77a6cc

* Add Device Descriptor (#4939)

* Add Device Descriptor

* format

* refine

* refine

* check cuda version

* check cuda version

* fix

* fix

* fix WorkSize

* handle more error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 1cc67da54e801269605be9b0e662c91e0d606a5e

* Refactor Optimizers for eager (#4938)

* refactor(Adam): refactor Adam to use dynamic learning rate

* refactor(Adamw): refactor Adamw Optimizer

* refactor(Rmsprop): refactor Rmsprop to use dynamic learning rate

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 49060956c1a3ab07267996f80be8f380025c9fce

* Add instructions on making sys env permanent (#4949)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 64d3a1d9e47ce5c3a7a31a4ebc4fef48a3608735

* add gradient functions for dim_gather op (#4913)

* start up

* bugs left

* add backward test case

* fix bugs

* refine testcase

* refine

* replace optrait with composedAttrs

* refine

Former-commit-id: e400bc0b527246931c95232509b693870f243ed6

* add gradient funs for unary and binary math op (#4961)

* add gradient funs for unary and binary math op

* add test exp and pow example

* refine pow test case

* refine

* rename register macro

Former-commit-id: a559f632cc59bfe742bb3ca843e2cdb006ee5bfb

* Refactor consistent tensor (#4937)

* refactor Tensor

* Export ConsistentTensor::is_cuda

* remove ConsistentTensor::blob_object

* minor fix

* minor fix

* fix compiler complains

* remove unused code

* skip test_creating_consistent_tensor

* del useless function

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: clackhan <han_binbin@163.com>
Former-commit-id: 662cda36312270b311debbea7a4a614bc4e2cd48

* add backward test case


Former-commit-id: 28353e85d5d1aac17aefbcab9f15d78e9aa3bb0e

* add backward test case

* support tensorrt7 qat (#4958)

* support tensorrt7

* Ignore handling bias

* support label quantization

Former-commit-id: 53881e3918991227812dd2a4808b1b546318a7bf

* modify math.abs test case


Former-commit-id: 1aa043e7221e71f3d049bcc89b8bd673f132b4bc

* modify math.abs test case

* fix softmax testcase (#4948)

* add scalar input support

* format

* fix softmax testcase

* fix logsoftmax test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 10ca2bc085e0511ac529390f15cd539f8aa1cd3a

* add module backward (#4935)

* add reshape module backward

* add expand module backward

* add expand backward

* add expand module backward

* add expand module test

* add squeeze module backward

* code format

* add repeat module backward

* code format

* fix bug

* fix comment

* align expand module with torch

* align repeat module with torch

* fix comment

* fix comment

* fix comment

* fix code format

* fix confilict

* fix bug

* fix comment

* add module backward

* fix pow bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 5362a0d80ad9ecbea094c77879c307dfafe6c906

* rewrite unsqueeze backward (#4966)

* rewrite unsqueeze backward

* fix comments

* fix comment

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: ec1c80ff0246f2e27cb0cfd9860563a78f1105b2

* add module backward (#4942)

* add exp module backward

* add greater test

* add less module test

* add negative module backward

* code format

* add matmul backward

* add broadcast_matmul_backward

* code format

* add batch_matmul backward

* add argmax module test

* delete unuseless code

* fix comment

* fix comment

* fix comment

* fix comment

* fix commet

* code format

* fix bug

* fix pow bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: d20a995378791b102a53680f0661b35c13cae980

* fix activation ci bug (#4980)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 4e974d7beddeea2bf331c441aca0c0ae0beb9b2c

* Hashable attr map (#4951)

* Device::compute_dep_object_

* sequantialize instructions in the same stream.

* refactor AttrMap

* remove redundant header file includes

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: ded8a78095e5e62957454d8fd99982e573c83839

* Fix Global<CommNet>::Delete() (#4981)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 8aade7e74177a12cc49753e4c286e6f2dc521e37

* delete unused headfile


Former-commit-id: 2b3c27a907fbc75c537cabbbaaf5818efb2e2a29

* delete unused headfile

* update test case format


Former-commit-id: 9cda51fb978fe91245d8c40a4d3fc6a10f2dd4dc

* update test case format

* add of_softmax_use_fast_math (#4979)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: ba6c64e7648c8ae69a48240da2d262440521da93

* NetIB device enumeration (#4974)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 3f1038d6467e678fd887a0ba9441332412effaa1

* Fix local tensor requires grad (#4992)

* fix(Tensor): add requires_grad setter for ExportTensor

* test(Tensor): refine tensor autograd test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 095521826ea9d3d3e0f34318668d4f80bb90e571

* Support localtensor slice (#4985)

* add scalar input support

* format

* register local tensor slice methods

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 4f4ce1f2f0d3d11574a1ea62bdfaedfc181cb57a

* Symbol::shared_from_symbol (#4969)

* Symbol::shared_from_symbol

* fix bug in Symbol::shared_from_symbol

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: a30eb43f4de2694bc65697dbe5304953b06dedbf

* CI formats code automatically (#4983)

* check in

* qq mail

* wrong fmt

* auto format by CI

* use youarefly@qq.com

* wrong fmt

* auto format by CI

* use ci-bot@oneflow.org

* wrong fmt

* auto format by CI

Co-authored-by: oneflow-ci-bot <373331853@qq.com>
Co-authored-by: oneflow-ci-bot <youarefly@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 9cfee9b13b68f72fd35c5c6d1879f7b9fefe7a8a

* Eager consistent tensor (#4984)

* Device::compute_dep_object_

* sequantialize instructions in the same stream.

* refactor AttrMap

* refactor Tensor

* Export ConsistentTensor::is_cuda

* remove ConsistentTensor::blob_object

* refactor TensorImpl

* minor fix

* fix compiler' complains

* Implements EagerConsistentTensorImpl::New

* minor fix

* fix compiler complains

* remove unused code

* skip test_creating_consistent_tensor

* backup code

* Symbol::shared_from_symbol

* remove redundant header file includes

* fix bug in Symbol::shared_from_symbol

* symbolize ParallelDesc and ParallelDistribution

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: clackhan <han_binbin@163.com>
Former-commit-id: 3356bcad86103c357e12ce81ff86d757584b819e

* add resnet50 model test (#4957)

* add resnet50 model test

* udpate script

* udpate script

* relax tolerant

* run resnet50 in 1n1d

* fix format

* test resnet50 fun parameters

* add resnet50 with and without bn test

* fix resnet50 without bn train overflow

* change assertEqual to assertTrue

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 01278a66fcdaea8130b60bb85a59996d8f487341

* Add arg where module (#4998)

* add argmax test

* add argwhere module

* add argwhere module

* code format

* update unit_test

* fix commet

* update docs

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 816983aae65374300354e86b1fdb0703429003db

* fix hierarchical_sub_task_graph_builder condition (#4990)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 0cc7410469a80fb976a83907c6c05b008ec95f66

* Add __str__ and tolist for tensor (#4928)

* add __str__ and tolist

* initial tensor printer

* reorginized tensor str

* minor fix

* user nparray2string

* Add FunctionNode op_name (#4970)

* feat(FunctionNode): add op_type_name

* style(FunctionNode): rename op_name to op_type_name

* add test case for numel

* style(OpExpr): rename type_name to op_type_name (#4976)

* add test for tensor str

* minor fix

* minor fix

* support for local tensor

* support for local tensor

* format

* fix typo

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: cf2ec98e9738336bb01254c361ba7b459cc0445a

* Add squeeze module backward (#5007)

* add argmax test

* add squeeze module backward

* fix conflict

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 10c40f2800e9fe3accf948276173eab45fc54657

* RPC backend local supports barrier of barrier_num > 1 (#4968)

* check in changes

* refine

* fix

* erase when barrier exits

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: c337148645740fed43e2da293a4cc054aa184659

* add activation module backward test  (#4967)

* add tanh module backward test

* add gelu module backward

* fix activation ci bug

* add reshape module backward

* add tensor reshape module and code format

* add permute module backward

* fix conflict

* add argmax test

* fix permute module bug

* add prelu module cpu backward

* fix prelu gpu backward bug

* code format

* restruct hardtanh module test

* add hardtanh backward

* add hardswish backward

* add hardsigmoid module backward

* add relu module backward

* add relu6 module backward

* add elu module backward

* fix comments

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: c1873f7472de25b5e87189a9b8d2a83ec74e9741

* Add sys_ptrace for build docker container (#5005)

* add sys_ptrace build docker container

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 659f59e1207e71d45ee0c5cc419bcf5736600f12

* Refine pythonpath in cmake (#5002)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 7b80f84fa4030cba769aca3ed15f3ad3a1ba61c5

* Refactor scope parallel desc (#4996)

* Symbol::shared_from_symbol

* fix bug in Symbol::shared_from_symbol

* symbolize Scope::GetParallelDesc()

* IsScalarType

* fix compiler complains

* fix bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: clackhan <han_binbin@163.com>
Former-commit-id: 0be87a942e7a674a8ead4bc83c54be8c6b76cf5a

* add arange module backward (#4978)

* add arange module backward

* update

* refine

* fix comments

* refine

* fix docs

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 3878aea7b3f9dfaf433e2a4d1df20005452c2236

* CI skips resnet50 to prevent segfault (#5017)

* CI skip resnet50

* fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 13b461253345e0d1d39fae0db0fb7a65e204eee8

* NetSocket device enumeration (#4997)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: c38ccebb5d946f34bbfc846b6014548b55878520

* Add device unmatch info (#4989)

* add argmax test

* add device unmatch error

* delete unuse code

* delete unuse code

* refine code

* fix var name error bug

* add prelu exception get

* add prelu exception get

* add prelu exception get

* add prelu exception get

* add exception

* code format

* fix commet

* add more error information

* refine error info

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 8b0905b80d6dfcf4b63c9ee78a4e59f53f76905c

* BindFwBwObaPairs skip parallel_cast (#4986)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 3a8bee332adf4688097c7203eff755ea5cf29c8b

* rewrite matmul op backward (#4988)

* rewrite matmul op backward

* refine

* update

* fix comments

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: f06bf705fefaf5b16926adac6ef742a36164bf21

* Add concat module backward (#5013)

* add argmax test

* add concat module backward

* add concat backward impl

* add concat module backward

* fix concat module backward bug

* fix concat module backward bug

* fix concat module backward bug

* add concat module backward

* delete unuse code

* fix comments

* fix comments

* fix comments

* fix comments

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 21f1bddb2eb796167b7ada04a72f51554bdc9bd9

* rewrite dropout backward (#5014)

* rewrite dropout backward

* refine

* fix comments

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: bd778bf3821db03c25c1ed87cd884079d5109bbe

* fix has_grad template (#4962)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 1559abd4effcdca85b11d2802b36146f5283e211

* Upload core files optionally (#5020)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 7b5eb78c910afd685fab7579bf342b85e8e4d7b2

* Fix docs bug (#5019)

* add argmax test

* fix oneflow docstring bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 1e9c6ff448764db3983561d1e10816019184921b

* Doesn't allow CI to run PRs in parallel (#5016)

* update commit

* fix sha

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: d770a56dd45c86aa312a9961cc0d4e64f4e5a43a

* try fix (#5029)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 5b7c76494687350e8d5974902bc0324dc58867b8

* add cosh module (#4943)

* add cosh module

* fix the calculation of cosh backward

* add testcase of cosh

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 7142ed99b8c03506c6765178c27b9c4aae34e849

* Fix log level (#5009)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 539d6093e1c5682fd56784549a6dcabfb1a1c6fe

* oop oneflow.Model  with training, validation and checkpoint (#4972)

* trainer structure

* add test

* add nnmodel api

* nn Model draft

* try run global_func in Model

* fit to be refined

* model run global_func train & eval

* nn Model for function style execution draf test pass

* refactor nn model

* nn model with nessary component

* format

* rm nn prefix of Model

* flow.Model multi-task numpy-input

* (flow.Model)op_dataload support multi job

* (flow.Model) auto job_func signature for numpy input

* (flow.Model)support auto numpy input job

* (flow.Model) nump input multi job train test pass

* (flow.Model)fix classmethod

* fix test

* (oneflow.Model)training_step multi output, refine according to pep8

* (oneflow.Model)pep8 check pass by flake8

* pytorch-style module

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix typo, update parameter

Signed-off-by: daquexian <daquexian566@gmail.com>

* Model refine

* Model fix typo

* oneflow.Model optimizer variable lazy get, numpy job signature to DataModule

* oneflow.Model merge and format

* oneflow.Model: comment empty func to be overried

* Optimizer: lazy get var add check and tips

* oneflow.Model: refactor

* oneflow.Model: refactor 2

* add TODO, remove unused import, set consistent to True in parameter

Signed-off-by: daquexian <daquexian566@gmail.com>

* oneflow.Model: ModelStage -> SubStep, TrainStage -> TrainStep

* fix format

* oneflow.Model: SubStep to SubModel

* oneflow.Model: infer_oneflow_data_placeholder and _infer_job_signature

* set placement of parameter

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* add __init__.py in oneflow.python.nn.modules

Signed-off-by: daquexian <daquexian566@gmail.com>

* add todo for GetCurrentJobName()

* fix typo

* oneflow.Model: refine error message

* fix format

* OOPModel: import new Module

* oneflow.Model: rm FunctionConfig in Model

* oneflow.Model config_exe to config_execution

* OOPModel: add and test naive validate

* oneflow.Model: merge module

* Optimizer: user mode to confirm that Optimizer.Variable() is called inside a job

* merge master

* oop model : predict demo

* model inherit new module

* refine

* refine oneflow.mode

* add test oop model

* add test

* no_grad on ones_like

* fix has_grad template

* format

* fix data input

* fix

* check oop model

* add model checkpoint

* rm useless code

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 121b1423ae3f059ca6cb260f067d8047ddf4fa06

* Add upsample module backward (#5025)

* add argmax test

* add upsample module backward

* update upsample unittest

* fix unittest bug

* refine upsample backward

* code format

* fix comment

* fix comment

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 86d4a6723a9a2689a9d357455cd6b9b10376b017

* Prevent CMake from using highest version of python3 (#5034)

* Use conda python if available

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 01e1a7144f66040b9de96ce946dcce31048dd87c

* rewrite slice backward (#5018)

* rewrite slice backward

* remove unuse .h

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: c1880c011dd453719a28d880abe15e2dab8d0da1

* fix qat (#5038)

* fix qat

* format

* refine

* add comment

* Update test.yml

* Update test_quantize_op.py

* Update test_quantize_op.py

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Former-commit-id: 2dfb5b566f906b92b93b078b17b8328c19d5eea1

* Support convert tensorbuffer to list of numpy (#4940)

* support convert tensorbuffer to list of numpy

* improve speed

* remove useless codes

* get tensor_buffer shapes and dtypes by single function

* add __eq__ and __hash__ to DType

* add dynamic_out to tensor_buffer_to_list_of_tensors_v2

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 577568af0fa0d98fd2b946831daa99231348e77a

* Add JUST (#5041)



Former-commit-id: 994b0df0e435c9b0a57399b1ed1cfb3a0048bf9d

* Add broadcast like module backward (#5037)

* add argmax test

* add broadcastlike module backward, bug need fixed

* fix broadcast_like backward bug

* auto format by CI

* refine code

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: e7bb21bb4f4b15a986d5b5270e81855514ec765e

* Fix segfault caused by zlib in conda when share lib is enabled (#5045)

* check in changes

* address review

* address review

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 9bec9735683e3adc3d9b0d336f9a88649b0f96bf

* Fix norm grad func to support dynamic attrs. (#5043)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: cc1c554c74346d5b0c2b90bdaea49546168af02d

* Fix segfault in new interface (#5042)

* fix data race about composed attr map

Signed-off-by: daquexian <daquexian566@gmail.com>

* move ResetPrior before ChooseOpKernel

Signed-off-by: daquexian <daquexian566@gmail.com>

* delete vm before others

Signed-off-by: daquexian <daquexian566@gmail.com>

* revert deletion order change, sync by atexit

Signed-off-by: daquexian <daquexian566@gmail.com>

* add comments

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix multi machine bug

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Former-commit-id: 68a5c10bc3f838b5c72a6e29c4cfa24b861cb1d3

* Add where module backward (#5035)

* add argmax test

* add where module backward

* fix where module unit_test bug

* add zerolike and where op function

* add backward code

* add broadcast like backward

* refine

* fix where module backward bug

* rebuild test

* fix comment

* fix comment

* fix comment

* fix comment

* fix comments

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: dec89a96af3eaddbf78f362e693df5030bd5420f

* Query system status if CI failed (#5052)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 26bbcfcdf64946d5a3e92d860fdcbb806c44a575

* fix(vm): add virtual mechine backpressure (#5050)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: cd21df2f36bd3aec79ea4893762e373ae12aed98

* change in_edages/out_edages to from SKIPLIST to LIST (#5047)

* change in_edages/out_edages to from SKIPLIST to LIST

* minor fix

* refine

Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: b95682624a6b93f276830e28c070a5f6a7c77f4c

* Stop heartbeat and add barrier before  Global<CtrlServer>::Delete() (#5010)

refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: e861dd3b26d5b94b478528d5de77f665b8bf2476

* fix error on exiting (#5053)

* fix error on exiting

Signed-off-by: daquexian <daquexian566@gmail.com>

* lazily get rank

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 51335128e02356011b1943f0c71ca1ec9fe3963b

* Restore eager r50 case (#5055)



Former-commit-id: 64665e75e702527859ef5b82eafa1caa05d8d229

* Add argmax softplus logsigmoid module backward (#5049)

* add argmax test

* add argmax module backward, bug need fixed

* add leakyrelu module backward

* delete argmax backward test

* delete argmax backward test

* add softplus module backward

* code format

* add softplus module backward

* add logsigmoid module backward

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 06b549b3123df4d889a822bcdee8cf8d2daecf48

* add doctest (#5046)

* add doctest

* refine

* update modules doctest

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 7394ec5ff76ab2ad62d0a257d3f720cd5f3b035c

* refactor DType (#5024)

* refactor DType

* fix compiler complains

* DType is only allowed to be used in python code

* dtype api bugfix

* fix error on exiting

Signed-off-by: daquexian <daquexian566@gmail.com>

* lazily get rank

Signed-off-by: daquexian <daquexian566@gmail.com>

* Export const DType* into python

Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Former-commit-id: c2b2eb25269679775e650e4ea9bedf96f1be5efc

* remove try_init_session in new interface (#5061)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 912bc97da3d13f9d68cb7b9f3a3f41e766aef89e

* Rewrite batch broadcast matmul backward (#5012)

* rewrite matmul op backward

* refine

* update

* fix comments

* refine

* refine

* rewrite batch broadcast matmul backward

* refine

* refine

* refine

* refine

* Add JUST

* restructure matmul series module backward

* refine

* fix comments

* fix comments

* refine

Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: c4f277a158fd3435f235dc013fa5d9b015401708

* modify format


Former-commit-id: 5bdb45adbb0881263ef5fe0afe13dd606a8e6ff6

* modify format

* run make of_format


Former-commit-id: e976b9b0f88bc2e91fddb3053dab5bd5001e8318

* run make of_format

* lock cmake version in manylinux cmake (#5057)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: fe662377fc70f23d9ff2f495fc48774770c802fe

* Doctest support in CI (#4973)

* check in changes

* refine

* fix

* add test on obj

* add relu example

* run doctest in ci

* dont delete python

* address review

* address review

* address review

* address review

* address review

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 56f6348d04bb9f24a1eb8e1c4f8d82d8c4f97f89

* Update README for 0.4.0 (#4965)

* refine

* remove content

* refine

* require py36

* address review

* refine

* address review

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: cc3162062b30228a4d335e9be66b08e701042609

* Add step lr and lambda lr (#5063)

* feat(StepLR): add StepLR

* feat(LambdaLR): add LambdaLR

* docs(LambdaLR): fix document

* style(LambdaLR): add comment

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 4c005b7c673892693b9b5541281d6d3f254d8a8a

* Release cuda 112 (#5060)

* Nightly for cu112

* add arg

* nightly

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 72cfa9916e040433426b5e6cef1752e394c12a3d

* add flow.asin, flow.Tensor.asin, flow.arcsin, flow.Tensor.arcsin, flow.asinh, flow.Tensor.asinh,  flow.arcsinh, flow.Tensor.arcsinh (#4955)

* add flow.asin and  torch.arcsin

* add torch.asin and torch.arcsin

* add torch.sin and torch.arcsin

* add torch.asin and torch.arcsin

* add torch.asin and torch.arcsin

* add torch.asin and torch.arcsin

* add torch.asin and torch.arcsin

* add torch.asin and torch.arcsin

* update test_asin.py including forward and backward

* Update test_math_ops.py

remove asin testcase

* update testcase including forward and backward

* add torch.asinh and torch.Tensor.asinh

* update testcase of asin and asinh

* update testcase of asin and asinh

* update testcase of asin and asinh

* update testcase

* update testcase

* make format

* update license

* update testcase

* check in

* qq mail

* wrong fmt

* auto format by CI

* use youarefly@qq.com

* wrong fmt

* auto format by CI

* use ci-bot@oneflow.org

* wrong fmt

* auto format by CI

* mv op testcase  to test_tensor.py

* auto format by CI

* update docstring

* update docstring

* update doctest

* update doctest

* auto format by CI

* update arcsinh

* auto format by CI

Co-authored-by: 陈岱渊 <chendy@zhejianglab.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: jackalcooper <jackalcooper@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <373331853@qq.com>
Co-authored-by: oneflow-ci-bot <youarefly@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Former-commit-id: 95748870b6cfa617a7da31ad738098a999cc68f6

* Add modules doctest bbuf (#5058)

* add argmax test

* add nllloss doctest

* add doctest

* add crossentropyloss doctest

* add expand module doctest

* add squeeze module doctest

* add repeat module doctest

* add exp module doctest

* add argmax module doctest

* add matmul module doctest

* add greater module doctest

* add less module doctest

* add negative  module doctest

* add linear  module doctest

* add tanh module doctest

* add gelu module doctest

* add reshape module doctest

* add transpose module doctest

* add where  module doctest

* add permute  module doctest

* add prelu  module doctest

* add hardtanh  module doctest

* add activation  module doctest

* add activation  module doctest

* add upsample  module doctest

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 58183f8713dd50d332d82ff481c3f0dd3bd8f8b5

* remove origin testCase in test_math_ops


Former-commit-id: 3a4bae704e2698225b15734fb440a6f8ddef5120

* remove origin testCase in test_math_ops

* Add tensor detach python api (#5068)

* add argmax test

* add tensor detach python api

* delete unuse code

Former-commit-id: 9b30b7c92d5f866f7d8fa6863c654528e7b34e95

* Delete preprocessor_internal.h.REMOVED.git-id

* Delete nn_ops.py.REMOVED.git-id

Co-authored-by: Lyon <flowingsun007@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: Yurui Li <32978179+poohRui@users.noreply.github.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Liang Depeng <liangdepeng@gmail.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: mosout <mosout@qq.com>
Co-authored-by: poohRui <yuruil@qq.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: oneflow-ci-bot <373331853@qq.com>
Co-authored-by: oneflow-ci-bot <youarefly@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: YongtaoShi <73167956+YongtaoShi@users.noreply.github.com>
Co-authored-by: yayeoCddy <Dy_Chen95@163.com>
Co-authored-by: 陈岱渊 <chendy@zhejianglab.com>
liujuncheng pushed a commit that referenced this pull request Jun 3, 2021
* fix error on exiting

Signed-off-by: daquexian <daquexian566@gmail.com>

* lazily get rank

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Former-commit-id: 5133512
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants