fix error on exiting #5053

daquexian · 2021-05-31T11:38:01Z

修复 #5042 引起的 python 退出时报错的问题

Signed-off-by: daquexian <daquexian566@gmail.com>

* Add scalar support of greater less module (#4841) * add scalar input support * format Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 9b11938c30c90ab8ea17a34a161f913503c254ee * Refine optimizer (#4840) * refactor(Optim): refine optimizer codes * docs(SGD): add document for SGD * docs(SGD): fix code * test(Adam): fix test_optim_adam bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: cd6ffac6215df1231894d269e1c26f1eeb23b841 * add docstring (#4846) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: da82bb8cb0e7f9da7082d3c646f28679cf9fac3c * fix eager with unknow symbol id (#4752) * fix eager with unknow symbol id * minor fix * fix conflict * remove unnnecessary function * remove unnecessary header * remove unnecessary methods Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: aa9f6f76b3a375b7b34fa54d63ab2f138f580dd4 * Dev fix linear module (#4836) * add broadcast matmul support * refine * add batch matmul support * remove redundant test case * linear module support high dimension input * format * fix linear Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: f1ccf2a324a0e73b74c20bb8c61585a8c2d3087c * Add rmsprop optimizer (#4834) * add rmsprop optimizer * fix rmsprop optimizer bug * fix rmsprop optimizer bug * add rmsprop optimizer docs * add rmsprop docs * fix comment * fix comment * fix comment * fix comment Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: f476d48d9934efaf2450a8070305a9dd5e906af8 * add adamw optimizer (#4824) * init adamw optimizer * fix adamw optimizer bug * fix comment * fix comment * code format * fix comment Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: f15a8aea8b57f726405755204369c7937a6cd412 * add experimental apis (#4817) * add experimental apis * merge master fix conflict * revert flow._oneflow_internal.dtype to flow.dtype * refine * fix test optimizer * update module docs * fix unit tests * fix matmul module test * fix adamw and rmsprop tests * fix crossentropy loss grad Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: c50ff3cc54c05dd1ad6f924388f8f06057ade663 * remove MakeParallelDescByDevice, fix the missing setting of parallel_desc in InstructionMsg copy constructor (#4850) Signed-off-by: daquexian <daquexian566@gmail.com> Former-commit-id: d5d7ef56ac50d38502116a3e38d9f07b5aae2900 * add experimental (#4856) Former-commit-id: 505d4865f714e71b4d8530430f2ef1334128fd1c * A more efficient implementation of NLL Loss (#4854) * A more efficient implementation of NLL Loss * A more efficient implementation of NLL Loss Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: ad5b01290db298b282bf4d8e7cc4162687bf5e19 * Add where module (#4845) * add broadcast_like module * add where module, still has bug * fix where module bug * fix where module bug * fix bug and add where module * fix where module commnet * code format * fix where module * code format Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: edf9a1900fc3b0552fdea0101bbbe229937d9b5a * dev_compare_cfg_file (#4860) * dev_compare_cfg_file * add def of org_content * minor fix Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 5e399d38c0fbb7a07cdc20d11ad60ac3e72318bc * stateful local opkernel: return a temp parallel ctx (#4857) * add temp parallel ctx for single card Signed-off-by: daquexian <daquexian566@gmail.com> * add TODO comment Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: f450ff89fff152471356de3c60285be6bcb7407e * Optimize memory occupancy for interface 1.0 (#4844) * Do not save inputs in function nodes even if requires_grad is true. * Allocate raw memory with actual size. Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: e5dadaf569f9685e401dba9df11ed45e9d80f30d * use less event records (#4861) * use less event records * more comments Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 218f61e3c6ebc84166359771406cd806a4476b5a * Refine cast grad func. (#4853) Former-commit-id: dfe7759e1d64ac5394c1da3699c4355cce13a46a * copy eager blob object to/from numpy in c++, use busy loop to wait (#4839) * numpy: create np arr in python and copy in c++, use busy loop to wait Signed-off-by: daquexian <daquexian566@gmail.com> * reformat Signed-off-by: daquexian <daquexian566@gmail.com> * add CopyBetweenMirroredTensorAndNumpy Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 87c0c3d8a0a98505133a91f8a8d78f498aa28932 * align squeeze module with torch (#4855) * align squeeze module with torch * fix comment * fix argmax bug * fix bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: a57b764926fd0bcd7be560db1614c1736f7049d8 * Fix adam weight decay param (#4835) * fix adam weight decay * fix adam weight decay * fix comment * fix comment * fix commnet * fix commnet * fix commnet * fix bug * fix(Adam): fix Adam test bug * revert adam test threshold to 1e-3 * fix(Adam): fix adam test bug and adjust param to increase error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: wyg1997 <wyg19970408@gmail.com> Former-commit-id: 1d5f743d67744af6bad3fe5257147e15cdeee502 * Fix groupnorm (#4848) * fix GroupNorm and modify test case * add grad op for reshape_like op Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 9bf14b8b2fc785362677292ed3f800458ea0ad38 * use composed attr map in contexts (#4838) * use composed attr map Signed-off-by: daquexian <daquexian566@gmail.com> * move implementation to .cpp Signed-off-by: daquexian <daquexian566@gmail.com> * OpExpr::New returns Maybe Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 9d78fdbae59fc25e3d0b8fbaa5b40e27e911f6d5 * flow.Size support negative index and add test (#4870) * feat(PySize): support negative index and add test * style(*): refine code * format code Former-commit-id: 3519d2e7de127abb6d5e74654fb91e9806eea849 * fix export experimental docs bug (#4867) * fix export experimental docs bug * fix export experimental docs bug * fix export experimental docs bug Co-authored-by: Yao Chi <later@usopp.net> Former-commit-id: d78c2bc6340aeaa9f1afac432544f68ac15661e2 * reorder VirtualMachine fields (#4873) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 1ce9f262c40dc315357f1cc8d03ee568ae56f2fc * Align module params with torch (#4865) * align mean module * allow negative dim param * support tuple of negative dim param * refine * format Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 373cefce1c9761ecef79242cab323e5c67836628 * Generate cfg header and source files in parallel and prevent rebuild from scratch when Python version changes (#4876) * refine * Update cfg.cmake * Update cfg.cmake Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 72204213754c32e48e704854eef802380da659ae * fix interpreter determin output leaf and grad (#4872) * fix interpreter determin output leaf and grad * fix GradMode get * simplify * add test for no_grad Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: de05e7a561e83fe3dbf0f7fca314787fd4561d0b * support crossentropy loss 3dim (#4875) * support crossentropy loss 3dim * merge conflict Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 4e689d258b193b836a5408648e599df6bf28e4dc * supoort nllloss 3dim (#4874) * supoort nllloss 3dim * supoort nllloss 3dim * merge conflict Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 30a75cee97eada21ea281fdaf00b467ea0080b68 * Device compute dep object (#4862) * Device::compute_dep_object_ * sequantialize instructions in the same stream. * adjust atexit sort Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: clackhan <han_binbin@163.com> Former-commit-id: 55a223cfd0c54ac0fda2f0ac647795e90a14625d * remove cambricon quantization test (#4879) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 9cddc629be2053dd5a51eb7f56672f733d6b390e * Use dlopen to call ibverbs APIs (#4852) * check in naive struct * refine * refine * refine * refine * add functions * refine * refine * refine * fmt * refine * refine * refine * refine * refine * refine * add note * refine * refine * refine * refine * refine * refine * refine * refine * rm include * revert cmakelist changes * refine * address review * rename * address review * address review * remove glog dependency * fix * refine * refine * print lib path in stdout * address review * address review * fix * support ONEFLOW_LIBIBVERBS_PATH * add case * update init_cluster_env.py for ONEFLOW_LIBIBVERBS_PATH * fix comment * address review Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 75f11b8257112c7afd0c777abf7cddc01b6b495c * Add copy user op (#4842) * copy user op * add to module and tensor.to interface * remove unnecessary code * backward for tensor.to * remove capture of input * support cpu only tensor * module to (#4858) * remove backward kernel and op * friendly deal with when tensor.grad is None * minor fix * minor fix * revert * suport 1m1d only * skip test normalization * skip test normalization * skip conv * support construct device using string * minor fix * minor fix * use maybe * fix device id type for device infer ctx * skip batchnorm * skip some tensor test case Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 2d5fae50b72583cea8d297b73ef7397b2153f356 * Fix reduce sum grad func. (#4882) * Fix reduce sum grad func. * Fix zeros op Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: a40222857621e0161a23a1d6fbcda9db60e477a6 * align dim size funtion (#4880) * align dim size funtion * fix dim usage Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 4592d467bf570afb606eca2f6ec175a5d283db89 * support expand and repeat op int datatype (#4883) * support expand and repeat op int datatype * code format Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 2939e8cbce6b0f0e6573ea524b9c6923b92a3e80 * return LocalTensor (TensorTuple) directly from op expr __call__ (#4864) * expose local tensor Signed-off-by: daquexian <daquexian566@gmail.com> * mt19937 -> minstd_rand Signed-off-by: daquexian <daquexian566@gmail.com> * revert unnecessary diff Signed-off-by: daquexian <daquexian566@gmail.com> * fix comments Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 72b76a2991061fba0b382399f5933031efb7b54d * Support custom parameters for optimizer (#4881) * feat(Optim): support custom parameters for optimizer * feat(Adam): adam support custom parameters * feat(Adamw): adamw support custom parameters * feat(RMSprop): rmsprop support custom parameters * style(Optim): refine adam and adamw Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Former-commit-id: a26f7080b866c4be0a606a6c82dffa7975233f32 * align transpose module with pytorch (#4877) * align transpose module with pytorch * fix comment * align tranpose module * support expand and repeat op int datatype * fix bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 2a5bc3594e047f1e6f46d12a5d87934bf14692b7 * Add ones like op (#4889) * Add ones like op. Conflicts: oneflow/api/foreign_lock_helper.h oneflow/api/python/autograd/autograd.cpp oneflow/api/python/framework/tensor.cpp oneflow/core/framework/op_interpreter/op_interpreter.cpp oneflow/core/framework/op_interpreter/op_interpreter_util.cpp oneflow/core/framework/tensor.cpp oneflow/core/framework/tensor_impl.cpp oneflow/core/framework/tensor_impl.h * Add ones_like unittest. * Use SwithCase * Fix typo * undef * Bugfix * Fix merge conflicits Co-authored-by: hjchen2 <hjchen2> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 7b4c8c5016797f070a562144328c877a7f5c36b7 * Cpu support conv module (#4894) * support expand and repeat op int datatype * support conv cpu module * support conv cpu module * support conv cpu module * support conv cpu module Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 59fcd0272395e36c1227cd733493cb638fb910b6 * Async cuda stream type (#4895) * add class AsyncCudaStreamType * fix bug * remove useless headfile Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 8784b21dd445f098672d02db9c6b69c72e61cfc3 * skip infer instr when physical operand != nullptr, remove unused code (#4868) * Disable infer instruction if instruction type has physical operand Signed-off-by: daquexian <daquexian566@gmail.com> * remove more infer instructions Signed-off-by: daquexian <daquexian566@gmail.com> * raise UNIMPLEMENTED() in infer Signed-off-by: daquexian <daquexian566@gmail.com> * fix hanging on exit Signed-off-by: daquexian <daquexian566@gmail.com> * reformat Signed-off-by: daquexian <daquexian566@gmail.com> * fix typo Signed-off-by: daquexian <daquexian566@gmail.com> * wrap results by Tensor() in .to() Signed-off-by: daquexian <daquexian566@gmail.com> * set need_check_mem_case to false for copy op Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 3137f51c726e551527a3d2a13e6bac3828dda13a * fix and check to with module on forwad and backward (#4897) * fix and check to on forwad and backward * add todo Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 585afc097a1cf603b7b75292eb8fff73bff0cedb * add JUST (#4891) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 98b88e209d6eccdaf08ee89580ba4e1c67081704 * Fix docs bug (#4892) * support expand and repeat op int datatype * fix modules docs bug * fix docs bug * fix docstring bug * fix docs bug * fix docs bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: fb164b58ebb86a5cfe388fb692a3626d66589099 * Add warning when no param update (#4896) * style(Optim): add warning when no param update * style(Optim): add TODO Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 7e9902aa05ce841da92d4dda3422319124b1b4a7 * Add gather embedding module (#4826) * add gather module * add gather module * add test case * add embedding module * fix comments * update embedding module and test case * refine * fix comment * fix comment * fix comment Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 986becca17c8285b12028adcf9cecd3eef6d656f * Support create cpu only tensor (#4863) * support create cpu tensor * add empty op * remove skip tensor test case: * remove skip tensor test case * remove TODO Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: dab6ba23172462d0eb3978c48828098104acae51 * Add permute module (#4901) * support expand and repeat op int datatype * fix modules docs bug * fix docs bug * fix docstring bug * fix docs bug * fix docs bug * add permute module Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 698e8cd516a71fa93003bb8981b7e91a24fb7b27 * remove useless code in expand module test (#4903) Former-commit-id: ac06d3c82a37681596dbd0620e6ddefe19f31c94 * Add conv docs (#4904) * remove useless code in expand module test * add conv2d docs Former-commit-id: 422ced2780efc30ce6ba9c015c2e32780248564e * Fix eager test bug (#4678) * skip test_gpt_data_loader in eager mode * 1_node_fix_egaer_test_bug * remove useless head file * skip tensor and module * skip 2-D sbp in eager mode * fix error * fix bug and remove some skip under eager * fix error * del oneflow_api * rm test_tensor.py * skip test_summary in eager mode * skip test_stateful_local_kernel under cpu only mode * add class AsyncCudaStreamType * fix bug * import os * remove BlobObject::is_python_shutting_down_ * fix error * sikp 2d sbp * minor fix * refine comment * make of_format Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 425bd439b360088fd742b437e4301ae3841a2b3c * cache cudnn handle in bn infer (#4906) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 516b3e5e76c0b8b8299997b5c82928a2104215f7 * Fix ones zeros like (#4907) * feat(xxxLikeOp): ones_like and zeros_like use user op * fix(Optim): fix learning rate device error bug * style(*): format codes * style(*): use int instead of np.int * test(Optim): add optimizer gpu test (#4908) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 6d7c6d8bcddd846690aa3dc2e70d4c250517ad2e * Bump nccl from 2.8.3 to v2.9.8 (#4899) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 3294cf2ad5f7e9420125fc5620b614bf086235f3 * Add prelu module (#4902) * support expand and repeat op int datatype * add prelu module * add prelu module * add prelu module * fix comments * fix comment * fix comment * add backward test * fix comment Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 138af6167ea131537d734ddf73bc3ffd568e819c * add InitEagerSession for eager mode (#4589) * try to merge eager ofrecord to master branch * refine * temp fix * try to add seed but fails * try to add seed but failsclear * use global function to init mirror/conssitent flag * fix test * add modules * fix record modules * fix destruction order * fix mirror gen seed * skip record unit test * remove TODO Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mosout <mosout@qq.com> Co-authored-by: Ldpe2G <liangdepeng@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: a139f49eaff8a00512367abd5f9af65fc6ee786f * add hardtanh module (#4914) Former-commit-id: d3c91a97081f9b34756b191992b1006a43270ca5 * fix_matmul_module_test_ci_bug (#4905) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 5c4a0d748eb0bf69ac3f9f3af3d308714b565eee * Cpu support of batchnorm layernorm module (#4890) * add rsqrt moduel * add batchnorm module cpu support * refine * update * fix param ini * add reduce series modules * add batchnorm,layernorm modules and test cases * refine * update .rst * refine according to comments * refine * update * fix layernorm bug * refine * remove additional license Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 9262e1ae6db9f179ef806be53076d15f886cc791 * Add flow.tensor (#4829) * add flow.tensor * small fix * fix dtype * Update oneflow/python/framework/tensor.py Co-authored-by: daquexian <daquexian566@gmail.com> * deal with multi-dimension list or tuple * remove list * Update oneflow/python/framework/tensor.py Co-authored-by: daquexian <daquexian566@gmail.com> * add unit test case * format Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: c43caa88395fa3dbf587793ab93429cf5a8a26c5 * Feat lr scheduler (#4921) * feat(LrScheduler): add ConsineScheduler * feat(LrScheduler): update cosine_scheduler and add test * feat(LrScheduler): refine codes * style(*): format codes * docs(LrScheduler): add document * docs(LrScheduler): refine documents Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: f7881ce7c500b0f768da559510c649051b784d8b * Fix CommNetIf::RegisterMemory/UnRegisterMemory lock scope (#4918) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 1a7c26b1252090ce47443025c2108e88c50ff9f9 * add leakyrelu module (#4912) * add leakyrelu module * code format * update docs * update docs Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 2b74432f3eab93f60bf1f8edfb648d8adf16348f * Support cast in tensor.to (#4917) * support cast in to * refactor to interface * refine doc * refine doc * refine kwargs * add test case support tensor * minor fix and test case * format Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 9fac6f36cbc7767fb705bbb4f0fe11c7c706390d * add hard swish module (#4915) * add hard swish module * fix bug * fix bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 98f937f1ff1b09412e8787e598802a864c557882 * Replace oneflow_worker with worker agent (#4900) * naive impl * refine * refine * add log * refine * refine * refine * refine * add todo * refine * refine * sync dynamic libs * refine * fix docker cmd * fix rank * refine * refine * add callbacks simple rpc * refine * refine * fix * refine * refine * refine * fix conn * support tradional mode * refine * refine * refine * rm * refine * refine * refine * refine todo * refine * refine * rm unused * rm todo * revert * refine * add log * refine * refine * fix order * refine * refine * refine * refine * refine * refine * refine * rm * rename * add comment * refine * rm * refine * refine * refine * refine * add todo * add info * refine * refine * refine * add back some legacy code * refine * refine * refine * refine * refine * rm oneflow_worker exe * rm log * fix bug * support --cmd * add check * refine * fix * fmt Former-commit-id: 37c63928dab947b61f5844c68cdf44da9248889c * add hardsigmoid module (#4919) * add hardsigmoid module * refine docs Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 937ccaf3a8cdbbcee71fdd086019444c7aa6591d * Refactor tensor (#4916) * refactor Tensor * Export ConsistentTensor::is_cuda * minor fix * minor fix Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 087c9753ba0a2627bab235b0e81685758e580ae2 * add relu6 module (#4925) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: c013fac1f4aad5348e6a5514a7ece4c4ed294b29 * add cuda test for add method(module) (#4888) * copy user op * add to module and tensor.to interface * remove unnecessary code * backward for tensor.to * remove capture of input * support cpu only tensor * module to (#4858) * remove backward kernel and op * friendly deal with when tensor.grad is None * minor fix * minor fix * revert * suport 1m1d only * skip test normalization * skip test normalization * skip conv * support construct device using string * minor fix * minor fix * use maybe * fix device id type for device infer ctx * skip batchnorm * skip some tensor test case * startup of add backward * startup of add gpu test * refine * add cuda test for Linear module * refine after sum fixed * gpu backward * gpu backward crashed * retain grad * refine according to comments of WangYinggang * refine: construct specified device tensor * refine testcase * refine: specifiy device when construct in test case * refien testcase for linear * refine * refien to_device * refine import statement * refine import path * remove useless _to_device fun Co-authored-by: poohRui <yuruil@qq.com> Co-authored-by: Yurui Li <32978179+poohRui@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 2e268dfc5a82c78d6f4dba35356d657519eff7b5 * add upsample module (#4923) * add upsample module * add upsample2d unittest * add upsample2d unittest * add docs * add UpsamplingNearest2d and UpsamplingBilinear2d module * code format and add docs * add more unit_test Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 8125b43f683af24868dd83806b6bb3c0d2529ca5 * add elu module (#4924) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 4dfc375a26affbb6ac92ed69b1b56b879cc28ce2 * improve ofrecord unit test (#4920) * improve ofrecord unit test * remove no_grad * fix codes according to review * fix format Former-commit-id: 984b1f084b4590770dd8cc404c84f551ab082db9 * align inplace param (#4933) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 8b0bdcdc30125d184894aeaf7cc6ca3ac7871c07 * refactor SGDUpdate and MomentumUpdate UserOp (#4930) * refactor(ModelUpdate): SGDUpdate and MomentumUpdate use optional input for learning rate * fix(*): fix bugs * style(*): refine code Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 49d53fecb10e7ff8d2a97a1f7fd52542d26839c3 * Dev support tensor slice (#4898) * add scalar input support * format * update tensor slice function * add slice slice_update module * add slice funtion in tensor * refine * fix tensor slice * fix bug * add tenser slice test case * add logical_slice_assign module * fix LogicalSliceAssign kernel to support eager local Signed-off-by: daquexian <daquexian566@gmail.com> * fix tests Signed-off-by: daquexian <daquexian566@gmail.com> * update export strategy * refine * fix docs * add more test case * fix comments * refine according to comments * add TODO item * format Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 4c958482cd62163cab174ca6551119a5a2b8b451 * add tensor.zeros_() and SoftSyncStream instr (#4927) * add tensor.zeros_() and soft sync stream instr Signed-off-by: daquexian <daquexian566@gmail.com> * separate cpu and gpu version of SoftSyncStream * Remove SyncAutoMemset * fix compile error Signed-off-by: daquexian <daquexian566@gmail.com> * fix wrong parallel_desc() Signed-off-by: daquexian <daquexian566@gmail.com> * remove unused code Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 0e7f1c6abf1b241f9811a551b5d513c9c8af5fa4 * Only upload log if distributed test fails (#4934) * only upload log if distributed test fails * refine * refine * reduce timeout * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 08d62afa85b0b05dea51351d1182b0af9758dc55 * Add logsigmoid softplus module (#4929) * add logsigmodi and softplus module * code format * fix docs Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 0a636c6f72d220f6ab28698342abf2a5bbf53963 * Add module backward test (#4926) * add scalar input support * format * update tensor slice function * add slice slice_update module * add slice funtion in tensor * refine * fix tensor slice * fix bug * add tenser slice test case * update softmax testcase * refine softmax backward * add logsoftmax backward test * add maskedfill backward test case * add sigmoid backward test * rewrite transpose bacckward op * add transpose backward test * format * refine * rm useless code * Fix transose unittest. * refine according to comments * update * refine * refine * format * fix backward testcase * fix perm param * fix bug * numpy method to cal sigmoid grad * format * refine * refine comments Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 7911d713f2903d8ca73296be7ed964c19b56ed54 * fix flow.save (#4941) Signed-off-by: daquexian <daquexian566@gmail.com> Former-commit-id: dbd9d76e34d1acf7e0fb8b35abbbbd6df554b51b * add math.abs (module) Former-commit-id: 9670c24a707e01d5445196dc01c145bda792995d * add math.abs (module) * fix zero point in fake quantization pass (#4586) * fix zero point Signed-off-by: daquexian <daquexian566@gmail.com> * round zero_point in fake quant kernel to align with onnx Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: bc9ec6c4c67e91287b9ddd78d3b9b322fd1c3b75 * also allow ONEFLOW_DEBUG (#4950) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: f65019f2891468b14f8ee6e910441204df77a6cc * Add Device Descriptor (#4939) * Add Device Descriptor * format * refine * refine * check cuda version * check cuda version * fix * fix * fix WorkSize * handle more error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 1cc67da54e801269605be9b0e662c91e0d606a5e * Refactor Optimizers for eager (#4938) * refactor(Adam): refactor Adam to use dynamic learning rate * refactor(Adamw): refactor Adamw Optimizer * refactor(Rmsprop): refactor Rmsprop to use dynamic learning rate Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 49060956c1a3ab07267996f80be8f380025c9fce * Add instructions on making sys env permanent (#4949) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 64d3a1d9e47ce5c3a7a31a4ebc4fef48a3608735 * add gradient functions for dim_gather op (#4913) * start up * bugs left * add backward test case * fix bugs * refine testcase * refine * replace optrait with composedAttrs * refine Former-commit-id: e400bc0b527246931c95232509b693870f243ed6 * add gradient funs for unary and binary math op (#4961) * add gradient funs for unary and binary math op * add test exp and pow example * refine pow test case * refine * rename register macro Former-commit-id: a559f632cc59bfe742bb3ca843e2cdb006ee5bfb * Refactor consistent tensor (#4937) * refactor Tensor * Export ConsistentTensor::is_cuda * remove ConsistentTensor::blob_object * minor fix * minor fix * fix compiler complains * remove unused code * skip test_creating_consistent_tensor * del useless function Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: clackhan <han_binbin@163.com> Former-commit-id: 662cda36312270b311debbea7a4a614bc4e2cd48 * add backward test case Former-commit-id: 28353e85d5d1aac17aefbcab9f15d78e9aa3bb0e * add backward test case * support tensorrt7 qat (#4958) * support tensorrt7 * Ignore handling bias * support label quantization Former-commit-id: 53881e3918991227812dd2a4808b1b546318a7bf * modify math.abs test case Former-commit-id: 1aa043e7221e71f3d049bcc89b8bd673f132b4bc * modify math.abs test case * fix softmax testcase (#4948) * add scalar input support * format * fix softmax testcase * fix logsoftmax test Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 10ca2bc085e0511ac529390f15cd539f8aa1cd3a * add module backward (#4935) * add reshape module backward * add expand module backward * add expand backward * add expand module backward * add expand module test * add squeeze module backward * code format * add repeat module backward * code format * fix bug * fix comment * align expand module with torch * align repeat module with torch * fix comment * fix comment * fix comment * fix code format * fix confilict * fix bug * fix comment * add module backward * fix pow bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 5362a0d80ad9ecbea094c77879c307dfafe6c906 * rewrite unsqueeze backward (#4966) * rewrite unsqueeze backward * fix comments * fix comment Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: ec1c80ff0246f2e27cb0cfd9860563a78f1105b2 * add module backward (#4942) * add exp module backward * add greater test * add less module test * add negative module backward * code format * add matmul backward * add broadcast_matmul_backward * code format * add batch_matmul backward * add argmax module test * delete unuseless code * fix comment * fix comment * fix comment * fix comment * fix commet * code format * fix bug * fix pow bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: d20a995378791b102a53680f0661b35c13cae980 * fix activation ci bug (#4980) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 4e974d7beddeea2bf331c441aca0c0ae0beb9b2c * Hashable attr map (#4951) * Device::compute_dep_object_ * sequantialize instructions in the same stream. * refactor AttrMap * remove redundant header file includes Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: ded8a78095e5e62957454d8fd99982e573c83839 * Fix Global<CommNet>::Delete() (#4981) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 8aade7e74177a12cc49753e4c286e6f2dc521e37 * delete unused headfile Former-commit-id: 2b3c27a907fbc75c537cabbbaaf5818efb2e2a29 * delete unused headfile * update test case format Former-commit-id: 9cda51fb978fe91245d8c40a4d3fc6a10f2dd4dc * update test case format * add of_softmax_use_fast_math (#4979) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: ba6c64e7648c8ae69a48240da2d262440521da93 * NetIB device enumeration (#4974) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 3f1038d6467e678fd887a0ba9441332412effaa1 * Fix local tensor requires grad (#4992) * fix(Tensor): add requires_grad setter for ExportTensor * test(Tensor): refine tensor autograd test Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 095521826ea9d3d3e0f34318668d4f80bb90e571 * Support localtensor slice (#4985) * add scalar input support * format * register local tensor slice methods Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 4f4ce1f2f0d3d11574a1ea62bdfaedfc181cb57a * Symbol::shared_from_symbol (#4969) * Symbol::shared_from_symbol * fix bug in Symbol::shared_from_symbol Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: a30eb43f4de2694bc65697dbe5304953b06dedbf * CI formats code automatically (#4983) * check in * qq mail * wrong fmt * auto format by CI * use youarefly@qq.com * wrong fmt * auto format by CI * use ci-bot@oneflow.org * wrong fmt * auto format by CI Co-authored-by: oneflow-ci-bot <373331853@qq.com> Co-authored-by: oneflow-ci-bot <youarefly@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 9cfee9b13b68f72fd35c5c6d1879f7b9fefe7a8a * Eager consistent tensor (#4984) * Device::compute_dep_object_ * sequantialize instructions in the same stream. * refactor AttrMap * refactor Tensor * Export ConsistentTensor::is_cuda * remove ConsistentTensor::blob_object * refactor TensorImpl * minor fix * fix compiler' complains * Implements EagerConsistentTensorImpl::New * minor fix * fix compiler complains * remove unused code * skip test_creating_consistent_tensor * backup code * Symbol::shared_from_symbol * remove redundant header file includes * fix bug in Symbol::shared_from_symbol * symbolize ParallelDesc and ParallelDistribution Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: clackhan <han_binbin@163.com> Former-commit-id: 3356bcad86103c357e12ce81ff86d757584b819e * add resnet50 model test (#4957) * add resnet50 model test * udpate script * udpate script * relax tolerant * run resnet50 in 1n1d * fix format * test resnet50 fun parameters * add resnet50 with and without bn test * fix resnet50 without bn train overflow * change assertEqual to assertTrue Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 01278a66fcdaea8130b60bb85a59996d8f487341 * Add arg where module (#4998) * add argmax test * add argwhere module * add argwhere module * code format * update unit_test * fix commet * update docs Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 816983aae65374300354e86b1fdb0703429003db * fix hierarchical_sub_task_graph_builder condition (#4990) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 0cc7410469a80fb976a83907c6c05b008ec95f66 * Add __str__ and tolist for tensor (#4928) * add __str__ and tolist * initial tensor printer * reorginized tensor str * minor fix * user nparray2string * Add FunctionNode op_name (#4970) * feat(FunctionNode): add op_type_name * style(FunctionNode): rename op_name to op_type_name * add test case for numel * style(OpExpr): rename type_name to op_type_name (#4976) * add test for tensor str * minor fix * minor fix * support for local tensor * support for local tensor * format * fix typo Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: cf2ec98e9738336bb01254c361ba7b459cc0445a * Add squeeze module backward (#5007) * add argmax test * add squeeze module backward * fix conflict Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 10c40f2800e9fe3accf948276173eab45fc54657 * RPC backend local supports barrier of barrier_num > 1 (#4968) * check in changes * refine * fix * erase when barrier exits Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: c337148645740fed43e2da293a4cc054aa184659 * add activation module backward test (#4967) * add tanh module backward test * add gelu module backward * fix activation ci bug * add reshape module backward * add tensor reshape module and code format * add permute module backward * fix conflict * add argmax test * fix permute module bug * add prelu module cpu backward * fix prelu gpu backward bug * code format * restruct hardtanh module test * add hardtanh backward * add hardswish backward * add hardsigmoid module backward * add relu module backward * add relu6 module backward * add elu module backward * fix comments Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: c1873f7472de25b5e87189a9b8d2a83ec74e9741 * Add sys_ptrace for build docker container (#5005) * add sys_ptrace build docker container * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 659f59e1207e71d45ee0c5cc419bcf5736600f12 * Refine pythonpath in cmake (#5002) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 7b80f84fa4030cba769aca3ed15f3ad3a1ba61c5 * Refactor scope parallel desc (#4996) * Symbol::shared_from_symbol * fix bug in Symbol::shared_from_symbol * symbolize Scope::GetParallelDesc() * IsScalarType * fix compiler complains * fix bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: clackhan <han_binbin@163.com> Former-commit-id: 0be87a942e7a674a8ead4bc83c54be8c6b76cf5a * add arange module backward (#4978) * add arange module backward * update * refine * fix comments * refine * fix docs Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 3878aea7b3f9dfaf433e2a4d1df20005452c2236 * CI skips resnet50 to prevent segfault (#5017) * CI skip resnet50 * fix Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 13b461253345e0d1d39fae0db0fb7a65e204eee8 * NetSocket device enumeration (#4997) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: c38ccebb5d946f34bbfc846b6014548b55878520 * Add device unmatch info (#4989) * add argmax test * add device unmatch error * delete unuse code * delete unuse code * refine code * fix var name error bug * add prelu exception get * add prelu exception get * add prelu exception get * add prelu exception get * add exception * code format * fix commet * add more error information * refine error info Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 8b0905b80d6dfcf4b63c9ee78a4e59f53f76905c * BindFwBwObaPairs skip parallel_cast (#4986) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 3a8bee332adf4688097c7203eff755ea5cf29c8b * rewrite matmul op backward (#4988) * rewrite matmul op backward * refine * update * fix comments * refine * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: f06bf705fefaf5b16926adac6ef742a36164bf21 * Add concat module backward (#5013) * add argmax test * add concat module backward * add concat backward impl * add concat module backward * fix concat module backward bug * fix concat module backward bug * fix concat module backward bug * add concat module backward * delete unuse code * fix comments * fix comments * fix comments * fix comments Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 21f1bddb2eb796167b7ada04a72f51554bdc9bd9 * rewrite dropout backward (#5014) * rewrite dropout backward * refine * fix comments * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: bd778bf3821db03c25c1ed87cd884079d5109bbe * fix has_grad template (#4962) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 1559abd4effcdca85b11d2802b36146f5283e211 * Upload core files optionally (#5020) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 7b5eb78c910afd685fab7579bf342b85e8e4d7b2 * Fix docs bug (#5019) * add argmax test * fix oneflow docstring bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 1e9c6ff448764db3983561d1e10816019184921b * Doesn't allow CI to run PRs in parallel (#5016) * update commit * fix sha Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: d770a56dd45c86aa312a9961cc0d4e64f4e5a43a * try fix (#5029) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 5b7c76494687350e8d5974902bc0324dc58867b8 * add cosh module (#4943) * add cosh module * fix the calculation of cosh backward * add testcase of cosh Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 7142ed99b8c03506c6765178c27b9c4aae34e849 * Fix log level (#5009) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 539d6093e1c5682fd56784549a6dcabfb1a1c6fe * oop oneflow.Model with training, validation and checkpoint (#4972) * trainer structure * add test * add nnmodel api * nn Model draft * try run global_func in Model * fit to be refined * model run global_func train & eval * nn Model for function style execution draf test pass * refactor nn model * nn model with nessary component * format * rm nn prefix of Model * flow.Model multi-task numpy-input * (flow.Model)op_dataload support multi job * (flow.Model) auto job_func signature for numpy input * (flow.Model)support auto numpy input job * (flow.Model) nump input multi job train test pass * (flow.Model)fix classmethod * fix test * (oneflow.Model)training_step multi output, refine according to pep8 * (oneflow.Model)pep8 check pass by flake8 * pytorch-style module Signed-off-by: daquexian <daquexian566@gmail.com> * fix typo, update parameter Signed-off-by: daquexian <daquexian566@gmail.com> * Model refine * Model fix typo * oneflow.Model optimizer variable lazy get, numpy job signature to DataModule * oneflow.Model merge and format * oneflow.Model: comment empty func to be overried * Optimizer: lazy get var add check and tips * oneflow.Model: refactor * oneflow.Model: refactor 2 * add TODO, remove unused import, set consistent to True in parameter Signed-off-by: daquexian <daquexian566@gmail.com> * oneflow.Model: ModelStage -> SubStep, TrainStage -> TrainStep * fix format * oneflow.Model: SubStep to SubModel * oneflow.Model: infer_oneflow_data_placeholder and _infer_job_signature * set placement of parameter Signed-off-by: daquexian <daquexian566@gmail.com> * reformat Signed-off-by: daquexian <daquexian566@gmail.com> * add __init__.py in oneflow.python.nn.modules Signed-off-by: daquexian <daquexian566@gmail.com> * add todo for GetCurrentJobName() * fix typo * oneflow.Model: refine error message * fix format * OOPModel: import new Module * oneflow.Model: rm FunctionConfig in Model * oneflow.Model config_exe to config_execution * OOPModel: add and test naive validate * oneflow.Model: merge module * Optimizer: user mode to confirm that Optimizer.Variable() is called inside a job * merge master * oop model : predict demo * model inherit new module * refine * refine oneflow.mode * add test oop model * add test * no_grad on ones_like * fix has_grad template * format * fix data input * fix * check oop model * add model checkpoint * rm useless code Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 121b1423ae3f059ca6cb260f067d8047ddf4fa06 * Add upsample module backward (#5025) * add argmax test * add upsample module backward * update upsample unittest * fix unittest bug * refine upsample backward * code format * fix comment * fix comment Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 86d4a6723a9a2689a9d357455cd6b9b10376b017 * Prevent CMake from using highest version of python3 (#5034) * Use conda python if available * refine * refine * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 01e1a7144f66040b9de96ce946dcce31048dd87c * rewrite slice backward (#5018) * rewrite slice backward * remove unuse .h Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: c1880c011dd453719a28d880abe15e2dab8d0da1 * fix qat (#5038) * fix qat * format * refine * add comment * Update test.yml * Update test_quantize_op.py * Update test_quantize_op.py Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Former-commit-id: 2dfb5b566f906b92b93b078b17b8328c19d5eea1 * Support convert tensorbuffer to list of numpy (#4940) * support convert tensorbuffer to list of numpy * improve speed * remove useless codes * get tensor_buffer shapes and dtypes by single function * add __eq__ and __hash__ to DType * add dynamic_out to tensor_buffer_to_list_of_tensors_v2 Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 577568af0fa0d98fd2b946831daa99231348e77a * Add JUST (#5041) Former-commit-id: 994b0df0e435c9b0a57399b1ed1cfb3a0048bf9d * Add broadcast like module backward (#5037) * add argmax test * add broadcastlike module backward, bug need fixed * fix broadcast_like backward bug * auto format by CI * refine code Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: e7bb21bb4f4b15a986d5b5270e81855514ec765e * Fix segfault caused by zlib in conda when share lib is enabled (#5045) * check in changes * address review * address review Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 9bec9735683e3adc3d9b0d336f9a88649b0f96bf * Fix norm grad func to support dynamic attrs. (#5043) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: cc1c554c74346d5b0c2b90bdaea49546168af02d * Fix segfault in new interface (#5042) * fix data race about composed attr map Signed-off-by: daquexian <daquexian566@gmail.com> * move ResetPrior before ChooseOpKernel Signed-off-by: daquexian <daquexian566@gmail.com> * delete vm before others Signed-off-by: daquexian <daquexian566@gmail.com> * revert deletion order change, sync by atexit Signed-off-by: daquexian <daquexian566@gmail.com> * add comments Signed-off-by: daquexian <daquexian566@gmail.com> * rename Signed-off-by: daquexian <daquexian566@gmail.com> * fix multi machine bug Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Former-commit-id: 68a5c10bc3f838b5c72a6e29c4cfa24b861cb1d3 * Add where module backward (#5035) * add argmax test * add where module backward * fix where module unit_test bug * add zerolike and where op function * add backward code * add broadcast like backward * refine * fix where module backward bug * rebuild test * fix comment * fix comment * fix comment * fix comment * fix comments Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: dec89a96af3eaddbf78f362e693df5030bd5420f * Query system status if CI failed (#5052) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 26bbcfcdf64946d5a3e92d860fdcbb806c44a575 * fix(vm): add virtual mechine backpressure (#5050) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: cd21df2f36bd3aec79ea4893762e373ae12aed98 * change in_edages/out_edages to from SKIPLIST to LIST (#5047) * change in_edages/out_edages to from SKIPLIST to LIST * minor fix * refine Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: b95682624a6b93f276830e28c070a5f6a7c77f4c * Stop heartbeat and add barrier before Global<CtrlServer>::Delete() (#5010) refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: e861dd3b26d5b94b478528d5de77f665b8bf2476 * fix error on exiting (#5053) * fix error on exiting Signed-off-by: daquexian <daquexian566@gmail.com> * lazily get rank Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 51335128e02356011b1943f0c71ca1ec9fe3963b * Restore eager r50 case (#5055) Former-commit-id: 64665e75e702527859ef5b82eafa1caa05d8d229 * Add argmax softplus logsigmoid module backward (#5049) * add argmax test * add argmax module backward, bug need fixed * add leakyrelu module backward * delete argmax backward test * delete argmax backward test * add softplus module backward * code format * add softplus module backward * add logsigmoid module backward Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 06b549b3123df4d889a822bcdee8cf8d2daecf48 * add doctest (#5046) * add doctest * refine * update modules doctest Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 7394ec5ff76ab2ad62d0a257d3f720cd5f3b035c * refactor DType (#5024) * refactor DType * fix compiler complains * DType is only allowed to be used in python code * dtype api bugfix * fix error on exiting Signed-off-by: daquexian <daquexian566@gmail.com> * lazily get rank Signed-off-by: daquexian <daquexian566@gmail.com> * Export const DType* into python Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: daquexian <daquexian566@gmail.com> Former-commit-id: c2b2eb25269679775e650e4ea9bedf96f1be5efc * remove try_init_session in new interface (#5061) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 912bc97da3d13f9d68cb7b9f3a3f41e766aef89e * Rewrite batch broadcast matmul backward (#5012) * rewrite matmul op backward * refine * update * fix comments * refine * refine * rewrite batch broadcast matmul backward * refine * refine * refine * refine * Add JUST * restructure matmul series module backward * refine * fix comments * fix comments * refine Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: c4f277a158fd3435f235dc013fa5d9b015401708 * modify format Former-commit-id: 5bdb45adbb0881263ef5fe0afe13dd606a8e6ff6 * modify format * run make of_format Former-commit-id: e976b9b0f88bc2e91fddb3053dab5bd5001e8318 * run make of_format * lock cmake version in manylinux cmake (#5057) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: fe662377fc70f23d9ff2f495fc48774770c802fe * Doctest support in CI (#4973) * check in changes * refine * fix * add test on obj * add relu example * run doctest in ci * dont delete python * address review * address review * address review * address review * address review Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 56f6348d04bb9f24a1eb8e1c4f8d82d8c4f97f89 * Update README for 0.4.0 (#4965) * refine * remove content * refine * require py36 * address review * refine * address review Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: cc3162062b30228a4d335e9be66b08e701042609 * Add step lr and lambda lr (#5063) * feat(StepLR): add StepLR * feat(LambdaLR): add LambdaLR * docs(LambdaLR): fix document * style(LambdaLR): add comment Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 4c005b7c673892693b9b5541281d6d3f254d8a8a * Release cuda 112 (#5060) * Nightly for cu112 * add arg * nightly Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 72cfa9916e040433426b5e6cef1752e394c12a3d * add flow.asin, flow.Tensor.asin, flow.arcsin, flow.Tensor.arcsin, flow.asinh, flow.Tensor.asinh, flow.arcsinh, flow.Tensor.arcsinh (#4955) * add flow.asin and torch.arcsin * add torch.asin and torch.arcsin * add torch.sin and torch.arcsin * add torch.asin and torch.arcsin * add torch.asin and torch.arcsin * add torch.asin and torch.arcsin * add torch.asin and torch.arcsin * add torch.asin and torch.arcsin * update test_asin.py including forward and backward * Update test_math_ops.py remove asin testcase * update testcase including forward and backward * add torch.asinh and torch.Tensor.asinh * update testcase of asin and asinh * update testcase of asin and asinh * update testcase of asin and asinh * update testcase * update testcase * make format * update license * update testcase * check in * qq mail * wrong fmt * auto format by CI * use youarefly@qq.com * wrong fmt * auto format by CI * use ci-bot@oneflow.org * wrong fmt * auto format by CI * mv op testcase to test_tensor.py * auto format by CI * update docstring * update docstring * update doctest * update doctest * auto format by CI * update arcsinh * auto format by CI Co-authored-by: 陈岱渊 <chendy@zhejianglab.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <373331853@qq.com> Co-authored-by: oneflow-ci-bot <youarefly@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Former-commit-id: 95748870b6cfa617a7da31ad738098a999cc68f6 * Add modules doctest bbuf (#5058) * add argmax test * add nllloss doctest * add doctest * add crossentropyloss doctest * add expand module doctest * add squeeze module doctest * add repeat module doctest * add exp module doctest * add argmax module doctest * add matmul module doctest * add greater module doctest * add less module doctest * add negative module doctest * add linear module doctest * add tanh module doctest * add gelu module doctest * add reshape module doctest * add transpose module doctest * add where module doctest * add permute module doctest * add prelu module doctest * add hardtanh module doctest * add activation module doctest * add activation module doctest * add upsample module doctest * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 58183f8713dd50d332d82ff481c3f0dd3bd8f8b5 * remove origin testCase in test_math_ops Former-commit-id: 3a4bae704e2698225b15734fb440a6f8ddef5120 * remove origin testCase in test_math_ops * Add tensor detach python api (#5068) * add argmax test * add tensor detach python api * delete unuse code Former-commit-id: 9b30b7c92d5f866f7d8fa6863c654528e7b34e95 * Delete preprocessor_internal.h.REMOVED.git-id * Delete nn_ops.py.REMOVED.git-id Co-authored-by: Lyon <flowingsun007@163.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Yurui Li <32978179+poohRui@users.noreply.github.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Liang Depeng <liangdepeng@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: Yao Chi <later@usopp.net> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mosout <mosout@qq.com> Co-authored-by: poohRui <yuruil@qq.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: oneflow-ci-bot <373331853@qq.com> Co-authored-by: oneflow-ci-bot <youarefly@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: YongtaoShi <73167956+YongtaoShi@users.noreply.github.com> Co-authored-by: yayeoCddy <Dy_Chen95@163.com> Co-authored-by: 陈岱渊 <chendy@zhejianglab.com>

* fix error on exiting Signed-off-by: daquexian <daquexian566@gmail.com> * lazily get rank Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Former-commit-id: 5133512

fix error on exiting

01345ed

Signed-off-by: daquexian <daquexian566@gmail.com>

daquexian added automerge bug python labels May 31, 2021

Merge branch 'master' into fix_atexit_error

848c832

daquexian requested a review from oneflow-ci-bot May 31, 2021 11:45

Ldpe2G approved these changes May 31, 2021

View reviewed changes

wyg1997 approved these changes May 31, 2021

View reviewed changes

oneflow-ci-bot removed their request for review May 31, 2021 12:55

daquexian and others added 3 commits May 31, 2021 21:13

lazily get rank

27e3fbb

Signed-off-by: daquexian <daquexian566@gmail.com>

Merge branch 'master' into fix_atexit_error

68f4acd

Merge branch 'master' into fix_atexit_error

97d9872

oneflow-ci-bot self-requested a review May 31, 2021 14:15

leaves-zwx approved these changes May 31, 2021

View reviewed changes

Merge branch 'master' into fix_atexit_error

8938f8f

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot May 31, 2021 15:33

Merge branch 'master' into fix_atexit_error

b6cd1da

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot May 31, 2021 16:39

oneflow-ci-bot merged commit 5133512 into master May 31, 2021

oneflow-ci-bot deleted the fix_atexit_error branch May 31, 2021 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix error on exiting #5053

fix error on exiting #5053

daquexian commented May 31, 2021

fix error on exiting #5053

fix error on exiting #5053

Conversation

daquexian commented May 31, 2021