Dev randint refine (#5981)

* disable backward pass consistent tensor meta check. (#5871) * disable backward pass consistent tensor meta check. * auto format by CI Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * ddp broadcast params and buffers (#5913) * ddp broadcast params and buffers Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * add clang tidy target (#5957) * add clang tidy target * fix a bug * refine * refine * reformat Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * cfg: add move assignment operator for performance (#5962) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add zhangshen op-test (#5600) * add some op-test * fix dims_error in my branch * Fix the bad backward kernel function by using 'cuda::atomic::Add' (#5614) * Test `nn.AdaptiveAvgPoolXd` (#5615) * Fix the bad backward kernel function by using 'cuda::atomic::Add' * Support the 'NoneType' annotation * Support objects of 'collections.abc.Iterable' as 'output_size' * Test with all cases of 'output_size' * Update adaptive_pool_gpu_kernel.cu * Skip testing `nn.AdaptiveAvgPool3d` for the current PyTorch * remove some useless test * Format TODO * Add the assertion messages for 'output_size' * Reformat codes * Remove raw tests for `flow.negative` * Remove unnecessary codes and add the assertion messages * Merge updates for 'generators.py' from master * Remove unnecessary 'random()' * Delete the separate test for `AvgPool2d` * Fix import paths * Fix import problems * Remove the PyTorch import * Denote the annotations for `tile` and `repeat` ops * Add the test for `nn.AvgPool1d` * Choose better generators for `nn.MaxPoolXd` * Randomly choose `dilation` and default values * auto format by CI * Test more kwargs for `nn.AvgPoolXd` * Add tests for `return_indices` * auto format by CI Co-authored-by: Tianyu Zhao <guikarist@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix wrong names (#5951) * fix wrong names * auto format by CI * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Enable more checkers for clang-tidy in CI (#5738) * CI: enable more checkers for clang-tidy * .clang-tidy: remove cppcoreguidelines-pro-type-vararg * CI: remove duplicate checkers * CI: remove clang-analyzer-alpha.deadcode.* * .clang-tidy: add performance-* * oneflow/core/eager: remove unnecessary malloc & free * .clang-tidy: add clang-analyzer-cplusplus.* to werror * user_kernel: remove useless move * quantization_aware_training: fix move return * .clang-tidy: add google-* * CI: fix clang tidy command * CI: fix test Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Feat grad mode classes (#5956) * feat(no_grad): support no_grad decorator * feat(AutogradMode): export flow.autograd_mode * feat(GradMode): export some grad_mode class * docs(GradMode): export documents * refine * docs(GradMode): export document for is_grad_enabled * auto format by CI * fix(GradMode): fix single client bug * fix bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * extract_consistent_to_consistent_op_expr (#5870) * abstract_consistent_to_consistent_op_expr * fix compiler complaint * refactor consistent-to-consistent eager consisitent op interpreter * fix compiler complaint * refactor ConsistentToConsistentOpExpr * lazy interpreter (#5903) * fix bugs about consistent_id * refactor functional::ToConsistent * refactor GetNdSbp * Update eager_consistent_op_interpreter.cpp * Update eager_mirrored_op_interpreter.cpp * fix error * fix error * auto format by CI * Update nd_sbp.h * refine identity boxing * fix sync checkmeta error * avoid consistent id check in lazy Co-authored-by: Xinqi Li <lixinqi0703106@163.com> Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * add CMAKE_INTERPROCEDURAL_OPTIMIZATION in fast cmake cache (#5970) * add CMAKE_INTERPROCEDURAL_OPTIMIZATION in fast cmake cache * skip test targets of re2 Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * check: fix clang-tidy-diff commands (#5972) * check: fix clang-tidy-diff commands * CI: fix step names Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Cpu mpi (#5865) * cuda base cpu mpi boxing * cpu_mpi * fix conflicts * add cpu mpi unittests * more checks and unittests * abstract_consistent_to_consistent_op_expr * fix compiler complaint * refactor consistent-to-consistent eager consisitent op interpreter * fix compiler complaint * refactor ConsistentToConsistentOpExpr * lazy interpreter (#5903) * fix bugs about consistent_id * more test_consistent_cast unittests * refactor functional::ToConsistent * refactor GetNdSbp * fix compiler complaints * refactor GetDevice4CurrentProcessCtx * fix error Co-authored-by: clackhan <han_binbin@163.com> Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix_bug_test_tensor_str (#5958) * fix bug int test_tensor_str * format * fix comment * fix bug to(cuda) is unavailable in cpu env Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * common/error: fix build error in mac (#5971) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Prevent running oneflow in forked subprocess (#5976) * prevent_running_oneflow_in_forked_subprocess * add line change * IsFork => IsForkedSubProcess * auto format by CI Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * refine randint Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Twice <i@twice.moe> Co-authored-by: ZhangShen <55383772+zhangshen12356@users.noreply.github.com> Co-authored-by: Tianyu Zhao <guikarist@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Xinqi Li <lixinqi0703106@163.com> Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: liufengwei0103 <2472937968@qq.com>
Oneflow-Inc · Aug 20, 2021 · 671f4f5 · 671f4f5
1 parent 038320c
commit 671f4f5
Show file tree

Hide file tree

Showing 92 changed files with 1,943 additions and 2,377 deletions.
diff --git a/.clang-tidy b/.clang-tidy
@@ -1,4 +1,15 @@
-# maybe-* checks are only available on OneFlow custom clang-tidy and clangd
-Checks: '-*, maybe-*'
+# `maybe-*` checks are only available on OneFlow custom clang-tidy and clangd
+# `-allow-enabling-analyzer-alpha-checkers` should be passed to clang-tidy for CSA checkers named `clang-analyzer-alpha.*` (or `-allow-enabling-alpha-checkers` for run-clang-tidy.py)
+# `aggressive-binary-operation-simplification` should be enabled (via `-Xclang -analyzer-config -Xclang aggressive-binary-operation-simplification=true` in clang)
+# there is some problem in `clang-analyzer-alpha.clone.*`, so do not enable it
+# `clang-analyzer-alpha.deadcode.*` is just too verbose to enable
+Checks: '-*, maybe-*, clang-analyzer-core.*, clang-analyzer-cplusplus.*, clang-analyzer-nullability.*, clang-analyzer-deadcode.*, clang-analyzer-security.*, clang-analyzer-optin.cplusplus.*, clang-analyzer-optin.performance.*, clang-analyzer-alpha.core.*, clang-analyzer-alpha.cplusplus.*, clang-analyzer-alpha.security.*, cppcoreguidelines-avoid-goto, cppcoreguidelines-init-variables, cppcoreguidelines-interfaces-global-init, cppcoreguidelines-no-malloc, cppcoreguidelines-prefer-member-initializer, cppcoreguidelines-pro-type-member-init, cppcoreguidelines-pro-type-static-cast-downcast, cppcoreguidelines-slicing, cppcoreguidelines-special-member-functions, performance-unnecessary-value-param, performance-unnecessary-copy-initialization, performance-noexcept-move-constructor, performance-no-automatic-move, performance-move-const-arg, performance-implicit-conversion-in-loop, performance-for-range-copy, google-default-arguments, google-global-names-in-headers, google-explicit-constructor'
 # TODO: treat all maybe warnings as errors when existing warnings are all fixed
-WarningsAsErrors: 'maybe-unused'
+WarningsAsErrors: 'maybe-unused, clang-analyzer-nullability.*, clang-analyzer-cplusplus.*, performance-implicit-conversion-in-loop, performance-move-const-arg, performance-no-automatic-move, performance-noexcept-move-constructor, google-default-arguments, google-global-names-in-headers'
+
+CheckOptions:
+  # `cppcoreguidelines-special-member-functions` is enabled, refer to https://en.cppreference.com/w/cpp/language/rule_of_three
+  - key:             cppcoreguidelines-special-member-functions.AllowSoleDefaultDtor
+    value:           True
+  - key:             performance-move-const-arg.CheckTriviallyCopyableMove
+    value:           False
diff --git a/.github/workflows/simple.yml b/.github/workflows/simple.yml
@@ -50,7 +50,7 @@ jobs:
             -DCMAKE_BUILD_TYPE=Release \
             -DBUILD_TESTING=ON
           cmake --build . -j$(nproc) --target of_git_version oneflow_deps generate_functional of_cfgobj generate_py_cfg
-      - name: Run Maybe-related checks by clang-tidy
+      - name: Run clang-tidy for all translation units
         # use clang as compiler for correct compiler flags
         run: |
           cd build
@@ -62,7 +62,7 @@ jobs:
             -DBUILD_TESTING=ON \
             -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
           cd ..
-          ./run-clang-tidy.py -clang-tidy-binary ./clang-tidy-489012f-x86_64.AppImage -p build -quiet
+          ./run-clang-tidy.py -clang-tidy-binary ./clang-tidy-489012f-x86_64.AppImage -p build -quiet -allow-enabling-alpha-checkers -extra-arg="-Xclang" -extra-arg="-analyzer-config" -extra-arg="-Xclang" -extra-arg="aggressive-binary-operation-simplification=true" '^((?!third_party_install).)+(?<!cfg.cpp)(?<!pb.cc)$'
 
   hosted:
     name: CPU-only

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -749,7 +749,7 @@ jobs:
         run: |
           git remote add upstream https://github.com/Oneflow-Inc/oneflow
           git fetch upstream
-      - name: Run Maybe-related checks by clang-tidy
+      - name: Run clang-tidy for modified files
         # use clang as compiler for correct compiler flags
         run: |
           cd build
@@ -761,4 +761,4 @@ jobs:
             -DBUILD_TESTING=ON \
             -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
           cd ..
-          git diff -U0 ${{ github.event.pull_request.base.sha }} | ./clang-tidy-diff.py -clang-tidy-binary ./clang-tidy-489012f-x86_64.AppImage -path build -quiet -j $(nproc) -p1
+          git diff -U0 ${{ github.event.pull_request.base.sha }} | ./clang-tidy-diff.py -clang-tidy-binary ./clang-tidy-489012f-x86_64.AppImage -path build -allow-enabling-alpha-checkers -j $(nproc) -p1 -extra-arg="-Xclang" -extra-arg="-analyzer-config" -extra-arg="-Xclang" -extra-arg="aggressive-binary-operation-simplification=true"
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -23,6 +23,7 @@ if (NOT THIRD_PARTY AND NOT ONEFLOW)
 endif()
 
 option(USE_CLANG_FORMAT "" OFF)
+option(USE_CLANG_TIDY "" OFF)
 option(BUILD_RDMA "" OFF)
 option(BUILD_CUDA "" ON)
 option(BUILD_TESTING "" OFF)

diff --git a/ci/check/run_clang_tidy.py b/ci/check/run_clang_tidy.py
@@ -0,0 +1,97 @@
+#!/usr/bin/env python2
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import asyncio
+import argparse
+import subprocess
+import os
+
+
+def split_and_print(prefix, text):
+    lines = text.decode().splitlines(keepends=True)
+    prefixed = ""
+    for l in lines:
+        prefixed += f"{prefix} {l.strip()}"
+    if l.strip():
+        print(prefixed, flush=True)
+
+
+async def handle_stream(stream, cb):
+    while True:
+        line = await stream.readline()
+        if line:
+            cb(line)
+        else:
+            break
+
+
+async def run_command(cmd=None, dry=False, name=None):
+    if dry:
+        print(f"[dry] {cmd}")
+        return 0
+    process = await asyncio.create_subprocess_shell(
+        cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE,
+    )
+    l = lambda x: split_and_print(f"[{name}]" if name else "", x)
+    await asyncio.gather(
+        handle_stream(process.stdout, l), handle_stream(process.stderr, l),
+    )
+    await process.wait()
+    return process.returncode
+
+
+def download(build_dir, dry=False):
+    urls = [
+        "https://github.com/Oneflow-Inc/llvm-project/releases/download/latest/clang-tidy-489012f-x86_64.AppImage"
+        if os.getenv("CI")
+        else "https://oneflow-static.oss-cn-beijing.aliyuncs.com/bin/clang-tidy/linux-x86_64/clang-tidy.AppImage",
+        "https://raw.githubusercontent.com/oneflow-inc/llvm-project/maybe/clang-tools-extra/clang-tidy/tool/clang-tidy-diff.py",
+    ]
+    dst_dir = f"{build_dir}/cache/bin"
+    dst = [f"{dst_dir}/clang-tidy", f"{dst_dir}/clang-tidy-diff.py"]
+    if dry:
+        if os.path.isfile(dst[0]) and os.path.isfile(dst[1]):
+            return dst
+        else:
+            None
+    else:
+        assert subprocess.call(f"mkdir -p {dst_dir}", shell=True) == 0
+        for i, _dst in enumerate(dst):
+            assert subprocess.call(f"curl -L {urls[i]} -o {_dst}", shell=True) == 0
+            assert subprocess.call(f"chmod +x {_dst}", shell=True) == 0
+        return dst
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Runs clang-tidy on all of the source files."
+    )
+    parser.add_argument(
+        "--build_dir", required=True,
+    )
+    args = parser.parse_args()
+    loop = asyncio.get_event_loop()
+    downloaded = download(args.build_dir, dry=True)
+    if downloaded is None:
+        downloaded = download(args.build_dir)
+    promises = [
+        run_command(
+            f"cd .. && git diff -U0 master | {downloaded[1]} -clang-tidy-binary {downloaded[0]} -path {args.build_dir} -j $(nproc) -p1 -allow-enabling-alpha-checkers -extra-arg=-Xclang -extra-arg=-analyzer-config -extra-arg=-Xclang -extra-arg=aggressive-binary-operation-simplification=true"
+        )
+    ]
+    loop.run_until_complete(asyncio.gather(*promises))
diff --git a/cmake/caches/cn/fast/cpu.cmake b/cmake/caches/cn/fast/cpu.cmake
@@ -7,3 +7,4 @@ set(CMAKE_BUILD_TYPE RelWithDebInfo CACHE STRING "")
 set(CMAKE_GENERATOR Ninja CACHE STRING "")
 set(CMAKE_C_COMPILER_LAUNCHER sccache CACHE STRING "")
 set(CMAKE_CXX_COMPILER_LAUNCHER sccache CACHE STRING "")
+set(CMAKE_INTERPROCEDURAL_OPTIMIZATION OFF CACHE BOOL "")
diff --git a/cmake/caches/cn/fast/cuda-61.cmake b/cmake/caches/cn/fast/cuda-61.cmake
@@ -8,3 +8,4 @@ set(CUDA_NVCC_GENCODES "arch=compute_61,code=sm_61" CACHE STRING "")
 set(CMAKE_C_COMPILER_LAUNCHER sccache CACHE STRING "")
 set(CMAKE_CXX_COMPILER_LAUNCHER sccache CACHE STRING "")
 set(CMAKE_CUDA_COMPILER_LAUNCHER sccache CACHE STRING "")
+set(CMAKE_INTERPROCEDURAL_OPTIMIZATION OFF CACHE BOOL "")
diff --git a/cmake/caches/cn/fast/cuda-75.cmake b/cmake/caches/cn/fast/cuda-75.cmake
@@ -8,3 +8,4 @@ set(CUDA_NVCC_GENCODES "arch=compute_75,code=sm_75" CACHE STRING "")
 set(CMAKE_C_COMPILER_LAUNCHER sccache CACHE STRING "")
 set(CMAKE_CXX_COMPILER_LAUNCHER sccache CACHE STRING "")
 set(CMAKE_CUDA_COMPILER_LAUNCHER sccache CACHE STRING "")
+set(CMAKE_INTERPROCEDURAL_OPTIMIZATION OFF CACHE BOOL "")
diff --git a/cmake/oneflow.cmake b/cmake/oneflow.cmake
@@ -209,7 +209,11 @@ add_custom_target(of_format
   COMMAND ${Python_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/ci/check/run_clang_format.py --source_dir ${CMAKE_CURRENT_SOURCE_DIR}/oneflow --fix --quiet
   COMMAND ${Python_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/ci/check/run_py_format.py --source_dir ${CMAKE_CURRENT_SOURCE_DIR} --fix
   )
-
+# clang tidy
+add_custom_target(of_tidy
+  COMMAND ${Python_EXECUTABLE} ${CMAKE_SOURCE_DIR}/ci/check/run_clang_tidy.py --build_dir ${CMAKE_BINARY_DIR}
+  DEPENDS of_git_version oneflow_deps generate_functional of_cfgobj generate_py_cfg
+  )
 # generate version
 set(OF_GIT_VERSION_DIR ${CMAKE_CURRENT_BINARY_DIR}/of_git_version)
 set(OF_GIT_VERSION_FILE ${OF_GIT_VERSION_DIR}/version.cpp)
@@ -292,6 +296,9 @@ add_dependencies(of_ccobj of_git_version)
 if (USE_CLANG_FORMAT)
   add_dependencies(of_ccobj of_format)
 endif()
+if (USE_CLANG_TIDY)
+  add_dependencies(of_ccobj of_tidy)
+endif()
 
 target_link_libraries(of_ccobj of_protoobj of_cfgobj ${ONEFLOW_CUDA_LIBS} glog_imported)
 

diff --git a/cmake/third_party/re2.cmake b/cmake/third_party/re2.cmake
@@ -29,5 +29,6 @@ if (THIRD_PARTY)
           -DCMAKE_INSTALL_PREFIX:PATH=${RE2_INSTALL_DIR}
           -DCMAKE_INSTALL_LIBDIR:PATH=${RE2_LIBRARY_DIR}
           -DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=ON
+          -DRE2_BUILD_TESTING:BOOL=OFF
           -DCMAKE_BUILD_TYPE:STRING=${CMAKE_BUILD_TYPE})
 endif (THIRD_PARTY)
diff --git a/docs/source/oneflow.rst b/docs/source/oneflow.rst
@@ -123,5 +123,9 @@ oneflow
             zeros, 
             zeros_like,
             is_nonzero,
+            no_grad,
+            grad_enable,
+            inference_mode,
+            is_grad_enabled,
 
 .. autofunction:: oneflow.data.load_mnist(train_batch_size=100, test_batch_size=100, data_format='NCHW')
diff --git a/oneflow/api/python/autograd/autograd.cpp b/oneflow/api/python/autograd/autograd.cpp
@@ -71,7 +71,7 @@ Maybe<one::TensorTuple> Backward(const one::TensorTuple& outputs, const one::Ten
                                  bool retain_graph, bool create_graph) {
   if (create_graph) { retain_graph = true; }
   std::shared_ptr<one::TensorTuple> gradients = JUST(CheckAndInitOutGrads(outputs, out_grads));
-  JUST(one::GetThreadLocalAutogradEngine()->RunBackwardAndSaveGrads4LeafTensor(
+  JUST(one::GetThreadLocalAutogradEngine()->RunBackwardAndSaveGrads4LeafTensorIf(
       outputs, *gradients, retain_graph, create_graph));
   return std::make_shared<one::TensorTuple>(0);
 }
@@ -86,7 +86,7 @@ Maybe<one::TensorTuple> Grad(const one::TensorTuple& outputs, const one::TensorT
       [](const std::shared_ptr<one::Tensor>& tensor) { return tensor->requires_grad(); }))
       << "All input tensors `.requires_grad` should be true";
   std::shared_ptr<one::TensorTuple> gradients = JUST(CheckAndInitOutGrads(outputs, out_grads));
-  return one::GetThreadLocalAutogradEngine()->RunBackwardAndReturnInputsTensorGrad(
+  return one::GetThreadLocalAutogradEngine()->RunBackwardAndReturnInputsTensorGradIf(
       outputs, inputs, *gradients, retain_graph, create_graph);
 }
 

diff --git a/...low/api/python/autograd/no_grad_guard.cpp → ...low/api/python/autograd/autograd_mode.cpp b/...low/api/python/autograd/no_grad_guard.cpp → ...low/api/python/autograd/autograd_mode.cpp
@@ -26,11 +26,12 @@ namespace oneflow {
 namespace autograd {
 
 ONEFLOW_API_PYBIND11_MODULE("autograd", m) {
-  py::class_<NoGradGuard, std::shared_ptr<NoGradGuard>>(m, "no_grad")
-      .def(py::init([]() { return std::make_shared<NoGradGuard>(); }))
-      .def("__enter__", [](const NoGradGuard& no_grad_obj) {})
-      .def("__exit__", [](const NoGradGuard& no_grad_obj, const py::object& type,
+  py::class_<AutoGradMode, std::shared_ptr<AutoGradMode>>(m, "AutoGradMode")
+      .def(py::init([](bool mode) { return std::make_shared<AutoGradMode>(mode); }))
+      .def("__enter__", [](const AutoGradMode& no_grad_obj) {})
+      .def("__exit__", [](const AutoGradMode& no_grad_obj, const py::object& type,
                           const py::object& value, const py::object& traceback) {});
+  m.def("is_grad_enabled", &GradMode::is_enabled);
 }
 
 }  // namespace autograd

diff --git a/oneflow/api/python/framework/tensor.cpp b/oneflow/api/python/framework/tensor.cpp
@@ -30,6 +30,7 @@ limitations under the License.
 #include "oneflow/core/framework/tensor_method.h"
 #include "oneflow/core/framework/device.h"
 #include "oneflow/core/framework/stride.h"
+#include "oneflow/core/framework/nd_sbp.h"
 #include "oneflow/core/framework/py_distribute.h"
 #include "oneflow/core/functional/value_types.h"
 #include "oneflow/core/job/placement.cfg.h"
@@ -299,9 +300,8 @@ Maybe<Tensor> NewTensor(py::args args, py::kwargs kwargs, Symbol<DType> desired_
       if (other_tensor->is_local()) {
         if (placement) {
           // LocalTensor -> ConsistentTensor
-          tensor = JUST(functional::ToConsistent(other_tensor, placement, sbp_tuple,
-                                                 /* identity_grad */ false,
-                                                 /* grad_sbp_parallels */ {}));
+          tensor =
+              JUST(functional::ToConsistent(other_tensor, placement, sbp_tuple, GetNoneSbpList()));
         } else {
           // LocalTensor -> LocalTensor
           if (!device) { device = JUST(Device::New("cpu")); }
@@ -310,9 +310,8 @@ Maybe<Tensor> NewTensor(py::args args, py::kwargs kwargs, Symbol<DType> desired_
       } else {
         if (placement) {
           // ConsistentTensor -> ConsistentTensor
-          tensor = JUST(functional::ToConsistent(other_tensor, placement, sbp_tuple,
-                                                 /* identity_grad */ false,
-                                                 /* grad_sbp_parallels */ {}));
+          tensor =
+              JUST(functional::ToConsistent(other_tensor, placement, sbp_tuple, GetNoneSbpList()));
         } else {
           // ConsistentTensor -> LocalTensor
           tensor = JUST(functional::ConsistentToLocal(other_tensor));

diff --git a/oneflow/api/python/symbol/placement_symbol.cpp b/oneflow/api/python/symbol/placement_symbol.cpp
@@ -40,53 +40,6 @@ Maybe<Shape> MakeShape(const py::tuple& py_shape) {
   return std::make_shared<Shape>(shape_dims);
 }
 
-std::string SerializePlacementSymbol2String(Symbol<ParallelDesc> placement) {
-  std::string device_type = placement->device_tag() == "gpu" ? "\"cuda\"" : "\"cpu\"";
-  std::vector<int64_t> sorted_node_ids;
-  HashMap<int64_t, std::vector<int64_t>> node_id2sorted_dev_phy_ids;
-  for (int64_t machine_id : placement->sorted_machine_ids()) {
-    int64_t node_id = GlobalProcessCtx::NodeId(machine_id);
-    if (!std::count(sorted_node_ids.begin(), sorted_node_ids.end(), node_id)) {
-      sorted_node_ids.push_back(node_id);
-    }
-    for (int64_t device_id : placement->sorted_dev_phy_ids(machine_id)) {
-      node_id2sorted_dev_phy_ids[node_id].push_back(device_id);
-    }
-  }
-  std::string machine_device_ids = "{";
-  int64_t node_idx = 0;
-  for (int64_t node_id : sorted_node_ids) {
-    std::string device_name = std::to_string(node_id) + " : [";
-    int64_t device_idx = 0;
-    for (int64_t device_id : node_id2sorted_dev_phy_ids.at(node_id)) {
-      device_name += std::to_string(device_id);
-      if (++device_idx != node_id2sorted_dev_phy_ids.at(node_id).size()) { device_name += ", "; }
-    }
-    device_name += "]";
-    if (++node_idx != sorted_node_ids.size()) { device_name += ", "; }
-    machine_device_ids += device_name;
-  }
-  machine_device_ids += "}";
-  std::string hierarchy = "(";
-  int32_t hierarchy_dim_idx = 0;
-  for (int64_t dim : placement->hierarchy()->dim_vec()) {
-    hierarchy += std::to_string(dim);
-    if (++hierarchy_dim_idx != placement->hierarchy()->dim_vec().size()) {
-      hierarchy += ", ";
-    } else if (placement->hierarchy()->dim_vec().size() == 1) {
-      hierarchy += ",";
-    }
-  }
-  hierarchy += ")";
-  std::string placement_str = "oneflow.placement(device_type=" + device_type
-                              + ", machine_device_ids=" + machine_device_ids
-                              + ", hierarchy=" + hierarchy + ")";
-  return placement_str;
-}
-
-auto* CachedSerializePlacementSymbol2String =
-    DECORATE(&SerializePlacementSymbol2String, ThreadLocal);
-
 struct PlacementSymbolExportUtil {
   static std::shared_ptr<ParallelDesc> ApiCreatePlacementSymbol(
       int64_t symbol_id, const std::shared_ptr<cfg::ParallelConf>& symbol_conf) {
@@ -207,7 +160,7 @@ struct PlacementSymbolExportUtil {
   }
 
   static std::string PlacementSymbol2String(Symbol<ParallelDesc> placement) {
-    return CachedSerializePlacementSymbol2String(placement);
+    return *PlacementToString(placement).GetPtrOrThrow();
   }
 
   static Maybe<Symbol<ParallelDesc>> ReplacePlacementDeviceTag(Symbol<ParallelDesc> parallel_desc,

diff --git a/oneflow/api/python/symbol/sbp_symbol.cpp b/oneflow/api/python/symbol/sbp_symbol.cpp
@@ -20,6 +20,7 @@ limitations under the License.
 #include "oneflow/core/common/constant.h"
 #include "oneflow/core/common/maybe.h"
 #include "oneflow/core/common/symbol.h"
+#include "oneflow/core/framework/nd_sbp.h"
 #include "oneflow/core/job/sbp_parallel.cfg.h"
 #include "oneflow/core/job/sbp_parallel.h"
 
@@ -30,17 +31,7 @@ namespace oneflow {
 namespace {
 
 std::string SbpParallelSymbolToString(const Symbol<cfg::SbpParallel>& sbp_sym) {
-  std::string sbp_str = "oneflow.sbp.";
-  if (sbp_sym->has_broadcast_parallel()) {
-    sbp_str += "broadcast";
-  } else if (sbp_sym->has_partial_sum_parallel()) {
-    sbp_str += "partial_sum";
-  } else if (sbp_sym->has_split_parallel()) {
-    sbp_str += "split(axis=" + std::to_string(sbp_sym->split_parallel().axis()) + ")";
-  } else {
-    UNIMPLEMENTED();
-  }
-  return sbp_str;
+  return *SbpToString(sbp_sym).GetPtrOrThrow();
 }
 
 Maybe<std::vector<Symbol<cfg::SbpParallel>>> MakeSplitSbpParallelList(int max_split_axis) {