Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[2.0] Add cpp-package #20131

Merged
merged 48 commits into from
May 24, 2021
Merged

[2.0] Add cpp-package #20131

merged 48 commits into from
May 24, 2021

Conversation

barry-jin
Copy link
Contributor

@barry-jin barry-jin commented Apr 6, 2021

Description

Migrate cpp-package to MXNet2.0. Still work in progress.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Adopt CachedOp API on graph executor
  • Adopt autograd API on backward pass
  • Port SoftmaxOutput operator to 2.0
  • Test in CI
  • Xavier initailizer will result in gradient vanish, switch to use Uniform and need more investigation.
  • Add Documentation
  • Refactor OpWrapperGenerator.py to solve cross-compilation problems

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mxnet-bot
Copy link

Hey @barry-jin , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [unix-gpu, edge, miscellaneous, centos-cpu, clang, website, sanity, windows-gpu, windows-cpu, centos-gpu, unix-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added the pr-work-in-progress PR is still work in progress label Apr 6, 2021
@barry-jin
Copy link
Contributor Author

@mxnet-bot run ci [sanity, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [sanity, unix-gpu]

@barry-jin
Copy link
Contributor Author

@mxnet-bot run ci [sanity, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [sanity, unix-gpu]

@barry-jin
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@barry-jin barry-jin closed this Apr 16, 2021
@barry-jin barry-jin reopened this Apr 16, 2021
@mseth10 mseth10 added the pr-work-in-progress PR is still work in progress label May 11, 2021
@barry-jin
Copy link
Contributor Author

@mxnet-bot run ci [centos-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 11, 2021
Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @barry-jin! You're adding back a subset of the operators removed in #18531 ("These operators are often used with mx.module APIs. Removing them for mxnet 2.0. You can find equivalent loss functions in mx.gluon.loss namespace.") Why not add back all the Operators that are commonly used therein? This may also be worth an update in the MXNet 2 RFC #16167 as you propose to support the Symbol APIs in the CPP package. For example, this change in CPP package also means that there is no need to drop the other language bindings that were also removed in #18531

Comment on lines 63 to 67
# skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/20011
#cp ../../build/cpp-package/example/test_regress_label .
#./test_regress_label

# sh unittests/unit_test_mlp_csv.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried running?

cpp-package/tests/travis/run_test.sh Outdated Show resolved Hide resolved
src/operator/softmax_output-inl.h Show resolved Hide resolved
@@ -0,0 +1,2 @@
# Rebuildable file(s)
op.h
Copy link
Contributor

@leezu leezu May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is op.h placed inside the build dir? Usually we only support out of source build in cmake. This .gitignore may no longer be needed as Makefile in-source build is no longer supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

op.h is generated by OpWrapperGenerator.py from user side, which should not be tracked by git. Should I just put this .gitignore in cpp-package directory or merged with incubator-mxnet/.gitignore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

op.h is generated by OpWrapperGenerator.py from user side

OpWrapperGenerator.py is invoked by cmake inside the build folder ("out of source build"). Usually out-of-source build should not modify the code / files outside of the build folder.

## Building C++ Package

The cpp-package directory contains the implementation of C++ API. As mentioned above, users are required to build this directory or package before using it.
**The cpp-package is built while building the MXNet shared library, *libmxnet.so*.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason for that. cpp-package depends only on libmxnet.so C APIs. It would be better to keep a separate CMakeLists.txt for the cpp-package with the only requirement to find libmxnet.so. This implies removing the USE_CPP_PACKAGE in the main CMakeLists.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. Let me update the documentation.

ci/docker/runtime_functions.sh Show resolved Hide resolved
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 13, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 14, 2021
@barry-jin
Copy link
Contributor Author

@mxnet-bot run ci [centos-gpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-gpu, unix-gpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 17, 2021
Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the missing operators. I think the PR is good to merge now and the other points can be addressed later

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants