Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Enable Large Tensor Support: Stage 1 #18625

Merged
merged 8 commits into from
Nov 17, 2020

Conversation

access2rohit
Copy link
Contributor

@access2rohit access2rohit commented Jun 27, 2020

Description

This PR enables Large Tensor Support(LTS) by default on master for all platforms except for Miscellaneous CLang build and UNIX MKL blas builds with lapack for CI stages and build scripts for both dynamic and static builds(make as well as ninja).
DO NOT MERGE BEFORE THIS PR: #17882

Progress for Large Tensor Support tracked here: #17331

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@leezu @ChaiBapchya @josephevans

@mxnet-bot
Copy link

Hey @access2rohit , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [sanity, miscellaneous, edge, centos-cpu, windows-gpu, windows-cpu, clang, unix-cpu, website, centos-gpu, unix-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@access2rohit
Copy link
Contributor Author

@leezu @ChaiBapchya @josephevans Please review

@access2rohit
Copy link
Contributor Author

@mxnet-label-bot add [pr-awaiting-review]

ci/build_windows.py Outdated Show resolved Hide resolved
ci/docker/runtime_functions.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its ready to merge, lets rename the PR title.
Looks good except that MKL_IF_AVAILABLE switch.

ci/docker/runtime_functions.sh Outdated Show resolved Hide resolved
config/distribution/linux_cpu.cmake Outdated Show resolved Hide resolved
config/distribution/linux_native.cmake Outdated Show resolved Hide resolved
@@ -53,7 +53,8 @@ struct polyval_backward_p {
DType igrad_p = 0;
index_t j = x_size - 1;
while (j >= 0) {
igrad_p += pow(x_dptr[j], p_size - i - 1) * ograd_dptr[j];
igrad_p += pow(x_dptr[j], static_cast<DType>(p_size) -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm, there is a unittest coverage for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -53,7 +53,8 @@ struct polyval_backward_p {
DType igrad_p = 0;
index_t j = x_size - 1;
while (j >= 0) {
igrad_p += pow(x_dptr[j], p_size - i - 1) * ograd_dptr[j];
igrad_p += pow(x_dptr[j], static_cast<DType>(p_size) -
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents Windows CI build failure

@access2rohit access2rohit force-pushed the lts_enable_stage1 branch 3 times, most recently from bf5a241 to 1b4e4b3 Compare July 2, 2020 12:28
@access2rohit access2rohit changed the title [DO NOT MERGE] Enable Large Tensor Support : Stage1 [WIP] Enable Large Tensor Support Jul 2, 2020
@access2rohit access2rohit changed the title [WIP] Enable Large Tensor Support [WIP] Enable Large Tensor Support: Stage 1 Jul 3, 2020
@access2rohit access2rohit changed the title [WIP] Enable Large Tensor Support: Stage 1 Enable Large Tensor Support: Stage 1 Jul 3, 2020
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 13, 2020
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Nov 16, 2020
@lanking520 lanking520 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Nov 16, 2020
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 17, 2020
@Zha0q1
Copy link
Contributor

Zha0q1 commented Nov 17, 2020

@leezu would you review

@leezu leezu merged commit fcfef81 into apache:master Nov 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants