Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[1.x] Backporting #18779 to v1.x #18894

Merged
merged 11 commits into from Aug 18, 2020
Merged

Conversation

samskalicky
Copy link
Contributor

Description

Backporting #18779 to v1.x

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mxnet-bot
Copy link

Hey @samskalicky , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [unix-gpu, website, centos-cpu, clang, miscellaneous, unix-cpu, windows-cpu, edge, centos-gpu, windows-gpu, sanity]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@HahTK
Copy link

HahTK commented Aug 10, 2020

Large scale cloud end users would need this feature in 1.8.
I propose we add tests for this feature for all 1.x APIs as part of this backport

@HahTK
Copy link

HahTK commented Aug 10, 2020

What is the timeline for this fix and MXNet 1.8 ?

Support additional inputs to custom subgraph ops that are not direct dependencies to ops in the subgraph. This will enable various use cases: custom control flow ops, custom ops that maintain a state that should be saved/loaded, etc.

Highlights:

* Added test that uses a graph pass (addInputPass) to add a new custom input to the subgraph op

* Added new optional argument (clear) to hybridize & optimize_for APIs in Gluon Block to enable multiple optimizations

* refactored lib_api.h JSON utilities

* added new Graph data structure utilities to simplify custom graph passes

* refactored custom op registration

* enhanced custom subgraph op to support additional inputs to subgraph op that is not an input to ops in the subgraph

* updated subgraph & graph pass READMEs

* Added error messaging from external library
@samskalicky
Copy link
Contributor Author

Large scale cloud end users would need this feature in 1.8.
I propose we add tests for this feature for all 1.x APIs as part of this backport

Hi @HahTK the PR on master was merged, and I cherry picked the commit into this branch. Notice that I left all the previous 1.x API tests (that were removed in the PR on master).

What is the timeline for this fix and MXNet 1.8 ?

merging this PR should be quick, get the CI to pass, one review from someone to check for missing things in 1.x and it should be good to go.

@samskalicky
Copy link
Contributor Author

I keep getting this weird Clojure failure:

[2020-08-16T17:41:29.383Z] lein test :only bert.bert-sentence-classification-test/train-test
[2020-08-16T17:41:29.383Z] 
[2020-08-16T17:41:29.383Z] FAIL in (train-test) (bert_sentence_classification_test.clj:90)
[2020-08-16T17:41:29.383Z] accuracy
[2020-08-16T17:41:29.383Z] expected: (< 0.5 (last (m/score model {:eval-data train-data, :eval-metric (eval-metric/accuracy)})))
[2020-08-16T17:41:29.383Z]   actual: (not (< 0.5 ##NaN))
[2020-08-16T17:41:29.383Z] WARN  org.apache.mxnet.DataDesc: Found Undefined Layout, will use default index 0 for batch axis

I did a rebase, but not sure why this is failing. None of my changes affect internal operators...

@szha
Copy link
Member

szha commented Aug 16, 2020

You need the link fix from 7a84fe1

@samskalicky
Copy link
Contributor Author

You need the link fix from 7a84fe1

Thanks! I created #18945 for the v1.x branch

@samskalicky
Copy link
Contributor Author

@HahTK @Kh4L Can you take a quick peek to make sure this PR looks good on 1.x branch? I had to make some tweaks

Copy link
Contributor

@Kh4L Kh4L left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! with some nit

@@ -159,7 +161,7 @@ MXReturnValue inferShape(const std::unordered_map<std::string, std::string>& att
unsigned kk = inshapes->at(1)[0];
unsigned m = inshapes->at(1)[1];
if (k != kk) {
std::cout << "Exected first input axis 1 equals to second input axis 0" << std::endl;
MX_ERROR_MSG << "Exected first input axis 1 equals to second input axis 0";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MX_ERROR_MSG << "Exected first input axis 1 equals to second input axis 0";
MX_ERROR_MSG << "Expected first input axis 1 equals to second input axis 0";

<< " Found output data type:" << outputs->at(0).dtype << std::endl;
MX_ERROR_MSG << "Error! Expected all inputs and outputs to be the same type."
<< "Found input storage type:" << inputs->at(0).stype
<< " Found output storage type:" << outputs->at(0).stype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwanted whitespaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! with some nit

Thanks! will add these todos to my next PR

@samskalicky samskalicky merged commit d1ac7c8 into apache:v1.x Aug 18, 2020
@szha szha added this to the v1.8.0 milestone Aug 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants