Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add AMP patching of npi ops in _api_internal module #19488

Merged
merged 3 commits into from
Nov 19, 2020

Conversation

mk-61
Copy link
Contributor

@mk-61 mk-61 commented Nov 6, 2020

Description

Apparently, some NumPy ops are registered in mxnet.ndarray.numpy._api_internal module, in addition to *._internal. This PR implements AMP patching of such ops there.

Fixes #19463.

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)

@ptrendx

@mk-61 mk-61 requested a review from szha as a code owner November 6, 2020 22:53
@mxnet-bot
Copy link

Hey @mk-61 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [unix-gpu, centos-cpu, windows-cpu, windows-gpu, centos-gpu, miscellaneous, website, clang, edge, unix-cpu, sanity]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 6, 2020
@leezu
Copy link
Contributor

leezu commented Nov 7, 2020

@mxnet-bot run ci [centos-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 7, 2020
@sxjscience
Copy link
Member

We may need to add the example in #19463 as a test case.

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-review PR is waiting for code review labels Nov 9, 2020
@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 10, 2020
@mk-61
Copy link
Contributor Author

mk-61 commented Nov 10, 2020

@mxnet-bot run ci [centos-cpu, centos-gpu, unix-cpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, unix-gpu, centos-cpu, centos-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 10, 2020
@mk-61
Copy link
Contributor Author

mk-61 commented Nov 10, 2020

@mxnet-bot run ci [centos-gpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu, centos-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 10, 2020
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 11, 2020
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 12, 2020
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 18, 2020
@TristonC
Copy link
Contributor

@sandeep-krishnamurthy Could someone help for the CI test failure? It seems the failure not related with the PR.

@leezu
Copy link
Contributor

leezu commented Nov 18, 2020

@TristonC @mk-61 I haven't seen such error before. I think it may be introduced by this PR. Please try rebasing on master first and if the issue persists we can be sure that it is due to this PR.

@mk-61
Copy link
Contributor Author

mk-61 commented Nov 19, 2020

@leezu, yes, the failures do appear to be related to this PR, yet in a way I don't fully understand. I tried running these test locally, and it appears that running test_amp_init.py changes something in a global state, that causes some other tests to fail.

It is understandable, in a way, since the test calls amp.init(), and performs some Python monkey patching - that's how AMP works. That's why I put this test into a separate module. Apparently, this is not enough, as it is not fully isolated.

So I can suggest the following ways to fix this, from most to least desirable, but for the first 2 I'll need a help from someone, more familiar with pytest and how it's integrated into our CI:

  1. Call tests in test_amp_init.py in a completely separate, isolated process.
  2. If we cannot run it completely isolated, run test_amp_init.py the last.
  3. I can temporarily mark the new test skipped (one can still run it explicitly, I guess, and it does succeed).

The last option is not ideal, since we do expect to add more tests, which require calling amp.init().

mx.npx.set_np(*flags)


@pytest.fixture(scope='module')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to narrow the scope to function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter. The only reason I put scope='module' here is that amp.init() is called once per module. But even that doesn't matter much, since all subsequent calls would be a noop.
Regardless of what you declare in pytest amp.init() cannot be undone, at least with the current API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a utility to disable amp? If it is too much work to add quickly, you may add a separate pytest call in the runtime_functions.sh file for the amp test files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the amp in pytorch is implemented as a context manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's a nifty API to enable/disable AMP with a context manager, but would definitely require a separate, non-trivial change.
For now, to unblock this PR, moved amp_init tests into a separate process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mk-61 would you like to help create a tracking issue for that change? It sounds like something we should do before the stable 2.0 release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leezu - for now, I've just added an item here - #18896 - I keep it as a TODO list for AMP, of sorts. Still, can create a separate issue if you prefer to track it like this.
About adding it before the stable 2.0 release, well... To be clear - it's not just a matter of exposing something existing via a new API. Also, I'd like to clarify our plans about #18697 first, since it affects pretty much everything AMP-related.
When do we expect the stable 2.0 release? We still have some other, higher-priority items to fix.
Finally, my hope was that if a new API is going to be a strict superset of already existing one - i.e., we'll still keep amp.init() call, but also add a way to turn it off - may be it's acceptable to add it later? Or, to put it differently, may be it's less of an issue to add a new APIs, as long as we are not removing anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leezu - for now, I've just added an item here - #18896 - I keep it as a TODO list for AMP, of sorts. Still, can create a separate issue if you prefer to track it like this.

That's fine. Thank you

About adding it before the stable 2.0 release, well... To be clear - it's not just a matter of exposing something existing via a new API. Also, I'd like to clarify our plans about #18697 first, since it affects pretty much everything AMP-related.

Using a graph pass may address the current issue of global state. A potential downside is that graph pass will only work with hybrid models.

When do we expect the stable 2.0 release? We still have some other, higher-priority items to fix.

We'd like to create an Alpha release soon. There is no date for a stable release yet.

Finally, my hope was that if a new API is going to be a strict superset of already existing one - i.e., we'll still keep amp.init() call, but also add a way to turn it off - may be it's acceptable to add it later? Or, to put it differently, may be it's less of an issue to add a new APIs, as long as we are not removing anything?

It's fine to change the API in 2.0, if we feel it should be improved.

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Nov 19, 2020
@TristonC
Copy link
Contributor

Thank you guy for your help @leezu @sxjscience.

Copy link
Member

@sxjscience sxjscience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lanking520 lanking520 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 19, 2020
@leezu leezu merged commit 6648866 into apache:master Nov 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][AMP][2.0] AMP issue of the concatenate operator
6 participants