fix custom operation in fork #14451

arcadiaphy · 2019-03-16T19:36:10Z

Description

Fix #14396. Update pthread_atfork function to properly fork on custom operator.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

wkcn

Thank you for the quick fix! LGTM
I have two questions:
Does each process have independent threads?
Does the bug exist on Windows?

anirudh2290

Thanks for the fix. Great work @arcadiaphy ! Would it be possible to add a test similar to the reproducible example you provided in the issue: #14396 ?

arcadiaphy · 2019-03-17T02:57:19Z

@wkcn For the two questions:

Yes, each process has its independent threads. Fork only duplicates the caller thread, so we need to make sure all locking primitives are in valid states and re-create the threads in child process. The easiest way is to restart CustomOperator when fork happens just like Engine does.
There is no fork on windows, so python use spawn method to create new process. I have no windows machine so I can only test on linux and mac with:import multiprocessing as mp; mp.set_start_method('spawn') It seems like python re-import mxnet in child process when spawning, so the bug doesn't exist at all.

arcadiaphy · 2019-03-17T13:00:00Z

I find another bug, the custom operator hangs when having unfinished async operations.

import mxnet as mx

class AdditionOP(mx.operator.CustomOp):
    def __init__(self):
        super(AdditionOP, self).__init__()
    def forward(self, is_train, req, in_data, out_data, aux):
        out_data[0][:] = in_data[0] + in_data[1]
    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        in_grad[0][:] = out_grad[0]
        in_grad[1][:] = out_grad[0]

@mx.operator.register("AdditionOP")
class AdditionOPProp(mx.operator.CustomOpProp):
    def __init__(self):
        super(AdditionOPProp, self).__init__()
    def list_arguments(self):
        return ['a', 'b']
    def list_outputs(self):
        return ['output']
    def infer_shape(self, in_shape):
        return in_shape, [in_shape[0]]
    def create_operator(self, ctx, shapes, dtypes):
        return AdditionOP()

def foo():
    a = mx.nd.array([1, 2, 3])
    b = mx.nd.array([4, 5, 6])
    c = mx.nd.Custom(a, b, op_type='AdditionOP')

def main():
    # program hangs in exit
    foo()
    # add following code, program hangs in forking
    # from multiprocessing import Process
    # p = Process(target=foo)
    # p.start()
    # p.join()

if __name__ == '__main__':
    main()

I test the script on ubuntu, it hangs when calling python CFUNCTYPE function in c++. If you add mx.nd.waitall() in foo to wait all async operations, the bug disappears.

It seems that calling python function from c++ when destructing static variable or forking may lead to problems.

anirudh2290 · 2019-03-17T14:37:49Z

@arcadiaphy waitall silently hides exceptions, it doesnt rethrow the exception yet. This should be fixed by #14397 . what happenswhen you add c.wait_to_read() in foo . does it still hang ?

arcadiaphy · 2019-03-17T14:49:59Z

@anirudh2290 No hangs after adding c.wait_to_read(). It only happens in calling CFUNCTYPE when destructing CustomOperator or running pre-fork Stop function.

karan6181 · 2019-03-17T15:17:34Z

@mxnet-label-bot add [Bug, Operator]

anirudh2290 · 2019-03-17T15:19:57Z

src/initialize.cc

    pthread_atfork(
      []() {
+        CustomOperator::Get()->Stop();
        Engine::Get()->Stop();


can we call Engine stop before customop stop. This will ensure that all ops pushed to engine have executed before stopping engine.

I take that back. Maybe that won't work since Stopping engine before custom op means there may be ops that havent been pushed to engine by custom op but it has stopped. Another option is to add waitall after the worker joined in CustomOp stop to make sure all pushed ops have completed.

The second option doesn't work either. When it goes into Stop function, the custom worker thread has already frozen on python function, leading to the blocking of worker join. I cannot think of a simple and graceful way to complete or ignore the unfinished custom operation other than manually add waiting command in python code.

do you know why it is frozen on worker join ? The worker should just push the operator to engine and return.

It's very easy to understand the first situation. The destruction of static variable on engine exit happens when python interpreter has been shutdown, so something bad happens when you call CFUNCTYPE function at that time.
I push a new commit to move custom Stop function to MXNotifyShutdown, forcing custom function called before python shutdown, so the first situation is all right.

The second situation of forking is so mysterious, I don't have a clue now. The custom Stop function is registered in prepare fork handler, everything should be OK at that time, but the worker thread just freezes.

arcadiaphy · 2019-03-18T08:57:01Z

I write a simple example to show the program freeze, which comes from calling CFUNCTYPE in thread when preparing fork.
fork example code

#include <iostream>
#include <thread>
#include <pthread.h>
#include "./lib.h"

callback *fun_python;

void set_python_callback(callback *func) {
  fun_python = func;
}

class Foo {
public:
  Foo() {
    pthread_atfork([]() {
      // call fun_python successfully
      std::cout << fun_python(5) << std::endl;
      // call fun_python failed in thread
      std::thread t([]() { std::cout << fun_python(10) << std::endl;});
      t.join();
    }, NULL, NULL);
  }
};

static Foo f;

YutingZhang · 2019-03-18T22:43:35Z

@arcadiaphy Does this also related to https://discuss.mxnet.io/t/custom-operator-cause-stuck-in-multicard-train/1452
Basically, a CustomOP could randomly get stuck in a multi-GPU setting. I also observed the same problem in my current experiment.

anirudh2290 · 2019-03-18T22:53:00Z

I think this may be related to https://docs.python.org/3/c-api/init.html#non-python-created-threads .
Quoting from the doc:

If you need to call Python code from these threads (often this will be part of a callback API provided by the aforementioned third-party library), you must first register these threads with the interpreter by creating a thread state data structure, then acquiring the GIL, and finally storing their thread state pointer, before you can start using the Python/C API. When you are done, you should reset the thread state pointer, release the GIL, and finally free the thread state data structure.

We don't have any such handling for callbacks in custom ops.

arcadiaphy · 2019-03-19T03:13:50Z

@anirudh2290 The program blocks because of GIL. According to the following docs in your link, if fork locks GIL, then acquiring GIL in another thread and join them will cause deadlock.

Another important thing to note about threads is their behaviour in the face of the C fork() call. On most systems with fork(), after a process forks only the thread that issued the fork will exist. That also means any locks held by other threads will never be released. Python solves this for os.fork() by acquiring the locks it uses internally before the fork, and releasing them afterwards. In addition, it resets any Lock Objects in the child. When extending or embedding Python, there is no way to inform Python of additional (non-Python) locks that need to be acquired before or reset after a fork. OS facilities such as pthread_atfork() would need to be used to accomplish the same thing. Additionally, when extending or embedding Python, calling fork() directly rather than through os.fork() (and returning to or calling into Python) may result in a deadlock by one of Python’s internal locks being held by a thread that is defunct after the fork. PyOS_AfterFork_Child() tries to reset the necessary locks, but is not always able to.

Additional handling should be added to correctly run callbacks:

#include <iostream>
#include <thread>
#include <pthread.h>
#include <Python.h>
#include "./lib.h"

callback *fun_python;

void set_python_callback(callback *func) {
  fun_python = func;
}

class Foo {
public:
  Foo() {
    pthread_atfork([]() {
      // call fun_python successfully
      std::cout << fun_python(5) << std::endl;
      // call fun_python failed in thread
      Py_BEGIN_ALLOW_THREADS
      std::thread t([]() { std::cout << fun_python(10) << std::endl;});
      t.join();
      Py_END_ALLOW_THREADS
    }, NULL, NULL);
  }
};

static Foo f;

But adding this will make mxnet link against libpython, do we really want to do this?

arcadiaphy · 2019-03-19T03:54:57Z

@YutingZhang The problem you mentioned may be related to GIL, correctly handling GIL in c++ extension is a tricky problem, easy to cause deadlock. Now we have no such handling at all.

anirudh2290 · 2019-03-19T06:52:09Z

@arcadiaphy yes I don't see other option apart from depending on libpython if we have to fix these issues w.r.t custom op. What do others think ?

arcadiaphy · 2019-03-19T07:09:09Z

@anirudh2290 If we don't want to depend on libpython, then this partial fix will work fine if you don't try to fork when having unfinished custom operation.

anirudh2290 · 2019-03-19T21:14:23Z

@arcadiaphy I think we can move forward with this PR as this is an improvement from what we have today. Can you document this as a limitation of using custom op with forking here: http://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html ,
http://mxnet.incubator.apache.org/versions/master/faq/new_op.html. I think custom op may cause issues for multi-gpu scenario because of not handling the callbacks correctly, but I haven't found a reproducible step yet. Once found I will open an issue.

arcadiaphy · 2019-03-20T04:26:11Z

@anirudh2290 Add some docs on custom op.

arcadiaphy · 2019-03-20T08:42:01Z

@anirudh2290 Docs updated, everything OK now.

* fix custom operation in fork * add test * fix custom stop * swap order * add docs * update doc

fix custom operation in fork

bbc5414

wkcn approved these changes Mar 17, 2019

View reviewed changes

anirudh2290 reviewed Mar 17, 2019

View reviewed changes

arcadiaphy force-pushed the pr_custom branch from b973f8d to bbc5414 Compare March 17, 2019 06:39

arcadiaphy force-pushed the pr_custom branch 2 times, most recently from c7871d6 to 5bd800c Compare March 17, 2019 13:57

add test

5bd800c

marcoabreu added Bug Operator labels Mar 17, 2019

anirudh2290 reviewed Mar 17, 2019

View reviewed changes

arcadiaphy added 2 commits March 18, 2019 01:36

fix custom stop

8539e90

swap order

69aebdf

arcadiaphy requested a review from szha as a code owner March 20, 2019 04:21

add docs

8951444

arcadiaphy force-pushed the pr_custom branch from 4142999 to 8951444 Compare March 20, 2019 04:29

update doc

3edd5ad

anirudh2290 approved these changes Mar 20, 2019

View reviewed changes

anirudh2290 merged commit 4b1811c into apache:master Mar 20, 2019

arcadiaphy deleted the pr_custom branch March 20, 2019 16:25

YutingZhang mentioned this pull request Mar 26, 2019

mx.nd.Custom conflicts with memory management #14522

Open

vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019

fix custom operation in fork (apache#14451)

8b81479

* fix custom operation in fork * add test * fix custom stop * swap order * add docs * update doc

ZhennanQin pushed a commit to ZhennanQin/incubator-mxnet that referenced this pull request Apr 3, 2019

fix custom operation in fork (apache#14451)

5f3a0ae

* fix custom operation in fork * add test * fix custom stop * swap order * add docs * update doc

nswamy pushed a commit that referenced this pull request Apr 5, 2019

fix custom operation in fork (#14451)

4b256aa

* fix custom operation in fork * add test * fix custom stop * swap order * add docs * update doc

arcadiaphy mentioned this pull request Apr 21, 2019

fix custom op fork test #14753

Merged

7 tasks

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

fix custom operation in fork (apache#14451)

0d543eb

* fix custom operation in fork * add test * fix custom stop * swap order * add docs * update doc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix custom operation in fork #14451

fix custom operation in fork #14451

arcadiaphy commented Mar 16, 2019 •

edited

wkcn left a comment

anirudh2290 left a comment

arcadiaphy commented Mar 17, 2019 •

edited

arcadiaphy commented Mar 17, 2019 •

edited

anirudh2290 commented Mar 17, 2019

arcadiaphy commented Mar 17, 2019

karan6181 commented Mar 17, 2019

anirudh2290 Mar 17, 2019

anirudh2290 Mar 17, 2019

arcadiaphy Mar 17, 2019 •

edited

anirudh2290 Mar 17, 2019

arcadiaphy Mar 17, 2019

arcadiaphy Mar 17, 2019

arcadiaphy commented Mar 18, 2019 •

edited

YutingZhang commented Mar 18, 2019

anirudh2290 commented Mar 18, 2019

arcadiaphy commented Mar 19, 2019 •

edited

arcadiaphy commented Mar 19, 2019 •

edited

anirudh2290 commented Mar 19, 2019

arcadiaphy commented Mar 19, 2019

anirudh2290 commented Mar 19, 2019

arcadiaphy commented Mar 20, 2019

arcadiaphy commented Mar 20, 2019 •

edited

fix custom operation in fork #14451

fix custom operation in fork #14451

Conversation

arcadiaphy commented Mar 16, 2019 • edited

Description

Checklist

Essentials

Changes

Comments

wkcn left a comment

Choose a reason for hiding this comment

anirudh2290 left a comment

Choose a reason for hiding this comment

arcadiaphy commented Mar 17, 2019 • edited

arcadiaphy commented Mar 17, 2019 • edited

anirudh2290 commented Mar 17, 2019

arcadiaphy commented Mar 17, 2019

karan6181 commented Mar 17, 2019

anirudh2290 Mar 17, 2019

Choose a reason for hiding this comment

anirudh2290 Mar 17, 2019

Choose a reason for hiding this comment

arcadiaphy Mar 17, 2019 • edited

Choose a reason for hiding this comment

anirudh2290 Mar 17, 2019

Choose a reason for hiding this comment

arcadiaphy Mar 17, 2019

Choose a reason for hiding this comment

arcadiaphy Mar 17, 2019

Choose a reason for hiding this comment

arcadiaphy commented Mar 18, 2019 • edited

YutingZhang commented Mar 18, 2019

anirudh2290 commented Mar 18, 2019

arcadiaphy commented Mar 19, 2019 • edited

arcadiaphy commented Mar 19, 2019 • edited

anirudh2290 commented Mar 19, 2019

arcadiaphy commented Mar 19, 2019

anirudh2290 commented Mar 19, 2019

arcadiaphy commented Mar 20, 2019

arcadiaphy commented Mar 20, 2019 • edited

arcadiaphy commented Mar 16, 2019 •

edited

arcadiaphy commented Mar 17, 2019 •

edited

arcadiaphy commented Mar 17, 2019 •

edited

arcadiaphy Mar 17, 2019 •

edited

arcadiaphy commented Mar 18, 2019 •

edited

arcadiaphy commented Mar 19, 2019 •

edited

arcadiaphy commented Mar 19, 2019 •

edited

arcadiaphy commented Mar 20, 2019 •

edited