Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added PyForwardFromTo and PyBackwardFromTo to Net, for releasing GIL … #4360

Closed

Conversation

alessandroferrari
Copy link

…in pycaffe. Added ScopedGILRelease for easy GIL release. Modified _caffe.cpp in pycaffe accordingly.

…in pycaffe. Added ScopedGILRelease for easy GIL release. Modified _caffe.cpp in pycaffe accordingly.
@seanbell
Copy link

seanbell commented Jun 26, 2016

The tests failed due to style issues -- please run make lint.

@alessandroferrari
Copy link
Author

Multithreading in python is doomed by GIL. Many users (as I do) are often using caffe for performing nets prediction by pycaffe, and often they are doing so from multithreaded applications. Pycaffe interface is not releasing GIL, which means that if caffe is doing predictions, other python threads are stuck. Since predictions are time consuming, this can lead to terrible slow down of the application. By means of this pull-request, GIL is released for forward pass and backward pass of the network. Within the pass, it is required temporarily only for coping with python layers, that require to hold GIL.
Thanks to this, within the same process, from the python interface is possible to perform predictions of different networks in parallel on multiple threads, still mantaining complete thread safety. Before everything was locked by GIL (false sharing).

*/
Dtype PyForwardFromTo(int start, int end) {
// Release GIL
m_thread_state = PyEval_SaveThread();
Copy link
Contributor

@ajtulloch ajtulloch Jun 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an RAII wrapper around Save/Restore thread for exception safety.

This definitely isn't the right place to put this logic though, FWIW.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAII wrapper: Thank you for the suggestion. I will take a look to fix it.

Not right place: I agree. However, where would you suggest to place it? (consider that you have to reacquire GIL in case you forward/backward a python layer within the net). Even though this solution insert "Pythonish" code within the C++ logic, not releasing GIL for the pycaffe is a painful bug for python interface. If there is a cleaner alternative I would love to discuss about it.

@ajtulloch
Copy link
Contributor

Why not just simply:

a) Release the GIL in PyCaffe
b) Reaquire the GIL in PythonLayer before running the function in that layer?

@alessandroferrari
Copy link
Author

It would be cleaner and it was my first thought. However, I have been stopped by two reasons:

  • PyEval_SaveThread return the thread_state of the current thread, you need somehow to pass from PyCaffe to the PythonLayer this state, without relying on Net (otherwise it would be the same);
  • I suspect that in python layer, python-boost does some magic that require GIL for passing the python objects arguments to the C++ code, since I get segfault if I acquire GIL in PythonLayer before calling the inner forward. GIL needs to be acquired before calling forward method for Python layers.
    Thus, even if releasing GIL within the Net is not clean, since pythonish stuffs are inserted in the C++ logic, looked like the only viable solution. But if you have any ideas on how to circumvent those problems, I will be happy to give them a try.

@crowsonkb
Copy link
Contributor

I use the multiprocessing package instead of threading both to work around the GIL and to parallelize PyCaffe over multiple GPUs. My main Python interpreter spawns several other interpreters each of which has one instance of PyCaffe.

@alessandroferrari
Copy link
Author

Of course multiprocessing does not have synchronization problems......
Still, multiprocessing come with a interprocess communication cost. There are many applications in production environment where multithreading may be beneficial, such as model ensembling and performance critical apps. Numpy, OpenCv, scikit all release GIL....

@@ -593,19 +597,19 @@ $(TEST_ALL_BIN): $(TEST_MAIN_SRC) $(TEST_OBJS) $(GTEST_OBJ) \
| $(DYNAMIC_NAME) $(TEST_BIN_DIR)
@ echo CXX/LD -o $@ $<
$(Q)$(CXX) $(TEST_MAIN_SRC) $(TEST_OBJS) $(GTEST_OBJ) \
-o $@ $(LINKFLAGS) $(LDFLAGS) -l$(LIBRARY_NAME) -Wl,-rpath,$(ORIGIN)/../lib
-o $@ $(LINKFLAGS) -l$(LIBRARY_NAME) $(LDFLAGS) -Wl,-rpath,$(ORIGIN)/../lib

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary?

@seanbell
Copy link

seanbell commented Jul 30, 2016

Thanks @alessandroferrari for the great PR, and I agree that the GIL should be released during forward/backward. This enables pre-fetching with multi-threading (not multi-processing).

Can you please squash this into a single commit?

@shelhamer @longjon I think this is an important feature when using Python data layers. Do you have any thoughts?

@@ -308,6 +333,9 @@ class Net {
/// The root net that actually holds the shared layers in data parallelism
const Net* const root_net_;
DISABLE_COPY_AND_ASSIGN(Net);

// For releasing/reacquiring GIL with pycaffe
shared_ptr<ScopedGILRelease> scoped_gil_release;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be scoped_gil_release_ to be consistent with other naming.

@cypof
Copy link
Member

cypof commented Jan 17, 2017

Each net can run in it's own fork now, there is an example in /python/train.py. #4563

@cypof cypof closed this Jan 17, 2017
@willyd
Copy link
Contributor

willyd commented Mar 30, 2017

Thanks @alessandroferrari for the great PR, and I agree that the GIL should be released during forward/backward. This enables pre-fetching with multi-threading (not multi-processing).

I think that even with new multiprocessing parallel training releasing the GIL is still relevant for the above use case. I can a get better throughput of data using the threading module and releasing the GIL in Net::ForwardFromTo and Net::BackwardFromTo that by using the multiprocessing module for single GPU training on windows. Of course we have to reacquire the GIL before calling any python layer functions like setup, reshape, etc.

@willyd
Copy link
Contributor

willyd commented Mar 30, 2017

PyEval_SaveThread return the thread_state of the current thread, you need somehow to pass from PyCaffe to the PythonLayer this state, without relying on Net (otherwise it would be the same);
I suspect that in python layer, python-boost does some magic that require GIL for passing the python objects arguments to the C++ code, since I get segfault if I acquire GIL in PythonLayer before calling the inner forward. GIL needs to be acquired before calling forward method for Python layers.

Thus, even if releasing GIL within the Net is not clean, since pythonish stuffs are inserted in the C++ logic, looked like the only viable solution. But if you have any ideas on how to circumvent those problems, I will be happy to give them a try.

@alessandroferrari I am not sure I completely understand what you are saying here. Have you tried something like:

Using:

class ReleasePyGIL {
 public:
  ReleasePyGIL() { state = PyEval_SaveThread(); }
  ~ReleasePyGIL() { PyEval_RestoreThread(state); }
 private:
  PyThreadState* state;

inside ForwardFromTo and BackwardFromTo and

class AcquirePyGIL {
 public:
  AcquirePyGIL() { state = PyGILState_Ensure(); }
  ~AcquirePyGIL() { PyGILState_Release(state); }
 private:
  PyGILState_STATE state;
};

inside PythonLayer::LayerSetup, PythonLayer::Reshape, etc.

This works flawlessly for me on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants