-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
caffe.NCCL.new_uid() - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 17: invalid start byte #5347
Comments
Thanks for reporting. I haven't tried the training script yet, I will get back to you on this ASAP. What python version are you using? |
Python 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)] on win32 Let me know if there is some work around and/or if you are able to reproduce. Thanks. |
I can reproduce the issue. I think the problem is that boost python maps std::string to a standard string (i.e. not unicode) in python 2 and maps it to a unicode string in python 3. Can you try to see if the issue exists as well with python 2.7? I think it would not be to hard write wrappers around the constructor or NCCL and the new_uid method returning and accepting bp::objects and manually converting from bytes objects to string objects in those wrappers. See this boostorg/python#85 (comment) for some ideas on how to implement the wrappers. @cypof Any comments? Has any @BVLC member tried to use multi-GPU training with python 3? |
How would the wrapper work? Can we do a wrapper in python? |
No. The wrapper needs to be C++. Try to replace your CAFFE_ROOT/python/caffe/_caffe.cpp with https://gist.github.com/willyd/0dbd1fabb06eeedc3289e656be03a022 Let me know if that works for you. |
@willyd Let me know if you are able to reproduce and/or create a different fix. |
You need to enable NCCL by setting USE_NCCL=ON. It should download a windows compatible (though maybe crippled) nccl and build it automatically. Unless you are really want to train with multiple GPUs on windows you can use caffe.exe to or a caffe.Solver from your own python script. |
@willyd |
@willyd Severity Code Description Project File Line Suppression State |
@willyd Now there is 3 linker errors. Any ID? One of the error says NCCL::new_uid is unresolved. Severity Code Description Project File Line Suppression State |
USE_NCCL is an option of the cmake build: https://github.com/BVLC/caffe/blob/windows/scripts/build_win.cmd#L79 |
@willyd |
Just set the option to USE_NCCL=1 and CMake will take care of the rest have a look at: https://github.com/BVLC/caffe/blob/windows/cmake/External/nccl.cmake |
Like this? Getting strange errors. Not sure where they now are coming from... D:\PROGRAMMING\caffe>scripts\build_win.cmd |
Turns out: had to start a fresh/new terminal. |
There is an error about NCCL in the output of build_win.cmd now: Is there something else I need to specify in build_win.cmd I simply dumped the NCCL zip file from Nvidia in |
This is only a warning. It did not find nccl so it should build it. Same thing happens on appveyor. AFAIK Nvidia do not provide binaries for nccl, they don't even support windows officially. |
Great. It worked now. The NCCL error is gone. Now there is a GPU memory error. Is it loading all data into GPU at the same time? I'm running another app right now as well maybe that has an impact. This machine has 8GB GPU VRAM. I0303 17:33:40.142139 40544 layer_factory.cpp:58] Creating layer data |
Should be fixed by #5400. |
Closing as presumed fixed until we hear otherwise. |
@shelhamer I will keep the issue open as a reminder until I merge #5400. |
How can we work around this error? If we set uid to a fixed number it doesn't work and with caffe.NCCL.new_uid() we get the following error:
uid = caffe.NCCL.new_uid()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 17: invalid start byte
Any help highly appreciated!
Steps to reproduce
Run train.py
uid = caffe.NCCL.new_uid()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 17: invalid start byte
Your system configuration
Operating system: Windows
Compiler: MS Visual Studio 2015
CUDA version (if applicable): 8
CUDNN version (if applicable): 5.1
BLAS: ?
Python
The text was updated successfully, but these errors were encountered: