Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Using mxnet on RTX3090 #19520

Open
DestinyMy opened this issue Nov 12, 2020 · 23 comments
Open

Using mxnet on RTX3090 #19520

DestinyMy opened this issue Nov 12, 2020 · 23 comments

Comments

@DestinyMy
Copy link

DestinyMy commented Nov 12, 2020

Hello,

I have some problems on using mxnet on RTX 3090. 30 series GPU only support cuda11, but I can't find the version of cuda11 corresponding to mxnet. I've browsed a lot of blogs or documents, but I still haven't found a solution.

Thanks for your advices.

@github-actions
Copy link

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

@Wallart
Copy link

Wallart commented Nov 12, 2020

Hello,
In order to use MXNet on my RTX 3090, I had to build MXNet 1.8.0.rc1 with CUDA 11 support.
If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1), so you could use MXNet without rebuilding everything.

@ptrendx
Copy link
Member

ptrendx commented Nov 17, 2020

Alternatively you can use the NGC container: https://ngc.nvidia.com/catalog/containers/nvidia:mxnet , version 20.10 supports sm_86 (so RTX3000 series).

@DestinyMy
Copy link
Author

Thank you very much. I'll try it right away.

@dai-ichiro
Copy link

dai-ichiro commented Nov 17, 2020

Check this site.
https://dist.mxnet.io/python/cu110

If your OS is Linux, you can install the nightly version of mxnet.

pip install mxnet-cu110==1.9.0b20201116 -f https://dist.mxnet.io/python/cu110

Hope this helps.

@shilei-nj
Copy link
Contributor

@Wallart
I do the same thing, it worked but the training speed on 3090 is slower than 2080ti. Do you have the same problem?

@Wallart
Copy link

Wallart commented Nov 23, 2020

@shilei-nj With equivalent batch size, the training speed on 3090 is equivalent to my old 1080ti.
What type of model are you training ?
Can you give some context about MXNet version / build options ?

I can try to reproduce, as I'm mostly using the extra amount of VRAM I might have missed performances issues

@shilei-nj
Copy link
Contributor

@Wallart
I have solved the problem, with MXNet 1.8.0.rc2 and cuda 11.1.
You should update cudnn from 8.0.4 to 8.0.5, this is very important.
And modify KNOWN_CUDA_ARCHS in MXNet Makefile, add 86.
Now it's really fast.

@Light--
Copy link

Light-- commented Nov 24, 2020

version 20.10 supports sm_86 (so RTX3000 series).

NOT worked. I tested all the containers.

@szha
Copy link
Member

szha commented Nov 24, 2020

@shilei-nj thanks for pointing it out. Would you help add this change to the v1.x branch?

@Light-- could you file a bug report for the issue you are facing? We will need more details to identify the issue which are requested in the issue template. Thanks!

@shilei-nj
Copy link
Contributor

@szha no thanks, just fix it by your next update please.

@chinakook
Copy link
Contributor

MXNet2.0 built by myself is working fine with RTX3090.

@Light--
Copy link

Light-- commented Nov 26, 2020

If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1),

@Wallart
buddy, your docker image runs like this.....

$ sudo docker run --gpus all -ti a86ad560010f /bin/bash
Starting as 9001:deeplearning
deeplearning home directory ready
deeplearning home directory populated
Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.

what's this?????????????????? how to use?? my other docker images works fine

@Wallart
Copy link

Wallart commented Nov 26, 2020

If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1),

@Wallart
buddy, your docker image runs like this.....

$ sudo docker run --gpus all -ti a86ad560010f /bin/bash
Starting as 9001:deeplearning
deeplearning home directory ready
deeplearning home directory populated
Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.

what's this?????????????????? how to use?? my other docker images works fine

I'm providing SSH daemon for remote debugging purposes. You need to run the image in background with -itd options.
Then you can execute docker exec -it -u USER mxnet-1.8.0.rc1 bash or connect the container to your preferred IDE.
You can also populate your container with a specific user/uid in order to mount volumes with -e HOST_USER=myUser -e HOST_UID=$(id -u)

@Wallart
Copy link

Wallart commented Nov 26, 2020

@Wallart
I have solved the problem, with MXNet 1.8.0.rc2 and cuda 11.1.
You should update cudnn from 8.0.4 to 8.0.5, this is very important.
And modify KNOWN_CUDA_ARCHS in MXNet Makefile, add 86.
Now it's really fast.

How fast it's going compared to your old build options ?
I will give it a try

@shilei-nj
Copy link
Contributor

@Wallart
With old options it is a little bit slower than 2080ti, now it's 50% faster than 2080ti.

@Light--
Copy link

Light-- commented Nov 27, 2020

Then you can execute docker exec -it -u USER mxnet-1.8.0.rc1 bash or connect the container to your preferred IDE.

hey, @Wallart what will you say about this?

i followed your steps, but

$ sudo docker exec -it -u root 37239180ff52 bash
root@37239180ff52:/tmp# python
Python 3.7.7 (default, Jun 26 2020, 05:10:03)
[GCC 7.3.0] :: Intel(R) Corporation on linux
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import mxnet
Illegal instruction (core dumped)

@Light--
Copy link

Light-- commented Nov 27, 2020

pip install mxnet-cu110==1.9.0b20201116 -f https://dist.mxnet.io/python/cu110

hey, @dai-ichiro, what will you say about this:

$ pip install mxnet_cu110-1.9.0b20201116-py2.py3-none-manylinux2014_x86_64.whl
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: mxnet-cu110==1.9.0b20201116 from file:///home/user1/mxcuda11/mxnet_cu110-1.9.0b20201116-py2.py3-none-manylinux2014_x86_64.whl in /home/user1/.local/lib/python3.6/site-packages (1.9.0b20201116)
Requirement already satisfied: requests<3,>=2.20.0 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (2.25.0)
Requirement already satisfied: graphviz<0.9.0,>=0.8.1 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (0.8.4)
Requirement already satisfied: numpy<2.0.0,>1.16.0 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (1.19.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (2.6)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (1.21.1)
(mxgpu) user1@pc228:~/mxcuda11$ python
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet

Illegal instruction (core dumped)

my environment:

Ubuntu 20.04.1 LTS
Linux pc 5.4.0-53-generic #59-Ubuntu SMP Wed Oct 21 09:38:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    On   | 00000000:02:00.0 Off |                  N/A |
| 30%   30C    P8    28W / 350W |      1MiB / 24265MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0


$ pip list
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Package           Version
----------------- --------------
asn1crypto        0.22.0
certifi           2020.6.20
cffi              1.10.0
chardet           3.0.4
cryptography      1.8.1
dataclasses       0.7
future            0.18.2
graphviz          0.8.4
idna              2.6
mxnet-cu110       1.9.0b20201116
numpy             1.19.4
packaging         16.8
Pillow            8.0.1
pip               20.2.2
pycparser         2.18
pyOpenSSL         17.0.0
pyparsing         2.2.0
PySocks           1.6.6
requests          2.25.0
setuptools        36.4.0
six               1.10.0
torch             1.7.0+cu110
torchvision       0.8.1+cu110
typing-extensions 3.7.4.3
urllib3           1.21.1
wheel             0.29.0

@chinakook
Copy link
Contributor

I think it's time to get cuda 11.1 and sm_86 into official mxnet support list as RTX 3090 series is very popular.

@Wallart
Copy link

Wallart commented Nov 27, 2020

@Light-- What type of CPU are you using ?

@Light--
Copy link

Light-- commented Dec 3, 2020

What type of CPU are you using ?

@Wallart
Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz

@TristonC
Copy link
Contributor

CUDA 11.1 and sm_86 are supported in MXNet 1.8+. @DestinyMy Has your problem be solved?

@TNTran92
Copy link

@TristonC , Do you have Windows version of MXNet 1.8+ I only saw linux on pypi
https://pypi.org/project/mxnet-cu112/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants