Performence with multi thead inference is slow #13075

idealboy · 2018-11-01T06:33:25Z

Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.

For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io

Description

(Brief description of the problem in no more than 2 sentences.)
when I do inference with multi thread(each thread will create one predictor handle with the same libmxnet.so), I found it is very slow.

I use MXPredReshape in some code to adapt for different input shape

Environment info (Required)

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

----------Python Info----------
('Version :', '2.7.5')
('Compiler :', 'GCC 4.8.2 20140120 (Red Hat 4.8.2-16)')
('Build :', ('default', 'Jun 17 2014 18:11:42'))
('Arch :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version :', '9.0.1')
('Directory :', '/usr/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
('Platform :', 'Linux-4.1.5-1.el7.centos.x86_64-x86_64-with-centos-7.0.1406-Core')
('system :', 'Linux')
('node :', 'face00')
('release :', '4.1.5-1.el7.centos.x86_64')
('version :', '#1 SMP Tue Aug 11 13:53:50 EDT 2015')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 16
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 22
Model name:
Stepping: 3
CPU MHz: 2494.224
BogoMIPS: 4988.44
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-15

Package used (Python/R/Scala/Julia):
(I'm using Python)

For Scala user, please provide:

Java version: (java -version)
Maven version: (mvn -version)
Scala runtime if applicable: (scala -version)

For R user, please provide R sessionInfo():

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):
gcc4.8.5

MXNet commit hash:
(Paste the output of git rev-parse HEAD here.)
0a286a0

mxnet1.3.x

Build config:
(Paste the content of config.mk, or the build command.)

40 export CC = gcc
41 export CXX = g++
42 export NVCC = nvcc
44 # whether compile with options for MXNet developer
45 DEV = 0
46
47 # whether compile with debug
48 DEBUG = 0
49
50 # whether to turn on segfault signal handler to log the stack trace
51 USE_SIGNAL_HANDLER =
52
53 # the additional link flags you want to add
54 ADD_LDFLAGS = -L/usr/local/lib
55
56 # the additional compile flags you want to add
57 ADD_CFLAGS = -I/usr/local/include
64 USE_CUDA = 0
65
66 # add the path to CUDA library to link and compile flag
67 # if you have already add them to environment variable, leave it as NONE
68 USE_CUDA_PATH = /usr/local/cuda-9.1
69 # USE_CUDA_PATH = /usr/local/cuda
70
71 # whether to enable CUDA runtime compilation
72 ENABLE_CUDA_RTC = 1
93 USE_OPENMP = 1
94
95 # whether use MKL-DNN library
96 USE_MKLDNN = 1
97
98 # whether use NNPACK library
99 USE_NNPACK = 0
100
101 # choose the version of blas you want to use
102 # can be: mkl, blas, atlas, openblas
103 # in default use atlas for linux while apple for osx
104 UNAME_S := $(shell uname -s)
105 ifeq ($(UNAME_S), Darwin)
106 USE_BLAS = apple
107 else
108 USE_BLAS = openblas
109 endif
110
111 # whether use lapack during compilation
112 # only effective when compiled with blas versions openblas/apple/atlas/mkl
113 USE_LAPACK = 0
114
115 # path to lapack library in case of a non-standard installation
116 USE_LAPACK_PATH =
117
118 # add path to intel library, you may need it for MKL, if you did not add the path
119 # to environment variable
120 USE_INTEL_PATH = NONE
121
122 # If use MKL only for BLAS, choose static link automatically to allow python wrapper
123 ifeq ($(USE_BLAS), mkl)
124 USE_STATIC_MKL = 1
125 else
126 USE_STATIC_MKL = NONE
157 USE_HDFS = 0
158
159 # path to libjvm.so. required if USE_HDFS=1
160 LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server
161
162 # whether or not allow to read and write AWS S3 directly. If yes, then
163 # libcurl4-openssl-dev is required, it can be installed on Ubuntu by
164 # sudo apt-get install -y libcurl4-openssl-dev
165 USE_S3 = 0
166
167 #----------------------------
168 # performance settings
169 #----------------------------
170 # Use operator tuning
171 USE_OPERATOR_TUNING = 1
172
173 # Use gperftools if found
174 USE_GPERFTOOLS = 0
175
176 # Use JEMalloc if found, and not using gperftools
177 USE_JEMALLOC = 1
178
179 #----------------------------
180 # additional operators
181 #----------------------------
182
183 # path to folders containing projects specific operators that you don't want to put in src/operators
184 EXTRA_OPERATORS =
185
186 #----------------------------
187 # other features
188 #----------------------------
189
190 # Create C++ interface package
191 USE_CPP_PACKAGE = 1

Error Message:

(Paste the complete error message, including stack trace.)

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

The text was updated successfully, but these errors were encountered:

pengzhao-intel · 2018-11-01T06:50:16Z

Thanks to reporting the issue.
Do you have reproducible cases?

Try：USE_MKLDNN = 1

idealboy · 2018-11-01T07:44:07Z

Thanks to reporting the issue.
Do you have reproducible cases?

Try：USE_MKLDNN = 1

sorry，indeedly， I use USE_MKLDNN = 1 to compile, and I mislabeled this option when commit the problem.

frankfliu · 2018-11-01T14:23:33Z

@mxnet-label-bot [Performance]

idealboy · 2018-11-08T05:47:36Z

will it have a solution? thank you sir!

pengzhao-intel · 2018-11-08T05:59:25Z

@idealboy it will be better to provide a case for me.
A case you can refer as below:
awslabs/sockeye#361

sandeep-krishnamurthy · 2019-04-01T23:25:22Z

@idealboy - Do you still face this issue?
Comment here from @leleamol will be useful - #14408 (comment)

marcoabreu added the Performance label Nov 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performence with multi thead inference is slow #13075

Performence with multi thead inference is slow #13075

idealboy commented Nov 1, 2018 •

edited

Loading

pengzhao-intel commented Nov 1, 2018 •

edited

Loading

idealboy commented Nov 1, 2018

frankfliu commented Nov 1, 2018

idealboy commented Nov 8, 2018

pengzhao-intel commented Nov 8, 2018

sandeep-krishnamurthy commented Apr 1, 2019

Performence with multi thead inference is slow #13075

Performence with multi thead inference is slow #13075

Comments

idealboy commented Nov 1, 2018 • edited Loading

Description

Environment info (Required)

Build info (Required if built from source)

Error Message:

Minimum reproducible example

Steps to reproduce

What have you tried to solve it?

pengzhao-intel commented Nov 1, 2018 • edited Loading

idealboy commented Nov 1, 2018

frankfliu commented Nov 1, 2018

idealboy commented Nov 8, 2018

pengzhao-intel commented Nov 8, 2018

sandeep-krishnamurthy commented Apr 1, 2019

idealboy commented Nov 1, 2018 •

edited

Loading

pengzhao-intel commented Nov 1, 2018 •

edited

Loading