Skip to content

grpc multi threading with TensorRT #1819

@zelabean

Description

@zelabean

Description

Environment

TensorRT Version: 8.2.3.0
NVIDIA GPU: gtx 1080ti
NVIDIA Driver Version: 470.103.01
CUDA Version: 11.4
CUDNN Version: 8.2
Operating System: Linux 18.06
Python Version (if applicable): 3.8.0
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.10
Baremetal or Container (if so, version):

grpc server code

server = grpc.server(
    futures.ThreadPoolExecutor(),
    options=[
        ("grpc.max_send_message_length", -1),
        ("grpc.max_receive_message_length", -1),
        ("grpc.so_reuseport", 1),
        ("grpc.use_local_subchannel_pool", 1),
    ],
)

grpc stub init

grpcObject(encoder=trt_model, decoder=decoder)

trt_model init code

def __init__(self):
      cuda_ctx = cuda.Device(0).make_context()
      self.cuda_ctx = cuda_ctx
      if self.cuda_ctx:
          self.cuda_ctx.push() 
      ...

Hello.
I'm using TensorRT via grpc.
However, after setting max_worker in the multi-threading function of grpc, the following error occurs when requests come in from multiple clients.
In case of max_worker=1, no error occurs. Can you help?

infer method

def infer(self, wav_path):

        input_signal = preprocess_stt(wav_path)

        if self.cuda_ctx:
            self.cuda_ctx.push()
        self.context.set_binding_shape(0, input_signal.shape)

        assert self.context.all_binding_shapes_specified
        h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)

        h_input_signal = cuda.register_host_memory(np.ascontiguousarray(to_numpy(input_signal)))
        cuda.memcpy_htod_async(self.d_input, h_input_signal, self.stream)
        self.context.execute_async(bindings=[int(self.d_input), int(self.d_output)], stream_handle=self.stream.handle)
        cuda.memcpy_dtoh_async(h_output, self.d_output, self.stream)
        self.stream.synchronize()

        if self.cuda_ctx:
            self.cuda_ctx.pop()
        return h_output

error

pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
E0228 17:02:30.063214 140249774667520 _server.py:471] Exception iterating responses: cuMemHostAlloc failed: an illegal memory access was encountered
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/grpc/_server.py", line 461, in _take_response_from_response_iterator
    return next(response_iterator), True
  File "/data/grpc/stt_grpc/grpc_class/dummy_grpc_core.py", line 116, in getStream
    stt_result = trt_inference(self.trt_model, 'aaa.wav', self.decoder)
  File "/data/grpc/stt_grpc/stt_package/stt_func.py", line 525, in trt_inference
    model_output = actor.infer('aaa.wav')
  File "/data/grpc/stt_grpc/grpc_class/tensorrt_stt.py", line 153, in infer
    h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions