grpc multi threading with TensorRT

## Description




## Environment

**TensorRT Version**: 8.2.3.0
**NVIDIA GPU**: gtx 1080ti
**NVIDIA Driver Version**: 470.103.01
**CUDA Version**: 11.4
**CUDNN Version**: 8.2
**Operating System**: Linux 18.06
**Python Version (if applicable)**: 3.8.0
**Tensorflow Version (if applicable)**: 
**PyTorch Version (if applicable)**: 1.10
**Baremetal or Container (if so, version)**: 


## grpc server code

    server = grpc.server(
        futures.ThreadPoolExecutor(),
        options=[
            ("grpc.max_send_message_length", -1),
            ("grpc.max_receive_message_length", -1),
            ("grpc.so_reuseport", 1),
            ("grpc.use_local_subchannel_pool", 1),
        ],
    )

## grpc stub init

    grpcObject(encoder=trt_model, decoder=decoder)

## trt_model init code

    def __init__(self):
          cuda_ctx = cuda.Device(0).make_context()
          self.cuda_ctx = cuda_ctx
          if self.cuda_ctx:
              self.cuda_ctx.push() 
          ...


Hello.
I'm using TensorRT via grpc.
However, after setting max_worker in the multi-threading function of grpc, the following error occurs when requests come in from multiple clients.
In case of max_worker=1, no error occurs. Can you help?

## infer method
    def infer(self, wav_path):
    
            input_signal = preprocess_stt(wav_path)
    
            if self.cuda_ctx:
                self.cuda_ctx.push()
            self.context.set_binding_shape(0, input_signal.shape)
    
            assert self.context.all_binding_shapes_specified
            h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
    
            h_input_signal = cuda.register_host_memory(np.ascontiguousarray(to_numpy(input_signal)))
            cuda.memcpy_htod_async(self.d_input, h_input_signal, self.stream)
            self.context.execute_async(bindings=[int(self.d_input), int(self.d_output)], stream_handle=self.stream.handle)
            cuda.memcpy_dtoh_async(h_output, self.d_output, self.stream)
            self.stream.synchronize()
    
            if self.cuda_ctx:
                self.cuda_ctx.pop()
            return h_output

## error

    pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
    E0228 17:02:30.063214 140249774667520 _server.py:471] Exception iterating responses: cuMemHostAlloc failed: an illegal memory access was encountered
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/grpc/_server.py", line 461, in _take_response_from_response_iterator
        return next(response_iterator), True
      File "/data/grpc/stt_grpc/grpc_class/dummy_grpc_core.py", line 116, in getStream
        stt_result = trt_inference(self.trt_model, 'aaa.wav', self.decoder)
      File "/data/grpc/stt_grpc/stt_package/stt_func.py", line 525, in trt_inference
        model_output = actor.infer('aaa.wav')
      File "/data/grpc/stt_grpc/grpc_class/tensorrt_stt.py", line 153, in infer
        h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
    pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

grpc multi threading with TensorRT #1819

Description

Environment

grpc server code

grpc stub init

trt_model init code

infer method

error

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

grpc multi threading with TensorRT #1819

Description

Description

Environment

grpc server code

grpc stub init

trt_model init code

infer method

error

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions