Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument) #2052

Closed
CoinCheung opened this issue Jun 13, 2022 · 10 comments
Assignees
Labels
API: Python triaged Issue has been triaged by maintainers

Comments

@CoinCheung
Copy link

Description

Error message like this:

[06/13/2022-14:32:23] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)
[06/13/2022-14:32:23] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::29] Error Code 1: Cuda Driver (context is destroyed)
[06/13/2022-14:32:23] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::29] Error Code 1: Cuda Driver (context is destroyed)

Sometimes there are core dump, but sometimes there isn't.

Environment

TensorRT Version: 8.2.5.1
NVIDIA GPU: V100
NVIDIA Driver Version: 450.80.02
CUDA Version: 11.3
CUDNN Version: 8.2.0
Operating System: ubuntu18.04(docker image)
Python Version (if applicable): python3.8
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.9
Baremetal or Container (if so, version): 11.3.1-cudnn8-devel-ubuntu18.04

Relevant Files

Code like this:

import os
import logging
import argparse

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit


parser = argparse.ArgumentParser()
parser.add_argument('--onnx')
parser.add_argument('--fp16', action='store_true')
parser.add_argument('--savepth', default='./model.trt')
args = parser.parse_args()


ctx = pycuda.autoinit.context
trt.init_libnvinfer_plugins(None, "")
TRT_LOGGER = trt.Logger()



def build_engine_from_onnx(onnx_file_path):
    engine = None
    EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as config, trt.OnnxParser(network, TRT_LOGGER) as parser, trt.Runtime(TRT_LOGGER) as runtime:
        config.max_workspace_size = 1 << 30 # 1G
        if args.fp16:
            config.set_flag(trt.BuilderFlag.FP16)
        builder.max_batch_size = 128
        # Parse model file
        assert os.path.exists(onnx_file_path), f'cannot find {onnx_file_path}'

        print(f'Loading ONNX file from path {onnx_file_path}...')
        with open(onnx_file_path, 'rb') as fr:
            if not parser.parse(fr.read()):
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                assert False

        print("Start to build Engine")
        plan = builder.build_serialized_network(network, config)
        engine = runtime.deserialize_cuda_engine(plan)
    return engine


def serialize_engine_to_file(engine, savepth):
    plan = engine.serialize()
    with open(savepth, "wb") as fw:
        fw.write(plan)



if __name__ == '__main__':
    engine = build_engine_from_onnx(args.onnx)
    serialize_engine_to_file(engine, args.savepth)

onnx file can be downloaded here: https://github.com/CoinCheung/eewee/releases/download/0.0.0/model.onnx

Steps To Reproduce

python run.py --onnx ./model.onnx --savepth ./model.trt

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered
@zerollzeng
Copy link
Collaborator

Seems the error might be due to the engine's lifetime and the attached cuda context. Could you try this script?

import os
import logging
import argparse

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit


parser = argparse.ArgumentParser()
parser.add_argument('--onnx')
parser.add_argument('--fp16', action='store_true')
parser.add_argument('--savepth', default='./model.trt')
args = parser.parse_args()


ctx = pycuda.autoinit.context
trt.init_libnvinfer_plugins(None, "")
TRT_LOGGER = trt.Logger()



def build_engine_from_onnx(onnx_file_path):
    engine = None
    EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as config, trt.OnnxParser(network, TRT_LOGGER) as parser, trt.Runtime(TRT_LOGGER) as runtime:
        config.max_workspace_size = 1 << 30 # 1G
        if args.fp16:
            config.set_flag(trt.BuilderFlag.FP16)
        builder.max_batch_size = 128
        # Parse model file
        assert os.path.exists(onnx_file_path), f'cannot find {onnx_file_path}'

        print(f'Loading ONNX file from path {onnx_file_path}...')
        with open(onnx_file_path, 'rb') as fr:
            if not parser.parse(fr.read()):
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                assert False

        print("Start to build Engine")
        plan = builder.build_serialized_network(network, config)
        engine = runtime.deserialize_cuda_engine(plan)
        plan = engine.serialize()
        savepth = './model.trt'
        with open(savepth, "wb") as fw:
            fw.write(plan)

if __name__ == '__main__':
    engine = build_engine_from_onnx(args.onnx)

@zerollzeng zerollzeng self-assigned this Jun 13, 2022
@zerollzeng zerollzeng added API: Python triaged Issue has been triaged by maintainers labels Jun 13, 2022
@CoinCheung
Copy link
Author

The problem still exists:
图片

And my code is like this:

import os
import logging
import argparse

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit


parser = argparse.ArgumentParser()
parser.add_argument('--onnx')
parser.add_argument('--fp16', action='store_true')
parser.add_argument('--savepth', default='./model.trt')
args = parser.parse_args()


ctx = pycuda.autoinit.context
trt.init_libnvinfer_plugins(None, "")
TRT_LOGGER = trt.Logger()



def build_engine_from_onnx(onnx_file_path):
    engine = None
    EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as config, trt.OnnxParser(network, TRT_LOGGER) as parser, trt.Runtime(TRT_LOGGER) as runtime:
        config.max_workspace_size = 1 << 30 # 1G
        if args.fp16:
            config.set_flag(trt.BuilderFlag.FP16)
        builder.max_batch_size = 128
        # Parse model file
        assert os.path.exists(onnx_file_path), f'cannot find {onnx_file_path}'

        print(f'Loading ONNX file from path {onnx_file_path}...')
        with open(onnx_file_path, 'rb') as fr:
            if not parser.parse(fr.read()):
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                assert False

        print("Start to build Engine")
        plan = builder.build_serialized_network(network, config)
        engine = runtime.deserialize_cuda_engine(plan)
        with open('./model.trt', "wb") as fw:
            fw.write(plan)
    return engine


#  def serialize_engine_to_file(engine, savepth):
#      plan = engine.serialize()



if __name__ == '__main__':
    engine = build_engine_from_onnx(args.onnx)
    #  serialize_engine_to_file(engine, args.savepth)

Sometimes there would be core dump, with only one Error line, and sometimes there are thre error lines without core dump.

I noticed that with cpp examples, a astream is defined during the compile phase, but I did not see associated code in the python examples. Do I need to find some way to synchronize the stream with python api?

@zerollzeng
Copy link
Collaborator

Could you try to delete the line

    return engine

@zerollzeng
Copy link
Collaborator

also, my script doesn't work in your environment? I can run it without any error.

@CoinCheung
Copy link
Author

Removing the return line solved the problem. What is the correct way to let a function return an engine? It seems that we can do this in cpp code, but we cannot do it with python code.

@nvpohanh
Copy link
Collaborator

@pranavm-nvidia Any insight?

@pranavm-nvidia
Copy link
Collaborator

The problem is that the engine outlives the CUDA context created by PyCUDA. One solution would be to scope the engine's lifetime. @CoinCheung you can modify your original script like so:

def main():
    engine = build_engine_from_onnx(args.onnx)
    serialize_engine_to_file(engine, args.savepth)

if __name__ == '__main__':
    main()

@CoinCheung
Copy link
Author

Thanks for telling me this. I works now.

@codewithAshray
Copy link

I run the script as @zerollzeng said but I'm getting the following error

[TRT] [E] 4: [network.cpp::validate::3039] Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
[builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
Traceback (most recent call last):
File "convert_onnx2trt.py", line 49, in
engine = build_engine_from_onnx(args.onnx)
File "convert_onnx2trt.py", line 42, in build_engine_from_onnx
engine = runtime.deserialize_cuda_engine(plan)
TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:
1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine

Invoked with: <tensorrt.tensorrt.Runtime object at 0x7fc72b9583b0>, None

Do you have any suggestions how to resolve this

@duyanfang123
Copy link

The problem is that the engine outlives the CUDA context created by PyCUDA. One solution would be to scope the engine's lifetime. @CoinCheung you can modify your original script like so:

def main():
    engine = build_engine_from_onnx(args.onnx)
    serialize_engine_to_file(engine, args.savepth)

if __name__ == '__main__':
    main()

This is a good result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API: Python triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants