Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT] ERROR: Network must have at least one output #319

Closed
santhoshnumberone opened this issue Jan 7, 2020 · 12 comments
Closed

[TensorRT] ERROR: Network must have at least one output #319

santhoshnumberone opened this issue Jan 7, 2020 · 12 comments
Labels
ONNX Samples triaged Issue has been triaged by maintainers

Comments

@santhoshnumberone
Copy link

Description

Trying to convert yolov3 to tensorrt using this yolov3

I am able to convert yolov3_to_onnx.py

I get this output described here without any errors or warnings

When I try onnx_to_tensorrt.py

I get this error

Loading ONNX file from path yolov3-416.onnx...
Beginning ONNX file parsing
Completed parsing of ONNX file
Building an engine from file yolov3-416.onnx; this may take a while...
[TensorRT] ERROR: Network must have at least one output
[TensorRT] ERROR: Network validation failed.
Completed creating Engine
Traceback (most recent call last):
File "onnx_to_tensorrt.py", line 102, in
main()
File "onnx_to_tensorrt.py", line 98, in main
_ = build_engine(onnx_file_path, engine_file_path)
File "onnx_to_tensorrt.py", line 84, in build_engine
f.write(engine.serialize())
AttributeError: 'NoneType' object has no attribute 'serialize'

According to his [TensorRT] ERROR: Network must have at least one output this happens to the tensorRT version

The tensorRT is unable to build the engine.

Do we have a work around?
What seems to be the issue here?

Environment

Linux distro: Ubuntu 18.04.3 LTS bionic
GPU type: GTX 1050Ti
Nvidia driver version: 440.33.01
CUDA version: 10.2 (according to nvidia-smi) and 9.1.85(according to nvcc --version)
CUDNN version: 7.6.5 according to ($ CUDNN_H_PATH=$(whereis cudnn.h) and cat ${CUDNN_H_PATH} | grep CUDNN_MAJOR -A 2)
Python version: Python 3.6.9
TensorRT version: 7.0.0 (according to dpkg -l | grep nvinfer)

@gcp
Copy link

gcp commented Jan 7, 2020

Increase the verbosity of the logging. If parsing fails for any reason and you don't properly check for errors, then you will get this error (which is IMHO very misleading and could be considered a bug). There will be only an empty network, and that doesn't have outputs :)

If you check for errors you will likely find the real reason.

@santhoshnumberone
Copy link
Author

santhoshnumberone commented Jan 7, 2020

@gcp It would be helpful if you could be more specific please?

@gcp
Copy link

gcp commented Jan 7, 2020

  1. Change this https://github.com/jkjung-avt/tensorrt_demos/blob/master/yolov3_onnx/onnx_to_tensorrt.py#L59 to get VERBOSE logging.
  2. parser.parse(model.read()) does not check the return value (it returns a boolean as per docs), so if parsing fails you won't get an error. It will fail silently until it tries to find outputs in a non-existent network.
  3. https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/parsers/Onnx/pyOnnx.html#tensorrt.OnnxParser.get_error

@gcp
Copy link

gcp commented Jan 7, 2020

If parsing actually succeeded and there's no error, check with https://lutzroeder.github.io/netron/ if your network has the outputs properly marked as outputs.

@santhoshnumberone
Copy link
Author

santhoshnumberone commented Jan 8, 2020

1. Change this https://github.com/jkjung-avt/tensorrt_demos/blob/master/yolov3_onnx/onnx_to_tensorrt.py#L59 to get VERBOSE logging.

2. parser.parse(model.read()) does not check the return value (it returns a boolean as per docs), so if parsing fails you won't get an error. It will fail silently until it tries to find outputs in a non-existent network.

3. https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/parsers/Onnx/pyOnnx.html#tensorrt.OnnxParser.get_error

Found this Tensorrt7.0 OnnxParser Error

TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
# TRT_LOGGER = trt.Logger()
network_flags = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)


def build_engine(onnx_file_path, engine_file_path=''):
    """Takes an ONNX file and creates a TensorRT engine."""
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(network_flags) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:

        builder.max_workspace_size = 1 << 28
        builder.max_batch_size = 1
        builder.fp16_mode = True
        #builder.strict_type_constraints = True


        # Parse model file
        if not os.path.exists(onnx_file_path):
            print('ONNX file {} not found, please run yolov3_to_onnx.py first to generate it.'.format(onnx_file_path))
            exit(0)
        print('Loading ONNX file from path {}...'.format(onnx_file_path))
        with open(onnx_file_path, 'rb') as model:
            if not parser.parse(model.read()):
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
            print('Beginning ONNX file parsing')
            parser.parse(model.read())
        print('Completed parsing of ONNX file')
        print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
        engine = builder.build_cuda_engine(network)
        print('Completed creating Engine')
        with open(engine_file_path, 'wb') as f:
            f.write(engine.serialize())
        return engine

I get his error

[TensorRT] VERBOSE: Formats and tactics selection completed in 260.534 seconds.
[TensorRT] VERBOSE: After reformat layers: 178 layers
[TensorRT] VERBOSE: Block size 1417674752
[TensorRT] VERBOSE: Block size 1417674752
[TensorRT] VERBOSE: Block size 708837376
[TensorRT] VERBOSE: Block size 354418688
[TensorRT] VERBOSE: Block size 268435456
[TensorRT] VERBOSE: Block size 44302336
[TensorRT] VERBOSE: Block size 44302336
[TensorRT] VERBOSE: Total Activation Memory: 4255645696
[TensorRT] INFO: Detected 1 inputs and 3 output network tensors.
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (GPU memory allocation failed during allocation of workspace. Try decreasing batch size.)
Completed creating Engine
Traceback (most recent call last):
File "onnx_to_tensorrt.py", line 114, in
main()
File "onnx_to_tensorrt.py", line 110, in main
_ = build_engine(onnx_file_path, engine_file_path)
File "onnx_to_tensorrt.py", line 96, in build_engine
f.write(engine.serialize())
AttributeError: 'NoneType' object has no attribute 'serialize'

Says above Detected 1 inputs and 3 output network tensors

but I don't seem to find even a single output node here.
yolov3-416,onnx network image using https://lutzroeder.github.io/netron/
Screenshot from 2020-01-08 12-33-13

My batch size is already 1, as seen from the above code.

How do I proceed?

@gcp
Copy link

gcp commented Jan 8, 2020

Total Activation Memory: 4255645696

That's over 4G just to hold intermediate outputs. It looks like this network is simply too large for your GPU.

but I don't seem to find even a single output node here.

If you click on 106_convolutional it will likely be marked as an output. In any case, that wasn't the problem, as you found out - it does see the output when parsing the ONNX.

@santhoshnumberone
Copy link
Author

santhoshnumberone commented Jan 8, 2020

Total Activation Memory: 4255645696

That's over 4G just to hold intermediate outputs. It looks like this network is simply too large for your GPU.

Just to be clear on what I understand from you comment GTX 1050Ti is not suitable to even converting this YoloV3 model to TensorRT?

I was able to convert YoloV3 to tensorRT on Jetson Nano. There must be a workaround.

but I don't seem to find even a single output node here.

If you click on 106_convolutional it will likely be marked as an output. In any case, that wasn't the problem, as you found out - it does see the output when parsing the ONNX.

Yeah I got three output
Screenshot from 2020-01-08 12-45-49

This seems to be a persisting issue of GTX1050Ti with YOLOv3 Here's a similar post for same yoloV3 of onnx[TensorRT] ERROR: Network must have at least one output

@rmccorm4
Copy link
Collaborator

Hi @santhoshnumberone,

FWIW I just ran this sample on a V100 and saw peak memory usage of about ~1.5GB, which is less than 1050Ti's cap of 4GB. However, I believe it is dependent on which kernels are chosen during engine building, which is dependent on the GPU / Compute Capability.

The workspace size was the same as the script you linked (1 << 28 ~= 256MB). I don't have a 1050Ti to test on, but I'll see if I can look into the root cause a bit more.

You might be able to try lowering the workspace size and see if that helps.

@cloudrivers
Copy link

@santhoshnumberone hi, did you solve this problem? I meet the same issue.

@goldwater668
Copy link

@santhoshnumberone
yolov3_ To_ onnx.py My environment is as follows:

TensorRT Version: 6.0.1.5

GPU Type: 2080Ti

CUDA Version: 10.0

CUDNN Version: 7.6.5

Operating System + Version: ubuntu 18.04

python3.6

pytorch1.4.0

onnx1.5.0

The rest of this sample can be run with either version of Python

Then the code is masked: if sys.version_ Info [0] > 2: the error is as follows:

TypeError: Unicode-objects must be encoded before hashing

Can I refer to your conversion code? thank you!

@rajeevsrao
Copy link
Collaborator

@santhoshnumberone can you please indicate if you still need help with this and if you were able to try experimenting with lower WS size?

@rajeevsrao rajeevsrao added Step: Builder triaged Issue has been triaged by maintainers labels Oct 27, 2020
@ttyio
Copy link
Collaborator

ttyio commented Jan 15, 2021

I will close since no response for more than 3 weeks, please reopen if you still have question, thanks!

@ttyio ttyio closed this as completed Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ONNX Samples triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

7 participants