Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry about Layer Performance of FP16 #3876

Open
minhhotboy9x opened this issue May 17, 2024 · 4 comments
Open

Inquiry about Layer Performance of FP16 #3876

minhhotboy9x opened this issue May 17, 2024 · 4 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@minhhotboy9x
Copy link

Description

Hi, I'm newer to TensorRT and I'm trying to understand the layer performance. I read the doc Optimizing for Tensor Cores and see that with the FP16 precision, the dim of tensor should be multiples of 8 or 16.
So I converted an ONNX model to an engine model, then I printed the layer information. Here is a part of it:

...
{
  "Name": "/model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "/model.1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,50,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [1,1],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [0,0],
  "PostPadding": [0,0],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 25,
  "Groups": 1,
  "Weights": {"Type": "Half", "Count": 1250},
  "Bias": {"Type": "Half", "Count": 25},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x32x32_stage4_warpsize4x1x1_g1_tensor16x8x16_t1r1s1",
  "TacticValue": "0xb4ed47991b2d81ae",
  "StreamId": 0,
  "Metadata": "[ONNX Layer: /model.2/cv1/conv/Conv]\u001e[ONNX Layer: /model.2/cv1/act/Relu]"
},{
  "Name": "Reformatting CopyNode for Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Reformatted Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "Outputs": [
  {
    "Name": "/model.2/cv1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003ea",
  "StreamId": 0,
  "Metadata": ""
},{
  "Name": "/model.2/m.0/cv1/conv/Conv + /model.2/m.0/cv1/act/Relu",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "/model.2/cv1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
  }],
  "Outputs": [
  {
    "Name": "/model.2/m.0/cv1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 25,
  "Groups": 1,
  "Weights": {"Type": "Half", "Count": 5625},
  "Bias": {"Type": "Half", "Count": 25},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_indexed_wo_smem_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x32x64_stage1_warpsize4x1x1_g1_tensor16x8x16_aligna4_alignc4",
  "TacticValue": "0xa1c540a5038e4190",
  "StreamId": 0,
  "Metadata": "[ONNX Layer: /model.2/m.0/cv1/conv/Conv]\u001e[ONNX Layer: /model.2/m.0/cv1/act/Relu]"
}
...

I see the description "Format/Datatype": "Channel major FP16 format where channel % 8 == 0" and "Format/Datatype": "Channel major FP16 format where channel % 2 == 0". I don't know what this means because my channel is not divisible by 8 "Dimensions": [1,25,160,160], and is my model optimized?

Sorry for my bad English.

Environment

TensorRT Version:

NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

@lix19937
Copy link

You can upload the onnx.

@zerollzeng
Copy link
Collaborator

I see the description "Format/Datatype": "Channel major FP16 format where channel % 8 == 0" and "Format/Datatype": "Channel major FP16 format where channel % 2 == 0". I don't know what this means because my channel is not divisible by 8 "Dimensions": [1,25,160,160], and is my model optimized?

It's vectorized format and trt will pad the tensor to the target format. you can refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#data-format-desc

@zerollzeng zerollzeng self-assigned this May 26, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label May 26, 2024
@minhhotboy9x
Copy link
Author

@zerollzeng Oh I see. However, the data format of each layer is auto-chosen for the best performance, right? Since I convert on my Jetson Nano the layers is converted to datatype "Two wide channel vectorized row major FP16 format" CHW2.

@minhhotboy9x
Copy link
Author

@lix19937 Here is my onnx v8s_pruned. This onnx is exported from Ultralytics so it has meta data. So, I use below python script to convert:

import argparse
import os
import json
import tensorrt as trt
from datetime import datetime
import onnx
import calibration

TRT_LOGGER = trt.Logger()

def parse_args():
    parser = argparse.ArgumentParser(description='Convert ONNX models to TensorRT')
        
    # Sample image
    parser.add_argument('--batch_size', type=int, help='data batch size',
        default=1)
    parser.add_argument('--img_size', help='input size',
        default=[3, 640, 640])

    # Model path
    parser.add_argument('--onnx_model_path',  help='onnx model path',
        default='./onnx_model.onnx')
    parser.add_argument('--tensorrt_engine_path',  help='tensorrt engine path',
        default='./yolov5s_640_384_pfg_dynamic_max_batchsize_8_FP16.engine')

    # TensorRT engine params
    parser.add_argument('--dynamic_axes', help='dynamic batch input or output',
        default='True')
    parser.add_argument('--engine_precision', help='precision of TensorRT engine', choices=['FP32', 'FP16', 'INT8'], 
        default='FP16')
    parser.add_argument('--min_engine_batch_size', type=int, help='set the min input data size of model for inference', 
        default=1)
    parser.add_argument('--opt_engine_batch_size', type=int, help='set the most used input data size of model for inference', 
        default=1)
    parser.add_argument('--max_engine_batch_size', type=int, help='set the max input data size of model for inference', 
        default=1)
    parser.add_argument('--engine_workspace', type=int, help='workspace of engine', 
        default=4)
    # Optional argument for INT8 precision
    parser.add_argument('--data_calib', type=str, help='img data directory for int8 calibration', default='datasets/VOC/images/val2007')

    args = string_to_bool(parser.parse_args())

    if args.engine_precision == 'INT8' and args.data_calib is None:
        parser.error("--data_calib is required when --engine_precision is set to INT8")

    return args

def extract_metadata(onnx_model_path):
    # Load ONNX model
    model_onnx = onnx.load(onnx_model_path)

    # Extract metadata
    metadata = {}
    for prop in model_onnx.metadata_props:
        metadata[prop.key] = prop.value
    return metadata


def string_to_bool(args):
    if args.dynamic_axes.lower() in ('true'): args.dynamic_axes = True
    else: args.dynamic_axes = False
    return args


def build_engine(onnx_model_path, tensorrt_engine_path, engine_precision, dynamic_axes, \
    img_size, batch_size, min_engine_batch_size, opt_engine_batch_size, max_engine_batch_size,\
        engine_workspace, data_calib):
    metadata = extract_metadata(onnx_model_path)
    print(metadata)
    # Builder
    logger = trt.Logger(trt.Logger.ERROR)
    builder = trt.Builder(logger)
    network_flags = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

    if engine_precision == "INT8":
        print('PTQ enabled!')
        network_flags = network_flags | (1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION))

    network = builder.create_network(network_flags)
    
    profile = builder.create_optimization_profile()
    
    config = builder.create_builder_config()

    config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED

    # Set FP16 
    if engine_precision == 'FP16':
        config.set_flag(trt.BuilderFlag.FP16)
    elif engine_precision == 'INT8':
        config.flags |= 1 << int(trt.BuilderFlag.INT8)
        config.flags |= 1 << int(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
        calib_loader = calibration.DataLoader(batch_size, 128, data_calib, 640, 640)
        config.int8_calibrator = calibration.Calibrator(calib_loader, data_calib + '.cache')

    # Onnx parser
    parser = trt.OnnxParser(network, logger)

    if not os.path.exists(onnx_model_path):
        print("Failed finding ONNX file!")
        exit()
    print("Succeeded finding ONNX file!")
    with open(onnx_model_path, "rb") as model:
        if not parser.parse(model.read()):
            print("Failed parsing .onnx file!")
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            exit()
        print("Succeeded parsing .onnx file!")
        
    # Input
    inputTensor = network.get_input(0) 
    # Dynamic batch (min, opt, max)
    print('inputTensor.name:', inputTensor.name)
    if dynamic_axes:
        profile.set_shape(inputTensor.name, (min_engine_batch_size, img_size[0], img_size[1], img_size[2]), \
            (opt_engine_batch_size, img_size[0], img_size[1], img_size[2]), \
            (max_engine_batch_size, img_size[0], img_size[1], img_size[2]))
        print('Set dynamic')
    else:
        profile.set_shape(inputTensor.name, (batch_size, img_size[0], img_size[1], img_size[2]), \
            (batch_size, img_size[0], img_size[1], img_size[2]), \
            (batch_size, img_size[0], img_size[1], img_size[2]))
    config.add_optimization_profile(profile)
    #network.unmark_output(network.get_output(0))
    
    # Write engine
    engineString = builder.build_serialized_network(network, config)
    if engineString == None:
        print("Failed building engine!")
        exit()
    print("Succeeded building engine!")

    # Chuyển từ dictionary sang JSON và encode nó
    metaString = json.dumps(metadata).encode('utf-8')
    
    # Lưu engine cùng với metadata vào file
    with open(tensorrt_engine_path, "wb") as f:
        # Ghi độ dài của metadata
        f.write(len(metaString).to_bytes(4, byteorder='little'))
        # Ghi metadata
        f.write(metaString)
        # Ghi engine
        f.write(engineString)
        


def main():
    args = parse_args()    
    # Build TensorRT engine
    build_engine(args.onnx_model_path, args.tensorrt_engine_path, args.engine_precision, args.dynamic_axes, \
        args.img_size, args.batch_size, args.min_engine_batch_size, args.opt_engine_batch_size, \
        args.max_engine_batch_size, args.engine_workspace, args.data_calib)
    
   
    
if __name__ == '__main__': 
    main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants