Inquiry about Layer Performance of FP16 #3876

minhhotboy9x · 2024-05-17T11:12:34Z

Description

Hi, I'm newer to TensorRT and I'm trying to understand the layer performance. I read the doc Optimizing for Tensor Cores and see that with the FP16 precision, the dim of tensor should be multiples of 8 or 16.
So I converted an ONNX model to an engine model, then I printed the layer information. Here is a part of it:

...
{
  "Name": "/model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "/model.1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,50,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [1,1],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [0,0],
  "PostPadding": [0,0],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 25,
  "Groups": 1,
  "Weights": {"Type": "Half", "Count": 1250},
  "Bias": {"Type": "Half", "Count": 25},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x32x32_stage4_warpsize4x1x1_g1_tensor16x8x16_t1r1s1",
  "TacticValue": "0xb4ed47991b2d81ae",
  "StreamId": 0,
  "Metadata": "[ONNX Layer: /model.2/cv1/conv/Conv]\u001e[ONNX Layer: /model.2/cv1/act/Relu]"
},{
  "Name": "Reformatting CopyNode for Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Reformatted Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "Outputs": [
  {
    "Name": "/model.2/cv1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003ea",
  "StreamId": 0,
  "Metadata": ""
},{
  "Name": "/model.2/m.0/cv1/conv/Conv + /model.2/m.0/cv1/act/Relu",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "/model.2/cv1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
  }],
  "Outputs": [
  {
    "Name": "/model.2/m.0/cv1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 25,
  "Groups": 1,
  "Weights": {"Type": "Half", "Count": 5625},
  "Bias": {"Type": "Half", "Count": 25},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_indexed_wo_smem_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x32x64_stage1_warpsize4x1x1_g1_tensor16x8x16_aligna4_alignc4",
  "TacticValue": "0xa1c540a5038e4190",
  "StreamId": 0,
  "Metadata": "[ONNX Layer: /model.2/m.0/cv1/conv/Conv]\u001e[ONNX Layer: /model.2/m.0/cv1/act/Relu]"
}
...

I see the description "Format/Datatype": "Channel major FP16 format where channel % 8 == 0" and "Format/Datatype": "Channel major FP16 format where channel % 2 == 0". I don't know what this means because my channel is not divisible by 8 "Dimensions": [1,25,160,160], and is my model optimized?

Sorry for my bad English.

Environment

TensorRT Version:

NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-05-26T11:07:15Z

You can upload the onnx.

zerollzeng · 2024-05-26T11:22:21Z

I see the description "Format/Datatype": "Channel major FP16 format where channel % 8 == 0" and "Format/Datatype": "Channel major FP16 format where channel % 2 == 0". I don't know what this means because my channel is not divisible by 8 "Dimensions": [1,25,160,160], and is my model optimized?

It's vectorized format and trt will pad the tensor to the target format. you can refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#data-format-desc

minhhotboy9x · 2024-05-26T17:48:42Z

@zerollzeng Oh I see. However, the data format of each layer is auto-chosen for the best performance, right? Since I convert on my Jetson Nano the layers is converted to datatype "Two wide channel vectorized row major FP16 format" CHW2.

minhhotboy9x · 2024-05-27T08:09:13Z

@lix19937 Here is my onnx v8s_pruned. This onnx is exported from Ultralytics so it has meta data. So, I use below python script to convert:

import argparse
import os
import json
import tensorrt as trt
from datetime import datetime
import onnx
import calibration

TRT_LOGGER = trt.Logger()

def parse_args():
    parser = argparse.ArgumentParser(description='Convert ONNX models to TensorRT')
        
    # Sample image
    parser.add_argument('--batch_size', type=int, help='data batch size',
        default=1)
    parser.add_argument('--img_size', help='input size',
        default=[3, 640, 640])

    # Model path
    parser.add_argument('--onnx_model_path',  help='onnx model path',
        default='./onnx_model.onnx')
    parser.add_argument('--tensorrt_engine_path',  help='tensorrt engine path',
        default='./yolov5s_640_384_pfg_dynamic_max_batchsize_8_FP16.engine')

    # TensorRT engine params
    parser.add_argument('--dynamic_axes', help='dynamic batch input or output',
        default='True')
    parser.add_argument('--engine_precision', help='precision of TensorRT engine', choices=['FP32', 'FP16', 'INT8'], 
        default='FP16')
    parser.add_argument('--min_engine_batch_size', type=int, help='set the min input data size of model for inference', 
        default=1)
    parser.add_argument('--opt_engine_batch_size', type=int, help='set the most used input data size of model for inference', 
        default=1)
    parser.add_argument('--max_engine_batch_size', type=int, help='set the max input data size of model for inference', 
        default=1)
    parser.add_argument('--engine_workspace', type=int, help='workspace of engine', 
        default=4)
    # Optional argument for INT8 precision
    parser.add_argument('--data_calib', type=str, help='img data directory for int8 calibration', default='datasets/VOC/images/val2007')

    args = string_to_bool(parser.parse_args())

    if args.engine_precision == 'INT8' and args.data_calib is None:
        parser.error("--data_calib is required when --engine_precision is set to INT8")

    return args

def extract_metadata(onnx_model_path):
    # Load ONNX model
    model_onnx = onnx.load(onnx_model_path)

    # Extract metadata
    metadata = {}
    for prop in model_onnx.metadata_props:
        metadata[prop.key] = prop.value
    return metadata


def string_to_bool(args):
    if args.dynamic_axes.lower() in ('true'): args.dynamic_axes = True
    else: args.dynamic_axes = False
    return args


def build_engine(onnx_model_path, tensorrt_engine_path, engine_precision, dynamic_axes, \
    img_size, batch_size, min_engine_batch_size, opt_engine_batch_size, max_engine_batch_size,\
        engine_workspace, data_calib):
    metadata = extract_metadata(onnx_model_path)
    print(metadata)
    # Builder
    logger = trt.Logger(trt.Logger.ERROR)
    builder = trt.Builder(logger)
    network_flags = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

    if engine_precision == "INT8":
        print('PTQ enabled!')
        network_flags = network_flags | (1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION))

    network = builder.create_network(network_flags)
    
    profile = builder.create_optimization_profile()
    
    config = builder.create_builder_config()

    config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED

    # Set FP16 
    if engine_precision == 'FP16':
        config.set_flag(trt.BuilderFlag.FP16)
    elif engine_precision == 'INT8':
        config.flags |= 1 << int(trt.BuilderFlag.INT8)
        config.flags |= 1 << int(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
        calib_loader = calibration.DataLoader(batch_size, 128, data_calib, 640, 640)
        config.int8_calibrator = calibration.Calibrator(calib_loader, data_calib + '.cache')

    # Onnx parser
    parser = trt.OnnxParser(network, logger)

    if not os.path.exists(onnx_model_path):
        print("Failed finding ONNX file!")
        exit()
    print("Succeeded finding ONNX file!")
    with open(onnx_model_path, "rb") as model:
        if not parser.parse(model.read()):
            print("Failed parsing .onnx file!")
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            exit()
        print("Succeeded parsing .onnx file!")
        
    # Input
    inputTensor = network.get_input(0) 
    # Dynamic batch (min, opt, max)
    print('inputTensor.name:', inputTensor.name)
    if dynamic_axes:
        profile.set_shape(inputTensor.name, (min_engine_batch_size, img_size[0], img_size[1], img_size[2]), \
            (opt_engine_batch_size, img_size[0], img_size[1], img_size[2]), \
            (max_engine_batch_size, img_size[0], img_size[1], img_size[2]))
        print('Set dynamic')
    else:
        profile.set_shape(inputTensor.name, (batch_size, img_size[0], img_size[1], img_size[2]), \
            (batch_size, img_size[0], img_size[1], img_size[2]), \
            (batch_size, img_size[0], img_size[1], img_size[2]))
    config.add_optimization_profile(profile)
    #network.unmark_output(network.get_output(0))
    
    # Write engine
    engineString = builder.build_serialized_network(network, config)
    if engineString == None:
        print("Failed building engine!")
        exit()
    print("Succeeded building engine!")

    # Chuyển từ dictionary sang JSON và encode nó
    metaString = json.dumps(metadata).encode('utf-8')
    
    # Lưu engine cùng với metadata vào file
    with open(tensorrt_engine_path, "wb") as f:
        # Ghi độ dài của metadata
        f.write(len(metaString).to_bytes(4, byteorder='little'))
        # Ghi metadata
        f.write(metaString)
        # Ghi engine
        f.write(engineString)
        


def main():
    args = parse_args()    
    # Build TensorRT engine
    build_engine(args.onnx_model_path, args.tensorrt_engine_path, args.engine_precision, args.dynamic_axes, \
        args.img_size, args.batch_size, args.min_engine_batch_size, args.opt_engine_batch_size, \
        args.max_engine_batch_size, args.engine_workspace, args.data_calib)
    
   
    
if __name__ == '__main__': 
    main()

zerollzeng self-assigned this May 26, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry about Layer Performance of FP16 #3876

Inquiry about Layer Performance of FP16 #3876

minhhotboy9x commented May 17, 2024

lix19937 commented May 26, 2024

zerollzeng commented May 26, 2024

minhhotboy9x commented May 26, 2024

minhhotboy9x commented May 27, 2024

Inquiry about Layer Performance of FP16 #3876

Inquiry about Layer Performance of FP16 #3876

Comments

minhhotboy9x commented May 17, 2024

Description

Environment

Relevant Files

Steps To Reproduce

lix19937 commented May 26, 2024

zerollzeng commented May 26, 2024

minhhotboy9x commented May 26, 2024

minhhotboy9x commented May 27, 2024