In trt10.0.1, these two APIs: setPrecision and setOutputType do not work #3941

2730gf · 2024-06-13T10:40:00Z

Description

We have a model that overflows when using fp16, so we use layer-precision to limit it and let some layers use fp32. It worked in version 8.6 and we could infer normal results. But after upgrading to 10.0.1, we found that the model output overflowed. Using polygraphy, we found that nan was already generated at the first overflow location (Is setprecison and setoutputType invalid?)

Environment

TensorRT Version:
10.0.1
NVIDIA GPU:
3090 & 3080
NVIDIA Driver Version:
550
CUDA Version:
cuda-12.2

Steps To Reproduce

my code is like this:

for (int32_t layerIdx = 0; layerIdx < network.getNbLayers(); ++layerIdx) {
    auto *layer = network.getLayer(layerIdx);
    auto const layerName = layer->getName();
    nvinfer1::DataType dataType;
    if (matchLayerPrecision(layerPrecisions, layerName, &dataType)) { // Function to determine whether to limit the precision
        layer->setPrecision(dataType);
        int32_t layerOutNb = layer->getNbOutputs();
        for (int32_t outputIdx = 0; outputIdx < layerOutNb; outputIdx++) {
            layer->setOutputType(outputIdx, dataType);
        }}}

By the way, I have already set kOBEY_PRECISION_CONSTRAINTS
env.config_->setFlag(nvinfer1::BuilderFlag::kOBEY_PRECISION_CONSTRAINTS);

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-06-15T14:08:27Z

I suggest use trtexec --layerOutputTypes=spec --layerPrecisions=spec --precisionConstraints=spec --fp16 --verbose --onnx=spec

2730gf · 2024-06-19T09:09:19Z

I have done this, and it works on 8.6, but fails on 10.0.1:

export layer_precision="p2o.Pow.0:fp32,p2o.Pow.2:fp32..."
trtexec  --fp16 --onnx=sample.onnx --precisionConstraints="obey" --layerPrecisions=${layer_precision} --layerOutputTypes=${layer_precision}  --saveEngine=sample.trt
trtexec --loadEngine=sample.trt  --dumpOutput --loadInputs=...

lix19937 · 2024-06-23T10:47:25Z

On trt10.0.1, try to use

trtexec  --fp16 --onnx=sample.onnx --precisionConstraints="obey" --layerPrecisions=${layer_precision} --layerOutputTypes=${layer_precision}  --saveEngine=sample.trt --builderOptimizationLevel=5

2730gf · 2024-06-24T05:21:37Z

I have added --builderOptimizationLevel=5, but it still overflows

lix19937 · 2024-06-25T01:25:46Z

You can compare the tactic between two version.

2730gf · 2024-06-26T12:18:51Z

Thank you very much for your reply，after setting builderOptimizationLevel to 5, cache cannot be generated in trt86, but can be generated in trt10.
In trt10, I can see the name of the strategy is: sm80_xmma_gemm_f32f32_f32f32_f32_nn_n_tilesize32x32x8_stage3_warpsize1x2x1_ffma_aligna4_alignc4; from the name, this is already a kernel using fp32?
Is there any other way to continue to locate the problem？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In trt10.0.1, these two APIs: setPrecision and setOutputType do not work #3941

In trt10.0.1, these two APIs: setPrecision and setOutputType do not work #3941

2730gf commented Jun 13, 2024 •

edited

Loading

lix19937 commented Jun 15, 2024 •

edited

Loading

2730gf commented Jun 19, 2024

lix19937 commented Jun 23, 2024

2730gf commented Jun 24, 2024

lix19937 commented Jun 25, 2024

2730gf commented Jun 26, 2024

In trt10.0.1, these two APIs: setPrecision and setOutputType do not work #3941

In trt10.0.1, these two APIs: setPrecision and setOutputType do not work #3941

Comments

2730gf commented Jun 13, 2024 • edited Loading

Description

Environment

Steps To Reproduce

lix19937 commented Jun 15, 2024 • edited Loading

2730gf commented Jun 19, 2024

lix19937 commented Jun 23, 2024

2730gf commented Jun 24, 2024

lix19937 commented Jun 25, 2024

2730gf commented Jun 26, 2024

2730gf commented Jun 13, 2024 •

edited

Loading

lix19937 commented Jun 15, 2024 •

edited

Loading