Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In trt10.0.1, these two APIs: setPrecision and setOutputType do not work #3941

Open
2730gf opened this issue Jun 13, 2024 · 6 comments
Open

Comments

@2730gf
Copy link

2730gf commented Jun 13, 2024

Description

We have a model that overflows when using fp16, so we use layer-precision to limit it and let some layers use fp32. It worked in version 8.6 and we could infer normal results. But after upgrading to 10.0.1, we found that the model output overflowed. Using polygraphy, we found that nan was already generated at the first overflow location (Is setprecison and setoutputType invalid?)

Environment

TensorRT Version:
10.0.1
NVIDIA GPU:
3090 & 3080
NVIDIA Driver Version:
550
CUDA Version:
cuda-12.2

Steps To Reproduce

my code is like this:

for (int32_t layerIdx = 0; layerIdx < network.getNbLayers(); ++layerIdx) {
    auto *layer = network.getLayer(layerIdx);
    auto const layerName = layer->getName();
    nvinfer1::DataType dataType;
    if (matchLayerPrecision(layerPrecisions, layerName, &dataType)) { // Function to determine whether to limit the precision
        layer->setPrecision(dataType);
        int32_t layerOutNb = layer->getNbOutputs();
        for (int32_t outputIdx = 0; outputIdx < layerOutNb; outputIdx++) {
            layer->setOutputType(outputIdx, dataType);
        }}}

By the way, I have already set kOBEY_PRECISION_CONSTRAINTS
env.config_->setFlag(nvinfer1::BuilderFlag::kOBEY_PRECISION_CONSTRAINTS);

@lix19937
Copy link

lix19937 commented Jun 15, 2024

I suggest use trtexec --layerOutputTypes=spec --layerPrecisions=spec --precisionConstraints=spec --fp16 --verbose --onnx=spec

@2730gf
Copy link
Author

2730gf commented Jun 19, 2024

I have done this, and it works on 8.6, but fails on 10.0.1:

export layer_precision="p2o.Pow.0:fp32,p2o.Pow.2:fp32..."
trtexec  --fp16 --onnx=sample.onnx --precisionConstraints="obey" --layerPrecisions=${layer_precision} --layerOutputTypes=${layer_precision}  --saveEngine=sample.trt
trtexec --loadEngine=sample.trt  --dumpOutput --loadInputs=... 

@lix19937
Copy link

On trt10.0.1, try to use

trtexec  --fp16 --onnx=sample.onnx --precisionConstraints="obey" --layerPrecisions=${layer_precision} --layerOutputTypes=${layer_precision}  --saveEngine=sample.trt --builderOptimizationLevel=5
 

@2730gf
Copy link
Author

2730gf commented Jun 24, 2024

I have added --builderOptimizationLevel=5, but it still overflows

@lix19937
Copy link

You can compare the tactic between two version.

@2730gf
Copy link
Author

2730gf commented Jun 26, 2024

Thank you very much for your reply,after setting builderOptimizationLevel to 5, cache cannot be generated in trt86, but can be generated in trt10.
In trt10, I can see the name of the strategy is: sm80_xmma_gemm_f32f32_f32f32_f32_nn_n_tilesize32x32x8_stage3_warpsize1x2x1_ffma_aligna4_alignc4; from the name, this is already a kernel using fp32?
Is there any other way to continue to locate the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants