8.6.0 diffusion demo txt2img not working #2784

Vozf · 2023-03-17T16:17:56Z

Description

I try to run by instructions
and on step
python3 demo_txt2img.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN -v
I encounter the error

[E] ModelImporter.cpp:726: While parsing node number 7 [LayerNormalization -> "/text_model/encoder/layers.0/layer_norm1/LayerNormalization_output_0"]:                    
[E] ModelImporter.cpp:727: --- Begin node ---                                                                                                                             
[E] ModelImporter.cpp:728: input: "/text_model/embeddings/Add_output_0"                                                                                                   
    input: "text_model.encoder.layers.0.layer_norm1.weight"                                                                                                               
    input: "text_model.encoder.layers.0.layer_norm1.bias"                                                                                                                 
    output: "/text_model/encoder/layers.0/layer_norm1/LayerNormalization_output_0"                                                                                        
    name: "/text_model/encoder/layers.0/layer_norm1/LayerNormalization"                                                                                                   
    op_type: "LayerNormalization"                                                                                                                                         
    attribute {                                                                                                                                                           
      name: "axis"                                                                                                                                                        
      i: -1                                                                                                                                                               
      type: INT                                                                                                                                                           
    }                                                                                                                                                                     
    attribute {                                                                                                                                                           
      name: "epsilon"                                                                                                                                                     
      f: 1e-05                                                                                                                                                            
      type: FLOAT                                                                                                                                                         
    }                                                                                                                                                                     
[E] ModelImporter.cpp:729: --- End node ---                                                                                                                               
[E] ModelImporter.cpp:732: ERROR: builtin_op_importers.cpp:5428 In function importFallbackPluginImporter:                                                                 
    [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"                                                             
[E] In node 7 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"     
[!] Could not parse ONNX correctly
[0] 0:[tmux]*                                                                                                                      "root@5100eb7a428b: /w" 16:17 17-Mar-23

Environment

TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

The text was updated successfully, but these errors were encountered:

rajeevsrao · 2023-03-18T01:06:14Z

@Vozf did you also upgrade to TensorRT 8.6.0?
python3 -c 'import tensorrt as trt;print(trt.__version__)' should give you 8.6.0.

Vozf · 2023-03-18T08:40:04Z

Yeah you were right, I've had 8.5.3. It seems this "Optional" step in the instruction isn't so optional
Unfortunately, I've upgraded and now get the following error

[I] Saving engine to engine/clip.plan                                                                                                                             [4/1859]
Building TensorRT engine for onnx/unet.opt.onnx: engine/unet.plan                                                                                                         
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, par
sing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_st
ream.h.                                                                                                                                                                   
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1733934759                                                                 
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, par
sing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_st
ream.h.                                                                                                                                                                   
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1733934759                                                                 
[W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped                                                                                    
[I]     Configuring with profiles: [Profile().add('sample', min=(2, 4, 64, 64), opt=(2, 4, 64, 64), max=(32, 4, 64, 64)).add('encoder_hidden_states', min=(2, 77, 1024), o
pt=(2, 77, 1024), max=(32, 77, 1024)).add('timestep', min=[1], opt=[1], max=[1])]                                                                                         
[I] Building engine with configuration:                                                                                                                                   
    Flags                  | [FP16]                                                                                                                                       
    Engine Capability      | EngineCapability.DEFAULT                                                                                                                         Memory Pools           | [WORKSPACE: 6934.31 MiB, TACTIC_DRAM: 11170.44 MiB]                                                                                          
    Tactic Sources         | []                                                                                                                                               Profiling Verbosity    | ProfilingVerbosity.DETAILED                                                                                                                  
    Preview Features       | [DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]                                                                                              
[E] 10: Could not find any implementation for node {ForeignNode[/down_blocks.0/attentions.0/norm/Constant_1_output_0 + (Unnamed Layer* 1216) [Shuffle].../down_blocks.0/re
snets.1/conv1/Cast]}.                                                                                                                                                     
[E] 10: [optimizer.cpp::computeCosts::3873] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/down_blocks.0/attentions.0/norm/Consta
nt_1_output_0 + (Unnamed Layer* 1216) [Shuffle].../down_blocks.0/resnets.1/conv1/Cast]}.)                                                                                 
[!] Invalid Engine. Please ensure the engine was built correctly                                                                                                          
Traceback (most recent call last):                                                                                                                                        
  File "demo_txt2img.py", line 76, in <module>                                                                                                                            
    demo.loadEngines(args.engine_dir, args.onnx_dir, args.onnx_opset,                                                                                                     
  File "/workspace/projects/pixomatic/TensorRT/demo/Diffusion/stable_diffusion_pipeline.py", line 290, in loadEngines                                                     
    engine.build(onnx_opt_path,                                                                                                                                           
  File "/workspace/projects/pixomatic/TensorRT/demo/Diffusion/utilities.py", line 206, in build                                                                           
    engine = engine_from_network(                                                                                                                                         
  File "<string>", line 3, in engine_from_network                                                                                                                         
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 42, in __call__                                                                   
    return self.call_impl(*args, **kwargs)                                                                                                                                
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 530, in call_impl                                                                  
    return engine_from_bytes(super().call_impl)                                                                                                                           
  File "<string>", line 3, in engine_from_bytes                                                                                                                           
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 42, in __call__                                                                   
    return self.call_impl(*args, **kwargs)                                                                                                                                
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 554, in call_impl                                                                  
    buffer, owns_buffer = util.invoke_if_callable(self._serialized_engine)                                                                                                
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/util/util.py", line 661, in invoke_if_callable                                                                  
    ret = func(*args, **kwargs)                                                                                                                                           
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 488, in call_impl                                                                  
    G_LOGGER.critical("Invalid Engine. Please ensure the engine was built correctly")                                                                                     
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/logger/logger.py", line 597, in critical                                                                        
    raise PolygraphyException(message) from None                                                                                                                          
polygraphy.exception.exception.PolygraphyException: Invalid Engine. Please ensure the engine was built correctly

aredden · 2023-03-22T00:24:05Z

I had a similar error-

[I]     Configuring with profiles: [Profile().add('sample', min=(2, 4, 32, 32), opt=(2, 4, 80, 64), max=(2, 4, 192, 192)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(2, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16, REFIT]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 19704.19 MiB, TACTIC_DRAM: 24217.31 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[E] 10: Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_9032 + (Unnamed Layer* 1211) [Shuffle].../down_blocks.0/attentions.0/Reshape_1 + /down_blocks.0/attentions.0/Transpose_1]}.
[E] 10: [optimizer.cpp::computeCosts::3873] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_9032 + (Unnamed Layer* 1211) [Shuffle].../down_blocks.0/attentions.0/Reshape_1 + /down_blocks.0/attentions.0/Transpose_1]}.)
[!] Invalid Engine. Please ensure the engine was built correctly

I also noticed that it seems like the flash attention plugins were removed? also- with this version, since it seems like no extra plugins are being added, instead of the unet getting down to UNet: final .. 1082 nodes, 2037 tensors, 3 inputs, 1 outputs it gets to: 4016 nodes, 6732 tensors, 3 inputs, 1 outputs. Is this a result of trying to get it functional for the other versions of stable diffusion? Sacrificing performance for flexibility?

rajeevsrao · 2023-03-22T00:32:20Z

@aredden @Vozf please share the python commands you used.

@aredden It looks like in your case REFIT is enabled?

Is this a result of trying to get it functional for the other versions of stable diffusion? Sacrificing performance for flexibility?

The increase in nodes is expected if we don't use plugins, however they will be fused back into fMHA Ops by the TensorRT optimizer. Plugins are you note are also not very flexible and support fewer SD versions and GPU target than the TensorRT out-of-box solution.

aredden · 2023-03-22T01:28:51Z

I was trying refit, to see what it would be like- maybe that was incorrect usage? Also- my environment was exactly the environment described in the stable-diffusion demo README.md via the docker container- installing requirements, to a T. @rajeevsrao One thing I noticed was that inside the container the tensorrt version is 8.5.3, whereas the optional tensorrt version I installed is 8.6.0- Maybe that caused some issue? - GPU is a 4090, with cuda 12.1 out of the container.

aredden · 2023-03-22T01:37:18Z

As for command, I was using this script, I added some arguments to modify the max latent dimensions for the unet, and having the pytorch model get pulled from a local custom diffusers checkpoint path- something I had been doing for the previous version of tensorrt.

#!/bin/sh
CUDA_VISIBLE_DEVICES=0 CUDA_MODULE_LOADING=LAZY python3 demo_txt2img.py \
    --negative-prompt "((Horribly blurred)), very ugly, (jpeg artifacts, blurry, gross), messy, warped, split, bad anatomy, malformed body, malformed, warped, fake, 3d, drawn, hideous, disgusting" \
    --denoising-steps 50 \
    --scheduler DPM \
    --width 512 \
    --height 640 \
    --engine-dir engine \
    --onnx-dir onnx \
    --force-onnx-export \
    --force-engine-build \
    --force-onnx-optimize \
    --build-preview-features \
    --build-enable-refit \
    --build-all-tactics \
    --build-static-batch \
    --build-dynamic-shape \
    --max-size 1536 \
    --model-path ./oranjipiratejaydos \
    -v \    
    "(Stunningly beautiful detailed) lush futuristic (eutopian paradise cyberpunk cityscape) landscape, intricate, elegant, mountains and very high waterfall background, volumetric lighting"

edit: Interesting, the error seems to have gone away after I build a container from source with tensorrt 8.6 and cuda 12 using the basic demo script. Might be that I had some code errors, or something about larger dynamic shapes doesn't work very well? Error also could be related to two different tensorrt binary versions in the previous container with 8.5.3 after updating to 8.6 via pip. Not sure.

edit 2: It compiles, but the output images are all black.

edit 3: The black images were the result of a faulty tensorrt compiled clip model for some reason 🤔 - I didn't change any code whatsoever so not sure why that would happen, but the speed of inference is about 1/3 what it was with 8.5.3.

edit 4: Alright, I built from source and that helped shave off quite a bit of time, from ~1600 ms/50 unet passes to about 1100ms / 50 unet passes. Which is still considerably slower than with 8.5.3, which gets closer to ~ 580 ms/50

Vozf · 2023-03-22T08:06:32Z

I'm doing step by step from diffusion readme
The error is on step
python3 demo_txt2img.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN -v

chavinlo · 2023-03-28T04:39:45Z

I had a similar error-

[I]     Configuring with profiles: [Profile().add('sample', min=(2, 4, 32, 32), opt=(2, 4, 80, 64), max=(2, 4, 192, 192)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(2, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16, REFIT]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 19704.19 MiB, TACTIC_DRAM: 24217.31 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[E] 10: Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_9032 + (Unnamed Layer* 1211) [Shuffle].../down_blocks.0/attentions.0/Reshape_1 + /down_blocks.0/attentions.0/Transpose_1]}.
[E] 10: [optimizer.cpp::computeCosts::3873] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_9032 + (Unnamed Layer* 1211) [Shuffle].../down_blocks.0/attentions.0/Reshape_1 + /down_blocks.0/attentions.0/Transpose_1]}.)
[!] Invalid Engine. Please ensure the engine was built correctly

I also noticed that it seems like the flash attention plugins were removed? also- with this version, since it seems like no extra plugins are being added, instead of the unet getting down to UNet: final .. 1082 nodes, 2037 tensors, 3 inputs, 1 outputs it gets to: 4016 nodes, 6732 tensors, 3 inputs, 1 outputs. Is this a result of trying to get it functional for the other versions of stable diffusion? Sacrificing performance for flexibility?

Had the same issue, Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::LayerNormalization

Tried everything, verifying torch, reinstalling dependencies, compiling the plugins, nothing worked except adding the --build-preview-features flag.

There was some warning that mentioned that enabling this would prevent issues... so I guess thats it....

chavinlo · 2023-03-28T04:53:46Z

Had the same issue, Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::LayerNormalization

Tried everything, verifying torch, reinstalling dependencies, compiling the plugins, nothing worked except adding the --build-preview-features flag.

There was some warning that mentioned that enabling this would prevent issues... so I guess thats it....

Can confirm it works, although yes, this is wayyyy slower than before

chavinlo · 2023-03-28T05:09:58Z

The increase in nodes is expected if we don't use plugins, however they will be fused back into fMHA Ops by the TensorRT optimizer. Plugins are you note are also not very flexible and support fewer SD versions and GPU target than the TensorRT out-of-box solution.

@rajeevsrao is there a way to accelerate it in the current state? by "GPU target" do you mean compiling the plugins for especific architectures? would that help?

Vozf · 2023-03-29T14:03:33Z

I've managed to run the demo until it was somehow killed at unet stage, although I had to manually install torch 1.13 because torch 2.0 was installing by default as torch isn't in requirements file. torch==1.13 must be added to requirements. torch 2.0 results in error from the start

chavinlo · 2023-03-30T01:49:13Z

man can't even replicate what I did yesterday
damn yall really broke it this time

skirsten · 2023-04-14T18:58:48Z

I am also getting the

Could not find any implementation for node {ForeignNode[down_blocks.0.attentions.0.transformer_blocks.0.norm1.weight + (Unnamed Layer* 1363) [Shuffle].../down_blocks.0/attentions.0/Reshape_1 + /down_blocks.0/attentions.0/Transpose_1]} [profile 1].

with this config on the normal text2img:

[I] Building engine with configuration:
    Flags                  | [FP16, REFIT]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 20480.00 MiB, TACTIC_DRAM: 24259.69 MiB]
    Tactic Sources         | []
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Optimization Profiles  | 2 profile(s)
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]

I was using the version_compatible build before without refit and everything was fine 😞.
Its a shame that I cannot build a engine with refit AND version_compatible to use with the lean runtime.
I also tested the normal build and that also works.

So it has to do with the refit which does not make any sense...
Now I am stuck having to ship gigabytes of unused dependencies and the build is magically failing without any reason...

rajeevsrao added triaged Issue has been triaged by maintainers Demo: Diffusion Issues regarding demoDiffusion labels Mar 18, 2023

Vozf mentioned this issue Mar 20, 2023

release/8.5 demo diffusion not working #2789

Closed

Vozf mentioned this issue Mar 29, 2023

error occurs when running stable diffusion demo on V100 16G #2826

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8.6.0 diffusion demo txt2img not working #2784

8.6.0 diffusion demo txt2img not working #2784

Vozf commented Mar 17, 2023

rajeevsrao commented Mar 18, 2023

Vozf commented Mar 18, 2023

aredden commented Mar 22, 2023

rajeevsrao commented Mar 22, 2023 •

edited

Loading

aredden commented Mar 22, 2023 •

edited

Loading

aredden commented Mar 22, 2023 •

edited

Loading

Vozf commented Mar 22, 2023

chavinlo commented Mar 28, 2023

chavinlo commented Mar 28, 2023

chavinlo commented Mar 28, 2023

Vozf commented Mar 29, 2023 •

edited

Loading

chavinlo commented Mar 30, 2023

skirsten commented Apr 14, 2023 •

edited

Loading

8.6.0 diffusion demo txt2img not working #2784

8.6.0 diffusion demo txt2img not working #2784

Comments

Vozf commented Mar 17, 2023

Description

Environment

Relevant Files

Steps To Reproduce

rajeevsrao commented Mar 18, 2023

Vozf commented Mar 18, 2023

aredden commented Mar 22, 2023

rajeevsrao commented Mar 22, 2023 • edited Loading

aredden commented Mar 22, 2023 • edited Loading

aredden commented Mar 22, 2023 • edited Loading

Vozf commented Mar 22, 2023

chavinlo commented Mar 28, 2023

chavinlo commented Mar 28, 2023

chavinlo commented Mar 28, 2023

Vozf commented Mar 29, 2023 • edited Loading

chavinlo commented Mar 30, 2023

skirsten commented Apr 14, 2023 • edited Loading

rajeevsrao commented Mar 22, 2023 •

edited

Loading

aredden commented Mar 22, 2023 •

edited

Loading

aredden commented Mar 22, 2023 •

edited

Loading

Vozf commented Mar 29, 2023 •

edited

Loading

skirsten commented Apr 14, 2023 •

edited

Loading