Reuse engine for multiple consequent runs #3868

KyriaAnnwyn · 2024-05-15T07:12:36Z

Description

Build engines for SDXL.
Then init pipeline. And do several runs. At the first run I get good picture, but the second run gives all grey image.

I've added controlnet and ip-adapter to original code.
init function in my code:

def init():
      pipe = StableDiffusionPipeline(
            pipeline_type=PIPELINE_TYPE.XL_IP,
            ip_adapter_scale=[0.8],
            ip_adapter_path=[IP_ADAPTER_CKPT],
            custom_controlnet_path=CONTROLNET_CKPT,
        **kwargs_init_pipeline)
    
    pipe.loadEngines(
        ENGINE_DIR,
        PYTORCH_MODEL,
        ONNX_DIR,
        **kwargs_load_engine)
    
    _, shared_device_memory = cudart.cudaMalloc(pipe.calculateMaxDeviceMemory())
    pipe.activateEngines(shared_device_memory)

then i have generate function, which uses loaded engines to generate images:

def generate(out_fpath):
    ...
    controlnet_scale = [0.8]
    controlnet_scale = torch.FloatTensor(controlnet_scale)
    demo_kwargs = {'input_image': input_images, 'controlnet_scales': controlnet_scale, 'image_embeds': img_emb}
    args_run_demo = (out_fpath, [prompt], [negative_prompt], SIZE[1], SIZE[0], 1, 1, 0, False)
    pipe.loadResourcesAllocBuf(SIZE[1], SIZE[0], 1, None)
    pipe.run(*args_run_demo, **demo_kwargs)

In main:

    generate('pic1.png')
    generate('pic2.png')

pic1.png is good, but pic2.png is all grey

Seems like I have to reinit something to get correct result, but I don't do that.

In original code there goes pipe.teardown() after generation, but this deletes engines and to make one more call we will need to load them again so it would not be possible to get fast inference.

Help, please, to solve the problem

Environment

TensorRT Version: Tensorrt 10.0.0b6

NVIDIA GPU: A100

NVIDIA Driver Version: 550.54.15

CUDA Version: 11.8

CUDNN Version:

Operating System: ubuntu 20.04

Python Version (if applicable): 3.10

Tensorflow Version (if applicable): -

PyTorch Version (if applicable): 2.2.1

Baremetal or Container (if so, version): -

The text was updated successfully, but these errors were encountered:

zerollzeng · 2024-05-17T15:43:29Z

Are you referring to our Stable-Diffusion sample?

KyriaAnnwyn · 2024-05-20T09:00:33Z

Yes, I'm using stable-diffusion pipeline from samples in demo folder

KyriaAnnwyn · 2024-05-20T09:03:46Z

The parameters to init pipeline and build engines were the following:

SIZE = (1280, 1280)

max_batch_size = 1
kwargs_init_pipeline = {
    'version': "xl-1.0",
    'max_batch_size': max_batch_size,
    'denoising_steps': NUM_STEPS,
    'scheduler': None,
    'guidance_scale': 7.0,
    'output_dir': None,
    'hf_token': None,
    'verbose': True,
    'nvtx_profile': False,
    'use_cuda_graph': True,
    'lora_scale': None,
    'lora_path': None,
    'framework_model_dir': PYTORCH_MODEL,
    'torch_inference': False,
}

kwargs_load_engine = {
    'onnx_opset': 19,
    'opt_batch_size': 1,
    'opt_image_height': SIZE[1],
    'opt_image_width': SIZE[0],
    'static_batch': True,
    'static_shape': False,
    'enable_all_tactics': False,
    'enable_refit': False,
    'timing_cache': None,
    'int8': False,
    'quantization_level': 3.0,
}

KyriaAnnwyn · 2024-05-20T21:41:42Z

Seems that something is wrong with loadResources.

To be sure that all my enhancements don't add any mess i shifted to simple txt2img:

init like this:

SIZE = (1024, 1024)
def _init() -> None:
    global _base

    _base = StableDiffusionPipeline(
            pipeline_type=PIPELINE_TYPE.XL_BASE,
            vae_scaling_factor=0.13025,
            return_latents=False,
            **kwargs_init_pipeline)
    
    _base.loadEngines(
        ENGINE_DIR,
        PYTORCH_MODEL,
        ONNX_DIR,
        **kwargs_load_engine)
    
    _, shared_device_memory = cudart.cudaMalloc(_base.calculateMaxDeviceMemory())
    _base.activateEngines(shared_device_memory)
    _base.loadResources(SIZE[1], SIZE[0], 1, None)

then generate with function:

SIZE2 = (768, 768)
async def gen_t2p(prompt):
    global _base

    height = SIZE2[1]
    width = SIZE2[0]
    _base.loadResources(height, width, 1, None)

    negative_prompt = GENERAL_NEGATIVE_PROMPT_POSTFIX

    images, time_base = _base.infer([prompt], [negative_prompt], height, width, warmup=False)

Here I use SIZE2 as I want to test that I can change input size (because I set static_shape to False when the engine was built). And I get incorrect images in the second call to gen_t2p.

If I comment out line
_base.loadResources(height, width, 1, None)
and generate images of the original size SIZE i get good images both at first and second call to this function.

I've also tried to make new function to only allocate memory for new sizes, without operations with events and stream, but the result was the same:

def reloadResources(self, image_height, image_width, batch_size, seed):

        # Allocate TensorRT I/O buffers
        if not self.torch_inference:
            for model_name, obj in self.models.items():
                self.engine[model_name].allocate_buffers(shape_dict=obj.get_shape_dict(batch_size, image_height, image_width), device=self.device)

KyriaAnnwyn · 2024-05-23T08:22:23Z

I figured out that these lines in the code lead to the error:

tensor = torch.empty(tuple(shape), dtype=numpy_to_torch_dtype_dict[dtype]).to(device=device)
self.tensors[name] = tensor

In the function:

def allocate_buffers(self, shape_dict=None, device='cuda'):
        for binding in range(self.engine.num_io_tensors):
            name = self.engine.get_tensor_name(binding)
            print(f"tensor name = {name}")
            if shape_dict and name in shape_dict:
                shape = shape_dict[name]
            else:
                shape = self.engine.get_tensor_shape(name)
            dtype = trt.nptype(self.engine.get_tensor_dtype(name))
            if self.engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
                self.context.set_input_shape(name, shape)
            tensor = torch.empty(tuple(shape), dtype=numpy_to_torch_dtype_dict[dtype]).to(device=device)
            self.tensors[name] = tensor

When I run it after the first inference I get nans and infs at some places in the output vectors of the engines.

Also maybe the problem is in loading addresses at inference:

    def infer(self, feed_dict, stream, use_cuda_graph=False):

        for name, buf in feed_dict.items():
            self.tensors[name].copy_(buf)

        for name, tensor in self.tensors.items():
            self.context.set_tensor_address(name, tensor.data_ptr())
...

KyriaAnnwyn · 2024-05-29T07:02:20Z

@zerollzeng Hello, if you need the code to reproduce, I can share it

Sundragon1993 · 2024-06-16T11:37:12Z

@KyriaAnnwyn Hello, are you using StableDiffusionPipeline from diffusers library or from TensorRT's implementation? I'm wondering if you could share the modified code to integrate IP Adapter?

zerollzeng self-assigned this May 17, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse engine for multiple consequent runs #3868

Reuse engine for multiple consequent runs #3868

KyriaAnnwyn commented May 15, 2024 •

edited

zerollzeng commented May 17, 2024

KyriaAnnwyn commented May 20, 2024

KyriaAnnwyn commented May 20, 2024

KyriaAnnwyn commented May 20, 2024

KyriaAnnwyn commented May 23, 2024

KyriaAnnwyn commented May 29, 2024

Sundragon1993 commented Jun 16, 2024

Reuse engine for multiple consequent runs #3868

Reuse engine for multiple consequent runs #3868

Comments

KyriaAnnwyn commented May 15, 2024 • edited

Description

Environment

zerollzeng commented May 17, 2024

KyriaAnnwyn commented May 20, 2024

KyriaAnnwyn commented May 20, 2024

KyriaAnnwyn commented May 20, 2024

KyriaAnnwyn commented May 23, 2024

KyriaAnnwyn commented May 29, 2024

Sundragon1993 commented Jun 16, 2024

KyriaAnnwyn commented May 15, 2024 •

edited