Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse engine for multiple consequent runs #3868

Open
KyriaAnnwyn opened this issue May 15, 2024 · 7 comments
Open

Reuse engine for multiple consequent runs #3868

KyriaAnnwyn opened this issue May 15, 2024 · 7 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@KyriaAnnwyn
Copy link

KyriaAnnwyn commented May 15, 2024

Description

Build engines for SDXL.
Then init pipeline. And do several runs. At the first run I get good picture, but the second run gives all grey image.

I've added controlnet and ip-adapter to original code.
init function in my code:

def init():
      pipe = StableDiffusionPipeline(
            pipeline_type=PIPELINE_TYPE.XL_IP,
            ip_adapter_scale=[0.8],
            ip_adapter_path=[IP_ADAPTER_CKPT],
            custom_controlnet_path=CONTROLNET_CKPT,
        **kwargs_init_pipeline)
    
    pipe.loadEngines(
        ENGINE_DIR,
        PYTORCH_MODEL,
        ONNX_DIR,
        **kwargs_load_engine)
    
    _, shared_device_memory = cudart.cudaMalloc(pipe.calculateMaxDeviceMemory())
    pipe.activateEngines(shared_device_memory)

then i have generate function, which uses loaded engines to generate images:

def generate(out_fpath):
    ...
    controlnet_scale = [0.8]
    controlnet_scale = torch.FloatTensor(controlnet_scale)
    demo_kwargs = {'input_image': input_images, 'controlnet_scales': controlnet_scale, 'image_embeds': img_emb}
    args_run_demo = (out_fpath, [prompt], [negative_prompt], SIZE[1], SIZE[0], 1, 1, 0, False)
    pipe.loadResourcesAllocBuf(SIZE[1], SIZE[0], 1, None)
    pipe.run(*args_run_demo, **demo_kwargs)

In main:

    generate('pic1.png')
    generate('pic2.png')

pic1.png is good, but pic2.png is all grey
image

Seems like I have to reinit something to get correct result, but I don't do that.

In original code there goes pipe.teardown() after generation, but this deletes engines and to make one more call we will need to load them again so it would not be possible to get fast inference.

Help, please, to solve the problem

Environment

TensorRT Version: Tensorrt 10.0.0b6

NVIDIA GPU: A100

NVIDIA Driver Version: 550.54.15

CUDA Version: 11.8

CUDNN Version:

Operating System: ubuntu 20.04

Python Version (if applicable): 3.10

Tensorflow Version (if applicable): -

PyTorch Version (if applicable): 2.2.1

Baremetal or Container (if so, version): -

@zerollzeng
Copy link
Collaborator

Are you referring to our Stable-Diffusion sample?

@zerollzeng zerollzeng self-assigned this May 17, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label May 17, 2024
@KyriaAnnwyn
Copy link
Author

Yes, I'm using stable-diffusion pipeline from samples in demo folder

@KyriaAnnwyn
Copy link
Author

The parameters to init pipeline and build engines were the following:

SIZE = (1280, 1280)

max_batch_size = 1
kwargs_init_pipeline = {
    'version': "xl-1.0",
    'max_batch_size': max_batch_size,
    'denoising_steps': NUM_STEPS,
    'scheduler': None,
    'guidance_scale': 7.0,
    'output_dir': None,
    'hf_token': None,
    'verbose': True,
    'nvtx_profile': False,
    'use_cuda_graph': True,
    'lora_scale': None,
    'lora_path': None,
    'framework_model_dir': PYTORCH_MODEL,
    'torch_inference': False,
}

kwargs_load_engine = {
    'onnx_opset': 19,
    'opt_batch_size': 1,
    'opt_image_height': SIZE[1],
    'opt_image_width': SIZE[0],
    'static_batch': True,
    'static_shape': False,
    'enable_all_tactics': False,
    'enable_refit': False,
    'timing_cache': None,
    'int8': False,
    'quantization_level': 3.0,
}

@KyriaAnnwyn
Copy link
Author

Seems that something is wrong with loadResources.

To be sure that all my enhancements don't add any mess i shifted to simple txt2img:

init like this:

SIZE = (1024, 1024)
def _init() -> None:
    global _base

    _base = StableDiffusionPipeline(
            pipeline_type=PIPELINE_TYPE.XL_BASE,
            vae_scaling_factor=0.13025,
            return_latents=False,
            **kwargs_init_pipeline)
    
    _base.loadEngines(
        ENGINE_DIR,
        PYTORCH_MODEL,
        ONNX_DIR,
        **kwargs_load_engine)
    
    _, shared_device_memory = cudart.cudaMalloc(_base.calculateMaxDeviceMemory())
    _base.activateEngines(shared_device_memory)
    _base.loadResources(SIZE[1], SIZE[0], 1, None)

then generate with function:

SIZE2 = (768, 768)
async def gen_t2p(prompt):
    global _base

    height = SIZE2[1]
    width = SIZE2[0]
    _base.loadResources(height, width, 1, None)

    negative_prompt = GENERAL_NEGATIVE_PROMPT_POSTFIX

    images, time_base = _base.infer([prompt], [negative_prompt], height, width, warmup=False)

Here I use SIZE2 as I want to test that I can change input size (because I set static_shape to False when the engine was built). And I get incorrect images in the second call to gen_t2p.

If I comment out line
_base.loadResources(height, width, 1, None)
and generate images of the original size SIZE i get good images both at first and second call to this function.

I've also tried to make new function to only allocate memory for new sizes, without operations with events and stream, but the result was the same:

def reloadResources(self, image_height, image_width, batch_size, seed):

        # Allocate TensorRT I/O buffers
        if not self.torch_inference:
            for model_name, obj in self.models.items():
                self.engine[model_name].allocate_buffers(shape_dict=obj.get_shape_dict(batch_size, image_height, image_width), device=self.device)

@KyriaAnnwyn
Copy link
Author

I figured out that these lines in the code lead to the error:

tensor = torch.empty(tuple(shape), dtype=numpy_to_torch_dtype_dict[dtype]).to(device=device)
self.tensors[name] = tensor

In the function:

def allocate_buffers(self, shape_dict=None, device='cuda'):
        for binding in range(self.engine.num_io_tensors):
            name = self.engine.get_tensor_name(binding)
            print(f"tensor name = {name}")
            if shape_dict and name in shape_dict:
                shape = shape_dict[name]
            else:
                shape = self.engine.get_tensor_shape(name)
            dtype = trt.nptype(self.engine.get_tensor_dtype(name))
            if self.engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
                self.context.set_input_shape(name, shape)
            tensor = torch.empty(tuple(shape), dtype=numpy_to_torch_dtype_dict[dtype]).to(device=device)
            self.tensors[name] = tensor

When I run it after the first inference I get nans and infs at some places in the output vectors of the engines.

Also maybe the problem is in loading addresses at inference:

    def infer(self, feed_dict, stream, use_cuda_graph=False):

        for name, buf in feed_dict.items():
            self.tensors[name].copy_(buf)

        for name, tensor in self.tensors.items():
            self.context.set_tensor_address(name, tensor.data_ptr())
...

@KyriaAnnwyn
Copy link
Author

@zerollzeng Hello, if you need the code to reproduce, I can share it

@Sundragon1993
Copy link

@KyriaAnnwyn Hello, are you using StableDiffusionPipeline from diffusers library or from TensorRT's implementation? I'm wondering if you could share the modified code to integrate IP Adapter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants