Skip to content

[BUG] DeepSpeedDiffusersTransformerBlock doesn't support int8 forward #2681

@tchaton

Description

@tchaton

Describe the bug
A clear and concise description of what the bug is.

After diving within the codebase, I found that DeepSpeedDiffusersTransformerBlock supports int8 inference: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/replace_module.py#L241

I replaced those lines by:

    def replace_attn_block(child, policy):
        config = Diffusers2DTransformerConfig(int8_quantization=False)
        return DeepSpeedDiffusersTransformerBlock(child, config)

resulting the following error:

nn.functional.linear(out_norm_3, self.ff1_w)
*** RuntimeError: expected scalar type Half but found Char

I wondered if this meant to work as I couldn't find tests related to it?

Here is my benchmarking script.

import os
import diffusers
import torch
import deepspeed
import argparse
from pytorch_lightning import seed_everything

def benchmark_fn(iters: int, warm_up_iters: int, function, *args, **kwargs) -> float:
    """
    Function for benchmarking a pytorch function.

    Parameters
    ----------
    iters: int
        Number of iterations.
    function: lambda function
        function to benchmark.
    args: Any type
        Args to function.
    Returns
    -------
    float
        Runtime per iteration in ms.
    """
    import torch

    results = []

    # Warm up
    for _ in range(warm_up_iters):
        function(*args, **kwargs)

    # Start benchmark.
    torch.cuda.synchronize()
    start_event = torch.cuda.Event(enable_timing=True)
    end_event = torch.cuda.Event(enable_timing=True)
    start_event.record()
    for _ in range(iters):
        results.append(function(*args, **kwargs))
    end_event.record()
    torch.cuda.synchronize()
    # in ms
    return (start_event.elapsed_time(end_event)) / iters, results


hf_auth_key = os.getenv("HF_AUTH_KEY")
if not hf_auth_key:
    raise ValueError("HF_AUTH_KEY is not set")

pipe = diffusers.StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    use_auth_token=hf_auth_key,
    torch_dtype=torch.float16,
    revision="fp16")

pipe = deepspeed.init_inference(pipe.to("cuda"), dtype=torch.float16)

parser = argparse.ArgumentParser()

parser.add_argument(
    "--prompt",
    type=str,
    nargs="?",
    default="astronaut riding a horse, digital art, epic lighting, highly-detailed masterpiece trending HQ",
    help="the prompt to render"
)

parser.add_argument(
    "--init-img",
    type=str,
    nargs="?",
    help="path to the input image"
)
parser.add_argument(
    "--seed",
    type=int,
    default=42,
    help="the seed (for reproducible sampling)",
)
parser.add_argument(
    "--outdir",
    type=str,
    nargs="?",
    help="dir to write results to",
    default="./outputs",
)
opt = parser.parse_args()
os.makedirs(opt.outdir, exist_ok=True)
seed_everything(opt.seed)

t, results = benchmark_fn(10, 5, pipe, prompt=[opt.prompt])
print(t)

grid_count = len(os.listdir(opt.outdir)) - 1

for result in results:
    for image in result.images:
        image.save(os.path.join(opt.outdir, f'grid-hf-{grid_count:04}.png'))
        grid_count += 1

To Reproduce
Steps to reproduce the behavior:

  1. Simple inference script to reproduce
  2. What packages are required and their versions
  3. How to run the script
  4. ...

Expected behavior
A clear and concise description of what you expected to happen.

ds_report output
Please run ds_report to give us details about your setup.

Screenshots
If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: [e.g. Ubuntu 18.04]
  • GPU count and types [e.g. two machines with x8 A100s each]
  • (if applicable) what DeepSpeed-MII version are you using
  • (if applicable) Hugging Face Transformers/Accelerate/etc. versions
  • Python version
  • Any other relevant info about your setup

Docker context
Are you using a specific docker image that you can share?

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions