[BUG] DeepSpeedDiffusersTransformerBlock doesn't support int8 forward

**Describe the bug**
A clear and concise description of what the bug is.

After diving within the codebase, I found that `DeepSpeedDiffusersTransformerBlock` supports int8 inference: `https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/replace_module.py#L241`

I replaced those lines by:

```python
    def replace_attn_block(child, policy):
        config = Diffusers2DTransformerConfig(int8_quantization=False)
        return DeepSpeedDiffusersTransformerBlock(child, config)
```

resulting the following error:

```python
nn.functional.linear(out_norm_3, self.ff1_w)
*** RuntimeError: expected scalar type Half but found Char
```

I wondered if this meant to work  as I couldn't find tests related to it?

Here is my benchmarking script.

```python
import os
import diffusers
import torch
import deepspeed
import argparse
from pytorch_lightning import seed_everything

def benchmark_fn(iters: int, warm_up_iters: int, function, *args, **kwargs) -> float:
    """
    Function for benchmarking a pytorch function.

    Parameters
    ----------
    iters: int
        Number of iterations.
    function: lambda function
        function to benchmark.
    args: Any type
        Args to function.
    Returns
    -------
    float
        Runtime per iteration in ms.
    """
    import torch

    results = []

    # Warm up
    for _ in range(warm_up_iters):
        function(*args, **kwargs)

    # Start benchmark.
    torch.cuda.synchronize()
    start_event = torch.cuda.Event(enable_timing=True)
    end_event = torch.cuda.Event(enable_timing=True)
    start_event.record()
    for _ in range(iters):
        results.append(function(*args, **kwargs))
    end_event.record()
    torch.cuda.synchronize()
    # in ms
    return (start_event.elapsed_time(end_event)) / iters, results


hf_auth_key = os.getenv("HF_AUTH_KEY")
if not hf_auth_key:
    raise ValueError("HF_AUTH_KEY is not set")

pipe = diffusers.StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    use_auth_token=hf_auth_key,
    torch_dtype=torch.float16,
    revision="fp16")

pipe = deepspeed.init_inference(pipe.to("cuda"), dtype=torch.float16)

parser = argparse.ArgumentParser()

parser.add_argument(
    "--prompt",
    type=str,
    nargs="?",
    default="astronaut riding a horse, digital art, epic lighting, highly-detailed masterpiece trending HQ",
    help="the prompt to render"
)

parser.add_argument(
    "--init-img",
    type=str,
    nargs="?",
    help="path to the input image"
)
parser.add_argument(
    "--seed",
    type=int,
    default=42,
    help="the seed (for reproducible sampling)",
)
parser.add_argument(
    "--outdir",
    type=str,
    nargs="?",
    help="dir to write results to",
    default="./outputs",
)
opt = parser.parse_args()
os.makedirs(opt.outdir, exist_ok=True)
seed_everything(opt.seed)

t, results = benchmark_fn(10, 5, pipe, prompt=[opt.prompt])
print(t)

grid_count = len(os.listdir(opt.outdir)) - 1

for result in results:
    for image in result.images:
        image.save(os.path.join(opt.outdir, f'grid-hf-{grid_count:04}.png'))
        grid_count += 1
```

**To Reproduce**
Steps to reproduce the behavior:
1. Simple inference script to reproduce
2. What packages are required and their versions
3. How to run the script
4. ...

**Expected behavior**
A clear and concise description of what you expected to happen.

**ds_report output**
Please run `ds_report` to give us details about your setup.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**System info (please complete the following information):**
 - OS: [e.g. Ubuntu 18.04]
 - GPU count and types [e.g. two machines with x8 A100s each]
 - (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) version are you using
 - (if applicable) Hugging Face Transformers/Accelerate/etc. versions
 - Python version
 - Any other relevant info about your setup

**Docker context**
Are you using a specific docker image that you can share?

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] DeepSpeedDiffusersTransformerBlock doesn't support int8 forward #2681

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] DeepSpeedDiffusersTransformerBlock doesn't support int8 forward #2681

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions