-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Description
Describe the bug
A clear and concise description of what the bug is.
After diving within the codebase, I found that DeepSpeedDiffusersTransformerBlock supports int8 inference: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/replace_module.py#L241
I replaced those lines by:
def replace_attn_block(child, policy):
config = Diffusers2DTransformerConfig(int8_quantization=False)
return DeepSpeedDiffusersTransformerBlock(child, config)resulting the following error:
nn.functional.linear(out_norm_3, self.ff1_w)
*** RuntimeError: expected scalar type Half but found CharI wondered if this meant to work as I couldn't find tests related to it?
Here is my benchmarking script.
import os
import diffusers
import torch
import deepspeed
import argparse
from pytorch_lightning import seed_everything
def benchmark_fn(iters: int, warm_up_iters: int, function, *args, **kwargs) -> float:
"""
Function for benchmarking a pytorch function.
Parameters
----------
iters: int
Number of iterations.
function: lambda function
function to benchmark.
args: Any type
Args to function.
Returns
-------
float
Runtime per iteration in ms.
"""
import torch
results = []
# Warm up
for _ in range(warm_up_iters):
function(*args, **kwargs)
# Start benchmark.
torch.cuda.synchronize()
start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)
start_event.record()
for _ in range(iters):
results.append(function(*args, **kwargs))
end_event.record()
torch.cuda.synchronize()
# in ms
return (start_event.elapsed_time(end_event)) / iters, results
hf_auth_key = os.getenv("HF_AUTH_KEY")
if not hf_auth_key:
raise ValueError("HF_AUTH_KEY is not set")
pipe = diffusers.StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
use_auth_token=hf_auth_key,
torch_dtype=torch.float16,
revision="fp16")
pipe = deepspeed.init_inference(pipe.to("cuda"), dtype=torch.float16)
parser = argparse.ArgumentParser()
parser.add_argument(
"--prompt",
type=str,
nargs="?",
default="astronaut riding a horse, digital art, epic lighting, highly-detailed masterpiece trending HQ",
help="the prompt to render"
)
parser.add_argument(
"--init-img",
type=str,
nargs="?",
help="path to the input image"
)
parser.add_argument(
"--seed",
type=int,
default=42,
help="the seed (for reproducible sampling)",
)
parser.add_argument(
"--outdir",
type=str,
nargs="?",
help="dir to write results to",
default="./outputs",
)
opt = parser.parse_args()
os.makedirs(opt.outdir, exist_ok=True)
seed_everything(opt.seed)
t, results = benchmark_fn(10, 5, pipe, prompt=[opt.prompt])
print(t)
grid_count = len(os.listdir(opt.outdir)) - 1
for result in results:
for image in result.images:
image.save(os.path.join(opt.outdir, f'grid-hf-{grid_count:04}.png'))
grid_count += 1To Reproduce
Steps to reproduce the behavior:
- Simple inference script to reproduce
- What packages are required and their versions
- How to run the script
- ...
Expected behavior
A clear and concise description of what you expected to happen.
ds_report output
Please run ds_report to give us details about your setup.
Screenshots
If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x8 A100s each]
- (if applicable) what DeepSpeed-MII version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
- Python version
- Any other relevant info about your setup
Docker context
Are you using a specific docker image that you can share?
Additional context
Add any other context about the problem here.