Cannot reproduce the results of speedup #57

649459021 · 2024-03-13T12:16:54Z

I implemented a script to test the acceleration by myself and tested it on 3090GPU, but it could not achieve the acceleration effect of Table 4 in the paper.

import torch, tomesd, random, time
import numpy as np
from diffusers import StableDiffusionPipeline, DDIMScheduler, PNDMScheduler 

def seed_everything(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    random.seed(seed)
    np.random.seed(seed)

seed = 2024
seed_everything(seed)
batch_size = 1
num_inference_steps = 50
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
# pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.scheduler = PNDMScheduler.from_config(pipe.scheduler.config)

prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator().manual_seed(seed)

# warmup gpu for testing time
print('Warm up of the gpu')
for i in range(2):
    image = pipe([prompt] * batch_size, num_inference_steps=num_inference_steps)
#-------------------


# origin pipeline
start_time = time.time()
image = pipe([prompt] * batch_size, num_inference_steps=num_inference_steps, generator=generator).images[0] 
end_time = time.time()
print("Origin Pipeline: {:.3f} seconds".format(end_time-start_time))
image.save("results/orgin.png")

# Apply ToMe with a 50% merging ratio
generator = torch.Generator().manual_seed(seed)
tomesd.apply_patch(pipe, ratio=0.5) # Can also use pipe.unet in place of pipe here

start_time = time.time()
image = pipe([prompt] * batch_size, num_inference_steps=num_inference_steps, generator=generator).images[0] 
end_time = time.time()
print("ToMe: {:.3f} seconds".format(end_time-start_time))
image.save("results/orgin_ToMe.png")

The output from this script is:

Origin Pipeline: 2.728 seconds
ToMe: 2.581 seconds

In Table 4, the speedup is almost doubled when r=50. Why is this the case in my test?
Could you give me some advice?

The text was updated successfully, but these errors were encountered:

dbolya · 2024-03-13T16:13:16Z

You might not be giving your GPU enough work. Try increasing the batch size and image size and see what happens.
The results in the paper were done using the original stable diffusion repo, not diffusers. Diffusers itself has implemented a bunch of different optimizations that eat into the speed-up of ToMe, but you should still see gains at bigger image sizes.

DiosMuerto · 2024-04-23T21:32:32Z

Does this still work in 2024? with the xl or Lightning models?

Ting-Justin-Jiang · 2024-07-12T00:17:55Z

Tested on XL, and accidentally found that merging middle blocks (1024 dimensions) give much better results (although no big speedups)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce the results of speedup #57

Cannot reproduce the results of speedup #57

649459021 commented Mar 13, 2024

dbolya commented Mar 13, 2024

DiosMuerto commented Apr 23, 2024

Ting-Justin-Jiang commented Jul 12, 2024

Cannot reproduce the results of speedup #57

Cannot reproduce the results of speedup #57

Comments

649459021 commented Mar 13, 2024

dbolya commented Mar 13, 2024

DiosMuerto commented Apr 23, 2024

Ting-Justin-Jiang commented Jul 12, 2024