No MPS support right? #34

rmasiso · 2023-12-23T16:35:44Z

Just to be clear, this repo is for CUDA enabled devices only, correct? On initially testing, mps doesn't seem to work.

teftef6220 · 2023-12-23T16:44:52Z

Yes, that is correct.

MPS is not supported. However, if we can further speed up the process using MPS, we will try it.

If you know anything about it, We would appreciate your advice.

leezenn · 2023-12-26T01:34:55Z

In case someone wondering for a start or need a project tryout on their Mac machine.

To run image to image or text to image from the readme example without acceleration:

pipe.enable_xformers_memory_efficient_attention()  # <-- NADA, remove/comment this

and pipe the model to "mps":

pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("mps"),
    dtype=torch.float16,
)

I'm not sure about xformers, I'm not an expert, but check the issue as it might be not needed.

Had to modify the class StreamDiffusion __call__ method in a pipeline to conditionally run cuda events wrapping...

Somewhere in .../StreamDiffusion/venv/lib/python3.xx/site-packages/streamdiffusion/pipeline.py if installed into venv via pip install . from the repo root...

   @torch.no_grad()
   # condition hack event sync/track for non-cuda devices, RIP profiling etc
   def __call__(
       self, x: Union[torch.Tensor, PIL.Image.Image, np.ndarray] = None
   ) -> torch.Tensor:
       if self.device == "cuda":
           start = torch.cuda.Event(enable_timing=True)
           end = torch.cuda.Event(enable_timing=True)
           start.record()
       if x is not None:
           x = self.image_processor.preprocess(x, self.height, self.width).to(
               device=self.device, dtype=self.dtype
           )
           if self.similar_image_filter:
               x = self.similar_filter(x)
               if x is None:
                   time.sleep(self.inference_time_ema)
                   return self.prev_image_result
           x_t_latent = self.encode_image(x)
       else:
           # TODO: check the dimension of x_t_latent
           x_t_latent = torch.randn((1, 4, self.latent_height, self.latent_width)).to(
               device=self.device, dtype=self.dtype
           )
       x_0_pred_out = self.predict_x0_batch(x_t_latent)
       x_output = self.decode_image(x_0_pred_out).detach().clone()

       self.prev_image_result = x_output
       if self.device == "cuda":
           end.record()
           torch.cuda.synchronize()
           inference_time = start.elapsed_time(end) / 1000
           self.inference_time_ema = 0.9 * self.inference_time_ema + 0.1 * inference_time
       return x_output

ifsheldon · 2023-12-29T10:41:30Z

@leezenn Thanks for the suggestion. How did you install streamdiffusion library? I guess in installation Step 3, we need to remove [tensorrt] right? Do we need to do extra steps?

leezenn · 2023-12-29T15:30:38Z

@ifsheldon I've installed it via pip install .
git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt] <- didn't work AFAIR.

Here are all steps I performed at the project root (you can copy and execute this shell script) (outdated read till the very end first):

python -m venv venv
source venv/bin/activate

pip install --upgrade pip

pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install wheel
pip install xformers
pip install accelerate

pip install .

deactivate

Step 3, we need to remove [tensorrt] right?

I'm not sure though. Do you?
(UPD) Yeah, as it doesn't work. No Nvidia, RIP.

As for the demo server I have changed it to sfast in the config at .../StreamDiffusion/demo/realtime-txt2img/server/config.py:

    # ...
    device: torch.device = torch.device("mps")
    # ...
    acceleration: Literal["none", "xformers", "tensorrt"] = "sfast"
    # ...

and run:

source venv/bin/activate
cd demo/realtime-txt2img/

pip install -r requirements.txt 
cd view && npm install && npm run build && cd ..
cd server && python main.py

deactivate

cd ../../../

Note:

I had to install xformers with preinstalled wheel (installation fails without it) - check the installation steps above - in order for it to work.
OR it was the accelerate I don't remember at this point. 😮‍💨

Nevermind, just

(Re)installed everything without xformers and (optional) accelerate:

python -m venv venv
source venv/bin/activate

pip install --upgrade pip

pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
# pip install wheel
# pip install xformers
# pip install accelerate

pip install .

deactivate

Then made a conditional switch in my comment above.
That's it.

For the Torch I've used the Apple guide.

ifsheldon · 2024-01-02T11:51:58Z

@leezenn Thanks a lot! I've successfully run it. But I wonder if you can run it with sfast? I don't know what it is. I cannot find it anywhere, in code or on Pypi.

from sfast.compilers.stable_diffusion_pipeline_compiler import CompilationConfig, compile this seems to import something from nowhere.

leezenn · 2024-01-09T03:25:38Z

@ifsheldon Sorry for the delay.

from sfast.compilers.stable_diffusion_pipeline_compiler import CompilationConfig, compile this seems to import something from nowhere.

I know right? :)

I'm glad you're confused too as I am.

I haven't dig into this much, but yeah, repo is missing the compiler part. Quick search oh Github gives me this from this repo.
I didn't spend time investigating what it does. I just used the tip from the docstring, which contains it, to try it out. And it just silently runs (it may log on another level, I don't know), then I saw inconsistency with docstrings. So... this is not well coocked (yet?). I just left it be. I wasn't particulary patient with it, I'm sorry.

The project seem to be promising tho. ❤️

odonald · 2024-01-16T17:17:39Z

@leezenn @ifsheldon I'm not sure If I set everything up in the correct way, but I at least got it working after following your conversation.

I was wondering what kind of speed you are getting from this?
Running the txt2img demo, for me takes around 5-10 seconds till it starts producing images, then it shoots out images every 1-2 seconds and then again takes approx 10 seconds after new input.

Im on a M3-Pro 36GB - expecting real-time generation will just stay a far away dream I guess?

leezenn · 2024-01-16T21:46:23Z

@odonald

for me takes around 5-10 seconds till it starts producing images

Most likely due to so called warmup runs.

then it shoots out images every 1-2 seconds and then again takes approx 10 seconds after new input

Similar effect on my M1Pro.
As far as I remember, it was running on GPU, but had some problems with uRAM, probably even memory leak. So, it started to hit SSD swap and crawl instead of running... I haven't investigate any further/closer after a couple of runs - don't take my words seriously, it's a surface look assumptions.

Don't think your dream is far away thought. I saw some projects that was redesigned for the Apple Silicon machines series, somewhere along ggerganov et al with their special tools. Something like this project...

So, you can try and adapt current project using it or wait till someone will do that.
If maintainers will constantly care to improve this project, I believe that somebody eventually come to make a proper MPS support unless there is a better alternative.

ethrx · 2024-04-18T17:42:56Z

@leezenn did you PR the hack? Tbh works perfectly on my M1 Pro, no performance decrease or aforementioned issues of lag/delay.

fbarretto · 2024-04-24T21:09:23Z

I've created this gist to help guide the setup and running the demos.

leezenn · 2024-04-25T11:12:50Z

@leezenn did you PR the hack? Tbh works perfectly on my M1 Pro, no performance decrease or aforementioned issues of lag/delay.

I did not. Maybe someone else did.

odonald mentioned this issue Jan 16, 2024

No matches found: streamdiffusion[tensorrt] #106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No MPS support right? #34

No MPS support right? #34

rmasiso commented Dec 23, 2023

teftef6220 commented Dec 23, 2023

leezenn commented Dec 26, 2023 •

edited

ifsheldon commented Dec 29, 2023

leezenn commented Dec 29, 2023 •

edited

ifsheldon commented Jan 2, 2024

leezenn commented Jan 9, 2024

odonald commented Jan 16, 2024

leezenn commented Jan 16, 2024 •

edited

ethrx commented Apr 18, 2024

fbarretto commented Apr 24, 2024 •

edited

leezenn commented Apr 25, 2024

No MPS support right? #34

No MPS support right? #34

Comments

rmasiso commented Dec 23, 2023

teftef6220 commented Dec 23, 2023

leezenn commented Dec 26, 2023 • edited

ifsheldon commented Dec 29, 2023

leezenn commented Dec 29, 2023 • edited

Note:

ifsheldon commented Jan 2, 2024

leezenn commented Jan 9, 2024

odonald commented Jan 16, 2024

leezenn commented Jan 16, 2024 • edited

ethrx commented Apr 18, 2024

fbarretto commented Apr 24, 2024 • edited

leezenn commented Apr 25, 2024

leezenn commented Dec 26, 2023 •

edited

leezenn commented Dec 29, 2023 •

edited

leezenn commented Jan 16, 2024 •

edited

fbarretto commented Apr 24, 2024 •

edited