Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No MPS support right? #34

Open
rmasiso opened this issue Dec 23, 2023 · 11 comments
Open

No MPS support right? #34

rmasiso opened this issue Dec 23, 2023 · 11 comments

Comments

@rmasiso
Copy link

rmasiso commented Dec 23, 2023

Just to be clear, this repo is for CUDA enabled devices only, correct? On initially testing, mps doesn't seem to work.

@teftef6220
Copy link
Collaborator

Yes, that is correct.

MPS is not supported. However, if we can further speed up the process using MPS, we will try it.

If you know anything about it, We would appreciate your advice.

@leezenn
Copy link

leezenn commented Dec 26, 2023

In case someone wondering for a start or need a project tryout on their Mac machine.

To run image to image or text to image from the readme example without acceleration:

pipe.enable_xformers_memory_efficient_attention()  # <-- NADA, remove/comment this

and pipe the model to "mps":

pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("mps"),
    dtype=torch.float16,
)

I'm not sure about xformers, I'm not an expert, but check the issue as it might be not needed.

Had to modify the class StreamDiffusion __call__ method in a pipeline to conditionally run cuda events wrapping...

Somewhere in .../StreamDiffusion/venv/lib/python3.xx/site-packages/streamdiffusion/pipeline.py if installed into venv via pip install . from the repo root...

   @torch.no_grad()
   # condition hack event sync/track for non-cuda devices, RIP profiling etc
   def __call__(
       self, x: Union[torch.Tensor, PIL.Image.Image, np.ndarray] = None
   ) -> torch.Tensor:
       if self.device == "cuda":
           start = torch.cuda.Event(enable_timing=True)
           end = torch.cuda.Event(enable_timing=True)
           start.record()
       if x is not None:
           x = self.image_processor.preprocess(x, self.height, self.width).to(
               device=self.device, dtype=self.dtype
           )
           if self.similar_image_filter:
               x = self.similar_filter(x)
               if x is None:
                   time.sleep(self.inference_time_ema)
                   return self.prev_image_result
           x_t_latent = self.encode_image(x)
       else:
           # TODO: check the dimension of x_t_latent
           x_t_latent = torch.randn((1, 4, self.latent_height, self.latent_width)).to(
               device=self.device, dtype=self.dtype
           )
       x_0_pred_out = self.predict_x0_batch(x_t_latent)
       x_output = self.decode_image(x_0_pred_out).detach().clone()

       self.prev_image_result = x_output
       if self.device == "cuda":
           end.record()
           torch.cuda.synchronize()
           inference_time = start.elapsed_time(end) / 1000
           self.inference_time_ema = 0.9 * self.inference_time_ema + 0.1 * inference_time
       return x_output

@ifsheldon
Copy link

@leezenn Thanks for the suggestion. How did you install streamdiffusion library? I guess in installation Step 3, we need to remove [tensorrt] right? Do we need to do extra steps?

@leezenn
Copy link

leezenn commented Dec 29, 2023

@ifsheldon I've installed it via pip install .
git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt] <- didn't work AFAIR.

Here are all steps I performed at the project root (you can copy and execute this shell script) (outdated read till the very end first):

python -m venv venv
source venv/bin/activate

pip install --upgrade pip

pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install wheel
pip install xformers
pip install accelerate

pip install .

deactivate

Step 3, we need to remove [tensorrt] right?

I'm not sure though. Do you?
(UPD) Yeah, as it doesn't work. No Nvidia, RIP.

As for the demo server I have changed it to sfast in the config at .../StreamDiffusion/demo/realtime-txt2img/server/config.py:

    # ...
    device: torch.device = torch.device("mps")
    # ...
    acceleration: Literal["none", "xformers", "tensorrt"] = "sfast"
    # ...

and run:

source venv/bin/activate
cd demo/realtime-txt2img/

pip install -r requirements.txt 
cd view && npm install && npm run build && cd ..
cd server && python main.py

deactivate

cd ../../../

Note:

I had to install xformers with preinstalled wheel (installation fails without it) - check the installation steps above - in order for it to work.
OR it was the accelerate I don't remember at this point. 😮‍💨


Nevermind, just

  • (Re)installed everything without xformers and (optional) accelerate:
python -m venv venv
source venv/bin/activate

pip install --upgrade pip

pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
# pip install wheel
# pip install xformers
# pip install accelerate

pip install .

deactivate

For the Torch I've used the Apple guide.

@ifsheldon
Copy link

@leezenn Thanks a lot! I've successfully run it. But I wonder if you can run it with sfast? I don't know what it is. I cannot find it anywhere, in code or on Pypi.

from sfast.compilers.stable_diffusion_pipeline_compiler import CompilationConfig, compile this seems to import something from nowhere.

@leezenn
Copy link

leezenn commented Jan 9, 2024

@ifsheldon Sorry for the delay.

from sfast.compilers.stable_diffusion_pipeline_compiler import CompilationConfig, compile this seems to import something from nowhere.

I know right? :)

I'm glad you're confused too as I am.

I haven't dig into this much, but yeah, repo is missing the compiler part. Quick search oh Github gives me this from this repo.
I didn't spend time investigating what it does. I just used the tip from the docstring, which contains it, to try it out. And it just silently runs (it may log on another level, I don't know), then I saw inconsistency with docstrings. So... this is not well coocked (yet?). I just left it be. I wasn't particulary patient with it, I'm sorry.

The project seem to be promising tho. ❤️

@odonald
Copy link

odonald commented Jan 16, 2024

@leezenn @ifsheldon I'm not sure If I set everything up in the correct way, but I at least got it working after following your conversation.

I was wondering what kind of speed you are getting from this?
Running the txt2img demo, for me takes around 5-10 seconds till it starts producing images, then it shoots out images every 1-2 seconds and then again takes approx 10 seconds after new input.

Im on a M3-Pro 36GB - expecting real-time generation will just stay a far away dream I guess?

@leezenn
Copy link

leezenn commented Jan 16, 2024

@odonald

for me takes around 5-10 seconds till it starts producing images

Most likely due to so called warmup runs.

then it shoots out images every 1-2 seconds and then again takes approx 10 seconds after new input

Similar effect on my M1Pro.
As far as I remember, it was running on GPU, but had some problems with uRAM, probably even memory leak. So, it started to hit SSD swap and crawl instead of running... I haven't investigate any further/closer after a couple of runs - don't take my words seriously, it's a surface look assumptions.

Don't think your dream is far away thought. I saw some projects that was redesigned for the Apple Silicon machines series, somewhere along ggerganov et al with their special tools. Something like this project...

So, you can try and adapt current project using it or wait till someone will do that.
If maintainers will constantly care to improve this project, I believe that somebody eventually come to make a proper MPS support unless there is a better alternative.

@ethrx
Copy link

ethrx commented Apr 18, 2024

@leezenn did you PR the hack? Tbh works perfectly on my M1 Pro, no performance decrease or aforementioned issues of lag/delay.

@fbarretto
Copy link

fbarretto commented Apr 24, 2024

I've created this gist to help guide the setup and running the demos.

@leezenn
Copy link

leezenn commented Apr 25, 2024

@leezenn did you PR the hack? Tbh works perfectly on my M1 Pro, no performance decrease or aforementioned issues of lag/delay.

I did not. Maybe someone else did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants