-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No MPS support right? #34
Comments
Yes, that is correct. MPS is not supported. However, if we can further speed up the process using MPS, we will try it. If you know anything about it, We would appreciate your advice. |
In case someone wondering for a start or need a project tryout on their Mac machine. To run image to image or text to image from the readme example without acceleration: pipe.enable_xformers_memory_efficient_attention() # <-- NADA, remove/comment this and pipe the model to "mps": pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
device=torch.device("mps"),
dtype=torch.float16,
) I'm not sure about xformers, I'm not an expert, but check the issue as it might be not needed. Had to modify the Somewhere in @torch.no_grad()
# condition hack event sync/track for non-cuda devices, RIP profiling etc
def __call__(
self, x: Union[torch.Tensor, PIL.Image.Image, np.ndarray] = None
) -> torch.Tensor:
if self.device == "cuda":
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
if x is not None:
x = self.image_processor.preprocess(x, self.height, self.width).to(
device=self.device, dtype=self.dtype
)
if self.similar_image_filter:
x = self.similar_filter(x)
if x is None:
time.sleep(self.inference_time_ema)
return self.prev_image_result
x_t_latent = self.encode_image(x)
else:
# TODO: check the dimension of x_t_latent
x_t_latent = torch.randn((1, 4, self.latent_height, self.latent_width)).to(
device=self.device, dtype=self.dtype
)
x_0_pred_out = self.predict_x0_batch(x_t_latent)
x_output = self.decode_image(x_0_pred_out).detach().clone()
self.prev_image_result = x_output
if self.device == "cuda":
end.record()
torch.cuda.synchronize()
inference_time = start.elapsed_time(end) / 1000
self.inference_time_ema = 0.9 * self.inference_time_ema + 0.1 * inference_time
return x_output |
@leezenn Thanks for the suggestion. How did you install streamdiffusion library? I guess in installation Step 3, we need to remove |
@ifsheldon I've installed it via Here are all steps I performed at the project root (you can copy and execute this shell script) (outdated read till the very end first): python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install wheel
pip install xformers
pip install accelerate
pip install .
deactivate
I'm not sure though. Do you? As for the demo server I have changed it to # ...
device: torch.device = torch.device("mps")
# ...
acceleration: Literal["none", "xformers", "tensorrt"] = "sfast"
# ... and run: source venv/bin/activate
cd demo/realtime-txt2img/
pip install -r requirements.txt
cd view && npm install && npm run build && cd ..
cd server && python main.py
deactivate
cd ../../../ Note:
Nevermind, just
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
# pip install wheel
# pip install xformers
# pip install accelerate
pip install .
deactivate
For the Torch I've used the Apple guide. |
@leezenn Thanks a lot! I've successfully run it. But I wonder if you can run it with
|
@ifsheldon Sorry for the delay.
I know right? :) I'm glad you're confused too as I am. I haven't dig into this much, but yeah, repo is missing the compiler part. Quick search oh Github gives me this from this repo. The project seem to be promising tho. ❤️ |
@leezenn @ifsheldon I'm not sure If I set everything up in the correct way, but I at least got it working after following your conversation. I was wondering what kind of speed you are getting from this? Im on a M3-Pro 36GB - expecting real-time generation will just stay a far away dream I guess? |
Most likely due to so called warmup runs.
Similar effect on my M1Pro. Don't think your dream is far away thought. I saw some projects that was redesigned for the Apple Silicon machines series, somewhere along ggerganov et al with their special tools. Something like this project... So, you can try and adapt current project using it or wait till someone will do that. |
@leezenn did you PR the hack? Tbh works perfectly on my M1 Pro, no performance decrease or aforementioned issues of lag/delay. |
I've created this gist to help guide the setup and running the demos. |
I did not. Maybe someone else did. |
Just to be clear, this repo is for CUDA enabled devices only, correct? On initially testing, mps doesn't seem to work.
The text was updated successfully, but these errors were encountered: