Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to run on CPU and MPS #36

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

WojtekKowaluk
Copy link

Some changes required to run it on CPU and other devices.

@@ -30,6 +30,8 @@ def __init__(
):
self.config = config
self.device = device
if not torch.has_cuda:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this does not work upon attempting to run on CPU when having CUDA-enabled version of PyTorch installed (an example use-case: I have a CUDA device, but I want to generate a higher-resolution image, and I don't have enough VRAM to do that on the GPU). Maybe we should check device directly (at least this workaround works for me)?

Suggested change
if not torch.has_cuda:
if device == "cpu":

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but maybe device!="cuda" then, because same is needed for "mps" device (Apple Silicon), I don't know about others.

@clarklight
Copy link

clarklight commented Apr 17, 2023

I tried the above but i am having this error: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: CUDA

@trolley813
Copy link

I tried the above but i am having this error: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: CUDA

As far as I understand (from the error message), you wrote CUDA in uppercase in your code, while PyTorch expect lowercase naming.

@clarklight
Copy link

I tried the above but i am having this error: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: CUDA

As far as I understand (from the error message), you wrote CUDA in uppercase in your code, while PyTorch expect lowercase naming.

Thank you, i managed to get it to work, yer, was trying to push it onto run on CPU, and in the end, the time it takes on the M1 Mac is too crazy, around 40 mins to process. I read on the other thread that it error out due to low gRam on 1070Ti, i ran it on my other window's laptop it also error out due to low gRam. Just dropping down the notes for anyone else that read this thread.

@CoruNethron
Copy link

@WojtekKowaluk , thank you for this fix

@clarklight , I've tested on M1 SoC 16GB as well and it achieves 8-10 seconds per iteration in my case, but you can try to use an mps device to enable GPU acceleration on that SoC. I've got improve up to 3 seconds per iteration - three times faster with mps

@clarklight
Copy link

@CoruNethron

To run this on the Mac, i have to use CPU right, because there is no Cuda on the GPU? I just tested it again running on the CPU, i just tried it again still 120 second per it.
Here is the test code, i changed it to run with the CPU. Am i doing anything incorrectly?

from kandinsky2 import get_kandinsky2
model = get_kandinsky2('cpu', task_type='text2img', cache_dir='/tmp/kandinsky2', model_version='2.1', use_flash_attention=False)
images = model.generate_text2img(
"red cat, 4k photo",
num_steps=25,
batch_size=1,
guidance_scale=4,
h=768, w=768,
sampler='p_sampler',
prior_cf_scale=4,
prior_steps="5"
)

@CoruNethron
Copy link

@clarklight there is no CUDA support in GPU, that's correct. But there is support for another acceleration on the GPU, that's mps, and it can utilize Mac silicon GPU with torch. So, just change cpu to mps as you previously changed cuda to cpu and it should do the trick. I've got about 3 times faster rendering. Also FYI, it takes about 1.25 seconds per iteration on my machine, when resolution is set to 512 by 512. Even faster, that stable diffusion.

@clarklight
Copy link

clarklight commented Apr 22, 2023

@CoruNethron Sweet, thank you! I got it to work! Yes its around 1.3second/it, but the outputted image are not images haha i will try to figure out why.

@CoruNethron
Copy link

CoruNethron commented Apr 22, 2023

@clarklight I took some ideas about image export with unique file name here:
https://gist.github.com/FurkanGozukara/10bdc0435b708b26bd87a59b6c3d1bc7

@clarklight
Copy link

clarklight commented Apr 22, 2023

@CoruNethron Most of my images are broken for some reason.....but if i run it on the web version, it runs fine...
boat

@trolley813 trolley813 mentioned this pull request May 4, 2023
@maxnowack
Copy link

I'm getting the following error, if I try to use img2img with mps:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/gradio/routes.py", line 412, in run_predict
    output = await app.get_blocks().process_api(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/gradio/blocks.py", line 1021, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/maxnowack/code/kubin/src/ui_blocks/i2i.py", line 65, in generate
    return generate_fn(params)
  File "/Users/maxnowack/code/kubin/src/webui.py", line 28, in <lambda>
    i2i_ui(generate_fn=lambda params: kubin.model.i2i(params), shared=ui_shared, tabs=ui_tabs)
  File "/Users/maxnowack/code/kubin/src/models/model_kd2.py", line 125, in i2i
    current_batch = self.kandinsky.generate_img2img(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py", line 466, in generate_img2img
    image = q_sample(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/kandinsky2/utils.py", line 52, in q_sample
    _extract_into_tensor(sqrt_alphas_cumprod, t, x_start.shape) * x_start
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/kandinsky2/model/utils.py", line 18, in _extract_into_tensor
    res = torch.from_numpy(arr).to(device=timesteps.device)[timesteps].float()
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

@WojtekKowaluk
Copy link
Author

I have fixed that one, but still getting other errors with img2img

Traceback (most recent call last):
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/routes.py", line 412, in run_predict
    output = await app.get_blocks().process_api(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1021, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/wojtek/Documents/kubin/src/ui_blocks/i2i.py", line 65, in generate
    return generate_fn(params)
  File "/Users/wojtek/Documents/kubin/src/webui.py", line 28, in <lambda>
    i2i_ui(generate_fn=lambda params: kubin.model.i2i(params), shared=ui_shared, tabs=ui_tabs)
  File "/Users/wojtek/Documents/kubin/src/models/model_kd2.py", line 127, in i2i
    current_batch = self.kandinsky.generate_img2img(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py", line 474, in generate_img2img
    return self.generate_img(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py", line 277, in generate_img
    samples, _ = sampler.sample(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/model/samplers.py", line 178, in sample
    self.make_schedule(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/model/samplers.py", line 104, in make_schedule
    "betas", to_torch(torch.from_numpy(self.old_diffusion.betas))
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/model/samplers.py", line 101, in <lambda>
    to_torch = lambda x: x.clone().detach().to(torch.float32).to("cuda")
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enable

after I change sampler to p_sampler I get another one:

Traceback (most recent call last):
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/routes.py", line 412, in run_predict
    output = await app.get_blocks().process_api(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1021, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/wojtek/Documents/kubin/src/ui_blocks/i2i.py", line 65, in generate
    return generate_fn(params)
  File "/Users/wojtek/Documents/kubin/src/webui.py", line 28, in <lambda>
    i2i_ui(generate_fn=lambda params: kubin.model.i2i(params), shared=ui_shared, tabs=ui_tabs)
  File "/Users/wojtek/Documents/kubin/src/models/model_kd2.py", line 141, in i2i
    saved_batch = save_output(self.output_dir, 'img2img', current_batch, params)
  File "/Users/wojtek/Documents/kubin/src/utils/file_system.py", line 38, in save_output
    params_as_json = None if params is None else json.dumps(params, skipkeys=True)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Image is not JSON serializable

@maxnowack
Copy link

There are still some hardcoded references to cuda in the samplers. I think a solution might be to pass the configured device to the samplers and use that instead of cuda. I'm quite inexperienced with pytorch, so I'm not sure what the implications of this might be.

@WojtekKowaluk
Copy link
Author

WojtekKowaluk commented May 20, 2023

I have fixed samplers, for JSON error I have fixed it here: seruva19/kubin#80

@WojtekKowaluk
Copy link
Author

@CoruNethron Most of my images are broken for some reason.....but if i run it on the web version, it runs fine... boat

is this plms_sampler? I think that one is broken. ddim_sampler and p_sampler should work fine :)

@ahmad88me
Copy link

For mac, MPS can be used. I've also created a pull request to handle mps (69759df)

    if torch.cuda.is_available():
        device = "cuda"
    elif torch.backends.mps.is_available():
        device = "mps"
    else:
        device = "cpu"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants