Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirectML Execution #507

Merged
merged 17 commits into from
Jan 30, 2023
Merged

DirectML Execution #507

merged 17 commits into from
Jan 30, 2023

Conversation

NullSenseStudio
Copy link
Collaborator

@NullSenseStudio NullSenseStudio commented Jan 7, 2023

Allows using non-NVIDIA GPU's on Windows. Currently only implemented for prompt_to_image. Doesn't support half precision so most users will likely need to make use of attention slicing and/or sequential CPU offload.

This is made as an alternative to ONNX, which has less models available, doesn't support memory optimizations like attention slicing or sequential CPU offload, and runs 3-4 times slower (on my machine at least).

I have encountered one odd bug where generating would start to only produce full white images, which requires releasing the generator to make it go away for a time.

@NullSenseStudio NullSenseStudio added the enhancement New feature or request label Jan 7, 2023
@NullSenseStudio
Copy link
Collaborator Author

Now working for other actions. Depth_to_image doesn't follow the depth map at all, but can still generate rather abstract images. Seamless axes are broken and will cause the final image to be mostly a single color.

@carson-katri
Copy link
Owner

carson-katri commented Jan 8, 2023

I added a workflow_dispatch trigger so you can manually create builds for AMD users to test.

https://github.com/carson-katri/dream-textures/actions/runs/3867749838

Co-authored-by: Carson Katri <Carson.katri@gmail.com>
@carson-katri
Copy link
Owner

@NullSenseStudio
Copy link
Collaborator Author

I'm going to attempt to make half precision be mostly possible. There's not as much compatibility as there is with float32 but there might be some meaningful memory saving if I cast between float16 and float32 where necessary. After I'd like to see if there's anyone in the discord wanting to test it.

@faxcorp
Copy link

faxcorp commented Jan 10, 2023

Hello, can you please give some sort of info on how to test this on my AMD 5700xt? Do I need to build this branch or something? Thank you very much

@NullSenseStudio
Copy link
Collaborator Author

You can open the link that Carson provided then scroll down into Artifacts and download dream_textures-windows-directml. I believe the zip file you get contains another zip file so you'll have to extract that then install the extracted zip into Blender.

@NullSenseStudio
Copy link
Collaborator Author

Half precision was a great success! It doesn't give the same image with the same seed and prompt as full precision like CUDA can with half precision enabled or not, but I'm not going to worry about that.

I also believe I've fixed the white image bug, as long as there isn't anything else that causes it.

New run: https://github.com/carson-katri/dream-textures/actions/runs/3888885287

@NullSenseStudio
Copy link
Collaborator Author

New run with fixed model download: https://github.com/carson-katri/dream-textures/actions/runs/3894477733

inpainting, upscaling, depth with color
@carson-katri carson-katri added this to the v0.0.10 milestone Jan 15, 2023
@NullSenseStudio NullSenseStudio marked this pull request as ready for review January 22, 2023 20:42
@NullSenseStudio
Copy link
Collaborator Author

Very pleased with the new 0.1.13.1.dev230119 version. Eliminates most of the patches and gives further performance improvements. Used to need around 36 seconds in the denoising loop (25 steps) and now that's down around 22 seconds. Near 39% time savings on a GTX 1070, hopefully there'll be similar gains on AMD cards.

I've modified model handling so that the frontend model id only uses the model's name rather than the full path, this should prevent issues caused from having special characters in account names. I haven't been able to replicate the bug so I'm not entirely sure it'll work.

I've also modified model revision selection to have better preference towards the main and fp16 revisions when the preferred one isn't found. I was having issues with it selecting the onnx revision that I use to compare against.

@NullSenseStudio
Copy link
Collaborator Author

Let me know if you have any issues with the changes or that it somehow causes bugs on macOS. I don't think more user testing is needed before release, but maybe it should be tested due to the version change.

@NullSenseStudio NullSenseStudio linked an issue Jan 26, 2023 that may be closed by this pull request
@carson-katri
Copy link
Owner

Is this ready for merge?

@NullSenseStudio
Copy link
Collaborator Author

Yes, unless if you want to do more user testing before release.

@carson-katri carson-katri merged commit 7aa8499 into main Jan 30, 2023
@carson-katri carson-katri deleted the torch-directml branch January 30, 2023 03:57
@carson-katri
Copy link
Owner

carson-katri commented Jan 30, 2023

I think more testing can wait for the next set of pre-release builds.

@Mhowser
Copy link

Mhowser commented Jan 30, 2023

Thanks so much for doing this! Will full precision eventually be possible in the future for AMD GPUs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants