Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for AMD / ROCm / HIP #707

Open
ehartford opened this issue Dec 6, 2023 · 8 comments
Open

add support for AMD / ROCm / HIP #707

ehartford opened this issue Dec 6, 2023 · 8 comments

Comments

@ehartford
Copy link

I want to again request AMD support, since it is now much more popular and usable than it has been

@wsippel
Copy link

wsippel commented Dec 10, 2023

AMD is working on it: https://github.com/ROCmSoftwarePlatform/flash-attention

I've not tested it yet, but a new branch with WMMA optimizations for Radeon 7000 was added just yesterday it seems.

@nktice
Copy link

nktice commented Dec 19, 2023

I have composed this guide for my AMD AI configuration...
https://github.com/nktice/AMD-AI
The ROCm project that had done flash attention has appeared to work with 5.73.
[ https://github.com/nktice/AMD-AI/blob/main/ROCm-5.7.md - I've not tested much, but the exllamav2 warnings that appear when it's not in use disappear once it's installed in this case... ]

Alas it does not work with the ROCm 6 at time of writing.
[ https://github.com/nktice/AMD-AI/blob/main/ROCm6.0.md - in this case exllamav2 crashes if flash attention ( same as above ) is installed. ]

An issue with this is that the AMD fork is always behind
and hard to maintain compared to the main content and developers.

What would be helpful is for AMD's content to be included
back into the source, so that they do not have to start from scratch again
every time there is any update to the main flash-attention code.

@ehartford
Copy link
Author

@tridao is it possible to merge this to support ROCm?

https://github.com/ROCmSoftwarePlatform/flash-attention

@tridao
Copy link
Contributor

tridao commented Jan 18, 2024

https://github.com/ROCmSoftwarePlatform/flash-attention

I think that's a fork maintained by AMD folks and it's not meant to be merged.

@ehartford
Copy link
Author

I doubt they would disapprove of merging, Seems just a rift of communication. I will reach out.

@nktice
Copy link

nktice commented Apr 3, 2024

https://github.com/ROCmSoftwarePlatform/flash-attention

I think that's a fork maintained by AMD folks and it's not meant to be merged.

As it's been a while, and they haven't updated or integrated...
I'd like to mention - AMD rarely updates or maintains such things...
It's common for them to abandon such projects with little notice...
Like for example their bits-and-bytes conversion is well out of date -
https://github.com/ROCm/bitsandbytes
Leading to others improvising for themselves to get things working -
[ Here's the most recent working bitsandbytes I've found that works with ROCm... it's well out of date, but not quite as abandoned as AMD's own... ]
https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6
There's been no quarrel about peoples' forked versions, and there are a few - but without their help it is something of a mess of mixed offerings.

It is more likely they offered an example of what could be done - and how to do it, so that the 'community' could take it from there. [ If that's not the case, then they'd clearly mention that, or keep it private. ]

I have contacted exllamav2 about the version issue, here is what they said - AMD's offered version isn't of much use...
turboderp/exllamav2#397 (comment)

@RichardFevrier
Copy link

Maybe @howiejayz could be part of this conversation =)

@howiejayz
Copy link

Maybe @howiejayz could be part of this conversation =)

Unfortunately I am no longer working on this project :( But as far as I know the other team is still working on this project and it will be long-term support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants