Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with ViT on the GPU #165

Closed
2 of 3 tasks
theabhirath opened this issue Jun 10, 2022 · 8 comments
Closed
2 of 3 tasks

Issues with ViT on the GPU #165

theabhirath opened this issue Jun 10, 2022 · 8 comments

Comments

@theabhirath
Copy link
Member

theabhirath commented Jun 10, 2022

With some experimentation, issues pop up when I use ViT on the GPU. Documenting these so that they can be tracked down and solved:

theabhirath added a commit to theabhirath/Metalhead.jl that referenced this issue Jun 10, 2022
theabhirath added a commit to theabhirath/Metalhead.jl that referenced this issue Jun 10, 2022
@darsnack
Copy link
Member

For the reshape issue, have you looked at what is the output of chunk. I suspect it isn't a CuArray but a view instead. permutedims just turns it back into a CuArray. So then the issue will be to make chunk GPU friendly.

@theabhirath
Copy link
Member Author

Oh, you're right! That is indeed the issue. I'll see how I can make chunk work on the GPU

@darsnack
Copy link
Member

darsnack commented Jun 10, 2022

Typically, CUDA.jl will return CuArrays instead of SubArrays for views, so long as the values are contiguous. The use of selectdim in chunk will try to take a view. If the "chunk" you are selecting corresponds to multiple indices (e.g. 2:4) in a middle dimension, then this will be a non-contiguous access and CUDA.jl will fallback to returning a SubArray. At this point almost any downstream operation on the GPU will complain. Basically this is to say that the CPU and GPU code path have to diverge cause you cannot make views work in this case.

Not sure exactly how we want to fix this. I feel that any change will be slower for the CPU path, and MLUtils.jl currently doesn't depend on CUDA.jl (we want to keep it this way).

cc @CarloLucibello and @ToucheSir for some more eyes

@ToucheSir
Copy link
Member

If I'm not mistaken,

query, key, value = chunk(qkv_reshaped, 3; dims = 4)
accesses the last dimension and not a middle one?

@darsnack
Copy link
Member

Oops so then selectdim just always returns a SubArray? Taking a contiguous view directly should return CuArray if I'm not mistaken.

@ToucheSir
Copy link
Member

ToucheSir commented Jun 11, 2022

Looks like it does for CuArrays. My understanding from testing with Cthulhu is that https://github.com/JuliaLang/julia/blob/v1.7.3/base/abstractarraymath.jl#L136 transforms what should be a : or or a UnitRange{Int64} for the non-selected dimensions into a Slice{OneTo{Int64}} . view on CuArrays can recognize view(x, :) is contiguous, but not view(x, Base.Slice(axes(x, 1))).

For our purposes, I think replacing the selectdim on https://github.com/JuliaML/MLUtils.jl/blob/348dbdd04065c650cf4a8fb2979bdaa4aad0b84a/src/utils.jl#L165 with an equivalent function that fills in non-chunked dims with Colon would suffice. It might also be nice to support dims as a Val for type stability, but as long as we can avoid triggering https://github.com/JuliaGPU/CUDA.jl/blob/0a46a13467f8522cc84ca490bc4948e25bf95fd3/src/array.jl#L631 we should be fine.

@CarloLucibello
Copy link
Member

can this be closed?

@ToucheSir
Copy link
Member

I think so, we can re-open if anything was missed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants