-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support loading/saving simdgroup matrix from threadgroup memory #71
Conversation
bors try |
tryBuild succeeded: |
src/device/intrinsics/memory.jl
Outdated
@@ -78,6 +78,15 @@ Base.@propagate_inbounds Base.getindex(A::MtlLargerDeviceArray{T}, i1::Integer) | |||
Base.@propagate_inbounds Base.setindex!(A::MtlLargerDeviceArray{T}, x, i1::Integer) where {T} = | |||
arrayset(A, convert(T,x)::T, i1) | |||
|
|||
Base.to_index(::MtlLargerDeviceArray{T}, i::Integer) where {T} = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? I'm assuming you're copying the CUDA.jl perf optimization for Int32 indices here. That doesn't really belong in this PR, but other than that it should definitely have some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right it's not needed, I've removed it.
I copied it over from src/device/array.jl
when I was trying to get tg_a[x,y]
to work for MtlLargerDeviceArray
(which is needed by the test).
bors try |
tryBuild succeeded: |
No description provided.