Add texture support from CuTextures.jl #206

maleadt · 2020-06-05T13:49:55Z

Fixes #46

@cdsousa: I'm still getting to understand your design and the full extent of the relevant CUDA APIs, so I expect to do some work here before merging. Feel free to comment though! I changed some already to be closer to the CUDA C defaults.

codecov · 2020-06-05T14:31:14Z

Codecov Report

Merging #206 into master will decrease coverage by 0.33%.
The diff coverage is 93.33%.

@@            Coverage Diff             @@
##           master     #206      +/-   ##
==========================================
- Coverage   82.30%   81.97%   -0.34%     
==========================================
  Files         143      145       +2     
  Lines        9315     9529     +214     
==========================================
+ Hits         7667     7811     +144     
- Misses       1648     1718      +70

Impacted Files	Coverage Δ
src/CUDA.jl	`100.00% <ø> (ø)`
test/texture.jl	`90.74% <90.74%> (ø)`
src/texture.jl	`94.59% <94.59%> (ø)`
examples/wmma/high-level.jl	`11.11% <0.00%> (-38.89%)`	⬇️
examples/wmma/low-level.jl	`14.28% <0.00%> (-35.72%)`	⬇️
lib/cuda/occupancy.jl	`76.00% <0.00%> (-20.00%)`	⬇️
test/device/wmma.jl	`0.00% <0.00%> (-7.41%)`	⬇️
deps/bindeps.jl	`80.00% <0.00%> (-2.00%)`	⬇️
test/execution.jl	`38.68% <0.00%> (-1.47%)`	⬇️
test/device/cuda.jl	`9.72% <0.00%> (-0.73%)`	⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6666d9...77be342. Read the comment docs.

cdsousa · 2020-06-05T14:32:47Z

Well, the part that I was least satisfied was the _type_to_cuarrayformat_dict ,reconstruct and cast "hacks". It deserves a review to check if the design can get some improvement.

Also, the way that getindex works may be little obvious: real indexes use normalized coordinates and integer indexes use non-normalized coordinates

maleadt · 2020-06-05T15:01:42Z

Also, the way that getindex works may be little obvious: real indexes use normalized coordinates and integer indexes use non-normalized coordinates

Didn't you use the call overload (texture(i,j)) for normalized coordinates vs. square brackets for non-normalized ones? Both that or the Real/Int approach seem less than ideal to me, I'd rather use the hardware for that (i.e. configure the texture object as normalized or not, and use the same indexing function). The only difference then is whether to use 1-based indexing or not.

cdsousa · 2020-06-05T15:25:40Z

Didn't you use the call overload (texture(i,j))

Yeah, I used to have that design, it perfectly fine.

I'd rather use the hardware for that (i.e. configure the texture object as normalized or not, and use the same indexing function). The only difference then is whether to use 1-based indexing or not.

That's possible too, we just then need to track that configuration on the host and device side objects, either by storing a flag or by splitting the types into the two versions.

maleadt · 2020-06-05T15:38:24Z

Yeah, I used to have that design, it perfectly fine.

Did I start with the wrong code then? Your master branch still has that design: https://github.com/cdsousa/CuTextures.jl/blob/6838fff61ffe32a5bb8344ca1c6e6a3dca3fd16b/src/native.jl#L62-L70

That's possible too, we just then need to track that configuration on the host and device side objects, either by storing a flag or by splitting the types into the two versions.

Yeah that's what I do now, a field CPU-side and a typevar in the device type. But I need to play with the code some more to get a feel about that API :-)

cdsousa · 2020-06-05T16:23:30Z

Did I start with the wrong code then? Your master branch still has that design:

Ahh, sorry for the confusion, I've overlooked into the getindex code you have added and though that it was the design I used to have with a single getindex for both normalized and non-normalized fetches that dispatched on the index type.
I didn't got at first that you had already started working on the code!

maleadt · 2020-06-08T11:26:24Z

About the automatic cast/reconstruct/_type_to_cuarrayformat_dict: I had actually been moving away from doing such things automatically (used to so for shuffle, ldg, etc). The compiler does not support it, and the hacks are fragile, hard to infer, and I don't trust that the layout/semantics are always going to be correct. It seems easier to me to admit that CUDA only supports texture memory of limited types, and ask of the user to reinterpret e.g. the RGBA{N0f8} array to an NTuple{4,UInt8}. Thoughts?

cdsousa · 2020-06-08T12:34:05Z

I'm Ok with that, we can always do that explicitly (however I think we cannot reinterpret a CuDeviceArray inside a kernel, can we?).

On the other hand, we will still have an automatic mapping between NTuple{4, Float32} and float4, right? In that case, there must be rationale for choosing NTuple{4, Float32} rather than something else (e.g., SIMD.Vec{4,Float32}).

maleadt · 2020-06-09T08:31:42Z

I think we cannot reinterpret a CuDeviceArray inside a kernel, can we?

No; I could add that, but it it seems simpler to reinterpret back when moving data off the GPU?

On the other hand, we will still have an automatic mapping between NTuple{4, Float32} and float4, right? In that case, there must be rationale for choosing NTuple{4, Float32} rather than something else (e.g., SIMD.Vec{4,Float32}).

I'm not entirely sure what you mean, currently using sources of type NTuple{4, T} will construct a 4-channel texture, and accessing that device-side just gives you that NTuple. (thinking of, maybe it would be convenient to allow passing the channel to getindex to get a plain value?)

cdsousa

On the other hand, we will still have an automatic mapping between NTuple{4, Float32} and float4, right? In that case, there must be rationale for choosing NTuple{4, Float32} rather than something else (e.g., SIMD.Vec{4,Float32}).

I'm not entirely sure what you mean, currently using sources of type NTuple{4, T} will construct a 4-channel texture, and accessing that device-side just gives you that NTuple. (thinking of, maybe it would be convenient to allow passing the channel to getindex to get a plain value?)

Not a big deal, I was just not sure why an NTuple{4, T} is the Julia-side version of a CUDA float4 (despite the fact that I actually used that equivalence in my original code). I guess that is the usual and right choice.

cdsousa · 2020-06-05T16:56:04Z

src/texture.jl

+Base.size(tm::CuTexture) = size(tm.mem)
+
+Adapt.adapt_storage(::Adaptor, t::CuTexture{T,N}) where {T,N} =
+    CuDeviceTexture{T,N,t.normalized_coordinates}(size(t.mem), t.handle)


Can this bring any type of instability?
I have not used a solution like this in the first place due to the fear of type instability, but now I guess that maybe the kernel launch works as a function barrier, is that right?

maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Jun 5, 2020

maleadt marked this pull request as draft June 5, 2020 14:06

maleadt merged commit 77be342 into master Jun 9, 2020

maleadt force-pushed the master branch from 5f11c40 to 108b7f7 Compare June 9, 2020 09:39

maleadt deleted the tb/textures branch June 9, 2020 09:39

maleadt mentioned this pull request Jun 9, 2020

Add texture support from CuTextures.jl #209

Merged

cdsousa reviewed Jun 9, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add texture support from CuTextures.jl #206

Add texture support from CuTextures.jl #206

maleadt commented Jun 5, 2020

codecov bot commented Jun 5, 2020 •

edited

cdsousa commented Jun 5, 2020 •

edited

maleadt commented Jun 5, 2020

cdsousa commented Jun 5, 2020 •

edited

maleadt commented Jun 5, 2020

cdsousa commented Jun 5, 2020

maleadt commented Jun 8, 2020

cdsousa commented Jun 8, 2020

maleadt commented Jun 9, 2020

cdsousa left a comment

cdsousa Jun 5, 2020

Add texture support from CuTextures.jl #206

Add texture support from CuTextures.jl #206

Conversation

maleadt commented Jun 5, 2020

codecov bot commented Jun 5, 2020 • edited

Codecov Report

cdsousa commented Jun 5, 2020 • edited

maleadt commented Jun 5, 2020

cdsousa commented Jun 5, 2020 • edited

maleadt commented Jun 5, 2020

cdsousa commented Jun 5, 2020

maleadt commented Jun 8, 2020

cdsousa commented Jun 8, 2020

maleadt commented Jun 9, 2020

cdsousa left a comment

Choose a reason for hiding this comment

cdsousa Jun 5, 2020

Choose a reason for hiding this comment

codecov bot commented Jun 5, 2020 •

edited

cdsousa commented Jun 5, 2020 •

edited

cdsousa commented Jun 5, 2020 •

edited