-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change LowerTriangularArrays to be subtype of AbstractArray{T,N-1} #543
base: main
Are you sure you want to change the base?
Conversation
maximilian-gelbrecht
commented
May 18, 2024
•
edited
Loading
edited
- Basic tests for LTA passing
- show LowerTriangularMatrix as Matrix but higher dims as Vector
- Check other occurrences of LTA in the rest of the code
- Clean up and benchmark arithmetics
- Update plot!
- Update Doc
- Update Broadcast test
This is awesome, thanks Maximilian. I'll let you continute work on this, but feel free to let me know if I can contribute |
Just tested to benchmark the broadcast, and I am bit frustrated and surprised that it still isn't fast. using SpeedyWeather, BenchmarkTools
NF = Float32
mmax = 32
idims = (5,)
lmax = mmax
L = randn(LowerTriangularArray{NF}, 50, 50, idims...)
L2 = randn(LowerTriangularArray{NF}, 50, 50, idims...)
@benchmark L.data .+ L2.data
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.147 μs … 799.501 μs ┊ GC (min … max): 0.00% … 98.70%
Time (median): 4.103 μs ┊ GC (median): 0.00%
Time (mean ± σ): 6.806 μs ± 37.810 μs ┊ GC (mean ± σ): 38.56% ± 6.94%
▁ ▁▅█▇▁
▅█▆▅▅▃▅▇▇█████▆▄▄▄▄▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂ ▃
2.15 μs Histogram: frequency by time 12.5 μs <
Memory estimate: 25.11 KiB, allocs estimate: 6.
@benchmark L .+ L2
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 13.328 μs … 4.845 ms ┊ GC (min … max): 0.00% … 98.54%
Time (median): 14.787 μs ┊ GC (median): 0.00%
Time (mean ± σ): 18.113 μs ± 97.273 μs ┊ GC (mean ± σ): 11.87% ± 2.21%
▇▆█▇▃▃▃▂▂▃▁▁▄ ▂
██████████████▇▇▅▆▆▅▅▅▄▄▄▄▂▄▅▃▅▅▇▇▆▅▆▇▇▆▅▄▄▃▄▄▄▄▂▂▂▄▂▅▅▃▄▄▄ █
13.3 μs Histogram: log(frequency) by time 44.1 μs <
Memory estimate: 25.14 KiB, allocs estimate: 5. I don't really understand why. The @benchmark LowerTriangularArray{T, N, ArrayType{T,N}}(undef, SpeedyWeather.LowerTriangularMatrices.matrix_size(L))
BenchmarkTools.Trial: 8898 samples with 188 evaluations.
Range (min … max): 529.633 ns … 43.168 μs ┊ GC (min … max): 0.00% … 97.67%
Time (median): 773.019 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.972 μs ± 6.479 μs ┊ GC (mean ± σ): 73.31% ± 29.36%
█▃ ▁
██▇▆▆▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▃▁▄▄▅▅▆▆▇▆▇▇▇▇██████████████▇▇ █
530 ns Histogram: log(frequency) by time 25.5 μs <
Memory estimate: 25.05 KiB, allocs estimate: 4.
julia> @benchmark Array{T,N}(undef, size(L))
BenchmarkTools.Trial: 8490 samples with 200 evaluations.
Range (min … max): 375.935 ns … 41.204 μs ┊ GC (min … max): 0.00% … 97.10%
Time (median): 633.835 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.926 μs ± 6.537 μs ┊ GC (mean ± σ): 77.56% ± 30.35%
█▄ ▁ ▁
██▇▆▆▅▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▄▄▃▆▆▄▅▆▆▆▆▇▇▇███████████▇█▇█▇▇ █
376 ns Histogram: log(frequency) by time 25.2 μs <
Memory estimate: 25.02 KiB, allocs estimate: 3. |
I'll look at it again later this week, and will look if I find any possible solutions |
Maybe some inlining missing? I can have a check too |
Had a look again, and I solved it. The problem was slow indexing with So before we only had Base.@propagate_inbounds Base.getindex(L::LowerTriangularArray{T,N}, I::CartesianIndex{M}) where {T,N,M} = getindex(L, Tuple(I)...) Now, we also have: Base.@propagate_inbounds Base.getindex(L::LowerTriangularArray{T,N}, I::CartesianIndex{N}) where {T,N} = getindex(L.data, I) This apparently made all the difference because internally broadcasting uses cartesian indexing and the |
@benchmark L .+ L2
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.318 μs … 1.983 ms ┊ GC (min … max): 0.00% … 98.92%
Time (median): 4.224 μs ┊ GC (median): 0.00%
Time (mean ± σ): 8.019 μs ± 47.512 μs ┊ GC (mean ± σ): 36.22% ± 6.99%
▅▆▅▅▄▅▇█▇▅▃▂▂▂▂▁▁▁ ▁▂▂▂▂▂▁ ▂
██████████████████████▇█▇▆▇▆▇▆▆▆▅▆▇████████████▇▅▆▇▆▆▅▅▅▅▅ █
2.32 μs Histogram: log(frequency) by time 17.5 μs <
Memory estimate: 25.14 KiB, allocs estimate: 5.
julia> @benchmark L.data .+ L2.data
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.236 μs … 1.869 ms ┊ GC (min … max): 0.00% … 99.13%
Time (median): 4.154 μs ┊ GC (median): 0.00%
Time (mean ± σ): 7.459 μs ± 47.129 μs ┊ GC (mean ± σ): 38.76% ± 6.92%
▅▆▆▆▅▆▇██▆▄▃▃▂▂▂▂▁▁▁▁ ▂
████████████████████████▇▇█▇▆▇▇▆▆▅▆▆▇▇█▇▇█▇▇▇▆▇▆▆▆▆▄▄▄▅▃▃▅ █
2.24 μs Histogram: log(frequency) by time 17.4 μs <
Memory estimate: 25.11 KiB, allocs estimate: 6. |
What's left to do basically is to adjust the usage of AssociatedLegendrePolynomials |
I'm still getting this ? julia> L = rand(LowerTriangularMatrix,5,5)
15-element LowerTriangularMatrix{Float64}:
Error showing value of type LowerTriangularMatrix{Float64}:
ERROR: BoundsError: attempt to access 15-element LowerTriangularMatrix{Float64} at index [6, 1] |
Last commit shows a julia> L = rand(LowerTriangularMatrix,5,5)
15-element, 5x5 LowerTriangularMatrix{Float64}
0.787422 0.0 0.0 0.0 0.0
0.260061 0.819204 0.0 0.0 0.0
0.973352 0.620726 0.161257 0.0 0.0
0.0461552 0.685936 0.0117785 0.382289 0.0
0.13645 0.501547 0.419401 0.309778 0.688196
julia> Float32.(L)
15-element, 5x5 LowerTriangularMatrix{Float32}
0.787422 0.0 0.0 0.0 0.0
0.260061 0.819204 0.0 0.0 0.0
0.973352 0.620726 0.161257 0.0 0.0
0.0461552 0.685936 0.0117785 0.382289 0.0
0.13645 0.501547 0.419401 0.309778 0.688196
julia> L = rand(LowerTriangularMatrix,15,15)
120-element, 15x15 LowerTriangularMatrix{Float64}
0.261788 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0
0.495482 0.117453 0.0 0.0 0.0 0.0 0.0 0.0
0.0925019 0.796409 0.937247 0.0 0.0 0.0 0.0 0.0
0.816407 0.409503 0.779616 0.182904 0.0 0.0 0.0 0.0
0.915145 0.823384 0.270083 0.565954 0.0 0.0 0.0 0.0
0.737812 0.795098 0.194549 0.698349 … 0.0 0.0 0.0 0.0
0.275376 0.939357 0.535364 0.515381 0.0 0.0 0.0 0.0
0.910015 0.22325 0.972193 0.0490044 0.0 0.0 0.0 0.0
0.823201 0.128216 0.0735159 0.122388 0.0 0.0 0.0 0.0
0.169003 0.862217 0.621427 0.695971 0.0 0.0 0.0 0.0
0.659415 0.0279775 0.245747 0.247737 … 0.0 0.0 0.0 0.0
0.596095 0.988664 0.441109 0.990378 0.177168 0.0 0.0 0.0
0.28178 0.189885 0.609562 0.415566 0.155296 0.989642 0.0 0.0
0.710654 0.138662 0.849563 0.444248 0.467292 0.0712397 0.722201 0.0
0.104208 0.775013 0.634025 0.199434 0.955224 0.219269 0.102539 0.0512979 |
But then for higher dimensions the matrices just flattened out into columns julia> L = rand(LowerTriangularArray,5,5,5)
15×5 LowerTriangularArray{Float64, 2, Matrix{Float64}}:
0.90549 0.179909 0.722785 0.656362 0.215722
0.0663613 0.209663 0.393662 0.29946 0.142641
0.31763 0.121395 0.941059 0.14699 0.738631
0.132192 0.160128 0.157869 0.751886 0.0505202
0.61786 0.7052 0.841462 0.915375 0.0175474
0.802477 0.113889 0.344423 0.261357 0.484409
0.0662565 0.933641 0.271906 0.0445899 0.085901
0.0866564 0.0510311 0.115502 0.477471 0.195907
0.995015 0.367898 0.14462 0.5566 0.626406
0.613344 0.327654 0.292307 0.598366 0.416606
0.863694 0.273074 0.0687791 0.901928 0.422603
0.77978 0.471441 0.133331 0.0685466 0.529635
0.617265 0.944824 0.989218 0.985348 0.371635
0.40759 0.570559 0.0131059 0.229718 0.532602
0.998965 0.150414 0.0708596 0.853031 0.308048 |
In #525 I started implementing the N-dimensional spectral transform just looping over the additional dimensions because I thought it would be good to check whether we have any constraints from that regarding implementations here. Good news, the 3D case works perfectly fine with only little tweaks ( for k in eachgrid(grids) # produces a CartesianIndex looping over dimension 3 to N
...
specs[lm, k]
...
view(grids.data, 1:20, k)
... So just adding the index julia> L = rand(LowerTriangularMatrix, 5, 5)
15-element, 5x5 LowerTriangularMatrix{Float64}
0.301503 0.0 0.0 0.0 0.0
0.985783 0.0208044 0.0 0.0 0.0
0.628694 0.0216734 0.462313 0.0 0.0
0.221242 0.817672 0.778306 0.301415 0.0
0.241364 0.18049 0.188116 0.456697 0.596747
julia> L[1, CartesianIndex()]
ERROR: MethodError: no method matching isless(::Int64, ::CartesianIndex{0}) I'm not quite sure which julia> G = rand(OctaHEALPixGrid,2)
16-element, 3-ring OctaHEALPixGrid{Float64}:
0.6672885408610827
0.3029214314408146
0.25557101111147806
0.023467728044619385
0.13131270800462713
0.6096673403313817
0.19343526034598502
0.6247839303906794
0.8554319237365166
0.12169715210643173
0.9504931175461784
0.9193855210964392
0.49717053897764873
0.8925290658067014
0.232981331232753
0.03177260647949132
julia> G[1, CartesianIndex()]
0.6672885408610827 |
I'll continue experimenting with the dynamical core just because I feel that gives us some more insights to decisions we're making here. |
The I'll add this in a bit hacky way by explicitly defining a separate |
Yes agree. Leading integer argument then trailing cartesianindex is all we need! Thanks Max!! |
There's currently an error in the transforms here, that is a bit hard for me to see where it comes from. The "Transform: Orography (exact grids)" test fails, but only for the ClenshawGrids and only in the last row of the spectral coefficients. |
Made a brief check again. I don't really understand why this error is happening. @milankl do you have any idea? I didn't really change anything in SpectralTransforms that should be able to cause this, and that it is only appearing for ClenshawGrids and not GaussianGrid is a bit weird. using SpeedyWeather
trunc = 31
NF = Float32
Grid = FullClenshawGrid
dealiasing = Grid in (FullGaussianGrid, OctahedralGaussianGrid) ? 2 : 3
SG = SpectralGrid(NF; trunc, Grid, dealiasing)
S = SpectralTransform(SG, recompute_legendre=false)
O = EarthOrography(SG, smoothing=true)
E = Earth(SG)
initialize!(O, E, S)
oro_grid = O.orography
oro_spec = spectral(oro_grid, S)
oro_grid1 = gridded(oro_spec, S)
oro_spec1 = spectral(oro_grid1, S)
oro_grid2 = gridded(oro_spec1, S)
oro_spec2 = spectral(oro_grid2, S) oro_spec1 - oro_spec2
560-element, 33x32 LowerTriangularMatrix{ComplexF32}
-0.00012207+0.0im 0.0+0.0im 0.0+0.0im … 0.0+0.0im 0.0+0.0im
-3.05176f-5+0.0im -1.52588f-5-1.52588f-5im 0.0+0.0im 0.0+0.0im 0.0+0.0im
6.10352f-5+0.0im -2.38419f-7+1.52588f-5im -3.05176f-5+0.0im 0.0+0.0im 0.0+0.0im
-3.05176f-5+0.0im 7.62939f-6+3.05176f-5im -1.52588f-5-7.62939f-6im 0.0+0.0im 0.0+0.0im
-3.05176f-5+0.0im 0.0-1.52588f-5im 0.0-4.76837f-6im 0.0+0.0im 0.0+0.0im
0.0+0.0im -1.90735f-6+2.28882f-5im -1.14441f-5+0.0im … 0.0+0.0im 0.0+0.0im
-3.05176f-5+0.0im 1.90735f-6-3.05176f-5im 7.62939f-6+3.8147f-6im 0.0+0.0im 0.0+0.0im
⋮ ⋱ ⋮
3.05176f-5+0.0im -2.86102f-6-1.33514f-5im 1.90735f-6+3.8147f-6im 0.0+0.0im 0.0+0.0im
1.52588f-5+0.0im -3.21865f-6+0.0im 9.53674f-6+2.38419f-6im 0.0+0.0im 0.0+0.0im
-9.53674f-6+0.0im 3.33786f-6-1.57356f-5im 3.8147f-6-1.90735f-6im 0.0+0.0im 0.0+0.0im
8.58307f-6+0.0im -1.90735f-6-1.52588f-5im -1.43051f-5-1.90735f-6im 0.0+0.0im 0.0+0.0im
3.05176f-5+0.0im -1.90735f-6-3.8147f-6im 1.33514f-5-4.76837f-7im … -6.25849f-7+5.96046f-7im 0.0+0.0im
-2.86102f-6+0.0im -6.91414f-6-1.33514f-5im -1.71661f-5-3.8147f-6im -2.38419f-6-9.53674f-7im -9.53674f-7+9.53674f-7im
0.238285+0.0im -0.194557-0.108487im 0.0348457+0.129144im 0.0+1.90735f-6im 9.53674f-7+0.0im |
Back in Oxford, I can have a look today. For now: I find it suspicious that |
@@ -331,7 +330,7 @@ function gridded!( map::AbstractGrid{NF}, # gridded output | |||
|
|||
# Recalculate or use precomputed Legendre polynomials Λ | |||
recompute_legendre && Legendre.unsafe_legendre!(Λw, Λ, lmax, mmax, Float64(cos_colat[j_north])) | |||
Λj = recompute_legendre ? Λ : Λs[j_north] | |||
Λj = recompute_legendre ? LowerTriangularMatrix(Λ) : Λs[j_north] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this allocates for a T31 transform 24 T31 matrices -- not ideal!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, absolutely, not ideal in case recompute_legendre == true
.
I didn't go very deep into AssociatedLegendrePolynomials.jl, but it does repeatably use ndims
of the array it's given, even the actual compute routines. So, we can not use the LowerTriangularMatrix
there completely easily.
I didn't test what happens if you redefine ndims
, but I expect this to cause some troubles somewhere else.
The first quite "hacky" solution that comes to my mind to solve it would be to define another type that is just used for the transforms:
struct LegendePolArray{T,N,ArrayType} <: AbstractArray{T,N}
data::LowerTriangularArray{T,N-1,ArrayType}
end
getindex(L:: LegendePolArray, I...) = getindex(L.data, I...)
setindex!(L:: LegendePolArray, x, I...) = setindex!(L.data, x, I...)
Broadcast and some other operations wouldn't work for it, but as far as I can see it, everything in SpectralTransforms
would.
But yeah, that is a bit hacky as a solution to that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented it like this now
@@ -452,7 +448,7 @@ function spectral!( alms::LowerTriangularMatrix{Complex{NF}}, # output: spectr | |||
# LEGENDRE TRANSFORM in meridional direction | |||
# Recalculate or use precomputed Legendre polynomials Λ | |||
recompute_legendre && Legendre.unsafe_legendre!(Λw, Λ, lmax, mmax, Float64(cos_colat[j_north])) | |||
Λj = recompute_legendre ? Λ : Λs[j_north] | |||
Λj = recompute_legendre ? LowerTriangularMatrix(Λ) : Λs[j_north] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -597,50 +604,42 @@ end | |||
L2 = deepcopy(L1) | |||
|
|||
L2 ./= 5 | |||
@test L1/5 == L2 | |||
@test (L1.data ./ 5) == L2.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary?
|
||
L1 = adapt(ArrayType, randn(LowerTriangularArray{NF}, 10, 10, idims...)) | ||
L2 = deepcopy(L1) | ||
|
||
L2 .^= 2 | ||
@test L1.^2 == L2 | ||
@test L1.data.^2 == L2.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and below
@maximilian-gelbrecht I'm making progress on the rewrite of the dynamical core in #525 too to inform some of the decisions here, I think there's really only two things I've been adding/tweaking
SpeedyWeather.jl/src/LowerTriangularMatrices/lower_triangular_matrix.jl Lines 282 to 285 in b8737f7
Something like I think we should rename
The spectral gradients and other parts of the dynamical core are somewhat inconsistent on whether 1-based or 0-based indexing is used (and there are places where it makes sense to mix), to make this clearer I'm suggesting these abstract types and then
so that one can neatly write SpeedyWeather.jl/src/SpeedyTransforms/spectral_gradients.jl Lines 64 to 65 in b8737f7
|
I think I'd instead of Base.size(L::LowerTriangularArray; as::Type{Matrix}) = (L.m, L.n, size(L)[2:end]...)
Base.size(L::LowerTriangularArray; as::Type{Vector}) = size(L.data)
Base.size(L::LowerTriangularArray) = size(L, as=Vector) to let dispatch handle the cases? I'll need to extend this with |
I did try to do it like that initially but this doesn't work with the dispatching rules apparently. You'll immediately get warnings that the function definitions are being overwritten |
@milankl I tried to look into the Crenshaw grid error again, but couldn't fix it. I can't see where I changed something that would even affect it to be honest. Could you do a full code review and look if you can find a part of code change that could affect it? |
Right, because multiple dispatch doesn't apply to keyword arguments, what about julia> struct S end
julia> Base.size(s::S, as::Type{<:Matrix}) = "matrix"
julia> Base.size(s::S, as::Type{<:Vector}) = "vector"
julia> Base.size(s::S; as=Vector) = size(s, as)
julia> size(S())
"vector"
julia> size(S(), as=Matrix)
"matrix"
julia> size(S(), as=Vector)
"vector" then? make |