Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance of apply! be rewriting the part that zero out rows #489

Merged
merged 1 commit into from
Oct 26, 2022

Conversation

KristofferC
Copy link
Collaborator

@KristofferC KristofferC commented Sep 27, 2022

As a consequence the strategy kwarg is no longer useful.

Benchmark code:

grid = generate_grid(Hexahedron, (100, 100, 100));
dim = 3
ip = Lagrange{dim, RefCube, 1}()
qr = QuadratureRule{dim, RefCube}(2)
cellvalues = CellScalarValues(qr, ip);
dh = DofHandler(grid)
push!(dh, :u, 1)
close!(dh);
K = create_sparsity_pattern(dh)
ch = ConstraintHandler(dh);
∂Ω = union(
    getfaceset(grid, "left"),
    getfaceset(grid, "right"),
    getfaceset(grid, "top"),
    getfaceset(grid, "bottom"),
);
dbc = Dirichlet(:u, ∂Ω, (x, t) -> 0)
add!(ch, dbc);
close!(ch)
update!(ch, 0.0);
f = zeros(size(K, 1))

using BenchmarkTools
@btime apply_zero!(K, f, ch)
julia> @btime apply_zero!(K, f, ch) # master
  692.526 ms (16 allocations: 423.98 MiB)

julia> @btime apply_zero!(K, f, ch)
  687.897 ms (4 allocations: 544 bytes)

Time is approx the same but we no longer need to keep two copies of the stiffness matrix in memory at the same time.

@KristofferC
Copy link
Collaborator Author

KristofferC commented Sep 27, 2022

Could someone just verify the performance difference.

@koehlerson
Copy link
Member

julia> @btime apply_zero!(K, f, ch) #master
  340.741 ms (16 allocations: 423.98 MiB)

julia> @btime apply_zero!(K, f, ch) #kc/row_zero
  408.249 ms (4 allocations: 544 bytes)

@KnutAM
Copy link
Member

KnutAM commented Sep 28, 2022

  498.754 ms (16 allocations: 423.98 MiB) # master
   665.802 ms (4 allocations: 544 bytes) # kc/row_zero
145.603 ms (4 allocations: 544 bytes) # kam/both_zero

Perhaps you already discussed why not storing the indices ofnzvals directly today, but at least it seems faster :)

The downside is that creating the constraint handler is slower,
even when giving the matrix to close! for added performance (it needs K internally to find the indices, also possible directly from dh, but was too much work:))

138.396 ms (120174 allocations: 77.98 MiB) # master
143.173 ms (120163 allocations: 80.44 MiB) # kc/row_zero
713.404 ms (160598 allocations: 107.49 MiB) # kam/both_zero (with `close!(ch, K)`)
1.824 s (160622 allocations: 2.96 GiB) # kam/both_zero (with `close!(ch)`)

(For the close!(ch) about 1 GB of allocations and 0.3 s can be saved by filling with a Singelton type instead of Float64)

@KristofferC
Copy link
Collaborator Author

Perhaps you already discussed why not storing the indices ofnzvals directly today, but at least it seems faster :)

Then you are storing approx the equivalent memory of half a stiffness matrix there which maybe is too much.

@KristofferC
Copy link
Collaborator Author

KristofferC commented Sep 28, 2022

It is interesting we get quite different benchmarks. On my desktop with a pretty beefy CPU (i9-12900K) I get:

julia> @btime apply_zero!(K, f, ch) # master
  407.463 ms (16 allocations: 423.98 MiB)

julia> @btime apply_zero!(K, f, ch) # PR
  369.013 ms (4 allocations: 544 bytes)

If I slap a Threads.@threads on it and run with 8 cores I get

julia> @btime apply_zero!(K, f, ch)
  168.278 ms (53 allocations: 4.89 KiB)

@KnutAM
Copy link
Member

KnutAM commented Sep 28, 2022

Perhaps you already discussed why not storing the indices ofnzvals directly today, but at least it seems faster :)

Then you are storing approx the equivalent memory of half a stiffness matrix there which maybe is too much.

Perhaps I misunderstand here, but wouldn't it be (on average) 2*num_prescribed_dofs*bandwidth (and only Ints)?
(But it was more for fun, I suppose this will always be dominated by linear solve and your solution is less invasive)

@KristofferC
Copy link
Collaborator Author

Yeah, I'm wrong, you only need the mapping for the constrained dofs (of course). Caching this mapping in the constraint handler makes sense to me I think, it would just move a part of the logic in the new function here to there. I guess the only drawback is that we typically do not send in the stiffness matrix to the constraint handler when it is created...

@koehlerson
Copy link
Member

It is interesting we get quite different benchmarks. On my desktop with a pretty beefy CPU (i9-12900K) I get:

julia> @btime apply_zero!(K, f, ch) # master
  407.463 ms (16 allocations: 423.98 MiB)

julia> @btime apply_zero!(K, f, ch) # PR
  369.013 ms (4 allocations: 544 bytes)

If I slap a Threads.@threads on it and run with 8 cores I get

julia> @btime apply_zero!(K, f, ch)
  168.278 ms (53 allocations: 4.89 KiB)

I used my laptop with 12th Gen i5-1240p but I can retry with threads If you want

@termi-official
Copy link
Member

Probably stupid question, but if the performance of applying (affine) constraints is of concern, why don't we apply the constraints on element-level as in deal.ii? (see e.g. https://www.dealii.org/current/doxygen/deal.II/classAffineConstraints.html#a373fbdacd8c486e675b8d2bff8943192 and https://www.dealii.org/current/doxygen/deal.II/step_27.html#Creatingthesparsitypattern)

@lijas
Copy link
Collaborator

lijas commented Sep 28, 2022

Probably stupid question, but if the performance of applying (affine) constraints is of concern, why don't we apply the constraints on element-level as in deal.ii?

Why not both :D

I think this PR can be merged, even though it seems to be a bit slower on @koehlerson benchmarks.

@fredrikekre
Copy link
Member

master:

@btime apply_zero!($K, $f, $ch)
794.046 ms (16 allocations: 423.98 MiB)

kc/row_zero:

@btime apply_zero!($K, $f, $ch)
643.115 ms (4 allocations: 544 bytes)

@KristofferC
Copy link
Collaborator Author

I should probably be get rid of the identity hashing thing. It is a bit unclear if that is optimizing for the specific benchmark.

This patch improves the performance of apply! and apply_zero! by
rewriting the part that zero out rows of the matrix. As a result, the
`strategy` keyword argument is obsolete and thus ignored.

Co-authored-by: Kristoffer Carlsson <kcarlsson89@gmail.com>
Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2022

Codecov Report

Base: 92.20% // Head: 92.28% // Increases project coverage by +0.07% 🎉

Coverage data is based on head (2c261c0) compared to base (f3057f3).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #489      +/-   ##
==========================================
+ Coverage   92.20%   92.28%   +0.07%     
==========================================
  Files          22       22              
  Lines        3783     3783              
==========================================
+ Hits         3488     3491       +3     
+ Misses        295      292       -3     
Impacted Files Coverage Δ
src/Dofs/ConstraintHandler.jl 95.08% <100.00%> (+0.35%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@fredrikekre fredrikekre merged commit 17f993a into master Oct 26, 2022
@fredrikekre fredrikekre deleted the kc/row_zero branch October 26, 2022 13:06
KnutAM added a commit that referenced this pull request Jul 31, 2024
ApplyStrategy was deprecated in #489, but kept for backwards compat (having no effect). 
This PR removes them completely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants