Skip to content

Conversation

pjaap
Copy link
Member

@pjaap pjaap commented Mar 4, 2025

By precomputing suitable search areas on the target grid boundary region.

Compute for each face the bounding box, transfer it to the opposite side and check intersections with the bounding boxes of the faces there.

A not so trivial benchmark:

xgrid = simplexgrid(0:0.05:1.0, 0:0.05:1.0, 0:0.05:1.0)
FES = FESpace{H1P2{2, 3}}(xgrid)
function give_opposite!(y, x)
    y .= x
    y[1] = 1 - x[1]
    return nothing
end

Then,
@time A = get_periodic_coupling_matrix(FES, xgrid, 4, 2, give_opposite!, sparsity_tol = 1.0e-8, heuristic_search = true)
takes about 31 seconds,

while the current method
@time A = get_periodic_coupling_matrix(FES, xgrid, 4, 2, give_opposite!, sparsity_tol = 1.0e-8, heuristic_search = false)
takes about 5 minutes!

The switch heuristic_search should probably be removed when this improvement is accepted.

Remark: the allocations are crazy (independent of the switch): 31.423613 seconds (36.58 M allocations: 31.542 GiB, 5.39% gc time) ... we should look into this.

By precomputing suitable search areas on the target grid boundary region
@pjaap
Copy link
Member Author

pjaap commented Mar 4, 2025

@chmerdon what do you think? Seems to cause no problems so far, so I would remove the heuristic_search entry and merge.

@chmerdon
Copy link
Member

chmerdon commented Mar 5, 2025

I put the wrap of the point evaluator outside the loop which improved the situation further:
14.768451 seconds (32.95 M allocations: 21.125 GiB, 6.16% gc time, 5.17% compilation time)
There are still a lot of allocations in that main loop that we should try to bring down...

@pjaap
Copy link
Member Author

pjaap commented Mar 5, 2025

Nice. We should still tackle the allocation problem.
I did some deep-dive into this an most allocation happen in
edgemoments::Array{T, 2} = zeros(T, nmoments, nitems)
(https://github.com/WIAS-PDELib/ExtendableFEMBase.jl/blob/master/src/interpolations.jl#L249)

@chmerdon
Copy link
Member

chmerdon commented Mar 5, 2025

Yes, so far the interpolate was expected to be used for all elements at once, here it is used in a loop to interpolate on each single boundary face. Maybe we can reduce the number of interpolate calls somehow by treating several boundary faces at once, that have non overlapping supports on the other side? Or we need to find a way to rewrite the interpolate code, such that variables like edgemoments are reused...

@chmerdon
Copy link
Member

chmerdon commented Mar 6, 2025

Got rid of some more minor allocations that brought also a little speedup... I am now at
11.737367 seconds (16.93 M allocations: 20.625 GiB, 7.81% gc time)

@pjaap
Copy link
Member Author

pjaap commented Mar 6, 2025

Great. I removed the heuristic_search switch, since we agree that anything else than true is not suitable.

@pjaap pjaap force-pushed the feature/speed-up-periodic-coupling branch from 7190faa to 5cab4f3 Compare March 6, 2025 08:08
@pjaap pjaap force-pushed the feature/speed-up-periodic-coupling branch from 5cab4f3 to d357811 Compare March 6, 2025 08:09
@chmerdon
Copy link
Member

chmerdon commented Mar 6, 2025

I accidently ran another Formatter, before I remember to run the pre-commit, maybe that caused it. At least I don't see a need for these commas and white spaces myself ^^

@pjaap
Copy link
Member Author

pjaap commented Mar 6, 2025

There shall not be another formatter than Runic 😆

@pjaap pjaap force-pushed the feature/speed-up-periodic-coupling branch from 4de4064 to 01d8d29 Compare March 11, 2025 09:46
@pjaap pjaap merged commit 601a7ee into master Mar 11, 2025
8 checks passed
@pjaap pjaap deleted the feature/speed-up-periodic-coupling branch March 11, 2025 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants