-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor performance improvements for close!(::MixedDofHandler)
#637
Conversation
This patch tweaks the dof distribution for `MixedDofHandler` slightly: - `push!` cell dofs directly into `dh.cell_dofs` instead of first pushing to an element celldofs vector and then pushing to the global one. This removes one `append!` per cell, but does not affect the benchmark that much (dict lookup is still dominating). - Use the computed dict token for extraction in `get_or_create_dofs!` - Change keyword arguments in `get_or_create_dofs!` to positional. This does not matter for performance, but gives nicer profile stackframes. I used this benchmark code: ```julia using Ferrite using BenchmarkTools function dofhandler(::Type{DHT}, grid) where DHT dh = DHT(grid) add!(dh, :v, 2, Lagrange{2,RefCube,2}()) add!(dh, :s, 1, Lagrange{2,RefCube,1}()) close!(dh) return dh end grid = generate_grid(Quadrilateral, (1000, 1000)) @Btime dofhandler(MixedDofHandler, $grid) ``` with the following (disappointing) results: ``` 1.605 s (319 allocations: 640.42 MiB) # master 1.573 s (316 allocations: 639.34 MiB) # patch ``` For reference, `DofHandler` gives 1.3s, 405MiB for this benchmark.
1c5afd5
to
b4ef9f6
Compare
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## master #637 +/- ##
==========================================
- Coverage 92.41% 92.41% -0.01%
==========================================
Files 29 29
Lines 4432 4429 -3
==========================================
- Hits 4096 4093 -3
Misses 336 336
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
This patch removes the tracking of (internal) cell dofs when distributing dofs for MixedDofHandler. There is no need to track these since they are defined as dofs which are never shared with other elements. This almost closes the performance gap between the DofHandlers (benchmark code from #637): ``` 1.573 s (316 allocations: 639.34 MiB) # MixedDofHandler master 1.397 s (257 allocations: 543.50 MiB) # MixedDofHandler patch 1.267 s (234 allocations: 405.12 MiB) # DofHandler master/patch ```
#639) This patch removes the tracking of (internal) cell dofs when distributing dofs for MixedDofHandler. There is no need to track these since they are defined as dofs which are never shared with other elements. This almost closes the performance gap between the DofHandlers (benchmark code from #637): ``` 1.573 s (316 allocations: 639.34 MiB) # MixedDofHandler master 1.397 s (257 allocations: 543.50 MiB) # MixedDofHandler patch 1.267 s (234 allocations: 405.12 MiB) # DofHandler master/patch ```
This reduces memory used in dof distribution for MixedDofHandler by only storing the first dof that are added for every entity, instead of the range of dofs. This simply copies the logic from DofHandler, which also means that MixedDofHandler now also support multiple dofs per face in 2D, but not in 3D, just like DofHandler. This closes the performance gap between the DofHandlers (benchmark code from #637): ``` 1.397 s (257 allocations: 543.50 MiB) # MixedDofHandler master 1.220 s (249 allocations: 456.05 MiB) # MixedDofHandler patch 1.267 s (234 allocations: 405.12 MiB) # DofHandler master/patch ```
This reduces memory used in dof distribution for MixedDofHandler by only storing the first dof that are added for every entity, instead of the range of dofs. This simply copies the logic from DofHandler, which also means that MixedDofHandler now also support multiple dofs per face in 2D, but not in 3D, just like DofHandler. This closes the performance gap between the DofHandlers (benchmark code from #637): ``` 1.397 s (257 allocations: 543.50 MiB) # MixedDofHandler master 1.220 s (249 allocations: 456.05 MiB) # MixedDofHandler patch 1.267 s (234 allocations: 405.12 MiB) # DofHandler master/patch ```
This reduces memory used in dof distribution for MixedDofHandler by only storing the first dof that are added for every entity, instead of the range of dofs. This simply copies the logic from DofHandler, which also means that MixedDofHandler now also support multiple dofs per face in 2D, but not in 3D, just like DofHandler. This closes the performance gap between the DofHandlers (benchmark code from #637): ``` 1.397 s (257 allocations: 543.50 MiB) # MixedDofHandler master 1.220 s (249 allocations: 456.05 MiB) # MixedDofHandler patch 1.267 s (234 allocations: 405.12 MiB) # DofHandler master/patch ```
This reduces memory used in dof distribution for MixedDofHandler by only storing the first dof that are added for every entity, instead of the range of dofs. This simply copies the logic from DofHandler, which also means that MixedDofHandler now also support multiple dofs per face in 2D, but not in 3D, just like DofHandler. This closes the performance gap between the DofHandlers (benchmark code from #637): ``` 1.397 s (257 allocations: 543.50 MiB) # MixedDofHandler master 1.220 s (249 allocations: 456.05 MiB) # MixedDofHandler patch 1.267 s (234 allocations: 405.12 MiB) # DofHandler master/patch ```
This patch changes the storage for vertex dof tracking from dictionaries to vectors. This is possible since we know already from the start the number of vertices and their global numbers. Since dof distribution is almost 100% dict hashing this give a pretty big performance boost. Running the benchmark code from #637: ``` 1.267 s (234 allocations: 405.12 MiB) # DofHandler master 812.388 ms (130 allocations: 290.07 MiB) # DofHandler patch 1.220 s (249 allocations: 456.05 MiB) # MixedDofHandler master 838.601 ms (145 allocations: 341.00 MiB) # MixedDofHandler patch ``` Note that the same optimization can be done for edges/faces too if those were globally enumerated. Fortunately vertex dofs are the most common case so implementing this is not a high priority.
This patch changes the storage for vertex dof tracking from dictionaries to vectors. This is possible since we know already from the start the number of vertices and their global numbers. Since dof distribution is almost 100% dict hashing this give a pretty big performance boost. Running the benchmark code from #637: ``` 1.267 s (234 allocations: 405.12 MiB) # DofHandler master 812.388 ms (130 allocations: 290.07 MiB) # DofHandler patch 1.220 s (249 allocations: 456.05 MiB) # MixedDofHandler master 838.601 ms (145 allocations: 341.00 MiB) # MixedDofHandler patch ``` Note that the same optimization can be done for edges/faces too if those were globally enumerated. Fortunately vertex dofs are the most common case so implementing this is not a high priority.
This patch tweaks the dof distribution for
MixedDofHandler
slightly:push!
cell dofs directly intodh.cell_dofs
instead of first pushing to an element celldofs vector and then pushing to the global one. This removes oneappend!
per cell, but does not affect the benchmark that much (dict lookup is still dominating).get_or_create_dofs!
get_or_create_dofs!
to positional. This does not matter for performance, but gives nicer profile stackframes.I used this benchmark code:
with the following (disappointing) results:
For reference,
DofHandler
gives 1.3s, 405MiB for this benchmark.