Fast integration changes for Vectorial Envelope and Optimizations #36

jbadger95 · 2021-10-17T19:45:43Z

Design

Fast integration routines need to be updated to include additional vectorial envelope terms. While implementing these changes the following optimizations were also identified:

Integration of boundary terms is typically more expensive than interior terms (when using fast integration), identifying sparsity structure of 3D shape functions before integration dramatically reduces the cost of boundary assembly.
Storing the Gram matrix in packed format precludes use of more optimized BLAS3 routines for cholesky factorization and back substitution. Converting the Gram matrix from packed to RFP format before factorization (and using corresponding LAPACK and BLAS routines) significantly reduces cost of inversion.

Finally, computation of enriched shape functions on prism and quad elements should be updated to use tensor produce ordering. This simplifies the DoF maps in the prism routine (the prism map is trivial, however when using the prism routine to assemble hexas it's necessary to convert from the 2D-1D tensor ordering to the 1Dx1Dx1D tensor ordering)

Implementation

The following routines need to be updated to include vectorial envelope terms:

elem_maxwell_fi_prism
elem_maxwell_gram_prism
elem_maxwell_fi_hexa
elem_maxwell_gram_hexa

The following routines need to be updated to include boundary and linear algebra optimizations:

elem_maxwell_fi_prism
elem_maxwell_fi_hexa
elem_residual_maxwell

The following enriched shape functions routines shoud be updated to use tensor product ordering (the hexa already uses tensor product ordering):

quad
- shape3DHBrokenQuad
- shape3DEBrokenQuad
- shape3DVBrokenQuad
prism
- shape2DHBrokenQuad
- shape2DEBrokenQuad
- shape2DVBrokenQuad

stefanhenneking · 2021-10-18T14:25:47Z

@jbadger95 When you have the PR all cleaned up and ready-for-review, we'll have to also ask Brandon and Federico to take a look at the changes you are proposing within the shape functions before we merge.

jbadger95 · 2021-10-24T22:27:16Z

@federicofuentes @brendankeith Hi, I made some changes to the enriched shape functions on Quads and Prisms essentially mirroring what's in Hexa, I was hoping you guys could take a look. The change is for convenient ordering of the shape functions in sum-factorization routines, it's not really about performance or anything (the expense of computing the shape functions is insignificant even in these sum-factorization elem routines).

stefanhenneking

@jbadger95 great work!!! I only have minor suggestions; but here are a few thoughts I generally had when looking over the changes:

Fast integration implementations have become increasingly complex because there are so many different terms to track that entirely avoiding introduction of new bugs appears almost impossible. In particular:

Adding additional terms to an existing formulation is quite challenging (as evident by the work you had to do for "simply" adding envelope terms to bilinear form and Gram matrix);
A thorough code review by looking only at the changes in fast integration routines is difficult; one needs to basically review the entire implementation (thousands of lines);
If you have ideas for general simplifications of implementing fast integration in the future, I am open to ideas (maybe modularize somehow?).
Did you check for possible memory leaks, either through debugging tool, or an easy hack is by solving the same linear system over and over while keeping track of memory usage (e.g. via htop) -- if in the process of the repeated solving memory usage increases there's a leak.

I haven't done verification runs yet, but will comment again once I have.

trunk/src/element/shape_1/broken/BrokenPrism.F

trunk/problems/LASER/UW_COUPLED/elem/elem_maxwell_gram_hexa.F90

trunk/problems/LASER/UW_COUPLED/elem/elem_maxwell_fi_pris.F90

jbadger95 · 2021-10-26T15:58:21Z

I agree these routines are nasty, comparing to the standard routines is helpful checking but it can take a while to track down bugs in these routines (as evidenced by my glacial pace). I have some ideas for modularization, we might have to explore a bit to try to find a solution that has a minimal impact on performance though. I'll maybe work on some simple examples when I get a chance.

stefanhenneking · 2021-10-26T15:59:51Z

Maybe some performance hit (a constant factor) is alright as long as the asymptotic speedup is still there.

stefanhenneking · 2021-10-26T16:03:58Z

Maybe some performance hit (a constant factor) is alright as long as the asymptotic speedup is still there.

If we want to look at this, we should probably do it with a simpler implementation where testing is easier and faster. Maybe a linear UW Maxwell model problem or so.

jbadger95 · 2021-10-26T16:11:18Z

The simplest way to modularize these would be to separate computation of the Gram and stiffness matrix. We already have a routine for the Gram matrix, we could make one exclusively for the stiffness matrices (maybe even separate ones for the element and boundary) and then have a different subroutine that does the LA. That would at least make routines more manageable. I can outline what I was thinking for further modularization but I doubt it’s performance will be great.

stefanhenneking · 2021-10-26T16:20:44Z

This would help modularizing, but not really simplify implementation and debugging in any significant way. I was trying to think further than that, somehow coming up with a more adaptable implementation where things like handling auxiliary matrices would become easier.

jbadger95 · 2021-10-26T16:21:45Z

If we want to look at this, we should probably do it with a simpler implementation where testing is easier and faster. Maybe a linear UW Maxwell model problem or so.

Or the heat problem?

stefanhenneking · 2021-10-26T16:23:49Z

If we want to look at this, we should probably do it with a simpler implementation where testing is easier and faster. Maybe a linear UW Maxwell model problem or so.

Or the heat problem?

Possible, yes. But using the existing primal DPG Poisson model problem as a testing ground makes things a lot easier than having to run the LASER application for testing.

…d up linear algebra operations

…und 1e-9 tolerance

…n routine accordingly Also removed some of the previous interior integration optimizations, can reimplement later

stefanhenneking · 2021-11-15T19:28:37Z

I think we can merge this. @jbadger95 do you agree?
There are only a few notable cases that we haven't tested:

anisotropic material (birefringent fiber)
envelope for the pump field (in principle, it should work same as the signal field)

In addition, it would be nice to have the PML parameters be automatically tuned depending on the user input for the envelope wavenumber. This could go into a separate PR.

stefanhenneking · 2022-02-21T03:15:30Z

@jbadger95 you mentioned fixing some minor issue recently in the FI routines for the envelope formulation? Did you make the change on a different branch? If so, would like to add it here or should we merge this branch into master and then make that change in a follow-up PR?

jbadger95 · 2022-02-21T03:20:28Z

I made the changes on my bending branch, I'll fix the errors quick on this one though. I have some more extensive changes to generalize the PML and stuff but I'll make those on a separate PR.

stefanhenneking · 2022-02-21T03:22:10Z

Ok, sounds good. I believe otherwise, once you have added the fix, this branch should be ready for merge?

jbadger95 · 2022-02-21T03:24:52Z

trunk/problems/LASER/UW_COUPLED/elem/elem_residual_maxwell.F90

@@ -103,7 +103,7 @@ subroutine elem_residual_maxwell(Mdle,Fld_flag,          &
 !
 !..Gram matrix in packed format
   !VTYPE, dimension(NrTest*(NrTest+1)/2) :: gramTest
-   VTYPE, allocatable :: gramP(:)
+   VTYPE, allocatable :: gramP(:), Grfp(:)


@stefanhenneking I'll remove this RFP format stuff from the residual routine as well. This is one of those cases where it's a MatVec operation so it's still only using BLAS2 and it's actually somewhat slower using RFP format.

jbadger95 · 2022-02-21T03:28:05Z

Yeah, give me a bit to look over my changes on my other branch and make sure nothing else is crucial then we should be good.

…_maxwell to used packed format

jbadger95 added the enhancement New feature or request label Oct 17, 2021

jbadger95 assigned stefanhenneking Oct 17, 2021

stefanhenneking marked this pull request as ready for review October 25, 2021 14:48

stefanhenneking requested review from stefanhenneking, brendankeith and federicofuentes October 25, 2021 14:48

stefanhenneking mentioned this pull request Oct 26, 2021

Optimizations in primal DPG heat formulation. #37

Open

stefanhenneking reviewed Oct 26, 2021

View reviewed changes

jbadger95 added 13 commits October 26, 2021 16:50

Changed gram matrix to full storage format to leverage BLAS3 and spee…

17bd0ff

…d up linear algebra operations

Restructuring prism to avoid symmetric computations

d8bae91

Implemented envelope terms, in final system only passes diff with aro…

5632337

…und 1e-9 tolerance

Added broken quad computation and fixed maps for hexa

88edc66

Clean up

9656dab

Experimenting with storage formats

7e9f560

Speeding up boundary assembly on hexa

a7607bc

Undoing changes to interior integration

b695785

Changed enriched prism computation and modified prism fast integratio…

fca3a0e

…n routine accordingly Also removed some of the previous interior integration optimizations, can reimplement later

Cleaning up hexa fast integration

46e2292

Modified prism gram FI routine for vectorial envelope terms

a0c734c

Added vectorial envelope terms to hexa FI

70f8f8d

Added vectorial envelope terms to gram hexa

7c74828

jbadger95 and others added 9 commits October 26, 2021 16:50

Added LA optimization to residual_maxwell

8fedfa6

Fixed envelope terms for PML

25511df

Addressing Stefan comments

2dbf64d

Addressing Stefan comments pt 2

822a58c

Fixed final computation of residual

a549850

Rebase onto master

709b168

Merge branch 'master' of github.com:Oden-EAG/hp3d into envelope

65e1e89

Initializing detJstretch in residual Gram computation.

7f81aae

Merge branch 'master' of github.com:Oden-EAG/hp3d into envelope

e62f084

stefanhenneking approved these changes Nov 15, 2021

View reviewed changes

jbadger95 commented Feb 21, 2022

View reviewed changes

Fixing errors in hexa envelope fi routine and reverting elem_residual…

61b0876

…_maxwell to used packed format

jbadger95 merged commit 3454958 into Oden-EAG:master Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast integration changes for Vectorial Envelope and Optimizations #36

Fast integration changes for Vectorial Envelope and Optimizations #36

jbadger95 commented Oct 17, 2021 •

edited

stefanhenneking commented Oct 18, 2021

jbadger95 commented Oct 24, 2021

stefanhenneking left a comment

jbadger95 commented Oct 26, 2021

stefanhenneking commented Oct 26, 2021

stefanhenneking commented Oct 26, 2021

jbadger95 commented Oct 26, 2021

stefanhenneking commented Oct 26, 2021

jbadger95 commented Oct 26, 2021

stefanhenneking commented Oct 26, 2021

stefanhenneking commented Nov 15, 2021

stefanhenneking commented Feb 21, 2022

jbadger95 commented Feb 21, 2022 •

edited

stefanhenneking commented Feb 21, 2022

jbadger95 Feb 21, 2022

jbadger95 commented Feb 21, 2022

Fast integration changes for Vectorial Envelope and Optimizations #36

Fast integration changes for Vectorial Envelope and Optimizations #36

Conversation

jbadger95 commented Oct 17, 2021 • edited

Design

Implementation

stefanhenneking commented Oct 18, 2021

jbadger95 commented Oct 24, 2021

stefanhenneking left a comment

Choose a reason for hiding this comment

jbadger95 commented Oct 26, 2021

stefanhenneking commented Oct 26, 2021

stefanhenneking commented Oct 26, 2021

jbadger95 commented Oct 26, 2021

stefanhenneking commented Oct 26, 2021

jbadger95 commented Oct 26, 2021

stefanhenneking commented Oct 26, 2021

stefanhenneking commented Nov 15, 2021

stefanhenneking commented Feb 21, 2022

jbadger95 commented Feb 21, 2022 • edited

stefanhenneking commented Feb 21, 2022

jbadger95 Feb 21, 2022

Choose a reason for hiding this comment

jbadger95 commented Feb 21, 2022

jbadger95 commented Oct 17, 2021 •

edited

jbadger95 commented Feb 21, 2022 •

edited