Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast integration changes for Vectorial Envelope and Optimizations #36

Merged
merged 23 commits into from
Feb 21, 2022
Merged

Fast integration changes for Vectorial Envelope and Optimizations #36

merged 23 commits into from
Feb 21, 2022

Conversation

jbadger95
Copy link
Contributor

@jbadger95 jbadger95 commented Oct 17, 2021

Design

Fast integration routines need to be updated to include additional vectorial envelope terms. While implementing these changes the following optimizations were also identified:

  • Integration of boundary terms is typically more expensive than interior terms (when using fast integration), identifying sparsity structure of 3D shape functions before integration dramatically reduces the cost of boundary assembly.
  • Storing the Gram matrix in packed format precludes use of more optimized BLAS3 routines for cholesky factorization and back substitution. Converting the Gram matrix from packed to RFP format before factorization (and using corresponding LAPACK and BLAS routines) significantly reduces cost of inversion.

Finally, computation of enriched shape functions on prism and quad elements should be updated to use tensor produce ordering. This simplifies the DoF maps in the prism routine (the prism map is trivial, however when using the prism routine to assemble hexas it's necessary to convert from the 2D-1D tensor ordering to the 1Dx1Dx1D tensor ordering)

Implementation

The following routines need to be updated to include vectorial envelope terms:

  • elem_maxwell_fi_prism
  • elem_maxwell_gram_prism
  • elem_maxwell_fi_hexa
  • elem_maxwell_gram_hexa

The following routines need to be updated to include boundary and linear algebra optimizations:

  • elem_maxwell_fi_prism
  • elem_maxwell_fi_hexa
  • elem_residual_maxwell

The following enriched shape functions routines shoud be updated to use tensor product ordering (the hexa already uses tensor product ordering):

  • quad
    • shape3DHBrokenQuad
    • shape3DEBrokenQuad
    • shape3DVBrokenQuad
  • prism
    • shape2DHBrokenQuad
    • shape2DEBrokenQuad
    • shape2DVBrokenQuad

@jbadger95 jbadger95 added the enhancement New feature or request label Oct 17, 2021
@stefanhenneking
Copy link
Contributor

@jbadger95 When you have the PR all cleaned up and ready-for-review, we'll have to also ask Brandon and Federico to take a look at the changes you are proposing within the shape functions before we merge.

@jbadger95
Copy link
Contributor Author

@federicofuentes @brendankeith Hi, I made some changes to the enriched shape functions on Quads and Prisms essentially mirroring what's in Hexa, I was hoping you guys could take a look. The change is for convenient ordering of the shape functions in sum-factorization routines, it's not really about performance or anything (the expense of computing the shape functions is insignificant even in these sum-factorization elem routines).

Copy link
Contributor

@stefanhenneking stefanhenneking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbadger95 great work!!! I only have minor suggestions; but here are a few thoughts I generally had when looking over the changes:

  1. Fast integration implementations have become increasingly complex because there are so many different terms to track that entirely avoiding introduction of new bugs appears almost impossible. In particular:
  • Adding additional terms to an existing formulation is quite challenging (as evident by the work you had to do for "simply" adding envelope terms to bilinear form and Gram matrix);
  • A thorough code review by looking only at the changes in fast integration routines is difficult; one needs to basically review the entire implementation (thousands of lines);
  • If you have ideas for general simplifications of implementing fast integration in the future, I am open to ideas (maybe modularize somehow?).
  • Did you check for possible memory leaks, either through debugging tool, or an easy hack is by solving the same linear system over and over while keeping track of memory usage (e.g. via htop) -- if in the process of the repeated solving memory usage increases there's a leak.
  1. I haven't done verification runs yet, but will comment again once I have.

@jbadger95
Copy link
Contributor Author

I agree these routines are nasty, comparing to the standard routines is helpful checking but it can take a while to track down bugs in these routines (as evidenced by my glacial pace). I have some ideas for modularization, we might have to explore a bit to try to find a solution that has a minimal impact on performance though. I'll maybe work on some simple examples when I get a chance.

@stefanhenneking
Copy link
Contributor

Maybe some performance hit (a constant factor) is alright as long as the asymptotic speedup is still there.

@stefanhenneking
Copy link
Contributor

Maybe some performance hit (a constant factor) is alright as long as the asymptotic speedup is still there.

If we want to look at this, we should probably do it with a simpler implementation where testing is easier and faster. Maybe a linear UW Maxwell model problem or so.

@jbadger95
Copy link
Contributor Author

The simplest way to modularize these would be to separate computation of the Gram and stiffness matrix. We already have a routine for the Gram matrix, we could make one exclusively for the stiffness matrices (maybe even separate ones for the element and boundary) and then have a different subroutine that does the LA. That would at least make routines more manageable. I can outline what I was thinking for further modularization but I doubt it’s performance will be great.

@stefanhenneking
Copy link
Contributor

This would help modularizing, but not really simplify implementation and debugging in any significant way. I was trying to think further than that, somehow coming up with a more adaptable implementation where things like handling auxiliary matrices would become easier.

@jbadger95
Copy link
Contributor Author

If we want to look at this, we should probably do it with a simpler implementation where testing is easier and faster. Maybe a linear UW Maxwell model problem or so.

Or the heat problem?

@stefanhenneking
Copy link
Contributor

If we want to look at this, we should probably do it with a simpler implementation where testing is easier and faster. Maybe a linear UW Maxwell model problem or so.

Or the heat problem?

Possible, yes. But using the existing primal DPG Poisson model problem as a testing ground makes things a lot easier than having to run the LASER application for testing.

@stefanhenneking
Copy link
Contributor

I think we can merge this. @jbadger95 do you agree?
There are only a few notable cases that we haven't tested:

  1. anisotropic material (birefringent fiber)
  2. envelope for the pump field (in principle, it should work same as the signal field)

In addition, it would be nice to have the PML parameters be automatically tuned depending on the user input for the envelope wavenumber. This could go into a separate PR.

@stefanhenneking
Copy link
Contributor

@jbadger95 you mentioned fixing some minor issue recently in the FI routines for the envelope formulation? Did you make the change on a different branch? If so, would like to add it here or should we merge this branch into master and then make that change in a follow-up PR?

@jbadger95
Copy link
Contributor Author

jbadger95 commented Feb 21, 2022

I made the changes on my bending branch, I'll fix the errors quick on this one though. I have some more extensive changes to generalize the PML and stuff but I'll make those on a separate PR.

@stefanhenneking
Copy link
Contributor

Ok, sounds good. I believe otherwise, once you have added the fix, this branch should be ready for merge?

@@ -103,7 +103,7 @@ subroutine elem_residual_maxwell(Mdle,Fld_flag, &
!
!..Gram matrix in packed format
!VTYPE, dimension(NrTest*(NrTest+1)/2) :: gramTest
VTYPE, allocatable :: gramP(:)
VTYPE, allocatable :: gramP(:), Grfp(:)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanhenneking I'll remove this RFP format stuff from the residual routine as well. This is one of those cases where it's a MatVec operation so it's still only using BLAS2 and it's actually somewhat slower using RFP format.

@jbadger95
Copy link
Contributor Author

Yeah, give me a bit to look over my changes on my other branch and make sure nothing else is crucial then we should be good.

@jbadger95 jbadger95 merged commit 3454958 into Oden-EAG:master Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants