Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEM Mass matrix element assembly kernel #330

Merged
merged 14 commits into from
Jul 21, 2023
Merged

FEM Mass matrix element assembly kernel #330

merged 14 commits into from
Jul 21, 2023

Conversation

artv3
Copy link
Member

@artv3 artv3 commented Jun 1, 2023

Kernel for element local mass matrix assembly

@artv3
Copy link
Member Author

artv3 commented Jul 5, 2023

@rhornung67 @MrBurmark could I get a review on this PR?

src/apps/MASS3DEA.hpp Outdated Show resolved Hide resolved
src/apps/MASS3DEA.hpp Outdated Show resolved Hide resolved
src/apps/MASS3DEA-OMP.cpp Outdated Show resolved Hide resolved
src/apps/MASS3DEA-Seq.cpp Outdated Show resolved Hide resolved
src/apps/MASS3DEA.hpp Outdated Show resolved Hide resolved
@artv3
Copy link
Member Author

artv3 commented Jul 6, 2023

@rhornung67 I added some notes on the z-loops with an iteration space of [0,1) that the kernels have. Let me know if it makes sense; I can expand more.

@rhornung67 rhornung67 mentioned this pull request Jul 10, 2023
24 tasks
@MrBurmark
Copy link
Member

MrBurmark commented Jul 11, 2023

My opinion on the "extra" length 1 loops is that if they are not necessary in some of the Base/Lambda implementations its fine to remove them. It seems like in this case that is how you would naturally code up the native variants of the implementation. On the other hand its important for the RAJA implementation to be single source and the same for all RAJA variants because that is how we write RAJA code.
To expound a little further afield, and this is not the case here, but sometimes the sequential optimization is large enough that the sequential and parallel implementations are substantially different. In that case its probably best to have two kernels instead of having significant differences between the Base and RAJA variants. An example of this is the basic INDEXLIST and INDEXLIST_3LOOP kernels where there are two ways to create the same list of indices. Breaking this into two kernels allows us to do variant to variant comparisons that are apples to apples and kernel to kernel comparisons that show the performance implications of each algorithm/implementation.

@artv3
Copy link
Member Author

artv3 commented Jul 18, 2023

TODO: Remove extra loop in host base variants with iteration range [0, 1)

@MrBurmark MrBurmark enabled auto-merge July 20, 2023 23:15
@MrBurmark MrBurmark merged commit ab03e6e into develop Jul 21, 2023
16 of 17 checks passed
@MrBurmark MrBurmark deleted the artv3/mass-ea branch July 21, 2023 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants