Matrixfree evaluation with `FE_RaviartThomasNodal` #13591

NiklasWik · 2022-04-05T14:04:51Z

This PR adds support for matrixfree evaluation of FE_RaviartThomasNodal (#9545).

However, it is currently missing the Piola transform, which you can see traces of in the 2 tests I created. The Piola transform is a planned follow up for the coming weeks.

In shape_info.templates.h there's a function which returns the lexicographic numbering of the nodes that don't really fit in. But with #9544 in mind, I felt that this was an ok temporary solution without making any major changes.

I'm guessing you'll have some opinions regarding the implementation, especially FEFaceEvaluationImplRaviartThomas and interpolate_generic_raviart_thomas(...) as I feel that they are a bit clumsy ..

marcfehling · 2022-04-05T19:07:36Z

Some simplex tests failed. Could you have a look?

The following tests FAILED:
	 16 - simplex/data_out_write_vtk_01.debug (Failed)
Errors while running CTest
	 93 - simplex/step-67.mpirun=1.debug (Failed)
	 94 - simplex/step-67.mpirun=2.debug (Failed)

include/deal.II/matrix_free/matrix_free.h

peterrum

First sets of comment. I'll take a look at include/deal.II/matrix_free/evaluation_kernels.h tomorrow. Very nice till now!

peterrum · 2022-04-06T20:24:25Z

include/deal.II/matrix_free/fe_evaluation.h

-                 this->data->data.front().fe_degree &&
+                 this->data->data.back().fe_degree &&


Does this make a difference?

I know this is a bit of an ugly fix... I wrote it so that data.front().fe_degree = degree + 1 and back() = degree. One could rewrite it so that it is the same, but this also means a change of +-1 in a few places in include/deal.II/matrix_free/evaluation_kernels.h. Do you think I should make this change? I guess I could also change the assert depending on if it is a RT element or not.

I would prefer if we could avoid make a change here. When you say this would mean to change some +-1 in some places, is this only in your new code or elsewhere, too? I think the fewer changes we make to existing code the better it is from a design perspective.

Only in the new code, so I could make these changes. A sort of related thing which me and @nfehn discussed a bit before is that FE_RaviartThomasNodal(k) creates an element of degree k in tangent direction and k+1 in normal direction, but fe.degree returns k+1. Why is this beneficial? This has meant that the code I've written so far in Exadg, I create RT elements with FE_RaviartThomasNodal(k - 1) in order to be able to use the same quadrature.

I think the motivation for FiniteElement::degree is to give the highest degree in any coordinate direction, so that a quadrature formula with FE::degree+1 ensures exact integration. I don't recall the precise convention, but I remember having discussed this with Guido Kanschat at some point (>10 years ago).

Thanks for the explanation. I guess we will a couple of problems with back/front in the future.

include/deal.II/matrix_free/shape_info.h

peterrum · 2022-04-06T20:29:23Z

tests/matrix_free/matrix_vector_rt_face_common.h

+    deallog.pop();
+    deallog.push("3d");
+    test<3, 2>();
+    // test<3, 3>(); Takes way too long


How much cells/dofs are we talking about?

8 cells and 756 dofs, so not that much. But the debug version takes 400 seconds which I'm guessing is because it is so slow to assemble the matrix. Haven't looked into it though.

Would a small mesh with 2 cells or so help?

Probably! But how do I create a mesh with only 2 cells..? I know you can refine different cells but not how I would refine a cell into 2 instead of 8 for dim=3

https://www.dealii.org/developer/doxygen/deal.II/namespaceGridGenerator.html#ac76417d7404b75cf53c732f456e6e971 with [2,1,1] subdivisions.

It still takes 200+ seconds for debug+release with 2 cells, so I'll leave this out for now, but I can mention this in the issue with todos. Maybe one could do something different than comparing with FEValues?

That is strange. Let's keep the 3D test case for now commented. But maybe you could investigate later what takes so long in the case of FEFaceValues?

Maybe one could do something different than comparing with FEValues?

Alternatively, one can solve a PDE and check the convergence rate.

tests/matrix_free/matrix_vector_rt_face_common.h

include/deal.II/matrix_free/tensor_product_kernels.h

peterrum · 2022-04-06T21:14:54Z

include/deal.II/matrix_free/tensor_product_kernels.h

+    constexpr int n_blocks2 =
+      (dim - direction - 1 == 0) ?
+        1 :
+        ((direction == normal_dir) ?
+           Utilities::pow((n_rows - 1),
+                          (direction >= dim) ? 0 : dim - direction - 1) :
+           (((direction < normal_dir) ? (n_rows + 1) : n_rows) *


I remember discussing this with @jwitte08 two years ago :)

I am wondering if we need two implementations of this function. Couldn't we extract the common parts?

Yes, this is also mentioned by @kronbichler in #9545. I didn't want to do any major changes and potentially brake anything, but this should be done at some point.

OK. Let's postpone. But we should address this soon!

include/deal.II/matrix_free/tensor_product_kernels.h

peterrum · 2022-04-06T21:20:40Z

include/deal.II/matrix_free/tensor_product_kernels.h

+                      if (normal_direction == 0)
+                        {
+                          if (contract_onto_face)
+                            out += n_rows - 1;
+                          else
+                            in += n_rows - 1;
+                        }
+                      if (normal_direction == 1)
+                        {
+                          if (contract_onto_face)
+                            out += n_rows - 2;
+                          else
+                            in += n_rows - 2;
+                        }
+                      if (normal_direction == 2)
+                        {
+                          if (contract_onto_face)
+                            out += n_rows;
+                          else
+                            in += n_rows;
+                        }


Same here. Most of the code is identical and only these shifts differ.

kronbichler

This is great code, thank you @NiklasWik ! I have a few comments and suggestions as listed below. One thing that strikes me after reading the code is that I believe most of the code could also be used for Nedelec elements (once we implement an FE_NedelecNodal as suggested in #9545) with only minor adjustments, so this is a really nice structure.

include/deal.II/matrix_free/evaluation_kernels.h

include/deal.II/matrix_free/shape_info.templates.h

include/deal.II/matrix_free/tensor_product_kernels.h

peterrum · 2022-04-08T06:21:16Z

include/deal.II/matrix_free/fe_evaluation.h

-                 this->data->data.front().fe_degree &&
+                 this->data->data.back().fe_degree &&


Thanks for the explanation. I guess we will a couple of problems with back/front in the future.

peterrum · 2022-04-08T06:23:22Z

tests/matrix_free/matrix_vector_rt_face_common.h

+    deallog.pop();
+    deallog.push("3d");
+    test<3, 2>();
+    // test<3, 3>(); Takes way too long


Would a small mesh with 2 cells or so help?

tests/matrix_free/matrix_vector_rt_face_common.h

peterrum · 2022-04-08T06:24:59Z

tests/matrix_free/matrix_vector_rt_face_common.h

+                                EvaluationFlags::values |
+                                  EvaluationFlags::gradients);


Let's do it as you proposed. Maybe you could create an issue with todos?

peterrum · 2022-04-08T06:29:43Z

include/deal.II/matrix_free/shape_info.templates.h

+          lex_normal.push_back(0);
+          for (unsigned int i = dofs_per_face_normal * 2 * dim;
+               i < dofs_per_face_normal * 2 * dim + fe.degree - 1;
+               i++)
+            lex_normal.push_back(i);
+          lex_normal.push_back(dofs_per_face_normal);


We create temporal FEs at multiple place, e.g.,

dealii/include/deal.II/matrix_free/shape_info.templates.h

Lines 833 to 834 in efbc6da

const auto fe_1d = create_fe<1>(fe);

const auto fe_2d = create_fe<2>(fe);

Is the setup of RT too expensive?

Or do you mean for a more easily read code?

Exactly. We normally break down in MF the element into 1D representation; here we have a 2D representation.

peterrum · 2022-04-08T06:32:23Z

include/deal.II/matrix_free/tensor_product_kernels.h

+    constexpr int n_blocks2 =
+      (dim - direction - 1 == 0) ?
+        1 :
+        ((direction == normal_dir) ?
+           Utilities::pow((n_rows - 1),
+                          (direction >= dim) ? 0 : dim - direction - 1) :
+           (((direction < normal_dir) ? (n_rows + 1) : n_rows) *


OK. Let's postpone. But we should address this soon!

peterrum · 2022-04-08T06:42:41Z

include/deal.II/matrix_free/evaluation_kernels.h

+      const T0                               eval0,
+      const T1                               eval1,
+      const T2                               eval2,


Pass by reference? What are these? Flags or evaluation kernels?

Evaluation kernels (EvaluatorTensorProduct). I could rename them to something different. The variable names are the same as in previous code, so I kept them for consistency.

include/deal.II/matrix_free/evaluation_kernels.h

peterrum

Looks overall very good!

Reviewing of evaluation_kernels.h is a bit hard, since a lot of code seems to be copy and paste. Is there a possibility to get rid of evaluate_tensor_product_per_component() and similar functions and introduce helper functions that work on a "list" of EvaluatorTensorProduct? The calling function would pass in the right objects depending of it is istropic or anisotropic? That would reduce the number of lines added significantly! Not sure if that is possible!?

If it is possible, my preference would be to create the helper function first, merge that into master and use that here in this PR.

include/deal.II/matrix_free/evaluation_kernels.h

peterrum · 2022-04-08T08:00:34Z

include/deal.II/matrix_free/evaluation_kernels.h

+    Eval_normal eval_normal(univariate_shape_data[0]->shape_values,
+                            univariate_shape_data[0]->shape_gradients,
+                            univariate_shape_data[0]->shape_hessians,
+                            univariate_shape_data[0]->fe_degree + 1,
+                            univariate_shape_data[0]->n_q_points_1d);


One can create a helper function like create_evaluator_tensor_product()

Yes, I've done that for the face evaluation, where you have to pick out the right shape functions. (Though I don't think hanging nodes work at this point). But I think doing it here will result in more lines than to just creating these two.

peterrum · 2022-04-08T08:01:38Z

include/deal.II/matrix_free/evaluation_kernels.h

-                 shape_info.data.front().fe_degree ||
-               fe_degree == -1,
-             ExcInternalError());
+      if (symmetric_evaluate) // TODO


Could you go through the TODO and add a comment what the problem there is (and maybe also note these an issue).

I've already remove all todos as Martin told me to, since one would have to go through everything again anyway to check what works and what doesn't.

peterrum · 2022-04-08T08:10:17Z

include/deal.II/matrix_free/evaluation_kernels.h

+              eval0.template gradients<0, true, false, normal_dir>(values_dofs,
+                                                                   temp1);
+              eval1.template values<1, true, false, normal_dir>(temp1,
+                                                                gradients_quad);


Conceptually, I think normal_dir belong to EvaluatorTensorProduct. It is a parameter like degree, specifying the data layout. In this anisotropic case, the normal_dir specified which degree is different; e.g., for normal_dir=0: degree+1 x degree x degree. I would create EvaluatorTensorProduct objects with the right template parameters locally.

peterrum · 2022-04-08T08:12:27Z

include/deal.II/matrix_free/evaluation_kernels.h

+              eval0.template gradients<0, true, false, normal_dir>(values_dofs,
+                                                                   temp1);
+              eval1.template values<1, true, false, normal_dir>(temp1,
+                                                                gradients_quad);


A general question: how does the data layout look like in the case of Nedelec? I am wondering if normal_dir is the right template parameter of the new EvaluatorTensorProduct class and we shouldn't explicitly write the degrees dim time?

NiklasWik · 2022-04-11T09:06:53Z

I have some failing tests when I run them locally, but I don't understand what I've done that made these fail..

I managed to remove about 1000 lines. Hopefully I didn't miss anything from above.. Thanks for your feedback and help @peterrum and @kronbichler! :)

peterrum · 2022-04-11T09:11:42Z

I managed to remove about 1000 lines.

That sounds great!

I have some failing tests when I run them locally, but I don't understand what I've done that made these fail..

Can you pinpoint which commit breaks the tests?

NiklasWik · 2022-04-11T09:21:49Z

Can you pinpoint which commit breaks the tests?

Haven't figured that out yet. I'll continue trying to find the problem

NiklasWik · 2022-04-11T09:39:51Z

After two failing checks of "tidy"... How can I also run this locally before pushing to the PR? Or do I just push to another branch and let the test run on my fork?

peterrum · 2022-04-11T09:42:47Z

After two failing checks of "tidy"... How can I also run this locally before pushing to the PR? Or do I just push to another branch and let the test run on my fork?

Either you install clang-tidy on your machine or you just keep on pushing commits to the branch (as most people do).

NiklasWik · 2022-04-12T05:58:21Z

Very confused about these fails. I had other tests failing for me locally.

peterrum · 2022-04-12T07:21:55Z

Very confused about these fails. I had other tests failing for me locally.

I have all matrix-free tests locally and they pass.

Related the error here:

[ 86%] Building CXX object source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_template_factory_inst4.cc.o
icpc: error #10106: Fatal error in /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/../../bin/intel64/mcpcom, terminated by kill signal
compilation aborted for /home/runner/work/dealii/dealii/source/matrix_free/evaluation_template_factory_inst3.cc (code 1)
make[2]: *** [source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_template_factory_inst3.cc.o] Error 1
source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/build.make:117: recipe for target 'source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_template_factory_inst3.cc.o' failed
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/all] Error 2
CMakeFiles/Makefile2:2937: recipe for target 'source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/all' failed
make: *** [all] Error 2
Makefile:135: recipe for target 'all' failed

This looks like a memory issue: trilinos/Trilinos#3277 (comment).

Maybe you could increase the number 6 to 10 here:

dealii/source/matrix_free/evaluation_template_factory.cc

Lines 21 to 24 in e677991

    
           #define SPLIT_INSTANTIATIONS_COUNT 6 
        
           #ifndef SPLIT_INSTANTIATIONS_INDEX 
        
           #  define SPLIT_INSTANTIATIONS_INDEX 0 
        
           #endif

and create new files with the name evaluation_template_factory_instX.cc. Hopefully, this decreases the size of the compilation units. If not, we need to manually move some content of evaluation_template_factory.templates.h to a new file, just as we did last year in the case of the hanging-node kernels.

peterrum · 2022-04-12T09:07:03Z

You also need to extend:

dealii/source/matrix_free/CMakeLists.txt

Lines 20 to 25 in 70ac873

    
           evaluation_template_factory.cc 
        
           evaluation_template_factory_inst2.cc 
        
           evaluation_template_factory_inst3.cc 
        
           evaluation_template_factory_inst4.cc 
        
           evaluation_template_factory_inst5.cc 
        
           evaluation_template_factory_inst6.cc

peterrum · 2022-04-13T09:25:44Z

I am wondering which modification lead to the increased memory consumption? Is it 02d3c52 and why? That you moved the template argument normal_dir from the function to the class, cannot be an issue can it?

kronbichler · 2022-04-13T10:10:43Z

I am wondering which modification lead to the increased memory consumption?

Do we know how much resources the VMs get and were they exhaused by "chance"? In general, it is hard to predict what precisely contributes to the memory consumption during compilation (unless you write the respective compiler), but of course the number of instantiations plays a role, but also the length of the function argument list that the compiler needs to juggle.

If not, we need to manually move some content of evaluation_template_factory.templates.h to a new file, just as we did last year in the case of the hanging-node kernels.

The most interesting step would be to separate the face evaluation kernel instantiations from the cell evaluation kernels, as they are pretty much separate and do different things anyway. This would also bring down the size of evaluation_kernels.h in terms of the number of lines of code into somewhat more manageable levels.

peterrum · 2022-04-13T12:59:24Z

The most interesting step would be to separate the face evaluation kernel instantiations from the cell evaluation kernels, as they are pretty much separate and do different things anyway.

That would be worth a try. I was thinking about the same but I am not sure if that will be enough , since the windows test is able to build the library but fails during the quick tests.

@NiklasWik Could you split up the file https://github.com/dealii/dealii/blob/master/include/deal.II/matrix_free/evaluation_template_factory.templates.h and create corresponding files for the instantiation. If that is not working, we need to revisit the PR again and think about alternatives.

NiklasWik · 2022-04-13T14:59:39Z

@NiklasWik Could you split up the file https://github.com/dealii/dealii/blob/master/include/deal.II/matrix_free/evaluation_template_factory.templates.h and create corresponding files for the instantiation.

Yes, I'll get on it! Hopefully I'll make time for it tomorrow.

NiklasWik · 2022-04-14T10:11:57Z

Did I miss something @peterrum?

peterrum · 2022-04-14T12:22:07Z

One failing test less. The windows test says:

check for insufficient disk space, invalid path, or insufficient privilege

According to https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources, the Windows runners should have about 7GB RAM. I have checked locally (on Linux): the library size has increase from 1.9 to 2.0GB in debug mode. This means an increase by 100MB, which is a lot, but shouldn't break the Window tests (but I guess it was already at the limit). I'll take another look at the PR how we could reduce memory consumption.

NiklasWik · 2022-04-19T12:35:12Z

I have pending commits for the Piola transform for affine cells, together with updated tests. Should I wait with these or just push them?

peterrum

If I see it correctly, the largest modifications of the last few commit is the introduction of these helper functions

dealii/include/deal.II/matrix_free/evaluation_kernels.h

Lines 990 to 998 in 317cb12

    
           template <int dim, int fe_degree, int n_q_points_1d, typename Number> 
        
           template <bool integrate> 
        
           inline void 
        
           FEEvaluationImpl<MatrixFreeFunctions::tensor_raviart_thomas, 
        
                            dim, 
        
                            fe_degree, 
        
                            n_q_points_1d, 
        
                            Number>:: 
        
             evaluate_or_integrate(

dealii/include/deal.II/matrix_free/evaluation_kernels.h

Lines 3164 to 3170 in 317cb12

    
           template <int  normal_dir, 
        
                     int  component, 
        
                     bool integrate, 
        
                     typename Eval0, 
        
                     typename Eval1> 
        
           static inline void 
        
           evaluate_in_face_apply(

which have the template arguments bool integrate. In the code you are using if instead of constexpr if, which would be the correct one (latter is not possible, since we are still using C++14). My guess is that this leads to some code duplication, which an optimizing compiler should not do. Maybe I could ask you to split up those functions and instead of if-statements use function specialization based on std::integral_constant like here:

dealii/include/deal.II/matrix_free/vector_access_internal.h

Lines 305 to 343 in 91bff6e

    
               template <typename VectorType> 
        
               void 
        
               process_dofs_vectorized(const unsigned int   dofs_per_cell, 
        
                                       const unsigned int   dof_index, 
        
                                       VectorType &         vec, 
        
                                       VectorizedArrayType *dof_values, 
        
                                       std::integral_constant<bool, true>) const 
        
               { 
        
           #ifdef DEBUG 
        
                 // in debug mode, run non-vectorized version because this path 
        
                 // has additional checks (e.g., regarding ghosting) 
        
                 process_dofs_vectorized(dofs_per_cell, 
        
                                         dof_index, 
        
                                         vec, 
        
                                         dof_values, 
        
                                         std::integral_constant<bool, false>()); 
        
           #else 
        
                 const Number *vec_ptr = vec.begin() + dof_index; 
        
                 for (unsigned int i = 0; i < dofs_per_cell; 
        
                      ++i, vec_ptr += VectorizedArrayType::size()) 
        
                   dof_values[i].load(vec_ptr); 
        
           #endif 
        
               } 
        
               template <typename VectorType> 
        
               void 
        
               process_dofs_vectorized(const unsigned int   dofs_per_cell, 
        
                                       const unsigned int   dof_index, 
        
                                       const VectorType &   vec, 
        
                                       VectorizedArrayType *dof_values, 
        
                                       std::integral_constant<bool, false>) const 
        
               { 
        
                 for (unsigned int i = 0; i < dofs_per_cell; ++i) 
        
                   for (unsigned int v = 0; v < VectorizedArrayType::size(); ++v) 
        
                     dof_values[i][v] = 
        
                       vector_access(vec, dof_index + v + i * VectorizedArrayType::size()); 
        
               }

peterrum · 2022-04-19T08:08:21Z

include/deal.II/fe/fe_raviart_thomas.h

+{
+  template <int dim>
+  std::vector<unsigned int>
+  get_lexicographic_numbering_rt_nodal(const unsigned int degree)


Cannot we move the implementation to a source file?

peterrum · 2022-04-19T14:14:37Z

I have pending commits for the Piola transform for affine cells, together with updated tests. Should I wait with these or just push them?

Let's keep them separate. Feel free to open a new PR!

NiklasWik · 2022-04-22T14:11:05Z

@peterrum I hope this was what you meant, but this seems to have made it worse. I'm not sure what else I can do. Do you have any other ideas?

peterrum · 2022-04-22T14:16:30Z

I hope this was what you meant, but this seems to have made it worse. I'm not sure what else I can do. Do you have any other ideas?

Oh, it did. To be honest, I don't have a clue either. My feeling is that the widows machines are already at the limit (even without this PR - but I don't have hard data for that). A dirty option would to switch to release mode on the Windows machines: on my machine the release builds consume significantly less memory. I don't like this solution; but don't have any idea. Maybe @kronbichler has? I guess it does not harm to try out release mode; just to get a feeling if that would help...

peterrum · 2022-04-22T14:17:07Z

The relevant line would be:

dealii/.github/workflows/windows.yml

Line 32 in 2633c93

    
                   cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX" -T host=x64 -A x64 ..

peterrum · 2022-04-22T14:29:02Z

Fingers crossed!

peterrum · 2022-04-25T09:18:38Z

.github/workflows/windows.yml

-        cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX" -T host=x64 -A x64 ..
+        cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX /D FE_EVAL_FACTORY_DEGREE_MAX=2" -T host=x64 -A x64 ..


Is it so simple to define a C macro? It it wouldn't work just override manually the value of

dealii/include/deal.II/matrix_free/evaluation_template_factory_internal.h

Line 24 in 0a36820

# define FE_EVAL_FACTORY_DEGREE_MAX 6

... and we find a solution after we have seen what the benefits are!

OK. It worked (now I see that you are directly modify the CXX flags)! I'll take another look at the code later today!

It was Martin, just to be clear! I shouldn't take credit here :)

I'll prepare the PR of the Piola for the affine case right away.

Of course, it's still a temporary solution since there will be a point where we will hit this with something else, but for all those build tests building a rather minimal degree should be more than enough, no need to waste disk space and compile time on it. We should probably choose something similar for the MacOS and Linux builds that are done through github's infrastructure.

@kronbichler Why aren't we do it like this in ExaDG? We also have somewhere the possibility if fast evaluation is enabled or not...

One could do that as well - the reason why I opted against it up to now was that deal.II compiles considerably more code than ExaDG does, as it repeats the instantiations also for narrower SIMD (ExaDG picks only the widest version), and that makes deal.II libraries considerably fatter. The other question is whether ExaDG really needs all instantiations up to 15 with the fast path, or whether 7 or 8 would suffice and then use the slower path afterwards.

peterrum

Looks good to me (apart from a few very minor points)!

peterrum · 2022-04-25T20:30:54Z

.github/workflows/windows.yml

-        cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX" -T host=x64 -A x64 ..
+        cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX /D FE_EVAL_FACTORY_DEGREE_MAX=2" -T host=x64 -A x64 ..


@kronbichler Why aren't we do it like this in ExaDG? We also have somewhere the possibility if fast evaluation is enabled or not...

include/deal.II/matrix_free/shape_info.templates.h

peterrum · 2022-04-25T20:36:36Z

source/matrix_free/evaluation_template_face_factory.cc

+
+DEAL_II_NAMESPACE_OPEN
+
+#define SPLIT_INSTANTIATIONS_COUNT 6


@kronbichler Should we keep the new instantiation files?

I think it makes sense to introduce them, yes. We might want to cut down the number of files a bit with some statistics on the size, but I see benefit in having them split up already.

peterrum · 2022-04-25T20:47:34Z

include/deal.II/matrix_free/evaluation_kernels.h

+              eval0.template gradients<0, true, false, normal_dir>(values_dofs,
+                                                                   temp1);
+              eval1.template values<1, true, false, normal_dir>(temp1,
+                                                                gradients_quad);


I think we should unify the code paths for tensor product, RT, and Nedelec if possible (in the future). At the moment these evaluation calls look like copy-and-paste. But maybe the Piola transforms (as indicated by the todos below) will add here some additional complexity.

peterrum · 2022-04-25T20:48:41Z

include/deal.II/matrix_free/evaluation_kernels.h



  template <int dim, int fe_degree, typename Number, bool lex_faces = false>
  struct FEFaceNormalEvaluationImpl
  {
+    using EvalNormalF =


What is F standing for?

But maybe the Piola transforms (as indicated by the todos below) will add here some additional complexity.

Yes, the Piola transform for general cells will require some changes here.

What is F standing for?

Just face :) I can write it out explicitly.

peterrum · 2022-04-25T21:04:02Z

And one thing: could you squash the commits!? Thx!

NiklasWik · 2022-04-25T22:04:18Z

Woops. That did not go as planned. Give me a sec

Test added for matrix-free RT Split of evaluation_template_factory.templates.h Decrease FE_EVAL_FACTORY_DEGREE_MAX to 2 for windows vm

NiklasWik · 2022-04-25T23:12:08Z

Woops. That did not go as planned. Give me a sec

Long second... I guess this was great motivation to get better at git..

peterrum

I let @kronbichler have the final decision!

peterrum self-requested a review April 5, 2022 14:11

kronbichler added Matrix-free ready to test labels Apr 5, 2022

peterrum reviewed Apr 5, 2022

View reviewed changes

include/deal.II/matrix_free/matrix_free.h Outdated Show resolved Hide resolved

peterrum reviewed Apr 6, 2022

View reviewed changes

kronbichler reviewed Apr 7, 2022

View reviewed changes

peterrum reviewed Apr 8, 2022

View reviewed changes

NiklasWik force-pushed the RT_matrix_free branch from efbc6da to 02d3c52 Compare April 11, 2022 08:58

peterrum reviewed Apr 19, 2022

View reviewed changes

peterrum added this to the Release 9.4 milestone Apr 23, 2022

NiklasWik force-pushed the RT_matrix_free branch from e88ed57 to e33fb85 Compare April 25, 2022 09:03

peterrum reviewed Apr 25, 2022

View reviewed changes

NiklasWik mentioned this pull request Apr 25, 2022

Matrix-free Piola transformation for affine cells #13642

Merged

peterrum approved these changes Apr 25, 2022

View reviewed changes

NiklasWik force-pushed the RT_matrix_free branch from da5678f to 95d4f40 Compare April 25, 2022 22:03

NiklasWik force-pushed the RT_matrix_free branch from 95d4f40 to 8672439 Compare April 25, 2022 23:05

Matrix-free evaluation of FE_RaviartThomasNodal

6918f72

Test added for matrix-free RT Split of evaluation_template_factory.templates.h Decrease FE_EVAL_FACTORY_DEGREE_MAX to 2 for windows vm

NiklasWik force-pushed the RT_matrix_free branch from 8672439 to 6918f72 Compare April 25, 2022 23:06

peterrum approved these changes Apr 26, 2022

View reviewed changes

kronbichler approved these changes Apr 26, 2022

View reviewed changes

kronbichler merged commit e99aafd into dealii:master Apr 26, 2022

kronbichler mentioned this pull request Apr 29, 2022

Tutorial: Time-harmonic Maxwell solver with perfectly matched layer #12807

Merged

peterrum mentioned this pull request Jul 6, 2022

Export CMake Module exadg/exadg#236

Merged

kronbichler mentioned this pull request Jul 7, 2022

Robustness of FEEvaluation's pre-compiled evaluation routines #14106

Closed

kronbichler mentioned this pull request Aug 31, 2023

Matrix-free evaluation of Raviart-Thomas: major restructuring #15951

Merged

		this->data->data.front().fe_degree &&
		this->data->data.back().fe_degree &&

	const auto fe_1d = create_fe<1>(fe);
	const auto fe_2d = create_fe<2>(fe);

	template <int dim, int fe_degree, int n_q_points_1d, typename Number>
	template <bool integrate>
	inline void
	FEEvaluationImpl<MatrixFreeFunctions::tensor_raviart_thomas,
	dim,
	fe_degree,
	n_q_points_1d,
	Number>::
	evaluate_or_integrate(

	template <int normal_dir,
	int component,
	bool integrate,
	typename Eval0,
	typename Eval1>
	static inline void
	evaluate_in_face_apply(

	template <typename VectorType>
	void
	process_dofs_vectorized(const unsigned int dofs_per_cell,
	const unsigned int dof_index,
	VectorType & vec,
	VectorizedArrayType *dof_values,
	std::integral_constant<bool, true>) const
	{
	#ifdef DEBUG
	// in debug mode, run non-vectorized version because this path
	// has additional checks (e.g., regarding ghosting)
	process_dofs_vectorized(dofs_per_cell,
	dof_index,
	vec,
	dof_values,
	std::integral_constant<bool, false>());
	#else
	const Number *vec_ptr = vec.begin() + dof_index;
	for (unsigned int i = 0; i < dofs_per_cell;
	++i, vec_ptr += VectorizedArrayType::size())
	dof_values[i].load(vec_ptr);
	#endif
	}



	template <typename VectorType>
	void
	process_dofs_vectorized(const unsigned int dofs_per_cell,
	const unsigned int dof_index,
	const VectorType & vec,
	VectorizedArrayType *dof_values,
	std::integral_constant<bool, false>) const
	{
	for (unsigned int i = 0; i < dofs_per_cell; ++i)
	for (unsigned int v = 0; v < VectorizedArrayType::size(); ++v)
	dof_values[i][v] =
	vector_access(vec, dof_index + v + i * VectorizedArrayType::size());
	}

		cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX" -T host=x64 -A x64 ..
		cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX /D FE_EVAL_FACTORY_DEGREE_MAX=2" -T host=x64 -A x64 ..

Matrixfree evaluation with FE_RaviartThomasNodal #13591

Matrixfree evaluation with FE_RaviartThomasNodal #13591

Conversation

NiklasWik commented Apr 5, 2022

marcfehling commented Apr 5, 2022

peterrum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kronbichler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterrum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NiklasWik commented Apr 11, 2022 • edited

peterrum commented Apr 11, 2022

NiklasWik commented Apr 11, 2022

NiklasWik commented Apr 11, 2022

peterrum commented Apr 11, 2022

NiklasWik commented Apr 12, 2022

peterrum commented Apr 12, 2022

peterrum commented Apr 12, 2022

peterrum commented Apr 13, 2022

kronbichler commented Apr 13, 2022 • edited

peterrum commented Apr 13, 2022

NiklasWik commented Apr 13, 2022

NiklasWik commented Apr 14, 2022

peterrum commented Apr 14, 2022

NiklasWik commented Apr 19, 2022

peterrum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterrum commented Apr 19, 2022

NiklasWik commented Apr 22, 2022

peterrum commented Apr 22, 2022

peterrum commented Apr 22, 2022

peterrum commented Apr 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterrum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterrum commented Apr 25, 2022

Matrixfree evaluation with `FE_RaviartThomasNodal` #13591

Matrixfree evaluation with `FE_RaviartThomasNodal` #13591

NiklasWik commented Apr 11, 2022 •

edited

kronbichler commented Apr 13, 2022 •

edited