New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matrixfree evaluation with FE_RaviartThomasNodal
#13591
Conversation
Some simplex tests failed. Could you have a look?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First sets of comment. I'll take a look at include/deal.II/matrix_free/evaluation_kernels.h
tomorrow. Very nice till now!
this->data->data.front().fe_degree && | ||
this->data->data.back().fe_degree && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is a bit of an ugly fix... I wrote it so that data.front().fe_degree = degree + 1
and back() = degree
. One could rewrite it so that it is the same, but this also means a change of +-1
in a few places in include/deal.II/matrix_free/evaluation_kernels.h
. Do you think I should make this change? I guess I could also change the assert depending on if it is a RT element or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we could avoid make a change here. When you say this would mean to change some +-1 in some places, is this only in your new code or elsewhere, too? I think the fewer changes we make to existing code the better it is from a design perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only in the new code, so I could make these changes. A sort of related thing which me and @nfehn discussed a bit before is that FE_RaviartThomasNodal(k)
creates an element of degree k
in tangent direction and k+1
in normal direction, but fe.degree
returns k+1
. Why is this beneficial? This has meant that the code I've written so far in Exadg, I create RT elements with FE_RaviartThomasNodal(k - 1)
in order to be able to use the same quadrature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the motivation for FiniteElement::degree
is to give the highest degree in any coordinate direction, so that a quadrature formula with FE::degree+1
ensures exact integration. I don't recall the precise convention, but I remember having discussed this with Guido Kanschat at some point (>10 years ago).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I guess we will a couple of problems with back/front in the future.
deallog.pop(); | ||
deallog.push("3d"); | ||
test<3, 2>(); | ||
// test<3, 3>(); Takes way too long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much cells/dofs are we talking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 cells and 756 dofs, so not that much. But the debug version takes 400 seconds which I'm guessing is because it is so slow to assemble the matrix. Haven't looked into it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a small mesh with 2 cells or so help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably! But how do I create a mesh with only 2 cells..? I know you can refine different cells but not how I would refine a cell into 2 instead of 8 for dim=3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still takes 200+ seconds for debug+release with 2 cells, so I'll leave this out for now, but I can mention this in the issue with todos. Maybe one could do something different than comparing with FEValues
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is strange. Let's keep the 3D test case for now commented. But maybe you could investigate later what takes so long in the case of FEFaceValues
?
Maybe one could do something different than comparing with FEValues?
Alternatively, one can solve a PDE and check the convergence rate.
constexpr int n_blocks2 = | ||
(dim - direction - 1 == 0) ? | ||
1 : | ||
((direction == normal_dir) ? | ||
Utilities::pow((n_rows - 1), | ||
(direction >= dim) ? 0 : dim - direction - 1) : | ||
(((direction < normal_dir) ? (n_rows + 1) : n_rows) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember discussing this with @jwitte08 two years ago :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we need two implementations of this function. Couldn't we extract the common parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is also mentioned by @kronbichler in #9545. I didn't want to do any major changes and potentially brake anything, but this should be done at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's postpone. But we should address this soon!
if (normal_direction == 0) | ||
{ | ||
if (contract_onto_face) | ||
out += n_rows - 1; | ||
else | ||
in += n_rows - 1; | ||
} | ||
if (normal_direction == 1) | ||
{ | ||
if (contract_onto_face) | ||
out += n_rows - 2; | ||
else | ||
in += n_rows - 2; | ||
} | ||
if (normal_direction == 2) | ||
{ | ||
if (contract_onto_face) | ||
out += n_rows; | ||
else | ||
in += n_rows; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Most of the code is identical and only these shifts differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great code, thank you @NiklasWik ! I have a few comments and suggestions as listed below. One thing that strikes me after reading the code is that I believe most of the code could also be used for Nedelec elements (once we implement an FE_NedelecNodal
as suggested in #9545) with only minor adjustments, so this is a really nice structure.
this->data->data.front().fe_degree && | ||
this->data->data.back().fe_degree && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I guess we will a couple of problems with back/front in the future.
deallog.pop(); | ||
deallog.push("3d"); | ||
test<3, 2>(); | ||
// test<3, 3>(); Takes way too long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a small mesh with 2 cells or so help?
EvaluationFlags::values | | ||
EvaluationFlags::gradients); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it as you proposed. Maybe you could create an issue with todos?
lex_normal.push_back(0); | ||
for (unsigned int i = dofs_per_face_normal * 2 * dim; | ||
i < dofs_per_face_normal * 2 * dim + fe.degree - 1; | ||
i++) | ||
lex_normal.push_back(i); | ||
lex_normal.push_back(dofs_per_face_normal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create temporal FEs at multiple place, e.g.,
dealii/include/deal.II/matrix_free/shape_info.templates.h
Lines 833 to 834 in efbc6da
const auto fe_1d = create_fe<1>(fe); | |
const auto fe_2d = create_fe<2>(fe); |
Is the setup of RT too expensive?
Or do you mean for a more easily read code?
Exactly. We normally break down in MF the element into 1D representation; here we have a 2D representation.
constexpr int n_blocks2 = | ||
(dim - direction - 1 == 0) ? | ||
1 : | ||
((direction == normal_dir) ? | ||
Utilities::pow((n_rows - 1), | ||
(direction >= dim) ? 0 : dim - direction - 1) : | ||
(((direction < normal_dir) ? (n_rows + 1) : n_rows) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's postpone. But we should address this soon!
const T0 eval0, | ||
const T1 eval1, | ||
const T2 eval2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass by reference? What are these? Flags or evaluation kernels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluation kernels (EvaluatorTensorProduct
). I could rename them to something different. The variable names are the same as in previous code, so I kept them for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks overall very good!
Reviewing of evaluation_kernels.h
is a bit hard, since a lot of code seems to be copy and paste. Is there a possibility to get rid of evaluate_tensor_product_per_component()
and similar functions and introduce helper functions that work on a "list" of EvaluatorTensorProduct
? The calling function would pass in the right objects depending of it is istropic or anisotropic? That would reduce the number of lines added significantly! Not sure if that is possible!?
If it is possible, my preference would be to create the helper function first, merge that into master and use that here in this PR.
Eval_normal eval_normal(univariate_shape_data[0]->shape_values, | ||
univariate_shape_data[0]->shape_gradients, | ||
univariate_shape_data[0]->shape_hessians, | ||
univariate_shape_data[0]->fe_degree + 1, | ||
univariate_shape_data[0]->n_q_points_1d); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One can create a helper function like create_evaluator_tensor_product()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've done that for the face evaluation, where you have to pick out the right shape functions. (Though I don't think hanging nodes work at this point). But I think doing it here will result in more lines than to just creating these two.
shape_info.data.front().fe_degree || | ||
fe_degree == -1, | ||
ExcInternalError()); | ||
if (symmetric_evaluate) // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you go through the TODO and add a comment what the problem there is (and maybe also note these an issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already remove all todos as Martin told me to, since one would have to go through everything again anyway to check what works and what doesn't.
eval0.template gradients<0, true, false, normal_dir>(values_dofs, | ||
temp1); | ||
eval1.template values<1, true, false, normal_dir>(temp1, | ||
gradients_quad); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, I think normal_dir
belong to EvaluatorTensorProduct
. It is a parameter like degree
, specifying the data layout. In this anisotropic case, the normal_dir
specified which degree is different; e.g., for normal_dir=0
: degree+1 x degree x degree
. I would create EvaluatorTensorProduct
objects with the right template parameters locally.
eval0.template gradients<0, true, false, normal_dir>(values_dofs, | ||
temp1); | ||
eval1.template values<1, true, false, normal_dir>(temp1, | ||
gradients_quad); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general question: how does the data layout look like in the case of Nedelec
? I am wondering if normal_dir
is the right template parameter of the new EvaluatorTensorProduct
class and we shouldn't explicitly write the degrees dim time?
efbc6da
to
02d3c52
Compare
I have some failing tests when I run them locally, but I don't understand what I've done that made these fail.. I managed to remove about 1000 lines. Hopefully I didn't miss anything from above.. Thanks for your feedback and help @peterrum and @kronbichler! :) |
That sounds great!
Can you pinpoint which commit breaks the tests? |
Haven't figured that out yet. I'll continue trying to find the problem |
After two failing checks of "tidy"... How can I also run this locally before pushing to the PR? Or do I just push to another branch and let the test run on my fork? |
Either you install clang-tidy on your machine or you just keep on pushing commits to the branch (as most people do). |
Very confused about these fails. I had other tests failing for me locally. |
I have all matrix-free tests locally and they pass. Related the error here:
This looks like a memory issue: trilinos/Trilinos#3277 (comment). Maybe you could increase the number 6 to 10 here: dealii/source/matrix_free/evaluation_template_factory.cc Lines 21 to 24 in e677991
and create new files with the name |
You also need to extend: dealii/source/matrix_free/CMakeLists.txt Lines 20 to 25 in 70ac873
|
I am wondering which modification lead to the increased memory consumption? Is it 02d3c52 and why? That you moved the template argument |
Do we know how much resources the VMs get and were they exhaused by "chance"? In general, it is hard to predict what precisely contributes to the memory consumption during compilation (unless you write the respective compiler), but of course the number of instantiations plays a role, but also the length of the function argument list that the compiler needs to juggle.
The most interesting step would be to separate the face evaluation kernel instantiations from the cell evaluation kernels, as they are pretty much separate and do different things anyway. This would also bring down the size of |
That would be worth a try. I was thinking about the same but I am not sure if that will be enough , since the windows test is able to build the library but fails during the quick tests. @NiklasWik Could you split up the file https://github.com/dealii/dealii/blob/master/include/deal.II/matrix_free/evaluation_template_factory.templates.h and create corresponding files for the instantiation. If that is not working, we need to revisit the PR again and think about alternatives. |
Yes, I'll get on it! Hopefully I'll make time for it tomorrow. |
Did I miss something @peterrum? |
One failing test less. The windows test says:
According to https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources, the Windows runners should have about 7GB RAM. I have checked locally (on Linux): the library size has increase from 1.9 to 2.0GB in debug mode. This means an increase by 100MB, which is a lot, but shouldn't break the Window tests (but I guess it was already at the limit). I'll take another look at the PR how we could reduce memory consumption. |
I have pending commits for the Piola transform for affine cells, together with updated tests. Should I wait with these or just push them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I see it correctly, the largest modifications of the last few commit is the introduction of these helper functions
dealii/include/deal.II/matrix_free/evaluation_kernels.h
Lines 990 to 998 in 317cb12
template <int dim, int fe_degree, int n_q_points_1d, typename Number> | |
template <bool integrate> | |
inline void | |
FEEvaluationImpl<MatrixFreeFunctions::tensor_raviart_thomas, | |
dim, | |
fe_degree, | |
n_q_points_1d, | |
Number>:: | |
evaluate_or_integrate( |
dealii/include/deal.II/matrix_free/evaluation_kernels.h
Lines 3164 to 3170 in 317cb12
template <int normal_dir, | |
int component, | |
bool integrate, | |
typename Eval0, | |
typename Eval1> | |
static inline void | |
evaluate_in_face_apply( |
which have the template arguments bool integrate
. In the code you are using if
instead of constexpr if
, which would be the correct one (latter is not possible, since we are still using C++14). My guess is that this leads to some code duplication, which an optimizing compiler should not do. Maybe I could ask you to split up those functions and instead of if-statements use function specialization based on std::integral_constant
like here:
dealii/include/deal.II/matrix_free/vector_access_internal.h
Lines 305 to 343 in 91bff6e
template <typename VectorType> | |
void | |
process_dofs_vectorized(const unsigned int dofs_per_cell, | |
const unsigned int dof_index, | |
VectorType & vec, | |
VectorizedArrayType *dof_values, | |
std::integral_constant<bool, true>) const | |
{ | |
#ifdef DEBUG | |
// in debug mode, run non-vectorized version because this path | |
// has additional checks (e.g., regarding ghosting) | |
process_dofs_vectorized(dofs_per_cell, | |
dof_index, | |
vec, | |
dof_values, | |
std::integral_constant<bool, false>()); | |
#else | |
const Number *vec_ptr = vec.begin() + dof_index; | |
for (unsigned int i = 0; i < dofs_per_cell; | |
++i, vec_ptr += VectorizedArrayType::size()) | |
dof_values[i].load(vec_ptr); | |
#endif | |
} | |
template <typename VectorType> | |
void | |
process_dofs_vectorized(const unsigned int dofs_per_cell, | |
const unsigned int dof_index, | |
const VectorType & vec, | |
VectorizedArrayType *dof_values, | |
std::integral_constant<bool, false>) const | |
{ | |
for (unsigned int i = 0; i < dofs_per_cell; ++i) | |
for (unsigned int v = 0; v < VectorizedArrayType::size(); ++v) | |
dof_values[i][v] = | |
vector_access(vec, dof_index + v + i * VectorizedArrayType::size()); | |
} |
{ | ||
template <int dim> | ||
std::vector<unsigned int> | ||
get_lexicographic_numbering_rt_nodal(const unsigned int degree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot we move the implementation to a source file?
Let's keep them separate. Feel free to open a new PR! |
@peterrum I hope this was what you meant, but this seems to have made it worse. I'm not sure what else I can do. Do you have any other ideas? |
Oh, it did. To be honest, I don't have a clue either. My feeling is that the widows machines are already at the limit (even without this PR - but I don't have hard data for that). A dirty option would to switch to release mode on the Windows machines: on my machine the release builds consume significantly less memory. I don't like this solution; but don't have any idea. Maybe @kronbichler has? I guess it does not harm to try out release mode; just to get a feeling if that would help... |
The relevant line would be: dealii/.github/workflows/windows.yml Line 32 in 2633c93
|
Fingers crossed! |
e88ed57
to
e33fb85
Compare
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX" -T host=x64 -A x64 .. | ||
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX /D FE_EVAL_FACTORY_DEGREE_MAX=2" -T host=x64 -A x64 .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it so simple to define a C macro? It it wouldn't work just override manually the value of
# define FE_EVAL_FACTORY_DEGREE_MAX 6 |
... and we find a solution after we have seen what the benefits are!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. It worked (now I see that you are directly modify the CXX flags)! I'll take another look at the code later today!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was Martin, just to be clear! I shouldn't take credit here :)
I'll prepare the PR of the Piola for the affine case right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, it's still a temporary solution since there will be a point where we will hit this with something else, but for all those build tests building a rather minimal degree should be more than enough, no need to waste disk space and compile time on it. We should probably choose something similar for the MacOS and Linux builds that are done through github's infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kronbichler Why aren't we do it like this in ExaDG? We also have somewhere the possibility if fast evaluation is enabled or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could do that as well - the reason why I opted against it up to now was that deal.II compiles considerably more code than ExaDG does, as it repeats the instantiations also for narrower SIMD (ExaDG picks only the widest version), and that makes deal.II libraries considerably fatter. The other question is whether ExaDG really needs all instantiations up to 15 with the fast path, or whether 7 or 8 would suffice and then use the slower path afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (apart from a few very minor points)!
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX" -T host=x64 -A x64 .. | ||
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX /D FE_EVAL_FACTORY_DEGREE_MAX=2" -T host=x64 -A x64 .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kronbichler Why aren't we do it like this in ExaDG? We also have somewhere the possibility if fast evaluation is enabled or not...
|
||
DEAL_II_NAMESPACE_OPEN | ||
|
||
#define SPLIT_INSTANTIATIONS_COUNT 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kronbichler Should we keep the new instantiation files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to introduce them, yes. We might want to cut down the number of files a bit with some statistics on the size, but I see benefit in having them split up already.
eval0.template gradients<0, true, false, normal_dir>(values_dofs, | ||
temp1); | ||
eval1.template values<1, true, false, normal_dir>(temp1, | ||
gradients_quad); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should unify the code paths for tensor product, RT, and Nedelec if possible (in the future). At the moment these evaluation calls look like copy-and-paste. But maybe the Piola transforms (as indicated by the todos below) will add here some additional complexity.
|
||
|
||
template <int dim, int fe_degree, typename Number, bool lex_faces = false> | ||
struct FEFaceNormalEvaluationImpl | ||
{ | ||
using EvalNormalF = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is F
standing for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe the Piola transforms (as indicated by the todos below) will add here some additional complexity.
Yes, the Piola transform for general cells will require some changes here.
What is F standing for?
Just face :) I can write it out explicitly.
And one thing: could you squash the commits!? Thx! |
da5678f
to
95d4f40
Compare
Woops. That did not go as planned. Give me a sec |
95d4f40
to
8672439
Compare
Test added for matrix-free RT Split of evaluation_template_factory.templates.h Decrease FE_EVAL_FACTORY_DEGREE_MAX to 2 for windows vm
8672439
to
6918f72
Compare
Long second... I guess this was great motivation to get better at git.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I let @kronbichler have the final decision!
This PR adds support for matrixfree evaluation of
FE_RaviartThomasNodal
(#9545).However, it is currently missing the Piola transform, which you can see traces of in the 2 tests I created. The Piola transform is a planned follow up for the coming weeks.
In shape_info.templates.h there's a function which returns the lexicographic numbering of the nodes that don't really fit in. But with #9544 in mind, I felt that this was an ok temporary solution without making any major changes.
I'm guessing you'll have some opinions regarding the implementation, especially
FEFaceEvaluationImplRaviartThomas
andinterpolate_generic_raviart_thomas(...)
as I feel that they are a bit clumsy ..