Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrixfree evaluation with FE_RaviartThomasNodal #13591

Merged
merged 1 commit into from Apr 26, 2022

Conversation

NiklasWik
Copy link
Contributor

This PR adds support for matrixfree evaluation of FE_RaviartThomasNodal (#9545).

However, it is currently missing the Piola transform, which you can see traces of in the 2 tests I created. The Piola transform is a planned follow up for the coming weeks.

In shape_info.templates.h there's a function which returns the lexicographic numbering of the nodes that don't really fit in. But with #9544 in mind, I felt that this was an ok temporary solution without making any major changes.

I'm guessing you'll have some opinions regarding the implementation, especially FEFaceEvaluationImplRaviartThomas and interpolate_generic_raviart_thomas(...) as I feel that they are a bit clumsy ..

@marcfehling
Copy link
Member

Some simplex tests failed. Could you have a look?

The following tests FAILED:
	 16 - simplex/data_out_write_vtk_01.debug (Failed)
Errors while running CTest
	 93 - simplex/step-67.mpirun=1.debug (Failed)
	 94 - simplex/step-67.mpirun=2.debug (Failed)

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sets of comment. I'll take a look at include/deal.II/matrix_free/evaluation_kernels.h tomorrow. Very nice till now!

this->data->data.front().fe_degree &&
this->data->data.back().fe_degree &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a bit of an ugly fix... I wrote it so that data.front().fe_degree = degree + 1 and back() = degree. One could rewrite it so that it is the same, but this also means a change of +-1 in a few places in include/deal.II/matrix_free/evaluation_kernels.h. Do you think I should make this change? I guess I could also change the assert depending on if it is a RT element or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we could avoid make a change here. When you say this would mean to change some +-1 in some places, is this only in your new code or elsewhere, too? I think the fewer changes we make to existing code the better it is from a design perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in the new code, so I could make these changes. A sort of related thing which me and @nfehn discussed a bit before is that FE_RaviartThomasNodal(k) creates an element of degree k in tangent direction and k+1 in normal direction, but fe.degree returns k+1. Why is this beneficial? This has meant that the code I've written so far in Exadg, I create RT elements with FE_RaviartThomasNodal(k - 1) in order to be able to use the same quadrature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the motivation for FiniteElement::degree is to give the highest degree in any coordinate direction, so that a quadrature formula with FE::degree+1 ensures exact integration. I don't recall the precise convention, but I remember having discussed this with Guido Kanschat at some point (>10 years ago).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I guess we will a couple of problems with back/front in the future.

include/deal.II/matrix_free/shape_info.h Outdated Show resolved Hide resolved
deallog.pop();
deallog.push("3d");
test<3, 2>();
// test<3, 3>(); Takes way too long
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much cells/dofs are we talking about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 cells and 756 dofs, so not that much. But the debug version takes 400 seconds which I'm guessing is because it is so slow to assemble the matrix. Haven't looked into it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a small mesh with 2 cells or so help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably! But how do I create a mesh with only 2 cells..? I know you can refine different cells but not how I would refine a cell into 2 instead of 8 for dim=3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still takes 200+ seconds for debug+release with 2 cells, so I'll leave this out for now, but I can mention this in the issue with todos. Maybe one could do something different than comparing with FEValues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is strange. Let's keep the 3D test case for now commented. But maybe you could investigate later what takes so long in the case of FEFaceValues?

Maybe one could do something different than comparing with FEValues?

Alternatively, one can solve a PDE and check the convergence rate.

tests/matrix_free/matrix_vector_rt_face_common.h Outdated Show resolved Hide resolved
tests/matrix_free/matrix_vector_rt_face_common.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/tensor_product_kernels.h Outdated Show resolved Hide resolved
Comment on lines +2526 to +2544
constexpr int n_blocks2 =
(dim - direction - 1 == 0) ?
1 :
((direction == normal_dir) ?
Utilities::pow((n_rows - 1),
(direction >= dim) ? 0 : dim - direction - 1) :
(((direction < normal_dir) ? (n_rows + 1) : n_rows) *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember discussing this with @jwitte08 two years ago :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we need two implementations of this function. Couldn't we extract the common parts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is also mentioned by @kronbichler in #9545. I didn't want to do any major changes and potentially brake anything, but this should be done at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's postpone. But we should address this soon!

include/deal.II/matrix_free/tensor_product_kernels.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/tensor_product_kernels.h Outdated Show resolved Hide resolved
Comment on lines +2730 to +2762
if (normal_direction == 0)
{
if (contract_onto_face)
out += n_rows - 1;
else
in += n_rows - 1;
}
if (normal_direction == 1)
{
if (contract_onto_face)
out += n_rows - 2;
else
in += n_rows - 2;
}
if (normal_direction == 2)
{
if (contract_onto_face)
out += n_rows;
else
in += n_rows;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Most of the code is identical and only these shifts differ.

Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great code, thank you @NiklasWik ! I have a few comments and suggestions as listed below. One thing that strikes me after reading the code is that I believe most of the code could also be used for Nedelec elements (once we implement an FE_NedelecNodal as suggested in #9545) with only minor adjustments, so this is a really nice structure.

include/deal.II/matrix_free/evaluation_kernels.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/evaluation_kernels.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/evaluation_kernels.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/evaluation_kernels.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/evaluation_kernels.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/shape_info.templates.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/shape_info.templates.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/shape_info.templates.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/tensor_product_kernels.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/tensor_product_kernels.h Outdated Show resolved Hide resolved
this->data->data.front().fe_degree &&
this->data->data.back().fe_degree &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I guess we will a couple of problems with back/front in the future.

deallog.pop();
deallog.push("3d");
test<3, 2>();
// test<3, 3>(); Takes way too long
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a small mesh with 2 cells or so help?

Comment on lines +146 to +157
EvaluationFlags::values |
EvaluationFlags::gradients);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it as you proposed. Maybe you could create an issue with todos?

Comment on lines 346 to 282
lex_normal.push_back(0);
for (unsigned int i = dofs_per_face_normal * 2 * dim;
i < dofs_per_face_normal * 2 * dim + fe.degree - 1;
i++)
lex_normal.push_back(i);
lex_normal.push_back(dofs_per_face_normal);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create temporal FEs at multiple place, e.g.,

const auto fe_1d = create_fe<1>(fe);
const auto fe_2d = create_fe<2>(fe);

Is the setup of RT too expensive?

Or do you mean for a more easily read code?

Exactly. We normally break down in MF the element into 1D representation; here we have a 2D representation.

Comment on lines +2526 to +2544
constexpr int n_blocks2 =
(dim - direction - 1 == 0) ?
1 :
((direction == normal_dir) ?
Utilities::pow((n_rows - 1),
(direction >= dim) ? 0 : dim - direction - 1) :
(((direction < normal_dir) ? (n_rows + 1) : n_rows) *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's postpone. But we should address this soon!

Comment on lines 231 to 233
const T0 eval0,
const T1 eval1,
const T2 eval2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass by reference? What are these? Flags or evaluation kernels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluation kernels (EvaluatorTensorProduct). I could rename them to something different. The variable names are the same as in previous code, so I kept them for consistency.

include/deal.II/matrix_free/evaluation_kernels.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/evaluation_kernels.h Outdated Show resolved Hide resolved
Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall very good!

Reviewing of evaluation_kernels.h is a bit hard, since a lot of code seems to be copy and paste. Is there a possibility to get rid of evaluate_tensor_product_per_component() and similar functions and introduce helper functions that work on a "list" of EvaluatorTensorProduct? The calling function would pass in the right objects depending of it is istropic or anisotropic? That would reduce the number of lines added significantly! Not sure if that is possible!?

If it is possible, my preference would be to create the helper function first, merge that into master and use that here in this PR.

include/deal.II/matrix_free/evaluation_kernels.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/evaluation_kernels.h Outdated Show resolved Hide resolved
Comment on lines 1022 to 1026
Eval_normal eval_normal(univariate_shape_data[0]->shape_values,
univariate_shape_data[0]->shape_gradients,
univariate_shape_data[0]->shape_hessians,
univariate_shape_data[0]->fe_degree + 1,
univariate_shape_data[0]->n_q_points_1d);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can create a helper function like create_evaluator_tensor_product()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've done that for the face evaluation, where you have to pick out the right shape functions. (Though I don't think hanging nodes work at this point). But I think doing it here will result in more lines than to just creating these two.

shape_info.data.front().fe_degree ||
fe_degree == -1,
ExcInternalError());
if (symmetric_evaluate) // TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you go through the TODO and add a comment what the problem there is (and maybe also note these an issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already remove all todos as Martin told me to, since one would have to go through everything again anyway to check what works and what doesn't.

Comment on lines +1144 to +1106
eval0.template gradients<0, true, false, normal_dir>(values_dofs,
temp1);
eval1.template values<1, true, false, normal_dir>(temp1,
gradients_quad);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, I think normal_dir belong to EvaluatorTensorProduct. It is a parameter like degree, specifying the data layout. In this anisotropic case, the normal_dir specified which degree is different; e.g., for normal_dir=0: degree+1 x degree x degree. I would create EvaluatorTensorProduct objects with the right template parameters locally.

Comment on lines +1144 to +1106
eval0.template gradients<0, true, false, normal_dir>(values_dofs,
temp1);
eval1.template values<1, true, false, normal_dir>(temp1,
gradients_quad);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question: how does the data layout look like in the case of Nedelec? I am wondering if normal_dir is the right template parameter of the new EvaluatorTensorProduct class and we shouldn't explicitly write the degrees dim time?

@NiklasWik
Copy link
Contributor Author

NiklasWik commented Apr 11, 2022

I have some failing tests when I run them locally, but I don't understand what I've done that made these fail..

I managed to remove about 1000 lines. Hopefully I didn't miss anything from above.. Thanks for your feedback and help @peterrum and @kronbichler! :)

@peterrum
Copy link
Member

I managed to remove about 1000 lines.

That sounds great!

I have some failing tests when I run them locally, but I don't understand what I've done that made these fail..

Can you pinpoint which commit breaks the tests?

@NiklasWik
Copy link
Contributor Author

Can you pinpoint which commit breaks the tests?

Haven't figured that out yet. I'll continue trying to find the problem

@NiklasWik
Copy link
Contributor Author

After two failing checks of "tidy"... How can I also run this locally before pushing to the PR? Or do I just push to another branch and let the test run on my fork?

@peterrum
Copy link
Member

After two failing checks of "tidy"... How can I also run this locally before pushing to the PR? Or do I just push to another branch and let the test run on my fork?

Either you install clang-tidy on your machine or you just keep on pushing commits to the branch (as most people do).

@NiklasWik
Copy link
Contributor Author

Very confused about these fails. I had other tests failing for me locally.

@peterrum
Copy link
Member

Very confused about these fails. I had other tests failing for me locally.

I have all matrix-free tests locally and they pass.

Related the error here:

[ 86%] Building CXX object source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_template_factory_inst4.cc.o
icpc: error #10106: Fatal error in /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/../../bin/intel64/mcpcom, terminated by kill signal
compilation aborted for /home/runner/work/dealii/dealii/source/matrix_free/evaluation_template_factory_inst3.cc (code 1)
make[2]: *** [source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_template_factory_inst3.cc.o] Error 1
source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/build.make:117: recipe for target 'source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_template_factory_inst3.cc.o' failed
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/all] Error 2
CMakeFiles/Makefile2:2937: recipe for target 'source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/all' failed
make: *** [all] Error 2
Makefile:135: recipe for target 'all' failed

This looks like a memory issue: trilinos/Trilinos#3277 (comment).

Maybe you could increase the number 6 to 10 here:

#define SPLIT_INSTANTIATIONS_COUNT 6
#ifndef SPLIT_INSTANTIATIONS_INDEX
# define SPLIT_INSTANTIATIONS_INDEX 0
#endif

and create new files with the name evaluation_template_factory_instX.cc. Hopefully, this decreases the size of the compilation units. If not, we need to manually move some content of evaluation_template_factory.templates.h to a new file, just as we did last year in the case of the hanging-node kernels.

@peterrum
Copy link
Member

You also need to extend:

evaluation_template_factory.cc
evaluation_template_factory_inst2.cc
evaluation_template_factory_inst3.cc
evaluation_template_factory_inst4.cc
evaluation_template_factory_inst5.cc
evaluation_template_factory_inst6.cc

@peterrum
Copy link
Member

I am wondering which modification lead to the increased memory consumption? Is it 02d3c52 and why? That you moved the template argument normal_dir from the function to the class, cannot be an issue can it?

@kronbichler
Copy link
Member

kronbichler commented Apr 13, 2022

I am wondering which modification lead to the increased memory consumption?

Do we know how much resources the VMs get and were they exhaused by "chance"? In general, it is hard to predict what precisely contributes to the memory consumption during compilation (unless you write the respective compiler), but of course the number of instantiations plays a role, but also the length of the function argument list that the compiler needs to juggle.

If not, we need to manually move some content of evaluation_template_factory.templates.h to a new file, just as we did last year in the case of the hanging-node kernels.

The most interesting step would be to separate the face evaluation kernel instantiations from the cell evaluation kernels, as they are pretty much separate and do different things anyway. This would also bring down the size of evaluation_kernels.h in terms of the number of lines of code into somewhat more manageable levels.

@peterrum
Copy link
Member

The most interesting step would be to separate the face evaluation kernel instantiations from the cell evaluation kernels, as they are pretty much separate and do different things anyway.

That would be worth a try. I was thinking about the same but I am not sure if that will be enough , since the windows test is able to build the library but fails during the quick tests.

@NiklasWik Could you split up the file https://github.com/dealii/dealii/blob/master/include/deal.II/matrix_free/evaluation_template_factory.templates.h and create corresponding files for the instantiation. If that is not working, we need to revisit the PR again and think about alternatives.

@NiklasWik
Copy link
Contributor Author

@NiklasWik Could you split up the file https://github.com/dealii/dealii/blob/master/include/deal.II/matrix_free/evaluation_template_factory.templates.h and create corresponding files for the instantiation.

Yes, I'll get on it! Hopefully I'll make time for it tomorrow.

@NiklasWik
Copy link
Contributor Author

Did I miss something @peterrum?

@peterrum
Copy link
Member

One failing test less. The windows test says:

check for insufficient disk space, invalid path, or insufficient privilege

According to https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources, the Windows runners should have about 7GB RAM. I have checked locally (on Linux): the library size has increase from 1.9 to 2.0GB in debug mode. This means an increase by 100MB, which is a lot, but shouldn't break the Window tests (but I guess it was already at the limit). I'll take another look at the PR how we could reduce memory consumption.

@NiklasWik
Copy link
Contributor Author

I have pending commits for the Piola transform for affine cells, together with updated tests. Should I wait with these or just push them?

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I see it correctly, the largest modifications of the last few commit is the introduction of these helper functions

template <int dim, int fe_degree, int n_q_points_1d, typename Number>
template <bool integrate>
inline void
FEEvaluationImpl<MatrixFreeFunctions::tensor_raviart_thomas,
dim,
fe_degree,
n_q_points_1d,
Number>::
evaluate_or_integrate(

template <int normal_dir,
int component,
bool integrate,
typename Eval0,
typename Eval1>
static inline void
evaluate_in_face_apply(

which have the template arguments bool integrate. In the code you are using if instead of constexpr if, which would be the correct one (latter is not possible, since we are still using C++14). My guess is that this leads to some code duplication, which an optimizing compiler should not do. Maybe I could ask you to split up those functions and instead of if-statements use function specialization based on std::integral_constant like here:

template <typename VectorType>
void
process_dofs_vectorized(const unsigned int dofs_per_cell,
const unsigned int dof_index,
VectorType & vec,
VectorizedArrayType *dof_values,
std::integral_constant<bool, true>) const
{
#ifdef DEBUG
// in debug mode, run non-vectorized version because this path
// has additional checks (e.g., regarding ghosting)
process_dofs_vectorized(dofs_per_cell,
dof_index,
vec,
dof_values,
std::integral_constant<bool, false>());
#else
const Number *vec_ptr = vec.begin() + dof_index;
for (unsigned int i = 0; i < dofs_per_cell;
++i, vec_ptr += VectorizedArrayType::size())
dof_values[i].load(vec_ptr);
#endif
}
template <typename VectorType>
void
process_dofs_vectorized(const unsigned int dofs_per_cell,
const unsigned int dof_index,
const VectorType & vec,
VectorizedArrayType *dof_values,
std::integral_constant<bool, false>) const
{
for (unsigned int i = 0; i < dofs_per_cell; ++i)
for (unsigned int v = 0; v < VectorizedArrayType::size(); ++v)
dof_values[i][v] =
vector_access(vec, dof_index + v + i * VectorizedArrayType::size());
}

{
template <int dim>
std::vector<unsigned int>
get_lexicographic_numbering_rt_nodal(const unsigned int degree)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we move the implementation to a source file?

@peterrum
Copy link
Member

I have pending commits for the Piola transform for affine cells, together with updated tests. Should I wait with these or just push them?

Let's keep them separate. Feel free to open a new PR!

@NiklasWik
Copy link
Contributor Author

@peterrum I hope this was what you meant, but this seems to have made it worse. I'm not sure what else I can do. Do you have any other ideas?

@peterrum
Copy link
Member

I hope this was what you meant, but this seems to have made it worse. I'm not sure what else I can do. Do you have any other ideas?

Oh, it did. To be honest, I don't have a clue either. My feeling is that the widows machines are already at the limit (even without this PR - but I don't have hard data for that). A dirty option would to switch to release mode on the Windows machines: on my machine the release builds consume significantly less memory. I don't like this solution; but don't have any idea. Maybe @kronbichler has? I guess it does not harm to try out release mode; just to get a feeling if that would help...

@peterrum
Copy link
Member

The relevant line would be:

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX" -T host=x64 -A x64 ..

@peterrum
Copy link
Member

Fingers crossed!

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX" -T host=x64 -A x64 ..
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX /D FE_EVAL_FACTORY_DEGREE_MAX=2" -T host=x64 -A x64 ..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it so simple to define a C macro? It it wouldn't work just override manually the value of

... and we find a solution after we have seen what the benefits are!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. It worked (now I see that you are directly modify the CXX flags)! I'll take another look at the code later today!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was Martin, just to be clear! I shouldn't take credit here :)

I'll prepare the PR of the Piola for the affine case right away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, it's still a temporary solution since there will be a point where we will hit this with something else, but for all those build tests building a rather minimal degree should be more than enough, no need to waste disk space and compile time on it. We should probably choose something similar for the MacOS and Linux builds that are done through github's infrastructure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kronbichler Why aren't we do it like this in ExaDG? We also have somewhere the possibility if fast evaluation is enabled or not...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could do that as well - the reason why I opted against it up to now was that deal.II compiles considerably more code than ExaDG does, as it repeats the instantiations also for narrower SIMD (ExaDG picks only the widest version), and that makes deal.II libraries considerably fatter. The other question is whether ExaDG really needs all instantiations up to 15 with the fast path, or whether 7 or 8 would suffice and then use the slower path afterwards.

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (apart from a few very minor points)!

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX" -T host=x64 -A x64 ..
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=c:/project -DDEAL_II_WITH_ZLIB=off -DDEAL_II_CXX_FLAGS="-WX /D FE_EVAL_FACTORY_DEGREE_MAX=2" -T host=x64 -A x64 ..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kronbichler Why aren't we do it like this in ExaDG? We also have somewhere the possibility if fast evaluation is enabled or not...

include/deal.II/matrix_free/shape_info.templates.h Outdated Show resolved Hide resolved
include/deal.II/matrix_free/shape_info.templates.h Outdated Show resolved Hide resolved

DEAL_II_NAMESPACE_OPEN

#define SPLIT_INSTANTIATIONS_COUNT 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kronbichler Should we keep the new instantiation files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to introduce them, yes. We might want to cut down the number of files a bit with some statistics on the size, but I see benefit in having them split up already.

Comment on lines +1144 to +1106
eval0.template gradients<0, true, false, normal_dir>(values_dofs,
temp1);
eval1.template values<1, true, false, normal_dir>(temp1,
gradients_quad);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should unify the code paths for tensor product, RT, and Nedelec if possible (in the future). At the moment these evaluation calls look like copy-and-paste. But maybe the Piola transforms (as indicated by the todos below) will add here some additional complexity.



template <int dim, int fe_degree, typename Number, bool lex_faces = false>
struct FEFaceNormalEvaluationImpl
{
using EvalNormalF =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is F standing for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe the Piola transforms (as indicated by the todos below) will add here some additional complexity.

Yes, the Piola transform for general cells will require some changes here.

What is F standing for?

Just face :) I can write it out explicitly.

@peterrum
Copy link
Member

And one thing: could you squash the commits!? Thx!

@NiklasWik
Copy link
Contributor Author

Woops. That did not go as planned. Give me a sec

Test added for matrix-free RT
Split of evaluation_template_factory.templates.h
Decrease FE_EVAL_FACTORY_DEGREE_MAX to 2 for windows vm
@NiklasWik
Copy link
Contributor Author

Woops. That did not go as planned. Give me a sec

Long second... I guess this was great motivation to get better at git..

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I let @kronbichler have the final decision!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants