Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utility functions to evaluate values of arbitrary (remote) points #11804

Merged
merged 1 commit into from
Mar 18, 2021

Conversation

peterrum
Copy link
Member

The big brother of FEPointEvaluation.

The classes/functions have enabled us to

  • interpolate between distributed non-matching grids (ALE, NS+LS, and between hex and tet meshes)
  • evaluate values along lines
  • create slice directly in deal.II (instead of outputting 3D results and creating the slices in Paraview)

I will add some tests tomorrow so that the use cases should be more clear.

@peterrum peterrum added this to the Release 9.3 milestone Feb 24, 2021
@peterrum peterrum force-pushed the remote_fe_point_evaluation branch 2 times, most recently from b522038 to 479c5d3 Compare February 26, 2021 16:35
@peterrum peterrum changed the title [WIP] Add utility functions to evaluate values of arbitrary (remote) points Add utility functions to evaluate values of arbitrary (remote) points Feb 27, 2021
@luca-heltai
Copy link
Member

luca-heltai commented Mar 1, 2021

This basically re-implements distribute_compute_point_locations and some of the GridTools::Cache objects from scratch, but using the same algorithm (bounding boxes describing the domain, check who may possibly own the points, and use a consensus algorithm to decide who owns what, then send around data).

With the existing functionality, this could have been implemented in the following way:

LA::MPI::Vec global_vec;
...
FiniteElement<dim,spacedim> &fe = ...;
...
std::vector<Point<spacedim>> points(n_points);
std::vector<double> values(n_points);

auto global_boxes = cache.get_covering_rtree(tree_level);
auto [cells, local_points, local_indices, global_points, from_cpu] = distributed_compute_point_locations(points, cache, {global_boxes.begin(), global_boxes.end()});

for(unsigned int i=0; i<cells.size(); ++i) {
   const auto &cell = cells[i];
   const auto &qpoints = local_points[i];
   const auto &indices= local_indices[i];
   for(unsigned int j=0; j<indices.size(); ++j)
   values[indices[j]] = fe.value(qpoints[j]);
}
// now send around the `values` vector
...

I think your implementation is likely to be more efficient, but can we at least gather functions that do the same stuff around with just one implementation (i.e., the one that works best), and throw away the others?

@kronbichler
Copy link
Member

@luca-heltai you are right, we should try to unify the implementations and have a good infrastructure that fits a wide application range. I will look into this more closely tomorrow.

Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some various comments that I would like to discuss.

Comment on lines 34 to 43
namespace EvaluationFlags
{
enum EvaluationFlags
{
avg = 0,
max = 1,
min = 2,
insert = 3
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we select a better name here, e.g., EvaluateReductionMethod, and document the intent of this class as the combination method for the case several function values are present in the given point (e.g. discontinuous Galerkin).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before making this change, let's agree on how to deal with gradients, hessians! Is it an option that we have for each a separate function as stated a couple of minutes ago!? Or do we want to encrypt this in these "flags". An issue I seen in the latter is that we don't have a way to determine the return type of the function based on the flag so that the user might need to be explicit regarding the template argument!?

Comment on lines +50 to +73
std::vector<typename FEPointEvaluation<n_components, dim>::value_type>
evaluate_at_points(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also think about the case where a user requests the gradient of a solution field (or the Hessian). How would we structure the case of various return kinds? Set up some additional data structure that is filled upon demand? In terms of technicalities, I think we can return any information FEPointEvaluation can compute, as long as we can pack the data appropriately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of technicalities, I think we can return any information FEPointEvaluation can compute, as long as we can pack the data appropriately.

Yes that is right. I would suggest once other quantities are needed, we move the content of this function to an internal (templated) function, which is called by evaluate_values_at_points(), evaluate_gradients_at_points(), ...

I think we should also think about the case where a user requests the gradient of a solution field (or the Hessian). How would we structure the case of various return kinds? Set up some additional data structure that is filled upon demand?

The bigger question is how we can deal with derived quantities (e.g. in the case of SIM I would like to communicate already the force computed with JxW so that only has to be tested after communication) and with composites (e.g., you might want to pack multiple quantities (one might be a scalar, the other one vectorial) together so that they are sent in one go; e.g. BDF-2 multiple timesteps after AMR step).

The best solution I was able to come up (in adaflo) for this till now is use RemovePointEvaluation directly. This class can work with arbitrary types so that I can also pass in vectors of tuples. What is your opinion?

template <int n_components, int dim, int spacedim, typename VectorType>
std::vector<typename FEPointEvaluation<n_components, dim>::value_type>
evaluate_at_points(
const DoFHandler<dim, spacedim> & dof_handler,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first sight, it seemed unexpected to me to need to hand in a DoFHandler object here, as we crucially rely on the precomputed cache in terms of point locations and all other geometry (that is provided by DoFHandler::get_triangulation() in the implementation below). I think the motivation is to get the interpretation of the vector, because we need to get cell iterators and iterator->get_dof_indices() to access the solution here. So in conclusion I do not disagree here, but I think we need to work hard to document this well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is that "surprising". All VectorTools functions get a DoFHandlerobject. The DoFHandler object is connected "directly" to the vector while the communication pattern is only connected to the grid.

So in conclusion I do not disagree here, but I think we need to work hard to document this well.

But yes we should!

Comment on lines 162 to 163
const auto evaluation_point_results = [&]() {
std::vector<std::unique_ptr<FEPointEvaluation<n_components, dim>>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this lambda (used once to populate a vector evaluation_point_results) and write the code straight-out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you don't like these kind of scopes. For me, however, it helps to structure code.

source/base/mpi_remote_point_evaluation.cc Outdated Show resolved Hide resolved
source/base/mpi_remote_point_evaluation.cc Outdated Show resolved Hide resolved
source/base/mpi_remote_point_evaluation.cc Outdated Show resolved Hide resolved
source/base/mpi_remote_point_evaluation.cc Outdated Show resolved Hide resolved
Comment on lines 408 to 158
template <int dim, int spacedim>
const std::vector<unsigned int> &
RemotePointEvaluation<dim, spacedim>::get_point_ptrs() const
{
return quadrature_points_ptr;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function (and the small ones below) can go to the header file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I can do that. What is the criterion when to leave a function in the header file. I would have thought that there are only two reasons: 1) that the function is templated in such a way that explicit instantiation is hard and 2) the calls is performance critical (as is in many places of FEEvaluation). But the functions here are definitely not performance critical so that they do not need to be inlined...

@peterrum
Copy link
Member Author

peterrum commented Mar 3, 2021

This basically re-implements distribute_compute_point_locations and some of the GridTools::Cache objects from scratch,

This statement is not exactly precise. The main use case of GridTools::distribute_compute_point_locations() is ParticleHandler::insert_global_particles(): points are generated on one mesh and then sent to another (potentially non-matching) mesh. I.e., data (points) are only sent in one direction and on receiver side some quantities are derived (in particular cells and reference points).

In contrast, the the goal of this PR it be enable evaluation/integration on arbitrary points (local and remote; potentially on non-matching meshes). A function added is VectorTools::evaluate_at_points() that allows to evaluate the values at arbitrary points on a global solution vector. This requires that data is also sent back after they have been computed at the points.

To be able to perform the communication, additional data structures have to be set up. In particular, data have to be sorted in each stage differently: during evaluation data needs to be sorted according to the owning cell; during communication the data has to be sorted according to the ranks; and at the end the data has to sorted in the order of the provided points. Furthermore, this needs additional communication steps for finalizing the data structures via handshake that can be performed via ConsensusAlgorithms.

Since the result of GridTools::distribute_compute_point_locations() is only (in one way or an other) a side result of what we need and can be obtained if the hand-shake is not performed (early return), I have created an internal function GridTools::internal::distribute_compute_point_locations() that can be configured via a bool. GridTools::internal::distribute_compute_point_locations(..., false) is now used in GridTools::distribute_compute_point_locations() and GridTools::internal::distribute_compute_point_locations(..., true) is used to set up the communication patter within Utilities::MPI::RemotePointEvaluation.

Utilities::MPI::RemotePointEvaluation is essentially a class that cashes the communication pattern (similarly to Utilities::MPI::Partitioner) and is responsible for the permutation of the data on the sender and receiver side.

This class can be passed to VectorTools::evaluate_at_points() (as cache) so that the communication pattern does not have to be updated in each function call. However, the use cases are much more diverse...

Note: I will move some minor changes to separate PRs.

source/base/mpi.cc Outdated Show resolved Hide resolved
@peterrum
Copy link
Member Author

peterrum commented Mar 6, 2021

/rebuild

@peterrum
Copy link
Member Author

peterrum commented Mar 6, 2021

PR is ready from my side. I have renamed some of the fields and have added some documentation. The discussions above (particularity regarding usability and extensibility - computation of other quantities) are still open. I will resolve these once we have agreed on a good solution.

@peterrum
Copy link
Member Author

peterrum commented Mar 7, 2021

Regarding the discussion above. To be able to work with RemotePointEvaluation, one needs to define a lambda-function like:

const auto fu = [&](auto &values, const auto &cell_data) {
std::vector<typename VectorType::value_type> solution_values;
std::vector<std::unique_ptr<FEPointEvaluation<n_components, dim>>>
evaluators(dof_handler.get_fe_collection().size());
const auto get_evaluator = [&](const unsigned int active_fe_index)
-> FEPointEvaluation<n_components, dim> & {
if (evaluators[active_fe_index] == nullptr)
evaluators[active_fe_index] =
std::make_unique<FEPointEvaluation<n_components, dim>>(
cache.get_mapping(), dof_handler.get_fe(active_fe_index));
return *evaluators[active_fe_index];
};
for (unsigned int i = 0; i < cell_data.cells.size(); ++i)
{
typename DoFHandler<dim>::active_cell_iterator cell = {
&cache.get_triangulation(),
cell_data.cells[i].first,
cell_data.cells[i].second,
&dof_handler};
const ArrayView<const Point<dim>> unit_points(
cell_data.reference_point_values.data() +
cell_data.reference_point_ptrs[i],
cell_data.reference_point_ptrs[i + 1] -
cell_data.reference_point_ptrs[i]);
solution_values.resize(
dof_handler.get_fe(cell->active_fe_index()).n_dofs_per_cell());
cell->get_dof_values(vector,
solution_values.begin(),
solution_values.end());
auto &evaluator = get_evaluator(cell->active_fe_index());
evaluator.evaluate(cell,
unit_points,
solution_values,
dealii::EvaluationFlags::values);
for (unsigned int q = 0; q < unit_points.size(); ++q)
values[q + cell_data.reference_point_ptrs[i]] =
evaluator.get_value(q);
}
};

The question what prevents users to write their own lamdba functions. Apart from the fact that I am dealing here with the general case (hp), I see two issues 1) a lot of boiler code and 2) the need to work with the CRS data structures directly.

In the ideal case, the above code could be rewritten in the following way:

      const auto fu = [&](const auto &data, auto &values, const auto cell_range) {
        FEPointEvaluation<n_components, dim> evaluator(data, cache.get_mapping(), dof_handler);

        for (unsigned int cell = cell_range.first; cell < cell_range.second; ++cell)
          {
            evaluator.reinit(cell);
            evaluator.read_dof_values(vector);
            evaluator.evaluate(dealii::EvaluationFlags::values);
            evaluator.set_point_values(values);
          }
      };

Adding additional evaluators in this context would be easier. But the question is if it is really worth it to move the complexity to FEPointEvaluation!?

Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will write more later, but let me push what I already have.

include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
(void)fu;
#else
output.resize(point_ptrs.back());
buffer.resize((send_permutation.size()) * 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-picking, but do we need the second pair of braces?

Suggested change
buffer.resize((send_permutation.size()) * 2);
buffer.resize(send_permutation.size() * 2);

Comment on lines 292 to 294
temp_map[send_ranks[i]] = Utilities::pack(
std::vector<T>(buffer_2.begin() + send_ptrs[i],
buffer_2.begin() + send_ptrs[i + 1]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we disable compression to keep reasonable performance?

Suggested change
temp_map[send_ranks[i]] = Utilities::pack(
std::vector<T>(buffer_2.begin() + send_ptrs[i],
buffer_2.begin() + send_ptrs[i + 1]));
temp_map[send_ranks[i]] = Utilities::pack(
std::vector<T>(buffer_2.begin() + send_ptrs[i],
buffer_2.begin() + send_ptrs[i + 1]), false);


auto &buffer = temp_map[send_ranks[i]];

requests.resize(requests.size() + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion,

Suggested change
requests.resize(requests.size() + 1);
requests.push_back(MPI_Request());

would be simpler to read, but I do not feel strongly here.

MPI_STATUS_IGNORE);

temp_recv_map[status.MPI_SOURCE] =
Utilities::unpack<std::vector<T>>(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we disable compression above, we of course also need to do that here.

(void)buffer;
(void)evaluation_function;
#else
// expand
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you slightly expand the comment here?

if (recv_rank == my_rank)
continue;

temp_map[recv_rank] = Utilities::pack(temp_recv_map[recv_rank]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

evaluate_at_points(
const DoFHandler<dim, spacedim> & dof_handler,
const VectorType & vector,
const Utilities::MPI::RemotePointEvaluation<dim, spacedim> &cache,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about this function again and I think we should have the cache variable as the first argument, to avoid confusion with the previous function and the implicit mapping argument other functions use in deal.II.

: tolerance(tolerance)
, ready_flag(false)
{}
template <int dim, int spacedim>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some empty lines her and below as usual in deal.II.

@peterrum
Copy link
Member Author

@kronbichler I have incorporated your suggestions!

Comment on lines 75 to 92
if (n_refinements >= 2)
{
for (auto cell : tria.active_cell_iterators())
if (cell->is_locally_owned())
if (0.3 <= cell->center().norm() && cell->center().norm() <= 0.43)
cell->set_refine_flag();
tria.execute_coarsening_and_refinement();
}

if (n_refinements >= 3)
{
for (auto cell : tria.active_cell_iterators())
if (cell->is_locally_owned())
if (0.335 <= cell->center().norm() &&
cell->center().norm() <= 0.39)
cell->set_refine_flag();
tria.execute_coarsening_and_refinement();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove these two additional refinements - I think it's enough if you simply add some step to refine that creates sufficiently different parallel partitions in terms of the owned cells not overlapping. This keeps the test shorted (and the compactness is really the key of this class).

Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has reached a good state. Again, I have been involved here, so we need someone else looking into it as well.

include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
include/deal.II/base/mpi_remote_point_evaluation.h Outdated Show resolved Hide resolved
source/base/mpi_remote_point_evaluation.cc Outdated Show resolved Hide resolved
@@ -0,0 +1,4 @@
CMAKE_MINIMUM_REQUIRED(VERSION 3.1.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want another directory for the tests here that doesn't match the location of the corresponding source files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep it this way to be able to collect the test that use RemotePointEvaluation in one place.

Comment on lines 16 to 17
// Like remote_point_evaluation_01.cc but normal and curvature vector living on
// background mesh.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from tests/remote_point_evaluation/remote_point_evaluation_02.cc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the comment. This was just a copy&paste error.

}

vector_1.print(deallog.get_file_stream());
vector_2.print(deallog.get_file_stream());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you check for correctness here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the result in Paraview. If you want I can replace the printing of the vectors by computing the global error for the following tests: vector_tools_evaluate_at_points_01, vector_tools_evaluate_at_points_03, vector_tools_evaluate_at_points_04, vector_tools_evaluate_at_points_05. For the other tests, it is - as far as I know - not possible to come up with some derived quantities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would prefer to check for the global error in the tests you mentioned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After that, it's good from my side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Done! Thanks again for the review!

MPI_COMM_WORLD);
}

force_vector_sharp_interface.print(deallog.get_file_stream());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you check for correctness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the result in Paraview. Furthermore, I use the vector as the right-hand-vector in a Navier-Stokes solver and the resulting pressure jump (the derived quantity) is correct there.

@peterrum
Copy link
Member Author

@masterleinad Thanks for the review. I have addressed most of them. I have one question in #11804 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants