FEValuesViews::RenumberedView #15819

luca-heltai · 2023-08-02T00:25:34Z

Part of #15773.

luca-heltai · 2023-08-02T00:27:36Z

/rebuild

luca-heltai · 2023-08-05T18:03:10Z

This is now ready for real life use. As soon as there is consensus with this implementation, I'll adapt the FECouplingViews to use this reordered view objects. @drwells , @tjhei .

bangerth

Just a few comments, but I have to go now.

bangerth · 2023-08-06T13:59:42Z

include/deal.II/fe/fe_values_coupling.h

+   * according to the given renumbering vectors.
+   */
+  template <typename ViewType>
+  class ReorderedView : public Subscriptor


All of the other views are not derived from Subscriptor, but are very lightweight classes. Any reason to go a different route here?

no. Sorry. Just a habit. I'll remove the subscriptor derivation.

include/deal.II/fe/fe_values_coupling.h

bangerth · 2023-08-06T14:09:56Z

include/deal.II/fe/fe_values_coupling.h

+   * object, where the degrees of freedom and quadrature points are renumbered
+   * according to the given renumbering vectors.
+   */
+  template <typename ViewType>


There is only a single tutorial program (step-81) that ever names these FEValuesViews objects. (And I'm going to write a patch eventually to change that.) Rather, we only ever advertise using the FEValuesExtractors explicitly, and you happen to get a view when you use fe_values[...].

Have you considered whether one could use the extractor type as a template argument instead, to stick with things that typically appear in user programs? You could then value a

using VewType = decltype(declval<FEValues<...>>()[declval<ExtractorType>()]);

to obtain the corresponding view type.

I would prefer something like this, because the extractor types are so much more widely used. You'd have to make dim,spacedim template arguments as well, though.

That's exactly the plan. The FEValuesCoupling object will take extractors and return a ReorderedView, pretty much like FEValues does.

Ah, I see. This isn't a class one is supposed to create objects of by hand. Then yes, this makes sense.

include/deal.II/fe/fe_values_coupling.h

bangerth · 2023-08-06T14:24:01Z

include/deal.II/fe/fe_values_coupling.h

+    /**
+     * Helper function that constructs a unique name for a container, based on a
+     * string prefix, on its size, and on the type stored in the container.
+     */
+    template <typename Number>
+    std::string
+    get_unique_container_name(const std::string &prefix,
+                              const unsigned int size,
+                              const Number &     exemplar_number) const;


The presence of this function surprises me, reading just through the class declaration (not having gotten to the implementation). Can you add to the documentation what the function is used for?

I've actually moved this in the RenumberingData struct, which is now external, and shared by all Views with the same renumbering indices.

bangerth · 2023-08-06T14:26:10Z

include/deal.II/fe/fe_values_coupling.h

+    /**
+     * Store a reference to the underlying view.
+     */
+    const ViewType &view;


Just make a copy. We don't generally create views as objects with storage, but only as temporaries.

Actually, on second thought, we generally take them from a cache. So perhaps this design is right. Maybe say that FEValues keeps a cache of these objects and so we can safely keep a reference.

copying them is also forbidden.

include/deal.II/fe/fe_values_coupling.h

luca-heltai · 2023-08-06T17:46:06Z

But these should definitely not be references.

This class may be used in the following way:

for (const auto &q : fe_coupling.get_quadrature_indices())
for (const auto &i : fe_coupling.get_dof_indices()) {
   viq = fe_coupling[extractor].value(i,q);
}

the call to fe_coupling[extractor] currently generates on the fly one of these objects. I did not want to make two copies of the numbering vectors inside the double loop. ~~If you want I can create~~ I created a RenumberingData class that I pass by reference, and store outside, to make it clear that this class does not do any allocation, and it only handles the renumbering. The RenumberingData is supposed to live longer than this object, and we store a reference to it. Does this make sense to you?

luca-heltai · 2023-08-06T18:49:24Z

@bangerth I'll update the other PR in a minute, so that you can have an idea of how this is used, but basically this is how it will look like:

FEValues<> fev1(...);
FEValues<> fev2(...);

FECouplingValues<> fecv(fev1, fev2);

auto left = fecv.left(Extractors::Scalar(0));
auto right = fecv.right(Extractors::Scalar(0));

...

fecv[left].value(i,q);
fecv[right].value(i,q);

In the example above, fecv[left] and fecv[right] return objects of type FEValuesViews::ReorderedView.

luca-heltai · 2023-08-06T18:50:56Z

Any opinions on naming? ReorderedView or RenumberedView?

bangerth · 2023-08-07T11:07:40Z

include/deal.II/fe/fe_coupling_values.h

+     * the corresponding shape function.
+     *
+     * An example of the renumbering ordering is the following. Assume that you
+     * have an FE_Q(1) finite element spac in dimension two (i.e., four degrees


...in space dimension two...?

finite element space in dimension two.

Suggested change

* have an FE_Q(1) finite element spac in dimension two (i.e., four degrees

* have an FE_Q(1) finite element space in dimension two (i.e., four degrees

bangerth · 2023-08-07T11:08:20Z

include/deal.II/fe/fe_coupling_values.h

+     * object should look like this:
+     * @code
+     * ...
+     * RenumberingData data({{0, 1, numbers::invalid_unsigned_int, 2, 3}});


But could it also be {0,1,2,3}?

Yes. In this case it would simply be the identity renumbering.

bangerth · 2023-08-07T11:09:00Z

RenumberedView sounds better to me than ReorderedView.

More later -- off to my plane now. Then a week of vacation in Greece ;-)

luca-heltai · 2023-08-07T15:23:08Z

RenumberedView sounds better to me than ReorderedView.

More later -- off to my plane now. Then a week of vacation in Greece ;-)

Enjoy!

luca-heltai · 2023-09-06T10:18:45Z

Anybody willing to revive the discussion here?

peterrum

Looks good to me. I don't understand yet in which context this should be used but I guess I will see that once reviewing #15773.

peterrum · 2023-09-24T09:43:15Z

Any other opinions? Let's wait at most another 2 days so that @luca-heltai can continue with the other patches. We can still make modifications on master 👍

luca-heltai · 2023-09-24T10:57:46Z

The other patches are in the second PR, where I show how to use this for coupled bulk surface problems, BEM, and fractional Laplacian.

bangerth · 2024-04-02T23:06:30Z

Would you mind rebasing this to see whether the tests still all succeed?

bangerth

I looked through this one more time and did not find much with the exception that the class is not thread-safe. That's a shame because we like to do assembly in parallel. I don't think it should be too difficult to address.

include/deal.II/fe/fe_coupling_values.h

bangerth · 2024-04-02T23:27:02Z

include/deal.II/fe/fe_coupling_values.h

+    RenumberingData(
+      const unsigned int               n_inner_dofs,
+      const unsigned int               n_inner_quadrature_points,
+      const std::vector<unsigned int> &dof_renumbering        = {},
+      const std::vector<unsigned int> &quadrature_renumbering = {})
+      : n_inner_dofs(n_inner_dofs)
+      , n_dofs(dof_renumbering.empty() ? n_inner_dofs : dof_renumbering.size())
+      , n_inner_quadrature_points(n_inner_quadrature_points)
+      , n_quadrature_points(quadrature_renumbering.empty() ?
+                              n_inner_quadrature_points :
+                              quadrature_renumbering.size())
+      , dof_renumbering(dof_renumbering)
+      , quadrature_renumbering(quadrature_renumbering)


This looks to me like the usual approach would be to either provide the first two arguments, or to provide the two vectors and make sure that the first two arguments are used consistently. Have you considered splitting this constructor into two constructors instead?

Actually, these two are somwhat independent: I could have a renumbering vector with smaller size w.r.t. inner dofs, i.e., extract only vertex dofs of a Q2, or I could integrate on a subface using an iterated quadrature as inner quadrature. I have no way to infer the first two arguments from the second two arguments.

bangerth · 2024-04-02T23:29:19Z

include/deal.II/fe/fe_coupling_values.h

+    {
+// Check that the renumbering vectors are valid.
+#ifdef DEBUG
+      // While for dofs we admit invalid values, this is not the case for
+      // quadrature points.
+      for (const auto i : dof_renumbering)
+        Assert(i < n_inner_dofs || i == numbers::invalid_unsigned_int,
+               ExcIndexRange(i, 0, n_inner_dofs));
+
+      for (const auto q : quadrature_renumbering)
+        AssertIndexRange(q, n_inner_quadrature_points)
+#endif
+    }


You're implementing all of the other functions out-of-line. Can you do the same with this function?

bangerth · 2024-04-03T03:12:11Z

include/deal.II/fe/fe_coupling_values.h

+  template <typename Number>
+  inline std::string
+  RenumberingData::get_unique_container_name(
+    const std::string &prefix,
+    const unsigned int size,
+    const Number      &exemplar_number) const
+  {
+    return prefix + "_" + Utilities::int_to_string(size) + "_" +
+           Utilities::type_to_string(exemplar_number);
+  }


This function is part of the public interface of the class. Looking through where it is used, it seems like it's only actually used in RenumberedView. What do you think about making the function private and making RenumberedView a friend? This seems to be an implementation detail that does not need to be part of the public interface of the structure.

bangerth · 2024-04-03T03:17:45Z

include/deal.II/fe/fe_coupling_values.h

+    const auto inner_shape_function = data.dof_renumbering.empty() ?
+                                        shape_function :
+                                        data.dof_renumbering[shape_function];


It's a shame we're still stuck with C++17. This would be so much nicer to write by storing the vector as a std::optional that either holds the vector or nothing, and then using the monadic operations...

bangerth · 2024-04-03T03:27:05Z

include/deal.II/fe/fe_coupling_values.h

+        const auto name = data.get_unique_container_name(
+          "RenumberedView::outer_to_inner_values",
+          data.n_inner_quadrature_points,
+          outer_values[0]);
+        auto &inner_values =
+          data.data_storage.get()
+            .template get_or_add_object_with_name<std::vector<ValueType>>(
+              name, data.n_inner_quadrature_points);
+        return inner_values;


This pertains to the comment above regarding the mutex: We generally consider views themselves as stateless; the state is in the object being viewed. As a consequence using views is thread-safe is the underlying container is accessed in a thread-safe manner (for example because you're only ever reading).

However, here you have a state that you store in the data object. You need to guard all accesses to the data.data_storage access with a mutex and because data is a reference to another object, that mutex must be part of the data object, not the view object.

Separately, I don't yet understand what you use this function for, but don't you have to make sure that inner_values actually has some content before you return it? What get_or_add_object_with_name() returns here is a vector with data.n_inner_quadrature_points elements, but no content. Is that what you want/need?

The content is updated later. This method is used to get a reference to a vector of values (either the same as the input vector, if they have the same dimension and no renumbering must occur), which is then filled, i.e.,

auto &inner_values = outer_to_inner_values(values); view.get_function_values(fe_function, inner_values); inner_to_outer_values(inner_values, values);

inner_values is filled by get_function_values, and then reordered by inner_to_outer_values.

I think I understood that later: You're just using this as a temp vector, but don't want to allocate/deallocate memory every time, right?

bangerth · 2024-04-03T03:32:18Z

include/deal.II/fe/fe_coupling_values.h

+        auto &inner_dofs = data.data_storage.get()
+                             .template get_or_add_object_with_name<VectorType>(
+                               name, data.n_inner_dofs);
+        for (unsigned int i = 0; i < data.n_dofs; ++i)
+          {
+            const auto inner_i = data.dof_renumbering[i];
+            if (inner_i != numbers::invalid_unsigned_int)
+              {
+                AssertIndexRange(inner_i, data.n_inner_dofs);
+                inner_dofs[inner_i] = outer_dofs[i];
+              }
+          }


Here, if you get to this function a second time, don't you want to skip the initialization of the inner_dofs array?

No. Calling this function is done to translate outer degrees of freedom into inner degrees of freedom before calling get_function_values_from_dof_values(..., inner_dofs), so it needs the outer dofs passed as argument, i.e., this how we use this:

const auto &inner_dof_values = outer_to_inner_dofs(dof_values); auto &inner_gradients = outer_to_inner_values(gradients); view.get_function_gradients_from_local_dof_values(inner_dof_values, inner_gradients); inner_to_outer_values(inner_gradients, gradients);

it uses the renumbering to provide the get_function_*_from_local_dof_values with sensible input data.

In other words, outer_to_inner_dofs fills the output vector, while outer_to_inner_values only provides a reference to a vector of the right size (which is filled later by get_function_*_from_local_dof_values).

Oh, I had missed that you're already doing the copying from outer_dofs here. These will be different every time, so the code block doesn't just initialize the object, but actually do the work on it.

bangerth · 2024-04-03T03:34:15Z

include/deal.II/fe/fe_coupling_values.h

+    auto &inner_values = outer_to_inner_values(values);
+    view.get_function_values(fe_function, inner_values);
+    inner_to_outer_values(inner_values, values);


This too is not thread-safe. You're using a temporary vector. That vector needs to be guarded (perhaps by a different mutex than the data.data_storage object, or perhaps not, but it needs to be guarded).

Same issue in the functions below.

The whole point of using a Threads::ThreadLocalStorage<GeneralDataStorage> was to make this thread safe. The outer_to_inner_values and inner_to_outer_values are, in fact, both thread safe (each thread only access its own ThreadLocalStorage object). The vector is cached (it is created if it does not exists, and stored in the (local to thread) GeneralDataStorage object for future use). Why should this be not thread safe?

luca-heltai · 2024-04-03T12:11:02Z

I looked through this one more time and did not find much with the exception that the class is not thread-safe. That's a shame because we like to do assembly in parallel. I don't think it should be too difficult to address.

I'm not sure I understand why this should not be thread safe. The only object that could be non-thread safe is the RenumberingData, which is made of read only vectors, and a ThreadLocal<GeneralDataStorage>. This is the same construction used in ScratchData (https://www.dealii.org/current/doxygen/deal.II/classMeshWorker_1_1ScratchData.html) which, to my knowledge, is thread safe. I have seen no issues so far in using this inside a mesh_loop.

Can you elaborate on why this could be non-thread safe?

bangerth · 2024-04-03T17:07:19Z

On my way to work today I was thinking about this patch again and realized that I had seen the use of ThreadLocalData but just not processed that part at all. I see now what you're doing.

But this doesn't work in general. It assumes that the entire workflow using the thread local object happens on a single thread (clearly the case) but also that it is not interrupted by another task, using the same object, running on the same thread. This is why we have a list of these objects in WorkStream, see Section 5.2 of https://www.math.colostate.edu/~bangerth/publications/2013-pattern.pdf, and in particular the last paragraph of that section. It took @tjhei and me about a week to realize this issue.

I don't know whether that can happen in your case. You need to convince yourself that between getting the object out of the thread-local cache and using it, there can not be a place where you call into the task scheduler. The case that made us find the problem was a call to v=0 (v being a vector), which is executed in parallel if the vector is long enough.

luca-heltai · 2024-04-05T16:13:33Z

I don't know whether that can happen in your case. You need to convince yourself that between getting the object out of the thread-local cache and using it, there can not be a place where you call into the task scheduler. The case that made us find the problem was a call to v=0 (v being a vector), which is executed in parallel if the vector is long enough.

Well, I don't think it can. We could be on the safe side, by making the struct a class, put the data in the private section, and allow only the FECouplingValues to access it. It will create an std::vector once, never set it to zero explicitly (nobody else except the current thread should ever access it externally).

bangerth · 2024-04-05T17:36:04Z

Let me describe the problem differently: It is not enough to make these objects thread-local. They need to be task-local. That's because you can have multiple tasks running on the same thread at the same time. The typical case looks like this:

  void my_task () {
    auto x = get_object_out_of_pool();
    ... do something with x...
    Threads::Task<> t = new_task ([]() {...});
    t.join();
    ... keep doing something with x...
}

Here, the call to t.join() makes the scheduler wait for the other task, and it may decide to run a different task on the same thread in the meantime. That different task may also get the thread-local object and work on it; the original task will then be surprised to see that from before calling t.join() and after that call, the value of x has changed. Of course,

creating and joining the other task may be something that happens in an entirely different function you happen to call in the middle and for which you had no idea that it uses tasks (or that did not use tasks when you wrote the code, but someone later parallelized it)
it could also be a parallel-for, or some other way of creating and waiting tasks.

The approach used in WorkStream is that we have a list of thread-local objects, for which we keep track which are used and which are not; a task requesting such an object then takes the first unused one.

bangerth · 2024-04-06T21:49:12Z

Options, in order to get this merged in the not so far future:

I think that what you use the class in question for is as a cache for vectors so you don't have to repeatedly allocate/deallocate memory, right? How about you get rid of the cache for now and let each function that currently uses the cache allocate its temporary memory from scratch?
We merge as-is with the goal of improving this in the future? I'll say that again, we might run into hard-to-debug bugs this way, but perhaps we can come up with a TaskLocal class that abstracts what we're already doing in WorkStream and that may make it easier to do what you want to do.

luca-heltai · 2024-04-07T08:40:21Z

The way the class is used now is the following.

Inside each task a new (and independent from other tasks)FECouplingValues is constructed at the beginning of the loop over points and indices, i.e.,:

CouplingFEValues<dim> cfv(fe_values1, fe_values2, quad_coupling_type, dof_coupling_type);

This constructor is the one that builds the reordering vectors, and figures out how to match dofs and q points. This is the place where the cache is created and stored. Unfortunately we don't know here what type of number we'll need in the cache, and I don't really know how to get rid of it, since the temporary vectors are needed inside functions that are templated on the number type.

I think that the major point is that the CouplingFEValues objects are built anew inside each task.

The rationale of the class is that inside each loop, I do not want to initialize new vectors, i.e., once I have the extractors

// Extractor on left cell 
const auto v1 = cfv.left(FEValuesExtractor::Scalar(0)); 
const auto p1 = cfv.left(FEValuesExtractor::Scalar(1));
// Extractor on right cell 
const auto u2 = cfv.right(FEValuesExtractor::Scalar(0)); 
const auto q2 = cfv.right(FEValuesExtractor::Scalar(1));

I want to be able to use the same syntax we are used to, i.e., I want to get a new view that is cheap to construct on the fly. We cannot do what we currently do with FEValuesViews (which are created up front and then the same one is returned whenever you call fe[extractor]).
The renumbered view is built from scratch here (and it is different for each task), and I don't want to do expensive things inside the loops, but upfront right before them, or at least only once, the first time that a temporary vector is required.

When I use the views, I want the temporary vectors to be built only the first time, and reused all other times:

for (const unsigned int q : cfv.quadrature_point_indices()) { 
  const auto &[x_q,y_q] = cfv.quadrature_point(q);
  for (const unsigned int i : cfv.coupling_dof_indices())
    {
      const auto &v_i = cfv[v1].value(i, q); // <- only on the first call this will be initialized. It is local to each task
      const auto &p_i = cfv[p1].value(i, q); // <- and this is a different cache as well
      for (const unsigned int j : cfv.coupling_dof_indices())
        {
          const auto &u_j = cfv[u2].value(j, q);
          const auto &q_j = cfv[q2].value(j, q);
          local_matrix(i, j) += (K1(x_q, y_q) * v_i * u_j +
                                 K2(x_q, y_q) * p_i * q_j) *
                                cfv[0].JxW(q) *
                                cfv[1].JxW(q);
        }
    }
}

If you think it would be safer, I can create a mutex for each vector size and number type that has been created, and make sure access is only made through a mutex. I did not want to do this, since these maybe accessed inside triple loops (as above). If you have suggestions on how to achieve the design above, I'm open to change what is required.

My argument for the thread safety of these classes is that FECouplingValues objects should not be created outside a task (and would need a cache that is task local), but within the task itself.

bangerth · 2024-04-07T19:04:45Z

OK, thanks for taking the time to explain this. Let's merge this -- we can think about this some more later on.

It is true that if you create the CouplingFEValues inside a task, then you need none of all this, and perhaps that's at least for the moment how we're going to use it. (In fact, in that case, you probably wouldn't even need the variable to be ThreadLocal because a task by definition always run from start to finish on the same thread.)

luca-heltai added ready for review ready to test labels Aug 2, 2023

luca-heltai mentioned this pull request Aug 2, 2023

FECouplingValues. #15773

Merged

drwells mentioned this pull request Aug 2, 2023

Split fe_values.h and fe_values.cc into three files each. #15829

Merged

luca-heltai force-pushed the reordered-view branch from 5364cc6 to 2ce44e5 Compare August 5, 2023 11:01

bangerth reviewed Aug 6, 2023

View reviewed changes

luca-heltai force-pushed the reordered-view branch from 6494df2 to 16d6a4a Compare August 6, 2023 17:54

luca-heltai force-pushed the reordered-view branch 3 times, most recently from c4104c2 to 7a86faf Compare August 7, 2023 09:22

bangerth reviewed Aug 7, 2023

View reviewed changes

luca-heltai changed the title ~~FEValuesViews::ReorderedView~~ FEValuesViews::RenumberedView Aug 8, 2023

luca-heltai force-pushed the reordered-view branch from 1949dce to 70cb5f8 Compare September 6, 2023 12:29

luca-heltai force-pushed the reordered-view branch from 70cb5f8 to eb4bd21 Compare September 20, 2023 13:07

peterrum self-assigned this Sep 23, 2023

peterrum approved these changes Sep 24, 2023

View reviewed changes

bangerth reviewed Apr 3, 2024

View reviewed changes

luca-heltai force-pushed the reordered-view branch from eb4bd21 to dba6907 Compare April 3, 2024 08:53

luca-heltai force-pushed the reordered-view branch from ebaf11e to 8c2a897 Compare April 3, 2024 12:02

FEValuesViews::ReorderedView

d557770

luca-heltai force-pushed the reordered-view branch from 8c2a897 to d557770 Compare April 3, 2024 15:15

bangerth merged commit 3f9f0fa into dealii:master Apr 7, 2024
16 checks passed

	* have an FE_Q(1) finite element spac in dimension two (i.e., four degrees
	* have an FE_Q(1) finite element space in dimension two (i.e., four degrees

FEValuesViews::RenumberedView #15819

FEValuesViews::RenumberedView #15819

Conversation

luca-heltai commented Aug 2, 2023

luca-heltai commented Aug 2, 2023

luca-heltai commented Aug 5, 2023

bangerth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luca-heltai commented Aug 6, 2023 • edited

luca-heltai commented Aug 6, 2023 • edited

luca-heltai commented Aug 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bangerth commented Aug 7, 2023

luca-heltai commented Aug 7, 2023

luca-heltai commented Sep 6, 2023

peterrum left a comment

Choose a reason for hiding this comment

peterrum commented Sep 24, 2023

luca-heltai commented Sep 24, 2023

bangerth commented Apr 2, 2024

bangerth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luca-heltai Apr 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luca-heltai Apr 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luca-heltai commented Apr 3, 2024

bangerth commented Apr 3, 2024

luca-heltai commented Apr 5, 2024

bangerth commented Apr 5, 2024

bangerth commented Apr 6, 2024

luca-heltai commented Apr 7, 2024 • edited

bangerth commented Apr 7, 2024

luca-heltai commented Aug 6, 2023 •

edited

luca-heltai commented Aug 6, 2023 •

edited

luca-heltai Apr 3, 2024 •

edited

luca-heltai Apr 3, 2024 •

edited

luca-heltai commented Apr 7, 2024 •

edited