Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEValuesViews::RenumberedView #15819

Merged
merged 1 commit into from Apr 7, 2024
Merged

Conversation

luca-heltai
Copy link
Member

Part of #15773.

@luca-heltai
Copy link
Member Author

/rebuild

@luca-heltai
Copy link
Member Author

This is now ready for real life use. As soon as there is consensus with this implementation, I'll adapt the FECouplingViews to use this reordered view objects. @drwells , @tjhei .

Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, but I have to go now.

* according to the given renumbering vectors.
*/
template <typename ViewType>
class ReorderedView : public Subscriptor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the other views are not derived from Subscriptor, but are very lightweight classes. Any reason to go a different route here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. Sorry. Just a habit. I'll remove the subscriptor derivation.

include/deal.II/fe/fe_values_coupling.h Outdated Show resolved Hide resolved
* object, where the degrees of freedom and quadrature points are renumbered
* according to the given renumbering vectors.
*/
template <typename ViewType>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only a single tutorial program (step-81) that ever names these FEValuesViews objects. (And I'm going to write a patch eventually to change that.) Rather, we only ever advertise using the FEValuesExtractors explicitly, and you happen to get a view when you use fe_values[...].

Have you considered whether one could use the extractor type as a template argument instead, to stick with things that typically appear in user programs? You could then value a

  using VewType = decltype(declval<FEValues<...>>()[declval<ExtractorType>()]);

to obtain the corresponding view type.

I would prefer something like this, because the extractor types are so much more widely used. You'd have to make dim,spacedim template arguments as well, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly the plan. The FEValuesCoupling object will take extractors and return a ReorderedView, pretty much like FEValues does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. This isn't a class one is supposed to create objects of by hand. Then yes, this makes sense.

include/deal.II/fe/fe_values_coupling.h Outdated Show resolved Hide resolved
include/deal.II/fe/fe_values_coupling.h Outdated Show resolved Hide resolved
include/deal.II/fe/fe_values_coupling.h Outdated Show resolved Hide resolved
Comment on lines 235 to 243
/**
* Helper function that constructs a unique name for a container, based on a
* string prefix, on its size, and on the type stored in the container.
*/
template <typename Number>
std::string
get_unique_container_name(const std::string &prefix,
const unsigned int size,
const Number & exemplar_number) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of this function surprises me, reading just through the class declaration (not having gotten to the implementation). Can you add to the documentation what the function is used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually moved this in the RenumberingData struct, which is now external, and shared by all Views with the same renumbering indices.

/**
* Store a reference to the underlying view.
*/
const ViewType &view;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make a copy. We don't generally create views as objects with storage, but only as temporaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on second thought, we generally take them from a cache. So perhaps this design is right. Maybe say that FEValues keeps a cache of these objects and so we can safely keep a reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copying them is also forbidden.

include/deal.II/fe/fe_values_coupling.h Outdated Show resolved Hide resolved
include/deal.II/fe/fe_values_coupling.h Outdated Show resolved Hide resolved
@luca-heltai
Copy link
Member Author

luca-heltai commented Aug 6, 2023

But these should definitely not be references.

This class may be used in the following way:

for (const auto &q : fe_coupling.get_quadrature_indices())
for (const auto &i : fe_coupling.get_dof_indices()) {
   viq = fe_coupling[extractor].value(i,q);
}

the call to fe_coupling[extractor] currently generates on the fly one of these objects. I did not want to make two copies of the numbering vectors inside the double loop. If you want I can create I created a RenumberingData class that I pass by reference, and store outside, to make it clear that this class does not do any allocation, and it only handles the renumbering. The RenumberingData is supposed to live longer than this object, and we store a reference to it. Does this make sense to you?

@luca-heltai
Copy link
Member Author

luca-heltai commented Aug 6, 2023

@bangerth I'll update the other PR in a minute, so that you can have an idea of how this is used, but basically this is how it will look like:

FEValues<> fev1(...);
FEValues<> fev2(...);

FECouplingValues<> fecv(fev1, fev2);

auto left = fecv.left(Extractors::Scalar(0));
auto right = fecv.right(Extractors::Scalar(0));

...

fecv[left].value(i,q);
fecv[right].value(i,q);

In the example above, fecv[left] and fecv[right] return objects of type FEValuesViews::ReorderedView.

@luca-heltai
Copy link
Member Author

Any opinions on naming? ReorderedView or RenumberedView?

@luca-heltai luca-heltai force-pushed the reordered-view branch 3 times, most recently from c4104c2 to 7a86faf Compare August 7, 2023 09:22
* the corresponding shape function.
*
* An example of the renumbering ordering is the following. Assume that you
* have an FE_Q(1) finite element spac in dimension two (i.e., four degrees
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...in space dimension two...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finite element space in dimension two.

Suggested change
* have an FE_Q(1) finite element spac in dimension two (i.e., four degrees
* have an FE_Q(1) finite element space in dimension two (i.e., four degrees

* object should look like this:
* @code
* ...
* RenumberingData data({{0, 1, numbers::invalid_unsigned_int, 2, 3}});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But could it also be {0,1,2,3}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In this case it would simply be the identity renumbering.

@bangerth
Copy link
Member

bangerth commented Aug 7, 2023

RenumberedView sounds better to me than ReorderedView.

More later -- off to my plane now. Then a week of vacation in Greece ;-)

@luca-heltai
Copy link
Member Author

RenumberedView sounds better to me than ReorderedView.

More later -- off to my plane now. Then a week of vacation in Greece ;-)

Enjoy!

@luca-heltai luca-heltai changed the title FEValuesViews::ReorderedView FEValuesViews::RenumberedView Aug 8, 2023
@luca-heltai
Copy link
Member Author

Anybody willing to revive the discussion here?

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I don't understand yet in which context this should be used but I guess I will see that once reviewing #15773.

@peterrum
Copy link
Member

Any other opinions? Let's wait at most another 2 days so that @luca-heltai can continue with the other patches. We can still make modifications on master 👍

@luca-heltai
Copy link
Member Author

The other patches are in the second PR, where I show how to use this for coupled bulk surface problems, BEM, and fractional Laplacian.

@bangerth
Copy link
Member

bangerth commented Apr 2, 2024

Would you mind rebasing this to see whether the tests still all succeed?

Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through this one more time and did not find much with the exception that the class is not thread-safe. That's a shame because we like to do assembly in parallel. I don't think it should be too difficult to address.

include/deal.II/fe/fe_coupling_values.h Outdated Show resolved Hide resolved
include/deal.II/fe/fe_coupling_values.h Outdated Show resolved Hide resolved
include/deal.II/fe/fe_coupling_values.h Outdated Show resolved Hide resolved
Comment on lines 87 to 99
RenumberingData(
const unsigned int n_inner_dofs,
const unsigned int n_inner_quadrature_points,
const std::vector<unsigned int> &dof_renumbering = {},
const std::vector<unsigned int> &quadrature_renumbering = {})
: n_inner_dofs(n_inner_dofs)
, n_dofs(dof_renumbering.empty() ? n_inner_dofs : dof_renumbering.size())
, n_inner_quadrature_points(n_inner_quadrature_points)
, n_quadrature_points(quadrature_renumbering.empty() ?
n_inner_quadrature_points :
quadrature_renumbering.size())
, dof_renumbering(dof_renumbering)
, quadrature_renumbering(quadrature_renumbering)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to me like the usual approach would be to either provide the first two arguments, or to provide the two vectors and make sure that the first two arguments are used consistently. Have you considered splitting this constructor into two constructors instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, these two are somwhat independent: I could have a renumbering vector with smaller size w.r.t. inner dofs, i.e., extract only vertex dofs of a Q2, or I could integrate on a subface using an iterated quadrature as inner quadrature. I have no way to infer the first two arguments from the second two arguments.

Comment on lines 100 to 112
{
// Check that the renumbering vectors are valid.
#ifdef DEBUG
// While for dofs we admit invalid values, this is not the case for
// quadrature points.
for (const auto i : dof_renumbering)
Assert(i < n_inner_dofs || i == numbers::invalid_unsigned_int,
ExcIndexRange(i, 0, n_inner_dofs));

for (const auto q : quadrature_renumbering)
AssertIndexRange(q, n_inner_quadrature_points)
#endif
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're implementing all of the other functions out-of-line. Can you do the same with this function?

Comment on lines 442 to 464
template <typename Number>
inline std::string
RenumberingData::get_unique_container_name(
const std::string &prefix,
const unsigned int size,
const Number &exemplar_number) const
{
return prefix + "_" + Utilities::int_to_string(size) + "_" +
Utilities::type_to_string(exemplar_number);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is part of the public interface of the class. Looking through where it is used, it seems like it's only actually used in RenumberedView. What do you think about making the function private and making RenumberedView a friend? This seems to be an implementation detail that does not need to be part of the public interface of the structure.

Comment on lines +463 to +478
const auto inner_shape_function = data.dof_renumbering.empty() ?
shape_function :
data.dof_renumbering[shape_function];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame we're still stuck with C++17. This would be so much nicer to write by storing the vector as a std::optional that either holds the vector or nothing, and then using the monadic operations...

Comment on lines 516 to 537
const auto name = data.get_unique_container_name(
"RenumberedView::outer_to_inner_values",
data.n_inner_quadrature_points,
outer_values[0]);
auto &inner_values =
data.data_storage.get()
.template get_or_add_object_with_name<std::vector<ValueType>>(
name, data.n_inner_quadrature_points);
return inner_values;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pertains to the comment above regarding the mutex: We generally consider views themselves as stateless; the state is in the object being viewed. As a consequence using views is thread-safe is the underlying container is accessed in a thread-safe manner (for example because you're only ever reading).

However, here you have a state that you store in the data object. You need to guard all accesses to the data.data_storage access with a mutex and because data is a reference to another object, that mutex must be part of the data object, not the view object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, I don't yet understand what you use this function for, but don't you have to make sure that inner_values actually has some content before you return it? What get_or_add_object_with_name() returns here is a vector with data.n_inner_quadrature_points elements, but no content. Is that what you want/need?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content is updated later. This method is used to get a reference to a vector of values (either the same as the input vector, if they have the same dimension and no renumbering must occur), which is then filled, i.e.,

auto &inner_values = outer_to_inner_values(values);
view.get_function_values(fe_function, inner_values);
inner_to_outer_values(inner_values, values);

inner_values is filled by get_function_values, and then reordered by inner_to_outer_values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understood that later: You're just using this as a temp vector, but don't want to allocate/deallocate memory every time, right?

Comment on lines +547 to +572
auto &inner_dofs = data.data_storage.get()
.template get_or_add_object_with_name<VectorType>(
name, data.n_inner_dofs);
for (unsigned int i = 0; i < data.n_dofs; ++i)
{
const auto inner_i = data.dof_renumbering[i];
if (inner_i != numbers::invalid_unsigned_int)
{
AssertIndexRange(inner_i, data.n_inner_dofs);
inner_dofs[inner_i] = outer_dofs[i];
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, if you get to this function a second time, don't you want to skip the initialization of the inner_dofs array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Calling this function is done to translate outer degrees of freedom into inner degrees of freedom before calling get_function_values_from_dof_values(..., inner_dofs), so it needs the outer dofs passed as argument, i.e., this how we use this:

    const auto &inner_dof_values = outer_to_inner_dofs(dof_values);
    auto       &inner_gradients  = outer_to_inner_values(gradients);

    view.get_function_gradients_from_local_dof_values(inner_dof_values,
                                                      inner_gradients);
    inner_to_outer_values(inner_gradients, gradients);

it uses the renumbering to provide the get_function_*_from_local_dof_values with sensible input data.

In other words, outer_to_inner_dofs fills the output vector, while outer_to_inner_values only provides a reference to a vector of the right size (which is filled later by get_function_*_from_local_dof_values).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I had missed that you're already doing the copying from outer_dofs here. These will be different every time, so the code block doesn't just initialize the object, but actually do the work on it.

Comment on lines +594 to +610
auto &inner_values = outer_to_inner_values(values);
view.get_function_values(fe_function, inner_values);
inner_to_outer_values(inner_values, values);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too is not thread-safe. You're using a temporary vector. That vector needs to be guarded (perhaps by a different mutex than the data.data_storage object, or perhaps not, but it needs to be guarded).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue in the functions below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of using a Threads::ThreadLocalStorage<GeneralDataStorage> was to make this thread safe. The outer_to_inner_values and inner_to_outer_values are, in fact, both thread safe (each thread only access its own ThreadLocalStorage object). The vector is cached (it is created if it does not exists, and stored in the (local to thread) GeneralDataStorage object for future use). Why should this be not thread safe?

@luca-heltai
Copy link
Member Author

I looked through this one more time and did not find much with the exception that the class is not thread-safe. That's a shame because we like to do assembly in parallel. I don't think it should be too difficult to address.

I'm not sure I understand why this should not be thread safe. The only object that could be non-thread safe is the RenumberingData, which is made of read only vectors, and a ThreadLocal<GeneralDataStorage>. This is the same construction used in ScratchData (https://www.dealii.org/current/doxygen/deal.II/classMeshWorker_1_1ScratchData.html) which, to my knowledge, is thread safe. I have seen no issues so far in using this inside a mesh_loop.

Can you elaborate on why this could be non-thread safe?

@bangerth
Copy link
Member

bangerth commented Apr 3, 2024

On my way to work today I was thinking about this patch again and realized that I had seen the use of ThreadLocalData but just not processed that part at all. I see now what you're doing.

But this doesn't work in general. It assumes that the entire workflow using the thread local object happens on a single thread (clearly the case) but also that it is not interrupted by another task, using the same object, running on the same thread. This is why we have a list of these objects in WorkStream, see Section 5.2 of https://www.math.colostate.edu/~bangerth/publications/2013-pattern.pdf, and in particular the last paragraph of that section. It took @tjhei and me about a week to realize this issue.

I don't know whether that can happen in your case. You need to convince yourself that between getting the object out of the thread-local cache and using it, there can not be a place where you call into the task scheduler. The case that made us find the problem was a call to v=0 (v being a vector), which is executed in parallel if the vector is long enough.

@luca-heltai
Copy link
Member Author

I don't know whether that can happen in your case. You need to convince yourself that between getting the object out of the thread-local cache and using it, there can not be a place where you call into the task scheduler. The case that made us find the problem was a call to v=0 (v being a vector), which is executed in parallel if the vector is long enough.

Well, I don't think it can. We could be on the safe side, by making the struct a class, put the data in the private section, and allow only the FECouplingValues to access it. It will create an std::vector once, never set it to zero explicitly (nobody else except the current thread should ever access it externally).

@bangerth
Copy link
Member

bangerth commented Apr 5, 2024

Let me describe the problem differently: It is not enough to make these objects thread-local. They need to be task-local. That's because you can have multiple tasks running on the same thread at the same time. The typical case looks like this:

  void my_task () {
    auto x = get_object_out_of_pool();
    ... do something with x...
    Threads::Task<> t = new_task ([]() {...});
    t.join();
    ... keep doing something with x...
}

Here, the call to t.join() makes the scheduler wait for the other task, and it may decide to run a different task on the same thread in the meantime. That different task may also get the thread-local object and work on it; the original task will then be surprised to see that from before calling t.join() and after that call, the value of x has changed. Of course,

  • creating and joining the other task may be something that happens in an entirely different function you happen to call in the middle and for which you had no idea that it uses tasks (or that did not use tasks when you wrote the code, but someone later parallelized it)
  • it could also be a parallel-for, or some other way of creating and waiting tasks.

The approach used in WorkStream is that we have a list of thread-local objects, for which we keep track which are used and which are not; a task requesting such an object then takes the first unused one.

@bangerth
Copy link
Member

bangerth commented Apr 6, 2024

Options, in order to get this merged in the not so far future:

  • I think that what you use the class in question for is as a cache for vectors so you don't have to repeatedly allocate/deallocate memory, right? How about you get rid of the cache for now and let each function that currently uses the cache allocate its temporary memory from scratch?
  • We merge as-is with the goal of improving this in the future? I'll say that again, we might run into hard-to-debug bugs this way, but perhaps we can come up with a TaskLocal class that abstracts what we're already doing in WorkStream and that may make it easier to do what you want to do.

@luca-heltai
Copy link
Member Author

luca-heltai commented Apr 7, 2024

The way the class is used now is the following.

Inside each task a new (and independent from other tasks)FECouplingValues is constructed at the beginning of the loop over points and indices, i.e.,:

CouplingFEValues<dim> cfv(fe_values1, fe_values2, quad_coupling_type, dof_coupling_type);

This constructor is the one that builds the reordering vectors, and figures out how to match dofs and q points. This is the place where the cache is created and stored. Unfortunately we don't know here what type of number we'll need in the cache, and I don't really know how to get rid of it, since the temporary vectors are needed inside functions that are templated on the number type.

I think that the major point is that the CouplingFEValues objects are built anew inside each task.

The rationale of the class is that inside each loop, I do not want to initialize new vectors, i.e., once I have the extractors

// Extractor on left cell 
const auto v1 = cfv.left(FEValuesExtractor::Scalar(0)); 
const auto p1 = cfv.left(FEValuesExtractor::Scalar(1));
// Extractor on right cell 
const auto u2 = cfv.right(FEValuesExtractor::Scalar(0)); 
const auto q2 = cfv.right(FEValuesExtractor::Scalar(1));

I want to be able to use the same syntax we are used to, i.e., I want to get a new view that is cheap to construct on the fly. We cannot do what we currently do with FEValuesViews (which are created up front and then the same one is returned whenever you call fe[extractor]).
The renumbered view is built from scratch here (and it is different for each task), and I don't want to do expensive things inside the loops, but upfront right before them, or at least only once, the first time that a temporary vector is required.

When I use the views, I want the temporary vectors to be built only the first time, and reused all other times:

for (const unsigned int q : cfv.quadrature_point_indices()) { 
  const auto &[x_q,y_q] = cfv.quadrature_point(q);
  for (const unsigned int i : cfv.coupling_dof_indices())
    {
      const auto &v_i = cfv[v1].value(i, q); // <- only on the first call this will be initialized. It is local to each task
      const auto &p_i = cfv[p1].value(i, q); // <- and this is a different cache as well
      for (const unsigned int j : cfv.coupling_dof_indices())
        {
          const auto &u_j = cfv[u2].value(j, q);
          const auto &q_j = cfv[q2].value(j, q);
          local_matrix(i, j) += (K1(x_q, y_q) * v_i * u_j +
                                 K2(x_q, y_q) * p_i * q_j) *
                                cfv[0].JxW(q) *
                                cfv[1].JxW(q);
        }
    }
}

If you think it would be safer, I can create a mutex for each vector size and number type that has been created, and make sure access is only made through a mutex. I did not want to do this, since these maybe accessed inside triple loops (as above). If you have suggestions on how to achieve the design above, I'm open to change what is required.

My argument for the thread safety of these classes is that FECouplingValues objects should not be created outside a task (and would need a cache that is task local), but within the task itself.

@bangerth
Copy link
Member

bangerth commented Apr 7, 2024

OK, thanks for taking the time to explain this. Let's merge this -- we can think about this some more later on.

It is true that if you create the CouplingFEValues inside a task, then you need none of all this, and perhaps that's at least for the moment how we're going to use it. (In fact, in that case, you probably wouldn't even need the variable to be ThreadLocal because a task by definition always run from start to finish on the same thread.)

@bangerth bangerth merged commit 3f9f0fa into dealii:master Apr 7, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants