Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory management in Trilinos sparsity pattern accessors. #16406

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 9 additions & 2 deletions include/deal.II/lac/trilinos_sparsity_pattern.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,9 +146,16 @@ namespace TrilinosWrappers
*
* In order to make copying of iterators/accessor of acceptable
* performance, we keep a shared pointer to these entries so that more
* than one accessor can access this data if necessary.
* than one accessor can access this data if necessary. Each operator
* never modifies the pointed to vector when it moves on to a different
* row of the matrix, but instead resets the shared pointer to a newly
* allocated vector, thereby ensuring that other accessors who may still
* hold a pointer to the original vector continue to see the same values.
* The exception is that if there is only a single accessor object
* pointing to the array (very likely), then we can change the underlying
* vector.
*/
std::shared_ptr<const std::vector<size_type>> colnum_cache;
std::shared_ptr<std::vector<size_type>> colnum_cache;

/**
* Discard the old row caches (they may still be used by other
Expand Down
17 changes: 14 additions & 3 deletions source/lac/trilinos_sparsity_pattern.cc
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,19 @@ namespace TrilinosWrappers
if (!sparsity_pattern->is_compressed())
sparsity_pattern->compress();

colnum_cache = std::make_shared<std::vector<size_type>>(
sparsity_pattern->row_length(this->a_row));
// Then replace the shared pointer to an object by a newly created
// vector (rather than re-sizing the vector and overwriting its
// content). This ensures that we simply break one link to a shared
// vector, and other accessors still pointing to this row's cache
// will continue to see the same values.
//
// We can optimize this, however, if we know that there is only a
// single shared pointer pointing to the array.
if (colnum_cache.use_count() > 1)
colnum_cache = std::make_shared<std::vector<size_type>>(
sparsity_pattern->row_length(this->a_row));
else
colnum_cache->resize(sparsity_pattern->row_length(this->a_row));
Comment on lines +60 to +64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this thread-safe? The use count could be increased on a different thread after the check but before resize, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good question. My primary motivation was that when you do

  SparsityPattern::iterator p = sp.begin();
  ++p;

and if ++p moves you to the next row of the matrix, then the current design deallocates the std::vector only to allocate another one. That's wasteful. Note that here, only one iterator is ever around, so the use count of these shared pointers is always one.

Your question pertains to the situation where another thread makes a copy of an iterator owned by the first thread (bumping the use count to two) at the very inopportune moment that the first thread has just passed the use_count() check. So perhaps something like this:

  SparsityPattern::iterator p = sp.begin();
  auto t = std::thread([&p]() { auto x = p; ++x; });
  ++p;
  t.join();

Note that here, the lambda function must capture p by reference -- if it had captured it by value, the capture would be a second copy, resulting in a use count of two, so the optimization would not have applied.

So I think you're right that the optimization is not thread-safe. Good catch!

The question is what we want to do about the situation. I would really like to optimize the use case here because typically one will only have one iterator object sitting around, no copies being made, and it seems silly to release and re-allocate the memory all the time. I could guard access to that variable with a mutex. What would you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is what we want to do about the situation. I would really like to optimize the use case here because typically one will only have one iterator object sitting around, no copies being made, and it seems silly to release and re-allocate the memory all the time. I could guard access to that variable with a mutex. What would you suggest?

Yeah, I would just use a static std::mutex and lock it for any access to colnum_cache.


if (colnum_cache->size() > 0)
{
Expand All @@ -61,7 +72,7 @@ namespace TrilinosWrappers
colnum_cache->size(),
ncols,
reinterpret_cast<TrilinosWrappers::types::int_type *>(
const_cast<size_type *>(colnum_cache->data())));
colnum_cache->data()));
AssertThrow(ierr == 0, ExcTrilinosError(ierr));
AssertThrow(static_cast<std::vector<size_type>::size_type>(ncols) ==
colnum_cache->size(),
Expand Down