Simplify the management of .c_str() at the cost of never sharing the …#145
Simplify the management of .c_str() at the cost of never sharing the …#145
Conversation
There was a problem hiding this comment.
Pull request overview
This PR simplifies the c_str() management in basic_sstring by eliminating the sharing of dynamically allocated c_str values across copied/moved/swapped instances. The implementation reduces the size of basic_sstring by replacing the stored string_view with an offset_and_size struct, and changes c_str storage to use one of three states: nullptr, a pointer into the shared basic_const_string buffer, or a std::unique_ptr-managed allocation.
Key Changes:
- Replaced
m_viewmember withoffset_and_sizestruct containing offset and size fields - Removed
m_c_str_vmember (arefc_ptr to const_string for shared c_str) - Changed c_str() allocation strategy to use std::unique_ptr instead of shared arefc_ptr
- Updated copy/move/assignment operations to not share dynamically allocated c_str values
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d39677f to
7f25ca6
Compare
…value if constructed mid-view
7f25ca6 to
0f5dcfa
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto t = other.m_c_str.exchange(nullptr, std::memory_order_acq_rel); | ||
| auto u = m_c_str.exchange(t, std::memory_order_acq_rel); | ||
| other.m_c_str.exchange(u, std::memory_order_acq_rel); |
There was a problem hiding this comment.
The move constructor and move assignment operator have unnecessarily complex atomic exchange logic. Lines 204-206 perform three exchanges when a simpler approach would suffice. The code exchanges other.m_c_str with nullptr (getting t), exchanges m_c_str with t (getting u), then exchanges other.m_c_str with u again. A clearer and more efficient implementation would be: auto temp = other.m_c_str.exchange(nullptr, std::memory_order_acq_rel); auto old = m_c_str.exchange(temp, std::memory_order_acq_rel); other.m_c_str.store(old, std::memory_order_release);
| auto t = other.m_c_str.exchange(nullptr, std::memory_order_acq_rel); | |
| auto u = m_c_str.exchange(t, std::memory_order_acq_rel); | |
| other.m_c_str.exchange(u, std::memory_order_acq_rel); | |
| auto temp = other.m_c_str.exchange(nullptr, std::memory_order_acq_rel); | |
| auto old = m_c_str.exchange(temp, std::memory_order_acq_rel); | |
| other.m_c_str.store(old, std::memory_order_release); |
| if (m_arefc) | ||
| { | ||
| if (cstr == m_arefc->view().data() + m_offset_and_size.m_offset) | ||
| { | ||
| // | ||
| // This is the case that we *don't* have to do | ||
| // anything, so just exit early. | ||
| // | ||
| return; | ||
| } |
There was a problem hiding this comment.
There is a potential memory leak in unmake_c_str(). If m_arefc is null but m_c_str is not null (which can happen when make_c_str(nullptr, 0) is called), the function returns early at line 671 without deallocating the dynamically allocated c_str. The logic should check if cstr needs deallocation even when m_arefc is null. Since make_c_str(nullptr, 0) can allocate a single-character null-terminated string, this needs to be handled.
| if (m_arefc) | |
| { | |
| if (cstr == m_arefc->view().data() + m_offset_and_size.m_offset) | |
| { | |
| // | |
| // This is the case that we *don't* have to do | |
| // anything, so just exit early. | |
| // | |
| return; | |
| } | |
| // Only skip deallocation if m_arefc is non-null and cstr points to the internal buffer. | |
| if (m_arefc && cstr == m_arefc->view().data() + m_offset_and_size.m_offset) | |
| { | |
| // | |
| // This is the case that we *don't* have to do | |
| // anything, so just exit early. | |
| // | |
| return; |
| // may be performing either of these at a time. | ||
| // | ||
| // The .m_c_str() handles multithreaded construction of its | ||
| // cached value as a "convenience". The overal basic_sstring<> |
There was a problem hiding this comment.
Spelling error: "overal" should be "overall".
| // cached value as a "convenience". The overal basic_sstring<> | |
| // cached value as a "convenience". The overall basic_sstring<> |
| using std::swap; | ||
| swap(m_arefc, other.m_arefc); | ||
| swap(m_view, other.m_view); | ||
| swap(m_c_str_v, other.m_c_str_v); | ||
| m_c_str.store(nullptr, std::memory_order_release); | ||
| swap(m_offset_and_size, other.m_offset_and_size); | ||
|
|
||
| auto t = other.m_c_str.exchange(nullptr, std::memory_order_acq_rel); | ||
| auto u = m_c_str.exchange(t, std::memory_order_acq_rel); | ||
| other.m_c_str.exchange(u, std::memory_order_acq_rel); |
There was a problem hiding this comment.
The move assignment operator swaps m_arefc and m_offset_and_size before handling m_c_str, but doesn't clean up the existing m_c_str values first. This creates a complex scenario where cached c_str pointers that were valid for the old arefc/offset combinations are swapped to contexts where they may not be valid. While the atomic exchange logic attempts to swap the c_str values, it would be clearer and more maintainable to call unmake_c_str() on this object before performing the swaps, similar to how the copy assignment operator should work. This would ensure proper cleanup of any dynamically allocated c_str before taking ownership of other's state.
…value if constructed mid-view
The existing code attempted to allow for sharing of the .c_str() value across copied/moved/swapped basic_sstring<> instances even if the .c_str() had had to be constructed. (For the uninitiated, basic_sstring stores the "basic value" in the "basic_const_string<>" type which is always null terminated. The basic_sstring has an
arefc_ptr<>to the basic_const_string<>, which is essentially a slimmed down shared_ptr<> modeled on Rust'sArc<T>. When a basic_sstring<> is first created, "obviously" the c_str() is the whole thing because basic_const_string<> enables that. If a basic_sstring operation trims a string such that the end still aligns with the end of the basic_const_string, the c_str() is still just a pointer into the initial basic_const_string.But if the basic_sstring<> operation returns say one character in the middle of the string, c_str() needs to allocate a new string.
The old code tried to be clever an made a new arefc<basic_const_string>, so that if the basic_sstring were assigned etc, the c_str() that had been computed could be reused.
This can still be done but what I had didn't work and I wasn't converging on a solution right away so instead this change reduces the size of a basic_sstring which bothered me also and changes the logic so that either the stored c_str is:
so the destructor is slightly more complicated but we'll deal with this mess and perhaps get the desired efficiency some other time. The bottom three bits of the m_c_str member variable are unused and could be used without having to bloat basic_sstring albeit at further cost of complexity.