-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialization performance #2087
Deserialization performance #2087
Conversation
libraries/lib-utility/MemoryStream.h
Outdated
/*! | ||
* @brief A low overhead memory stream with O(1) append, low heap fragmentation and a linear memory view. | ||
* | ||
* wxMemoryBuffer always appends 1Kb to the end of the buffer, causing severe performance issues | ||
* and significant heap fragmentation. There is no possibility to controll the increment value. | ||
* and significant heap fragmentation. There is no possibility to control the increment value. | ||
* | ||
* std::vector doubles it's memory size which can be problematic for large projects as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not it's
libraries/lib-utility/MemoryStream.h
Outdated
using StreamChunk = std::pair<const void*, size_t>; | ||
|
||
private: | ||
static constexpr size_t ChunkSize = 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there would be less waste of operating system pages if you reduced this constant by 2 * sizeof(void*) ? That is, sufficient space for the overhead of a std::list node in the probable implementation. I'm guessing the operating system is instead allocating a few big pages only to use 16 bytes for the end of the chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes total sense, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it should be more than two pointers, for operator new overhead. Just how much would of course vary with platform and debug configuration.
libraries/lib-xml/XMLWriter.cpp
Outdated
@@ -30,7 +30,10 @@ the general functionality for creating XML in UTF8 encoding. | |||
#include <wx/ffile.h> | |||
#include <wx/intl.h> | |||
|
|||
#include <string.h> | |||
#include <cstring> | |||
#include <charconv> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you now know, is incomplete for macOS and not even known in the Linux build. Sorry, you must find an alternative to to_chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm painfully aware of it now. This is really sad that a very simple part of C++17 is not implemented fully in 2021
41e23c9
to
9d2a064
Compare
@@ -394,45 +395,45 @@ wxString ProjectSerializer::Decode(const wxMemoryBuffer &buffer) | |||
mIds.clear(); | |||
|
|||
struct Error{}; // exception type for short-range try/catch | |||
auto Lookup = [&mIds]( UShort id ) -> const wxString & | |||
auto Lookup = [&mIds]( UShort id ) -> std::string_view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look below to the FT_Push and FT_Pop cases. I think there is an opportunity to use move assignment or swaps for a bit more performance, instead of copying of mIds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FT_Push and FT_Pop never happen, in fact!
libraries/lib-utility/MemoryStream.h
Outdated
using StreamChunk = std::pair<const void*, size_t>; | ||
|
||
private: | ||
static constexpr size_t ChunkSize = 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it should be more than two pointers, for operator new overhead. Just how much would of course vary with platform and debug configuration.
static constexpr size_t ChunkSize = | ||
1024 * 1024 - // 1Mb | ||
2 * sizeof(void*) - // account for the list node pointers | ||
sizeof(size_t); // account for the bytes used member |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe subtract a little more for operator new overhead. Make a guess what that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw no measurable difference here, to be honest. If CRT decides that it will allocate page size aligned amount of memory - we will have an overhead of 4K (less than 1% percent overhead in this case) and will be likely able to reuse it. If CRT fallbacks to some mmap
kind of API - there is no overhead that we can control here. Controlling the overhead from std::allocator is more feasible, but I'm not sure if we really need such control. (And the simplest way is to simply use a handwritten allocator)
Anyway, the current bottleneck is very far away from this place both for serialization and deserialization. A better fix would be to perform incremental IO directly from the database, but SQLite3 interface doesn't really have an incremental write API. We can avoid memory "linearization" though because sqlite3_bind_zeroblob
is a very fast API (not actually committing any pages, rather just setting a propper header)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shrug. Whatever easy little help we can get. Yes, measure first.
9d2a064
to
cb670da
Compare
I have reviewed everything but the XMLUtf8BufferWriter class and have no objections to those parts. I assume the ToChars implementation is just a great cut and paste of trusted code. Or are there any differences from your source I should know about? |
The only major difference is that I have added buffer size checks to Grisu2, which always blindly assumed that "the buffer is big enough" |
You don't yet use sqlite3_bind_zeroblob. Do you mean to? This would allow skipping MemoryStream::GetData() which is one huge allocation and memory movement? Why is the blob handle interface "not really" the incremental write we need? Are there other opportunities for performance improvementh you have seen but not yet implemented in this PR? |
libraries/lib-xml/XMLWriter.cpp
Outdated
{ | ||
constexpr size_t bufferSize = std::numeric_limits<float>::max_digits10 + | ||
5 + // No constexpr log2 yet! example - e-308 | ||
3; // Dot, sing an 0 separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the blob handle interface "not really" the incremental write we need?
SQLite needs to know the blob size in the advance, so there is no way to stream the data into the blob as you can into the file, for example. However, it is possible to reuse the "iterators" interface and drop the GetData call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment typos
libraries/lib-xml/XMLWriter.cpp
Outdated
{ | ||
constexpr size_t bufferSize = std::numeric_limits<double>::max_digits10 + | ||
5 + // No constexpr log2 yet! | ||
3; // Dot, sing an 0 separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sign
WriteEscaped(value); | ||
} | ||
|
||
void XMLUtf8BufferWriter::WriteSubTree(const std::string_view& value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn’t used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just copied the "Writer" interface, but I agree that this can be dropped.
Approved, let’s move |
Modified Grisu2 is used for floating point numbers, jeaiii for integers Grisu2 is simplish and widely used for serialization, suits our needs completely.
a81eac4
to
c710ea0
Compare
But we can know the blob size before we write it? We still capture data in chunks and move it into the blob, but we could eliminate the intermediate move into one big contiguous array. Correct? That would be a win. You were saying we might eliminate more moves if we could resize the blob as we write it? Yeah, no luck there, no additional win. |
That is what I plan to do in a next PR |
A first attempt to improve the loading times for large projects. This improves both performance and memory usage by around 2x.
46 Mb binary produces 75 Mb XML stream, which requires around 90 Mb of RAM and 5 seconds to load. Previously, RAM overhead was ~380 Mb, the loading time was 11 seconds.
The peak memory usage is now 221 Mb down from ~580. Audacity uses 134 Mb right after the project is loaded.
XML processing is now responsible for 78% of that time:
Half of the time is spent in "StartElement"
26% of the time is spent on constructing wxString objects.
Recommended: