Drop some unnecessary allocations#230
Conversation
Martinsos
left a comment
There was a problem hiding this comment.
Thanks @bobsayshilol , this is looking quite good!
I have a couple of small questions -> let's take a look at those and then we can merge.
| // This query and target are used in all the calculations later. | ||
| *queryTransformed = static_cast<unsigned char *>(malloc(sizeof(unsigned char) * queryLength)); | ||
| *targetTransformed = static_cast<unsigned char *>(malloc(sizeof(unsigned char) * targetLength)); | ||
| unsigned char *queryTransformed = static_cast<unsigned char *>(malloc(sizeof(unsigned char) * queryLength)); |
There was a problem hiding this comment.
What is the benefit of introducing these "intermediary" variables, and then setting queryTransformed_ and targetTransformed_ only at the very end?
There was a problem hiding this comment.
As noted in 1e4d6ec, this stops the compiler from assuming that the output pointers might alias with other pointers. In practical terms it's a micro-micro-optimisation which can be seen in this relatively noisy diff https://godbolt.org/z/YEPE7d9PK where there's one less mov (ie no double lookup) on the lines queryTransformed[i] = letterIdx[c]; and targetTransformed[i] = letterIdx[c]; (which would be (*queryTransformed_)[i] = letterIdx[c]; and (*targetTransformed_)[i] = letterIdx[c]; respectively).
There was a problem hiding this comment.
Got it! Pretty cool. I haven't seen the commit messages, my bad.
| const unsigned char idx = static_cast<unsigned char>(alphabetSize++); | ||
| letterIdx[c] = idx; | ||
| alphabet[idx] = queryOriginal[i]; |
There was a problem hiding this comment.
I would if we could kick out idx and just use alphabetSize (and then ++ it at the very end).
Maybe if we made it unsigned char?
There was a problem hiding this comment.
If alphabetSize where an unsigned char then it would overflow to 0 if the alphabet needed MAX_UCHAR elements, in which case the final std::string would be passed a length of 0 which wouldn't be correct. Rather than using a specifically sized type I went with an int since it's guaranteed to be at least 16 bits and it's C/C++'s natural size for things like type promotion.
There was a problem hiding this comment.
Makes sense, thanks for explaining!
Martinsos
left a comment
There was a problem hiding this comment.
All good @bobsayshilol , LGTM!
Could you please rebase this on the latest master, and then I will merge? Thanks!
Putting containers on the heap doesn't do anything since they allocate internally and don't grow on the stack, so it's just additional work having to call into the allocator.
There's only 64 int's which is 256 bytes on most platforms. This improves performance by a small but measureable amount in the tests.
`alphabet += ...` isn't a trivial operation since it has to check if it needs to allocate more memory each time it's called. Since the alphabet can't include any duplicates we can instead create a fixed size buffer and build it on the stack, then perform a single allocation at the end. Also move the storing of the transformed pointers to the end so that the compiler can infer that they don't alias with anything. This gives another small boost in performance.
|
Awesome, thanks a lot @bobsayshilol ! |
These caught my eye so I removed the obvious ones, and then measured the performance to check that it didn't make it worse. I haven't done a thorough check since the scripts in
test_datapoint to files I can't find, so I used the timing tests inrunTeststo check the performance before and after these changes. These results show a decent improvement in some cases.Tests were done by running
runTests 4000 | grep faster3 times and recording the results. Compilers used weregcc 14.2.1andclang 19.1.7. I haven't checked with MSVC but I assume performance won't have worsened.See individual commits for more details.