Skip to content

Drop some unnecessary allocations#230

Merged
Martinsos merged 3 commits intoMartinsos:masterfrom
bobsayshilol:drop-allocs
May 13, 2025
Merged

Drop some unnecessary allocations#230
Martinsos merged 3 commits intoMartinsos:masterfrom
bobsayshilol:drop-allocs

Conversation

@bobsayshilol
Copy link
Copy Markdown
Contributor

These caught my eye so I removed the obvious ones, and then measured the performance to check that it didn't make it worse. I haven't done a thorough check since the scripts in test_data point to files I can't find, so I used the timing tests in runTests to check the performance before and after these changes. These results show a decent improvement in some cases.

Tests were done by running runTests 4000 | grep faster 3 times and recording the results. Compilers used were gcc 14.2.1 and clang 19.1.7. I haven't checked with MSVC but I assume performance won't have worsened.

Compiler/method Before (run 1) Before (run 2) Before (run 3) After (run 1) After (run 2) After (run 3)
gcc/HWA 5.51 5.51 5.53 5.91 5.89 5.90
gcc/HW 6.91 6.91 6.92 7.47 7.46 7.46
gcc/NWA 3.49 3.51 3.50 3.51 3.49 3.52
gcc/NW 11.53 11.54 11.53 12.40 12.38 12.36
gcc/SHWA 32.32 32.31 32.29 42.44 42.48 42.36
gcc/SHW 49.49 49.58 49.55 77.01 76.91 76.98
clang/HWA 5.70 5.71 5.72 5.91 5.93 5.94
clang/HW 7.15 7.16 7.15 7.52 7.48 7.47
clang/NWA 3.77 3.77 3.75 3.76 3.77 3.72
clang/NW 11.72 11.70 11.79 12.37 12.42 12.31
clang/SHWA 37.55 37.47 37.67 41.80 41.47 41.25
clang/SHW 60.33 60.45 60.57 69.76 70.14 69.68

See individual commits for more details.

Copy link
Copy Markdown
Owner

@Martinsos Martinsos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bobsayshilol , this is looking quite good!

I have a couple of small questions -> let's take a look at those and then we can merge.

// This query and target are used in all the calculations later.
*queryTransformed = static_cast<unsigned char *>(malloc(sizeof(unsigned char) * queryLength));
*targetTransformed = static_cast<unsigned char *>(malloc(sizeof(unsigned char) * targetLength));
unsigned char *queryTransformed = static_cast<unsigned char *>(malloc(sizeof(unsigned char) * queryLength));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of introducing these "intermediary" variables, and then setting queryTransformed_ and targetTransformed_ only at the very end?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in 1e4d6ec, this stops the compiler from assuming that the output pointers might alias with other pointers. In practical terms it's a micro-micro-optimisation which can be seen in this relatively noisy diff https://godbolt.org/z/YEPE7d9PK where there's one less mov (ie no double lookup) on the lines queryTransformed[i] = letterIdx[c]; and targetTransformed[i] = letterIdx[c]; (which would be (*queryTransformed_)[i] = letterIdx[c]; and (*targetTransformed_)[i] = letterIdx[c]; respectively).

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Pretty cool. I haven't seen the commit messages, my bad.

const unsigned char idx = static_cast<unsigned char>(alphabetSize++);
letterIdx[c] = idx;
alphabet[idx] = queryOriginal[i];
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would if we could kick out idx and just use alphabetSize (and then ++ it at the very end).

Maybe if we made it unsigned char?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If alphabetSize where an unsigned char then it would overflow to 0 if the alphabet needed MAX_UCHAR elements, in which case the final std::string would be passed a length of 0 which wouldn't be correct. Rather than using a specifically sized type I went with an int since it's guaranteed to be at least 16 bits and it's C/C++'s natural size for things like type promotion.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for explaining!

Copy link
Copy Markdown
Owner

@Martinsos Martinsos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good @bobsayshilol , LGTM!

Could you please rebase this on the latest master, and then I will merge? Thanks!

Putting containers on the heap doesn't do anything since they allocate
internally and don't grow on the stack, so it's just additional work
having to call into the allocator.
There's only 64 int's which is 256 bytes on most platforms.

This improves performance by a small but measureable amount in the
tests.
`alphabet += ...` isn't a trivial operation since it has to check if it
needs to allocate more memory each time it's called. Since the alphabet
can't include any duplicates we can instead create a fixed size buffer
and build it on the stack, then perform a single allocation at the end.

Also move the storing of the transformed pointers to the end so that
the compiler can infer that they don't alias with anything.

This gives another small boost in performance.
@Martinsos Martinsos merged commit 0ddc23e into Martinsos:master May 13, 2025
10 checks passed
@Martinsos
Copy link
Copy Markdown
Owner

Awesome, thanks a lot @bobsayshilol !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants