Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add templated memcpy/memcmp #2655

Merged
merged 12 commits into from
Nov 24, 2021
Merged

Add templated memcpy/memcmp #2655

merged 12 commits into from
Nov 24, 2021

Conversation

lnkuiper
Copy link
Contributor

@lnkuiper lnkuiper commented Nov 23, 2021

This PR adds templated versions of memcpy and memcmp, wrapped in methods called FastMemcpy and FastMemcmp, which select the specific templated version with a large switch statement.

The allows the compiler to pre-compile these functions with the specified number of bytes, and emit more optimized machine code. This yields a speedup specifically when these functions are used in a loop with a constant size, e.g.:

const size_t size = 8;
for (idx_t i = 0; i < n; i++) {
    // ...
    FastMemcpy(dest, source, size);
    // or
    FastMemcmp(str1, str2, size);
    // ...
}

The speedup can be up to 5-6x, depending on size.

Of course, these functions are heavily used in the sort code, and adding these functions yields a significant performance boost.
I sorted 100M random integers:

.timer on
.mode trash
create table test as select cast(random() * 99999999 as int) i from range(100000000);
-- pragma threads=1; OR pragma threads=4;
select * from test order by i;

And here are the results:

Threads Old New
1 4.285s 3.966s
4 2.968s 2.277s

As we can see, the performance is mostly felt when there are more threads, because it affects merge sort more than it affects radix sort.

The fun part is that the single threaded performance is now as fast as the 4 threaded performance on this specific benchmark was when the sorting blog came out.

@Alex-Monahan
Copy link
Contributor

Wow, incredible that you've made that kind of improvement even after the initial algorithm change! Maybe it's worth re-posting the blog with a short update/new benchmark at the top?

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good. Two questions:

//! but only when you are calling memcpy with a const size in a loop.
//! For instance `while (<cond>) { memcpy(<dest>, <src>, const_size); ... }`
static inline void FastMemcpy(void *dest, const void *src, const size_t size) {
D_ASSERT(size % 8 == 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If size%8 has to be 0, do we need to add the cases for size=1, size=2, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed but I forgot it was there, my bad. We may want to use this code elsewhere where the assertion does not hold.

case 256:
return MemcpyFixed<256>(dest, src);
default:
MemcpyFixed<256>(dest, src);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just call memcpy with a variable size at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a slight performance improvement by calling the fixed memcpy like this beyond 256 bytes, that's why I kept it there. It's not much though. Would you prefer I change it?

@lnkuiper
Copy link
Contributor Author

@Alex-Monahan That is a good idea. Maybe I should hold off on that until @Mytherin finishes the local storage rework that will allow for parallel creation of ordered tables?

These two things together will give a great performance boost, especially for strings.

@Alex-Monahan
Copy link
Contributor

Even more goodies coming than I knew about!! :-)

@Mytherin Mytherin merged commit a73d33c into duckdb:master Nov 24, 2021
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants