Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threeway comparision support #143

Closed
timblechmann opened this issue Aug 2, 2023 · 4 comments
Closed

threeway comparision support #143

timblechmann opened this issue Aug 2, 2023 · 4 comments

Comments

@timblechmann
Copy link

it would be great if operator <=> would be supported

@pdimov
Copy link
Member

pdimov commented Apr 23, 2024

@Lastique As far as I can see, the implementation of operator<=> for SSE2 would be line by line identical to operator<, with the final line replaced with

    return cmp <=> rcmp;

but I'm leaving the actual change to you. (I suppose you'd want to refactor the common part out.)

I haven't yet decided to definitively drop msvc-12.0 so let's not remove the unaligned load workarounds yet (although the < 1600 part can certainly be dropped.)

@Lastique
Copy link
Member

@pdimov Did you intentionally disable SSE2 implementation here, when uint128_t is available? The change was made in b3e3a59.

I would prefer SSE2 to be used over uint128_t, unless the latter is actually faster.

@pdimov
Copy link
Member

pdimov commented Apr 23, 2024

It was intentional, yes. I think we should err on the side of trusting the compilers to produce whatever codegen they believe is optimal for __uint128_t, as this would automatically take advantage of improvements they implement, and better reflect the target instruction set.

I've looked at the generated code, and everything seems OK when using __uint128_t; even operator< looks to my naked eye not that bad compared to your SSE2 implementation.

I think that the benefit we gain from just delegating the job of producing optimal codegen to the compilers and not worrying about it anymore justifies leaving small gains on the table (provided they exist at all - I haven't really measured.)

Some CE links:

Clang operator==: https://godbolt.org/z/7cP98x53q (https://godbolt.org/z/dWMe4MWqh with -march=native)
GCC operator==: https://godbolt.org/z/rY5GvhvcK (https://godbolt.org/z/r33PEd5ja with -march=native)

Clang operator<: https://godbolt.org/z/435c6dzvb (https://godbolt.org/z/Pf455dxKq with -march=native)
GCC operator<: https://godbolt.org/z/ncrcbYshT (https://godbolt.org/z/3P1n4hcWr with -march=native)

We still use the hand optimized SSE2 on MSVC and under 32 bit, which is as it should be.

@Lastique
Copy link
Member

Lastique commented Apr 23, 2024

Here are links for gcc 10.2 (the version used in Debian 11, which is one of the target platforms I'm interested in):

operator==: https://godbolt.org/z/bfTvncdo9
operator<: https://godbolt.org/z/ojqGTvx3e

And here is gcc 12.2 (Debian 12, another target system of my interest):

operator==: https://godbolt.org/z/PMzPr14nr
operator<: https://godbolt.org/z/rMK1h57oa

(Spoiler: 10.2 and 12.2 are pretty much the same.)

I think, I would still prefer SSE2 in these cases, although I haven't benchmarked yet. I definitely don't like the doubled number of loads.

@pdimov pdimov closed this as completed Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants