Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 15561 #3927

Merged
merged 1 commit into from Jan 14, 2016

Conversation

Projects
None yet
3 participants
@John-Colvin
Copy link
Contributor

commented Jan 13, 2016

Is it a bit of a hack? Yes, but it works without language changes (see http://forum.dlang.org/post/vfgawgvzzfgjhshavmlq@forum.dlang.org).

Thanks to @tgehr (Timon Gehr) for the idea, although it seems others have also been using the same technique elsewhere.

@dlang-bot

This comment has been minimized.

Copy link
Contributor

commented Jan 13, 2016

Fix Bugzilla Description
15561 std.typecons.Proxy gets NaN comparisons wrong
@@ -5189,6 +5189,8 @@ mixin template Proxy(alias a)
return a.opCmp(b);
else static if (is(typeof(b.opCmp(a))))
return -b.opCmp(a);
else static if (isFloatingPoint!ValueType || isFloatingPoint!B)
return a < b ? -1 : a > b ? +1 : a == b ? 0 : float.nan;
else
return a < b ? -1 : a > b ? +1 : 0;

This comment has been minimized.

Copy link
@andralex

andralex Jan 13, 2016

Member

This may seem minor, but while at it could you please rewrite this as return a < b ? -1 : (a > b). The fact that the second branch computes a Boolean without any additional processing is key to generating good code for this. Consider http://goo.gl/VY2XLo, which show that the seemingly equivalent expressions return a > b ? 1 : a < b ? -1 : 0; and return a == b ? 0 : a > b ? +1 : -1; both generate branches.

If you have the time, I also suggest you benchmark return (a > b) - (a < b);, which generates one extra instruction but seemingly smaller code.

This comment has been minimized.

Copy link
@John-Colvin

John-Colvin Jan 13, 2016

Author Contributor

Done.

I'm gonna try and have some discussions with compiler writers about opCmp and perhaps do some more optimisation work once I've got a clearer picture of what is/isn't working and why.

Typedef ought to be as near as 100% of the speed of the original as possible, otherwise it's a pretty awful substitute for a builtin feature.

@@ -5189,6 +5189,8 @@ mixin template Proxy(alias a)
return a.opCmp(b);
else static if (is(typeof(b.opCmp(a))))
return -b.opCmp(a);
else static if (isFloatingPoint!ValueType || isFloatingPoint!B)
return a < b ? -1 : a > b ? +1 : a == b ? 0 : float.nan;

This comment has been minimized.

Copy link
@andralex

This comment has been minimized.

Copy link
@andralex

andralex Jan 13, 2016

Member

If you rewrite that as in http://goo.gl/LLO2k1, you get 13 instructions instead of 14 and one branch instead of four.

This comment has been minimized.

Copy link
@andralex

andralex Jan 13, 2016

Member

The third version at http://goo.gl/0TZdS2 has only 12 instructions including one branch.

This comment has been minimized.

Copy link
@John-Colvin

John-Colvin Jan 13, 2016

Author Contributor

What matters is what happens when it's inlined in an actual comparison, because if this opCmp isn't being inlined then a few extra instructions will be the least of your worries.

This comment has been minimized.

Copy link
@andralex

andralex Jan 13, 2016

Member

The point is smaller code and fewer branches are good proxies for better performance.

This comment has been minimized.

Copy link
@andralex

andralex Jan 13, 2016

Member

I'm not sure I understand where you're getting at.

This comment has been minimized.

Copy link
@John-Colvin

John-Colvin Jan 13, 2016

Author Contributor

Given Typedef!float x,y;, it's the code that's generated when you do this:
bool a = x [<>]=? y;
that matters, not the code that is generated when you do this:
int b = x.opCmp(y);

That is what I am testing in the link I gave.

This comment has been minimized.

Copy link
@John-Colvin

John-Colvin Jan 13, 2016

Author Contributor

See http://goo.gl/fC79Cl and you can compare the generated code between the opCmp design and doing the comparison directly.

This comment has been minimized.

Copy link
@andralex

andralex Jan 13, 2016

Member

Oh, I see. Post inlining, some of the work done by the simpler implementation gets eliminated. I guess we're in good shape. Could you please operate the stylistic change for the integral comparison, and let's finish. Thanks!

This comment has been minimized.

Copy link
@John-Colvin

John-Colvin Jan 13, 2016

Author Contributor

To be honest, they all generate disappointing code, it's a "best of the worst" situation. opCmp doesn't optimise as well as one would expect.

@andralex andralex added the needs work label Jan 13, 2016

@John-Colvin John-Colvin force-pushed the John-Colvin:patch-12 branch from ad5a9c4 to 563879a Jan 13, 2016

@John-Colvin

This comment has been minimized.

Copy link
Contributor Author

commented Jan 13, 2016

@andralex ok to remove the "needs work" label now?

@andralex

This comment has been minimized.

Copy link
Member

commented Jan 13, 2016

yah, we're done here, I'll take care of the rest; thx!

@andralex

This comment has been minimized.

Copy link
Member

commented Jan 13, 2016

Auto-merge toggled on

andralex added a commit that referenced this pull request Jan 14, 2016

@andralex andralex merged commit 22615a8 into dlang:master Jan 14, 2016

2 checks passed

CyberShadow/DAutoTest Documentation OK (no changes)
Details
auto-tester Pass: 10
Details

@John-Colvin John-Colvin deleted the John-Colvin:patch-12 branch Jan 14, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.