-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[StdLib] Strictify 'fix' #5939
[StdLib] Strictify 'fix' #5939
Conversation
9edb594
to
494649b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. It's oddly specific, but it's for testing a specific pass anyway, so it's perfectly fine. It would be great to try to abstract some of that stuff out, so that we can reuse the code in other tests, but I don't really have any suggestions on that.
30 lines of imports is ridiculous, we should clean our APIs up.
plutus-core/plutus-ir/test/PlutusIR/Transform/StrictLetRec/Tests/Lib.hs
Outdated
Show resolved
Hide resolved
plutus-core/plutus-ir/test/PlutusIR/Transform/StrictLetRec/Tests/Lib.hs
Outdated
Show resolved
Hide resolved
plutus-core/plutus-ir/test/PlutusIR/Transform/StrictLetRec/Tests/Lib.hs
Outdated
Show resolved
Hide resolved
plutus-core/plutus-ir/test/PlutusIR/Transform/StrictLetRec/Tests/Lib.hs
Outdated
Show resolved
Hide resolved
plutus-core/plutus-ir/test/PlutusIR/Transform/StrictLetRec/Tests/Lib.hs
Outdated
Show resolved
Hide resolved
plutus-core/plutus-ir/test/PlutusIR/Transform/StrictLetRec/Tests/Lib.hs
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,329 @@ | |||
(let |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we print out UPLC, instead of TPLC? The TPLC is 100x harder to read, and for no benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just harder to read, it's also noisy. It has this huge fixBy
definition that is generated "just in case", which is normally removed via dead code elimination, but not in this case because we turn optimizations off.
(Note that some work in this PR is done by me and some by Yura, so it's not immediately obvious who reviewers are talking to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we print out UPLC, instead of TPLC? The TPLC is 100x harder to read, and for no benefit.
The reason I wrote PIR in this case is because I was following this advice by @michaelpj
This test isolates just one pass PIR -> PIR, so that we can reason how PIR compilation treats Let(rec)s without taking into the equation possible effects added by compilation to UPLC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is PIR, not TPLC.. but still, this is too much to look at. At the very least you should print readable PIR, but it will still be too big. I think the UPLC for this program should be quite small, probably 10 lines or so.
(\s x -> | ||
(\and -> | ||
force | ||
(case | ||
x | ||
[ (delay (constr 0 [])) | ||
, (\x xs -> | ||
delay | ||
(force | ||
(case | ||
x | ||
[(delay (and xs)), (delay (constr 1 []))]))) ])) | ||
(s s)) | ||
(\s ds -> | ||
force | ||
(case | ||
ds | ||
[ (delay (constr 0 [])) | ||
, (\x xs -> | ||
delay | ||
(force | ||
(case x [(delay (s s xs)), (delay (constr 1 []))]))) ])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! So this is what gets more efficient and we literally just inline a single application: and = s s
gets inlined. Why does the optimizer not do that? Perhaps 'cause it's not semantics preserving, for all the optimizer knows evaluating s s
(I just realized I should've named it differently...) may be impure.
I now wonder if we can make it even more efficient by somehow caching s s
once and for all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the optimizer not do that? Perhaps 'cause it's not semantics preserving, for all the optimizer knows evaluating s s (I just realized I should've named it differently...) may be impure.
That's exactly why.
What I'm wondering is how the cost improvement can be so significant. I'd expect a very tiny improvement, and if and
is used more than once, I'd expect a regression 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect a very tiny improvement
It may simply enable us to do more optimizations, I'll take a closer look. But in any case not saving something into a name pointlessly and then needing to do a lookup certainly should indeed help.
if
and
is used more than once, I'd expect a regression 😕
No, and
is computed at every recursive step: the \s -> s s
thing, i.e. the driver of recursion, is outside of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may simply enable us to do more optimizations
But this is the final UPLC, after all optimizations. I struggle to see how the cost difference can be so big - I must be missing something.
No, and is computed at every recursive step: the \s -> s s thing, i.e. the driver of recursion, is outside of it.
What I'm saying is that if and
occurs twice, then inlining it causes s s
to be evaluated twice at each recursive step, which should be worse. I'm surprised that there isn't a single test case that became worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now wonder if we can make it even more efficient by somehow caching
s s
once and for all.
I've looked at it as hard as I could and I failed to find any opportunity there, it probably doesn't exist.
/benchmark nofib |
Click here to check the status of your benchmark. |
f4e4826
to
0121dfa
Compare
I ran some of the benchmarks in this PR. This seems to speed things up a bit, apart from some suspicious results for the list benchmarks. It definitely doesn't make things worse. |
(\f -> (\s -> s s) (\s x -> f (s s) x))) No newline at end of file | ||
(\f -> (\s -> s s) (\s -> f (\x -> s s x)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anybody wondering, this is the whole change.
(force | ||
(force ifThenElse | ||
(lessThanEqualsInteger 10 x) | ||
(delay ((\x -> s s x) (addInteger 1 i) xs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization idea: we should eta-contract applied functions. That would change effect ordering though, I think.
({cpu: 13825400 | ||
| mem: 38840}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Barely any difference for findIndex
and a big difference for elem
:
- ({cpu: 10254990
- | mem: 32120})
+ ({cpu: 9426990
+ | mem: 28520})
and all
:
- ({cpu: 10219630
- | mem: 32120})
+ ({cpu: 9391630
+ | mem: 28520})
The more I look at it, the more confused I become.
Maybe it's indeed just an extra lambda and an application? We have different extra ones for findIndex
as well, that could explain all the difference.
({cpu: 9426990 | ||
| mem: 28520}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No extra lam-apply => big difference.
@zliu41 I think the by far most likely explanation is simply removal of a single lam-apply pair. Because when we get a different one back, the numbers go roughly to where they were before. The simplest way to investigate it would be to eta-contract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zliu41 I think the by far most likely explanation is simply removal of a single lam-apply pair.
I think you are likely correct. For any
or elem
, the recursion is simply called on xs
(i.e., go xs
). The UPLC inliner can inline xs
, yielding s s xs
, which is efficient.
For findIndex
, on the other hand, we have go (Builtins.addInteger i 1) xs
, and the inliner is unable to inline Builtins.addInteger i 1
(because it's impure), leading to (\x -> s s x) (addInteger 1 i)
.
There is a big cost difference for findIndexEmptyList
, and I think that's because s s
is evaluated once before, but not evaluated at all after. In general s s
is evaluated one more time before vs. after.
EvaluationFailure -> | ||
fail $ "Evaluation failed, available traces: " <> show traces | ||
EvaluationSuccess term -> do | ||
term @?= constant () (someValue (1 :: Integer)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Unisay just FYI, you can write fromValue (1 :: Integer)
instead of constant () (someValue (1 :: Integer)
. It's a bit confusing, because we have a HasConstant
class that contains
-- | Wrap a Haskell value as a @term@.
fromConstant :: Some (ValueOf (UniOf term)) -> term
which is a lot like TermLike
containing
constant :: ann -> Some (ValueOf uni) -> term ann
but doesn't take an annotation. fromValue
is defined in terms of the former and we use it during evaluation unlike anything from TermLike
. And in general, evaluation stuff is separate (because it needs to be ultra-optimized), so it doesn't have great discoverability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will use fromValue
next time!
@Unisay has addressed comments, I've written a Note explaining what we compile (mutually) recursive functions to, let's merge this bad boi. There's still work to do here, I'll create some issues or something. |
I still think |
@zliu41 I think nobody is going to read it regardless. Without optimizations we produce this unreadable CPSed code and it's not much better in UPLC compared to TPLC/PIR, types even help reading it a bit. But if you have a strong opinion about it, I'll make it UPLC. Note that this PR reduces the size of the file (old version vs new version) by 2x. |
We did read |
This makes
fix
stricter so that the body of the function that it takes always gets forced regardless of whether the necessary argument is provided or not.Resolves #5961.