Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLT-834 Experiment: refactor CEK machine for adding a stepper #4909

Closed
wants to merge 3 commits into from

Conversation

thealmarty
Copy link
Contributor

This branch refactors the CEK machine such that we can add a stepper to it easier. It's for benchmarking purposes. Do not merge.

Pre-submit checklist:

  • Branch
    • Tests are provided (if possible)
    • Commit sequence broadly makes sense
    • Key commits have useful messages
    • Relevant tickets are mentioned in commit messages
    • Formatting, PNG optimization, etc. are updated
  • PR
    • (For external contributions) Corresponding issue exists and is linked in the description
    • Targeting master unless this is a cherry-pick backport
    • Self-reviewed the diff
    • Useful pull request description
    • Reviewer requested

@thealmarty thealmarty added Do not merge EXPERIMENT Experiments that we probably don't want to merge labels Oct 13, 2022
@thealmarty
Copy link
Contributor Author

/benchmark plutus-benchmark:validation

@thealmarty thealmarty force-pushed the thealmarty/plt-834-experiment-refactor-cek branch from c21409b to 619d1ee Compare October 13, 2022 17:36
@thealmarty
Copy link
Contributor Author

/benchmark plutus-benchmark:validation

@kwxm
Copy link
Contributor

kwxm commented Oct 14, 2022

The automatic benchmarking is still broken. I'll run the benchmarks manually on the benchmarking machine and paste the results in here.

@kwxm
Copy link
Contributor

kwxm commented Oct 14, 2022

Oh.

Script                              Old            New         Change
----------------------------------------------------------------------------------
auction_1-1                       182.9 μs	 1.847 ms    +909.8%
auction_1-2                       776.7 μs	 9.276 ms   +1094.3%
auction_1-3                       762.9 μs	 9.215 ms   +1107.9%
auction_1-4                       230.8 μs	 2.487 ms    +977.6%
auction_2-1                       183.4 μs	 1.866 ms    +917.4%
auction_2-2                       774.4 μs	 9.235 ms   +1092.5%
auction_2-3                       984.0 μs	 12.50 ms   +1170.3%
auction_2-4                       761.8 μs	 9.226 ms   +1111.1%
auction_2-5                       231.5 μs	 2.489 ms    +975.2%
crowdfunding-success-1            215.2 μs	 2.251 ms    +946.0%
crowdfunding-success-2            215.2 μs	 2.244 ms    +942.8%
crowdfunding-success-3            214.6 μs	 2.242 ms    +944.7%
currency-1                        279.4 μs	 3.237 ms   +1058.6%
escrow-redeem_1-1                 397.7 μs	 4.445 ms   +1017.7%
escrow-redeem_1-2                 398.8 μs	 4.449 ms   +1015.6%
escrow-redeem_2-1                 469.5 μs	 5.354 ms   +1040.4%
escrow-redeem_2-2                 470.1 μs	 5.361 ms   +1040.4%
escrow-redeem_2-3                 470.2 μs	 5.339 ms   +1035.5%
escrow-refund-1                   162.6 μs	 1.595 ms    +880.9%
future-increase-margin-1          279.9 μs	 3.226 ms   +1052.6%
future-increase-margin-2          640.6 μs	 7.620 ms   +1089.5%
future-increase-margin-3          640.3 μs	 7.583 ms   +1084.3%
future-increase-margin-4          598.4 μs	 6.768 ms   +1031.0%
future-increase-margin-5          966.6 μs	 11.08 ms   +1046.3%
future-pay-out-1                  279.1 μs	 3.225 ms   +1055.5%
future-pay-out-2                  638.9 μs	 7.671 ms   +1100.7%
future-pay-out-3                  638.3 μs	 7.620 ms   +1093.8%
future-pay-out-4                  963.2 μs	 11.05 ms   +1047.2%
future-settle-early-1             279.0 μs	 3.234 ms   +1059.1%
future-settle-early-2             638.3 μs	 7.548 ms   +1082.5%
future-settle-early-3             637.8 μs	 7.546 ms   +1083.1%
future-settle-early-4             737.6 μs	 7.651 ms    +937.3%
game-sm-success_1-1               451.7 μs	 4.874 ms    +979.0%
game-sm-success_1-2               200.8 μs	 2.107 ms    +949.3%
game-sm-success_1-3               763.8 μs	 9.248 ms   +1110.8%
game-sm-success_1-4               231.9 μs	 2.499 ms    +977.6%
game-sm-success_2-1               450.5 μs	 4.903 ms    +988.3%
game-sm-success_2-2               200.2 μs	 2.109 ms    +953.4%
game-sm-success_2-3               766.5 μs	 9.034 ms   +1078.6%
game-sm-success_2-4               232.9 μs	 2.489 ms    +968.7%
game-sm-success_2-5               766.4 μs	 9.092 ms   +1086.3%
game-sm-success_2-6               231.8 μs	 2.487 ms    +972.9%
multisig-sm-1                     470.9 μs	 5.157 ms    +995.1%
multisig-sm-2                     463.4 μs	 4.920 ms    +961.7%
multisig-sm-3                     463.0 μs	 5.008 ms    +981.6%
multisig-sm-4                     468.7 μs	 5.082 ms    +984.3%
multisig-sm-5                     679.1 μs	 7.714 ms   +1035.9%
multisig-sm-6                     470.4 μs	 5.183 ms   +1001.8%
multisig-sm-7                     457.6 μs	 4.944 ms    +980.4%
multisig-sm-8                     464.0 μs	 5.022 ms    +982.3%
multisig-sm-9                     468.0 μs	 5.100 ms    +989.7%
multisig-sm-10                    679.9 μs	 7.711 ms   +1034.1%
ping-pong-1                       382.0 μs	 4.174 ms    +992.7%
ping-pong-2                       382.9 μs	 4.188 ms    +993.8%
ping-pong_2-1                     221.5 μs	 2.363 ms    +966.8%
prism-1                           165.3 μs	 1.686 ms    +920.0%
prism-2                           487.6 μs	 5.191 ms    +964.6%
prism-3                           409.2 μs	 4.712 ms   +1051.5%
pubkey-1                          141.7 μs	 1.366 ms    +864.0%
stablecoin_1-1                    1.057 ms	 10.17 ms    +862.2%
stablecoin_1-2                    194.4 μs	 2.031 ms    +944.8%
stablecoin_1-3                    1.193 ms	 11.92 ms    +899.2%
stablecoin_1-4                    208.3 μs	 2.195 ms    +953.8%
stablecoin_1-5                    1.470 ms	 16.13 ms    +997.3%
stablecoin_1-6                    257.6 μs	 2.777 ms    +978.0%
stablecoin_2-1                    1.055 ms	 10.17 ms    +864.0%
stablecoin_2-2                    194.8 μs	 2.029 ms    +941.6%
stablecoin_2-3                    1.197 ms	 12.05 ms    +906.7%
stablecoin_2-4                    207.7 μs	 2.193 ms    +955.8%
token-account-1                   206.8 μs	 2.261 ms    +993.3%
token-account-2                   371.9 μs	 4.413 ms   +1086.6%
uniswap-1                         479.7 μs	 6.070 ms   +1165.4%
uniswap-2                         241.7 μs	 2.632 ms    +989.0%
uniswap-3                         1.877 ms	 24.12 ms   +1185.0%
uniswap-4                         340.7 μs	 3.746 ms    +999.5%
uniswap-5                         1.288 ms	 15.16 ms   +1077.0%
uniswap-6                         330.0 μs	 3.620 ms    +997.0%
vesting-1                         405.6 μs	 4.555 ms   +1023.0%

@kwxm
Copy link
Contributor

kwxm commented Oct 14, 2022

I tried again and got similar results.

@michaelpj
Copy link
Contributor

That's more or less what I'd expect!

@michaelpj
Copy link
Contributor

Shower thoughts.

We really need to get rid of the creation of the state objects. I don't think there's any way GHC is going to be clever enough to eliminate those, the place where they're destructed is just way too far from where they're constructed. I think that means that instead we probably need to do something with continuations:

{-# INLINE computeStep #-}
computeStep :: (... -> r) -> .... -> r
computeStep k =
  go ...
    -- instead of Computing $ ...
    k ...

So computeStep gets inlined, at which point we hopefully know k, so GHC can optimize it properly. Except it's going to be more complicated than that because:

  1. We need to pass continuations for all the different cases, so three of them.
  2. The continuations are recursive with the main function! I'm not sure how to make this nice, might just have to try it.

@kwxm
Copy link
Contributor

kwxm commented Oct 14, 2022

continuations

Oh no.

@effectfully
Copy link
Contributor

@michaelpj

We really need to get rid of the creation of the state objects.

No, we just need to recurse in the worker of the production machine and not recurse in the worker of the debugging machine. Something along these lines (pseudocode, doesn't type check, omitting all frames etc business for clarity):

enterComputeCek debug term = computeCekStep term where
    computeCek = if debug then computeCekDebug else computeCekStep
    {-# NOINLINE computeCek #-}  -- Making sure the `if` is only evaluated once.

    computeCekDebug term = pure $ Computing term

    computeCekStep (Force _ body) = computeCek body
    <other_clauses>

Here you'll still return a CekState on the production path, but it'll always be Terminating, because it's the good old recursive CEK machine, just wraps the result with Terminating for the purposes of unifying the interface for the two kinds of the machine. Given that the machine is tail-recursive, this additional wrapping will affect performance in a negligible way.

It's probably still going to be a bit slower (not by 10x for sure, though), because this alignment likely breaks compilation to join points, but then you can do the class with a type-level Bool trick and replace the NOINLINE pragma with the INLINE one and let GHC do its magic. You could also mark enterComputeCek as INLINE, apply it to False in a standalone definition and make sure everything inlines properly, but that is much less reliable than matching on a Bool at the type level.

@michaelpj
Copy link
Contributor

Sneaky, yes indeed that would probably work. I was trying to abstract out the recursion, which would probably work worse anyway.

@thealmarty
Copy link
Contributor Author

So I'm not sure we can say that we're sharing the code in this way. We still need to abstract out the "steps" for the debugging machine. It's basically a case match with a way of returning the same type at the end (CekState). I'm not sure what that's buying us?

@effectfully
Copy link
Contributor

effectfully commented Oct 24, 2022

We still need to abstract out the "steps" for the debugging machine.

We still need Computing, Returning and Terminating and whatever logic they require, but we don't need to duplicate computeCek, returnCek, applyEvaluate etc (just like in your current PR, except with better performance).

I'm not sure what that's buying us?

Performance.

@michaelpj
Copy link
Contributor

So I'm not sure we can say that we're sharing the code in this way. We still need to abstract out the "steps" for the debugging machine.

My understanding of what Roman is proposing is that it would not duplicate the logic for the steps. We would be sharing most of the code.

@michaelpj
Copy link
Contributor

I think we went the other way for now, we can resurrect this if we need it.

@michaelpj michaelpj closed this Apr 3, 2023
@kwxm kwxm deleted the thealmarty/plt-834-experiment-refactor-cek branch November 10, 2023 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Do not merge EXPERIMENT Experiments that we probably don't want to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants