Reduce allocations for event dispatch #243

ocharles · 2022-01-17T09:09:07Z

This commit continues my work on low-level optimizations for event dispatch. My plan of attack is to compile all modules with -ddump-stg, and then methodically work from the outside in, finding all unnecessary let bindings (let bindings I know that we will always need to force) and trying to eliminate them. The commits here show the reduction in heap allocations for cabal run benchmarks -- -p Boring --stdev Infinity +RTS -s.

This is still WIP, but I thought I'd throw this up for now so others can see what I'm doing!

ocharles · 2022-01-17T09:10:00Z

reactive-banana/src/Control/Monad/Trans/ReaderWriterIO.hs

-{-----------------------------------------------------------------------------
-    Type and class instances
------------------------------------------------------------------------------}
-newtype ReaderWriterIOT r w m a = ReaderWriterIOT { run :: r -> IORef w -> m a }


This actually just incurs more boxing than using CPS'ed WriterT, because whenever we call run we need to box up an IORef

ocharles · 2022-01-17T09:11:29Z

reactive-banana/src/Reactive/Banana/Prim/Evaluation.hs

@@ -107,14 +108,14 @@ insertNodes :: RWS.Tuple BuildR (EvalPW, BuildW) Lazy.Vault -> [SomeNode] -> Que
 insertNodes (RWS.Tuple (time,_) _ _) = go
    where
    go :: [SomeNode] -> Queue SomeNode -> IO (Queue SomeNode)
-    go []              q = return q
-    go (node@(P p):xs) q = do
+    go []              !q = return q


This is a big reduction in allocations, perhaps the biggest in this branch. I think we could replace insertNodes with foldl' or foldr and this might be a bit more obvious.

ocharles · 2022-01-17T09:12:35Z

reactive-banana/src/Reactive/Banana/Prim/IO.hs

@@ -39,7 +39,7 @@ newInput = mdo
        }
    -- Also add the  alwaysP  pulse to the inputs.
    let run :: a -> Step
-        run a = step ([P pulse, P always], Lazy.insert key (Just a) Lazy.empty)
+        run a n = step ([P pulse, P always], Lazy.insert key (Just a) Lazy.empty) n


This eta expansion allows run to be a two-parameter closure, rather than two nested closures.

ocharles · 2022-01-17T09:13:31Z

reactive-banana/src/Reactive/Banana/Prim/Plumbing.hs

+-- Recursively execute the  buildLater  calls.
+unfold :: BuildR -> BuildW -> BuildIO a -> IO (a, BuildW)
+unfold !i w m = do
+    (a, BuildW (w1, w2, w3, later)) <- RW.runReaderWriterIOT m i
+    let !w' = w <> BuildW (w1,w2,w3,mempty)
+    w'' <- case later of
+        Just m  -> snd <$> unfold i w' m
+        Nothing -> return w'
+    return (a,w'')


I don't necessarily need to make unfold a top-level binding, but the important thing is that it doesn't mention any variables bound by runBuildIO. This allows us to inline a bit of runBuildIO, and to float the recursion loop out to the top-level. Otherwise, each runBuildIO allocates a new recursion closure

ocharles · 2022-01-17T09:14:16Z

reactive-banana/src/Reactive/Banana/Prim/Types.hs

@@ -23,7 +23,7 @@ import Reactive.Banana.Prim.Util
 -- | A 'Network' represents the state of a pulse/latch network,
 data Network = Network
    { nTime           :: !Time                 -- Current time.
-    , nOutputs        :: !(OrderedBag Output)  -- Remember outputs to prevent garbage collection.
+    , nOutputs        :: {-# unpack #-} !(OrderedBag Output)  -- Remember outputs to prevent garbage collection.


Avoids actually allocating an OrderedBag when creating an updated Network after stepping - instead we can just store the underlying pointers that we have access to from some worker/wrapper transformations.

…c-evaluation Perform evaluation steps `Network` using `GraphGC` ### Overview This pull request completely changes the way that dependencies between `Pulse` are tracked and used in order to perform an `Evaluation.step` for a `Network`. We use the machinery provided by `GraphGC` to * track dependencies between `Pulse`, using `insertEdge` and `clearPredecessors`. * traverse the `Pulse` in dependency order with early exit, using `walkSuccessors_`. ### Comments * This should fix many remaining issues with garbage collection for `Pulse`, specifically #261 * I think that in order to fix *all* remaining issues for `Pulse`, we may have to look at garbage collection and `Vault`. * This pull request doesn't do anything for `Latch`. Still, 🥳! ### Obsoletes * #182 * sadly, #243

mitchellwrosen · 2023-02-24T15:31:39Z

No pressure of course but it'd be great to get these memory optimizations in somehow :)

As mentioned in #268 this PR was made stale. Maybe smaller chunks would be appropriate? Or, well -- I doubt it -- probably just bad timing. Big rewrites of internals don't happen often.

mitchellwrosen · 2023-02-24T15:37:50Z

Huh, if the benchmarks haven't changed, things seem to have gotten substantially worse. I'm seeing 11,163,946,400 total bytes allocated by cabal run reactive-banana:benchmark -- -p Boring --stdev Infinity +RTS -s, compared to where Ollie started at 4,941,300,968. Is that right?

mitchellwrosen · 2023-02-24T16:17:28Z

I ran a quick eventlog2html and I see that the majority of allocations are attributed to the STACK (thread's stack), [], and ARR_WORDS (ByteArray# / MutableByteArray#)

HeinrichApfelmus · 2023-02-25T11:35:47Z

I'm seeing 11,163,946,400 total bytes allocated by

Fixing the space leaks was worth it, I would say. 😅

Also note that this is "The total number of bytes allocated by the program over the whole run." Which is not quite the same as "The maximum space actually used by your program is the 'bytes maximum residency' figure." The total number of bytes is highly correlated to CPU time, not so much to memory use.

@mitchellwrosen , if you want to play around with such lower-level optimizations, switching Data.HashMap to Data.IntMap or something like that may be worth investigating. I'd like to keep Graph polymorphic in v, but replacing the Hashable v with

class IsInteger v where
    -- | `toInteger x = toInteger y` implies `x = y`. 
    toInteger :: v → Integer

or something like that would be fine with me.

Hm, or maybe

class HasUnique v where
    toUnique :: v → Unique

mitchellwrosen · 2023-11-15T16:02:23Z

@mitchellwrosen , if you want to play around with such lower-level optimizations, switching Data.HashMap to Data.IntMap or something like that may be worth investigating. I'd like to keep Graph polymorphic in v, but replacing the Hashable v with

I think to get down to IntMap performance we'd need to just have a monotonically-increasing Int supply of unique numbers, not Integer, which wouldn't really work on 32 bit systems. On 64 bit systems I think treating Int as effectively infinitely-sized makes good sense.

dpwiz · 2023-11-23T11:07:40Z

Can we put this under CPP or a flag and use IntMap on x64?

ocharles added 10 commits January 16, 2022 15:57

4,941,300,968 -> 4,917,479,704

832aba2

4,917,479,704 -> 4,901,063,992

d1ee154

4,901,063,992 -> 4,900,822,728

4a60b22

4,900,822,728 -> 4,871,437,720

bfe5aa5

4,871,437,776 -> 4,823,075,472

df1b58d

4,823,071,504 -> 4,805,689,088

5a33f4a

4,805,689,088 -> 4,759,653,648

3ee56fa

4,759,653,672 -> 4,751,794,496

3d9160c

4,751,789,688 -> 4,325,598,336

d44cb98

4,325,598,080 -> 4,325,529,216

d4021c1

ocharles commented Jan 17, 2022

View reviewed changes

HeinrichApfelmus mentioned this pull request Dec 26, 2022

Perform evaluation steps Network using GraphGC #268

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations for event dispatch #243

Reduce allocations for event dispatch #243

ocharles commented Jan 17, 2022

ocharles Jan 17, 2022

ocharles Jan 17, 2022

ocharles Jan 17, 2022

ocharles Jan 17, 2022

ocharles Jan 17, 2022

mitchellwrosen commented Feb 24, 2023

mitchellwrosen commented Feb 24, 2023

mitchellwrosen commented Feb 24, 2023

HeinrichApfelmus commented Feb 25, 2023 •

edited

Loading

mitchellwrosen commented Nov 15, 2023

dpwiz commented Nov 23, 2023

Reduce allocations for event dispatch #243

Are you sure you want to change the base?

Reduce allocations for event dispatch #243

Conversation

ocharles commented Jan 17, 2022

ocharles Jan 17, 2022

Choose a reason for hiding this comment

ocharles Jan 17, 2022

Choose a reason for hiding this comment

ocharles Jan 17, 2022

Choose a reason for hiding this comment

ocharles Jan 17, 2022

Choose a reason for hiding this comment

ocharles Jan 17, 2022

Choose a reason for hiding this comment

mitchellwrosen commented Feb 24, 2023

mitchellwrosen commented Feb 24, 2023

mitchellwrosen commented Feb 24, 2023

HeinrichApfelmus commented Feb 25, 2023 • edited Loading

mitchellwrosen commented Nov 15, 2023

dpwiz commented Nov 23, 2023

HeinrichApfelmus commented Feb 25, 2023 •

edited

Loading