Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform evaluation steps Network using GraphGC #268

Merged
merged 19 commits into from
Jan 2, 2023

Conversation

HeinrichApfelmus
Copy link
Owner

Overview

This pull request completely changes the way that dependencies between Pulse are tracked and used in order to perform an Evaluation.step for a Network.

We use the machinery provided by GraphGC to

  • track dependencies between Pulse, using insertEdge and clearPredecessors.
  • traverse the Pulse in dependency order with early exit, using walkSuccessors_.

Comments

  • This should fix many remaining issues with garbage collection for Pulse, specifically Memory leak with dynamic behavior switching #261
  • I think that in order to fix all remaining issues for Pulse, we may have to look at garbage collection and Vault.
  • This pull request doesn't do anything for Latch.

Still, 🥳!

Obsoletes

@HeinrichApfelmus
Copy link
Owner Author

Alas, it appears that the event log for Ollie's example of #261 (comment) still shows the presence of a space leak. 🤔 Needs more work.

Eventually, the right thing to do is to include a few tests in the test suite that can detect certain space leaks. 🤓 In other words — instead of checking for space leaks manually, we should automate checking for them.

That said, the new tests do show that garbage collection on GraphGC works in principle, so I'm not sure what's going on with Ollie's example. 🤔 Trying to automate the -evenlog GHC flag could be tricky, so I'll probably start with pure measure of the GraphGC size.

@HeinrichApfelmus
Copy link
Owner Author

Alas, it appears that the event log for Ollie's example of #261 (comment) still shows the presence of a space leak.

Actually, setting profiling: True in cabal.project and running

$ cabal run leak -- +RTS -hT -RTS
$ hp2pretty leak.hp

results in the following heap profile:

leak

The program completes very quickly and runs in constant size. 🤔

However! Compiling the program with profiling: False will increase the running time considerably — there seems to be a space leak.

@HeinrichApfelmus
Copy link
Owner Author

Eventually, the right thing to do is to include a few tests in the test suite that can detect certain space leaks. 🤓 In other words — instead of checking for space leaks manually, we should automate checking for them.

I have added a function getSize that measures the size of the event network and used it to turn the offending program #261 into a test case — and voilà, it appears that the leak is caused by more and more pulses being added to the network.

Twist: The test will pass when I set profiling: True — but another test, observeE_stepper will fail! It looks like there is something wrong in the (new) interplay between Pulse and Latch.

@HeinrichApfelmus
Copy link
Owner Author

HeinrichApfelmus commented Dec 31, 2022

Getting closer.

I have added a test case that works on the Mid level as opposed to the High level; the test checks whether the combination of executeP and accumL adds more and more pulses to the graph — and this is currently the case. This is fortunate, as this result rules out observable sharing as root cause for the leak.

Observation: If I remove performSufficientGC from the runNetworkSizes function, then the network will grow faster. This implies that some garbage collection is actually taking place, it's just not fast enough at removing old pulses, or it retains some parts for longer than necessary.

Observation 2: The Mid level test fails regardless of whether profiling: True or profiling: False. Good!

@HeinrichApfelmus
Copy link
Owner Author

I have added a debugging utility printNetwork that prints the pulse graph of the network in graphviz format.

For the failing Mid-level test, the result looks like this: 😳

debug

Apparently, dereferencing the weak pointers will yield Nothing, even though they have not yet been removed from the graph. Weird.

`tryReadTQueue` returns a single element, wrapped in `Maybe`.
Due to the (new) polymorphism of `mapM_`, I did not notice that the result is a single element and not a list.
@HeinrichApfelmus
Copy link
Owner Author

Finally, fixed it! 🥳

Turns out that I was accidentally relying on tryReadTQueue to read all vertices scheduled for deletion from the queue — but that will only read a single item. The function that I actually wanted is flushTQueue.

@HeinrichApfelmus HeinrichApfelmus merged commit 440e9df into master Jan 2, 2023
@HeinrichApfelmus
Copy link
Owner Author

Hm, the space profile for #261 is still not constant, but the leak is not in the network graph, at least. 🤔

leak

@HeinrichApfelmus
Copy link
Owner Author

I found the culprit for the remaining space leak: In the function deleteVertex, I forgot to delete the vertex from the levels map in Graph:

deleteVertex :: (Eq v, Hashable v) => v -> Graph v e -> Graph v e
deleteVertex x = clearPredecessors x . clearSuccessors x

Here is the space profile after fixing this:

leak

🥳

@HeinrichApfelmus HeinrichApfelmus deleted the HeinrichApfelmus/graphgc-evaluation branch January 14, 2023 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant