prevent duplication of nodeRef on deserialization #355
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As we saw from joernio/flatgraph#11 , we have a problem when deserializing nodes: There are too many and too heavy (ShiftLeftSecurity/overflowdb-codegen#205) NodeRefs.
This addresses the "too many" part. The issue is the following: When we deserialize a node, then we ask NodeFactory to please give us a new NodeDb object with the data. But the API helpfully also allocates a NodeRef object that is kept alive by the NodeDb!
So the pointer structure when deserializing looks like this:
This is of course nonsense. We fix this by passing pre-existing NodeRef pointers into the
createNode
function and only allocate a newNodeRef
when that isnull
.This should save one NodeRef per node, which weighs 32 bytes (with ShiftLeftSecurity/overflowdb-codegen#205; before it was 40 bytes) .
PS. can you add some more eyes that I got the referenceManager queue / registerRef right? We never want to add a nodeRef twice, nor do we want to forget it.