Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent duplication of nodeRef on deserialization #355

Merged
merged 3 commits into from Feb 20, 2023
Merged

Conversation

bbrehm
Copy link
Contributor

@bbrehm bbrehm commented Feb 20, 2023

As we saw from joernio/flatgraph#11 , we have a problem when deserializing nodes: There are too many and too heavy (ShiftLeftSecurity/overflowdb-codegen#205) NodeRefs.

This addresses the "too many" part. The issue is the following: When we deserialize a node, then we ask NodeFactory to please give us a new NodeDb object with the data. But the API helpfully also allocates a NodeRef object that is kept alive by the NodeDb!

So the pointer structure when deserializing looks like this:

NodeRef1 -(`private Node node`)-> NodeDb1 -(`private NodeRef ref`)-> NodeRef2 -(`private Node node`)-> NodeDb1

This is of course nonsense. We fix this by passing pre-existing NodeRef pointers into the createNode function and only allocate a new NodeRef when that is null.

This should save one NodeRef per node, which weighs 32 bytes (with ShiftLeftSecurity/overflowdb-codegen#205; before it was 40 bytes) .

PS. can you add some more eyes that I got the referenceManager queue / registerRef right? We never want to add a nodeRef twice, nor do we want to forget it.

core/src/main/java/overflowdb/NodeFactory.java Outdated Show resolved Hide resolved
if (node == null) throw new IllegalStateException("unable to read node from disk; id=" + id);
this.node = node;
assert(this.node == node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java's asserts are not being checked at runtime by default 😢
I tend to throw new AssertionError in this case, but I guess there's other options.

Also: the error text in this case wouldn't be very helpful.

@mpollmeier
Copy link
Contributor

mpollmeier commented Feb 20, 2023

Hmm, this doesn't seem to fix the underlying issue yet. I tested with ShiftLeftSecurity/overflowdb-codegen#205 and still got twice as many Refs compared to the Db nodes...

I take that back, this works, I didn't upgrade the joern version in the odbv2 branch.
🎉

Co-authored-by: Michael Pollmeier <michael@michaelpollmeier.com>
Copy link
Contributor

@mpollmeier mpollmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the assert issue before merging

@bbrehm bbrehm merged commit 379a2d3 into master Feb 20, 2023
@delete-merged-branch delete-merged-branch bot deleted the bbrehm/noDupRef branch February 20, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants