New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defintion/Reference: Reduce space used #6249
Conversation
SortedSet is complex in memory, serialized, JSON, computationally. VI Model changes, but question answers stay the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 14 of 16 files at r1.
Reviewable status: 14 of 16 files reviewed, 1 unresolved discussion (waiting on @anothermattbrown and @dhalperi)
projects/batfish-common-protocol/src/main/java/org/batfish/datamodel/DefinedStructureInfo.java, line 33 at r1 (raw file):
addDefinitionLines
Since this is called a lot of times, it seems pretty expensive to go back and forth between IntegerSpace.Builder
and IntegerSpace
.
It seems much more efficient to collect into a mutable TreeRangeSet
, and then convert to IntegerSpace
in getDefinitionLines
. Is there a reason not to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 16 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @dhalperi)
projects/batfish-common-protocol/src/main/java/org/batfish/datamodel/DefinedStructureInfo.java, line 33 at r1 (raw file): Previously, anothermattbrown (Matt Brown) wrote…
I do not think this is much of a problem in practice - in hierarchical and Cisco-style configs, this is all O(1). [It is relevant for flattened configs, like juniper]. What we really need is a clean builder -> built separation; what you're describing is a fairly ugly (IMO) workaround of a type that has gotten us in trouble in the past. Would make the TreeRangeSet transient so that it serializes and deserializes immutably? Where this code is in the pipeline it seems hard to untangle into a proper builder -> built pattern. This PR moves this code, by far, out from being the bottleneck in either CPU or memory. |
Codecov Report
@@ Coverage Diff @@
## master #6249 +/- ##
============================================
- Coverage 72.92% 72.91% -0.02%
+ Complexity 35018 35014 -4
============================================
Files 2829 2829
Lines 142336 142352 +16
Branches 17087 17087
============================================
- Hits 103792 103789 -3
- Misses 30315 30327 +12
- Partials 8229 8236 +7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r2.
Reviewable status: complete! all files reviewed, all discussions resolved
projects/batfish-common-protocol/src/main/java/org/batfish/datamodel/DefinedStructureInfo.java, line 33 at r1 (raw file):
Previously, dhalperi (Dan Halperin) wrote…
I do not think this is much of a problem in practice - in hierarchical and Cisco-style configs, this is all O(1). [It is relevant for flattened configs, like juniper].
What we really need is a clean builder -> built separation; what you're describing is a fairly ugly (IMO) workaround of a type that has gotten us in trouble in the past. Would make the TreeRangeSet transient so that it serializes and deserializes immutably?
Where this code is in the pipeline it seems hard to untangle into a proper builder -> built pattern. This PR moves this code, by far, out from being the bottleneck in either CPU or memory.
Yeah I agree about the builder pattern, but since we don't use that in this part of the code (and use lots of mutable data structures) I was picturing just keeping it mutable. getDefinitionLines
could return an immutable view. If that's called a ton of times we could think about caching that (and yes using transient) there.
Anyway, not a blocker
It's expensive in rare cases, and also significantly more expensive to turn into IntegerSpace dynamically (deep in a nested map).
Changing from
SortedSet<Integer>
toIntegerSpace
is a dramatic reduction in storage, especially for definition lines. In one large network, halfway through parsing there were about 60M objects in memory, of which 34M were inDefinedStructureInfo
.By using
IntegerSpace
instead, we reduce sets to ranges and dramatically cut down on the number of objects maintained. Only internal storage changes - see ViModel diffs. Question answers stay the same.