Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defintion/Reference: Reduce space used #6249

Merged
merged 5 commits into from Sep 30, 2020

Conversation

dhalperi
Copy link
Member

Changing from SortedSet<Integer> to IntegerSpace is a dramatic reduction in storage, especially for definition lines. In one large network, halfway through parsing there were about 60M objects in memory, of which 34M were in DefinedStructureInfo.

By using IntegerSpace instead, we reduce sets to ranges and dramatically cut down on the number of objects maintained. Only internal storage changes - see ViModel diffs. Question answers stay the same.

SortedSet is complex in memory, serialized, JSON, computationally.

VI Model changes, but question answers stay the same.
@batfish-bot
Copy link

This change is Reviewable

Copy link
Contributor

@anothermattbrown anothermattbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 16 files at r1.
Reviewable status: 14 of 16 files reviewed, 1 unresolved discussion (waiting on @anothermattbrown and @dhalperi)


projects/batfish-common-protocol/src/main/java/org/batfish/datamodel/DefinedStructureInfo.java, line 33 at r1 (raw file):

addDefinitionLines

Since this is called a lot of times, it seems pretty expensive to go back and forth between IntegerSpace.Builder and IntegerSpace.

It seems much more efficient to collect into a mutable TreeRangeSet, and then convert to IntegerSpace in getDefinitionLines. Is there a reason not to do that?

Copy link
Contributor

@anothermattbrown anothermattbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 16 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @dhalperi)

@dhalperi
Copy link
Member Author


projects/batfish-common-protocol/src/main/java/org/batfish/datamodel/DefinedStructureInfo.java, line 33 at r1 (raw file):

Previously, anothermattbrown (Matt Brown) wrote…
addDefinitionLines

Since this is called a lot of times, it seems pretty expensive to go back and forth between IntegerSpace.Builder and IntegerSpace.

It seems much more efficient to collect into a mutable TreeRangeSet, and then convert to IntegerSpace in getDefinitionLines. Is there a reason not to do that?

I do not think this is much of a problem in practice - in hierarchical and Cisco-style configs, this is all O(1). [It is relevant for flattened configs, like juniper].

What we really need is a clean builder -> built separation; what you're describing is a fairly ugly (IMO) workaround of a type that has gotten us in trouble in the past. Would make the TreeRangeSet transient so that it serializes and deserializes immutably?

Where this code is in the pipeline it seems hard to untangle into a proper builder -> built pattern. This PR moves this code, by far, out from being the bottleneck in either CPU or memory.

@codecov
Copy link

codecov bot commented Sep 29, 2020

Codecov Report

Merging #6249 into master will decrease coverage by 0.01%.
The diff coverage is 87.50%.

@@             Coverage Diff              @@
##             master    #6249      +/-   ##
============================================
- Coverage     72.92%   72.91%   -0.02%     
+ Complexity    35018    35014       -4     
============================================
  Files          2829     2829              
  Lines        142336   142352      +16     
  Branches      17087    17087              
============================================
- Hits         103792   103789       -3     
- Misses        30315    30327      +12     
- Partials       8229     8236       +7     
Impacted Files Coverage Δ Complexity Δ
.../main/java/org/batfish/datamodel/IntegerSpace.java 89.65% <ø> (ø) 21.00 <0.00> (ø)
...rencedstructures/ReferencedStructuresAnswerer.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...va/org/batfish/datamodel/DefinedStructureInfo.java 85.71% <84.61%> (-3.18%) 8.00 <4.00> (+4.00) ⬇️
...n/java/org/batfish/vendor/VendorConfiguration.java 97.24% <89.47%> (-0.89%) 51.00 <10.00> (+2.00) ⬇️
...del/answers/ConvertConfigurationAnswerElement.java 95.34% <100.00%> (ø) 21.00 <1.00> (ø)
...sh/question/UndefinedReferencesQuestionPlugin.java 100.00% <100.00%> (ø) 3.00 <0.00> (ø)
...tfish/question/UnusedStructuresQuestionPlugin.java 97.14% <100.00%> (+0.04%) 3.00 <0.00> (ø)
...n/definedstructures/DefinedStructuresAnswerer.java 88.52% <100.00%> (ø) 14.00 <0.00> (ø)
...ain/java/org/batfish/symbolic/IngressLocation.java 65.78% <0.00%> (-2.64%) 15.00% <0.00%> (-1.00%)
...col/src/main/java/org/batfish/role/InferRoles.java 89.54% <0.00%> (-1.37%) 50.00% <0.00%> (-1.00%)
... and 7 more

Copy link
Contributor

@anothermattbrown anothermattbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved


projects/batfish-common-protocol/src/main/java/org/batfish/datamodel/DefinedStructureInfo.java, line 33 at r1 (raw file):

Previously, dhalperi (Dan Halperin) wrote…

I do not think this is much of a problem in practice - in hierarchical and Cisco-style configs, this is all O(1). [It is relevant for flattened configs, like juniper].

What we really need is a clean builder -> built separation; what you're describing is a fairly ugly (IMO) workaround of a type that has gotten us in trouble in the past. Would make the TreeRangeSet transient so that it serializes and deserializes immutably?

Where this code is in the pipeline it seems hard to untangle into a proper builder -> built pattern. This PR moves this code, by far, out from being the bottleneck in either CPU or memory.

Yeah I agree about the builder pattern, but since we don't use that in this part of the code (and use lots of mutable data structures) I was picturing just keeping it mutable. getDefinitionLines could return an immutable view. If that's called a ton of times we could think about caching that (and yes using transient) there.

Anyway, not a blocker

@dhalperi dhalperi merged commit 073d509 into batfish:master Sep 30, 2020
@dhalperi dhalperi deleted the definition-lines-simpler branch September 30, 2020 05:23
dhalperi added a commit to dhalperi/batfish that referenced this pull request Oct 1, 2020
It's expensive in rare cases, and also significantly more expensive to turn into
IntegerSpace dynamically (deep in a nested map).
dhalperi added a commit that referenced this pull request Oct 1, 2020
It's expensive in rare cases, and also significantly more expensive to turn into
IntegerSpace dynamically (deep in a nested map).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants