Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential memory improvement by re-using NameState across multiple patterns #88

Merged
merged 18 commits into from
May 23, 2023

Conversation

jonessha
Copy link
Contributor

Description of changes:

This is a continuation of #75. I have closed out that pull request and opened this open.

Previously, Ruler would create a new NameState for every pattern. Now, NameStates are re-used across patterns for a given field.

The NameState is accessed via the pattern's ByteMatch at the end of the ByteMachine and provides access to the ByteMachine for the next field. So picture a rule with many fields, and for each of these fields, an array of many elements. Each array element is a pattern for the field. So this turns into a rapidly/exponentially expanding tree of NameStates and ByteMachines. But it's all unnecessary. Since the next field and its patterns will be the same no matter what value we use to match the current field, there is no reason not to re-use NameStates (and thus ByteMachines). This collapses an exponentially-expanding tree into a line. Or to word it another way, this changes Ruler's memory complexity from O(product of all array sizes) to O(sum of all array sizes).

This introduced the concept of "sub-rules". These already existed in Ruler as different parts of an "or" disjunction (whether explicitly with $or or implicitly by specifying the same rule name multiple times). But these sub-rules were often described as "rules" in the Ruler code. It became important to capture the distinction in this CR so we could ensure a machine traversal is satisfying a single sub-rule and not different parts of many sub-rules.

Benchmark / Performance (for source code changes):

Testing against one event with several large arrays, there was a 99.5% reduction in the number of ByteMatches created.

This is already running in a production environment and is achieving the same match results as the previous version of Ruler. This has led to a 70% reduction in GC activity, 45% reduction in heap usage, and 28% reduction in CPU. Latency to create a Ruler machine is down 35-90% at high percentiles, depending on the percentile.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@longzhang-lz
Copy link
Collaborator

This looks great, Aha, I see where your inspiration was from, it seems similar like spin bytestate - name state spin.
just start reading the change, I will have more time to complete review in the weekend.

But these sub-rules were often described as "rules" in the Ruler code. It became important to capture the distinction in this CR so we could ensure a machine traversal is satisfying a single sub-rule and not different parts of many sub-rules.

For this one, existing sub rule wont' consume duplicated memory, each time, only delta portion of sub rule will actually be added into Ruler and it won't result additional run-time matching time.
Could you please clarify why it matters to distinguish the sub rule, is there specific request to know exactly which sub rule resulted the rule matching? given it is or relationship, I am wondering what benefit to client if tell client the detail of which "or" condition get matched

@jonessha
Copy link
Contributor Author

Thanks @longzhang-lz for having a look. Always great to get your feedback!

Distinguishing sub-rules from rules wasn't to save additional memory or to know exactly which sub-rule resulted in a match. Distinguishing sub-rules was needed just for correctness. The unit test ACMachineTest.testDifferentValuesFromOrRuleBothGoThroughSharedNameState (along with testDifferentValuesFromExplicitOrRuleBothGoThroughSharedNameState) illustrates this best.

Walking through that unit test, adding rule1 to the machine creates a NameState N1 accessed by either bar=x or bar=y, from which you can access a second NameState N2 on foo=a. Now, both rule2(a) via bar=x and rule2b via bar=y will be able to re-use NameState N1 when we add them to the machine as well. When matching rules on event (bar=x, foo=a), we of course expect just rule1 to match. But what if we didn't distinguish sub-rules from rules? What would happen is we would access NameState N1 via bar=x, and would see at that NameState that both rule1 and rule2 are candidate rules (since they both led to this NameState using bar=x). Then we would travel to the next NameState N2 via foo=a, and again see that both rule1 and rule2 are candidates. Since this is the terminus, we would conclude both rule1 and rule2 match. But this is wrong. It was rule2(a) that matched bar=x and it was rule2b that matched foo=a. So we're mixing and matching different parts of the OR. We need to track candidates on a sub-rule level so that we can see that neither rule2(a) nor rule2b were fully satisfied. Make sense?

@longzhang-lz
Copy link
Collaborator

Gotcha, thanks Shawn, it makes sense, I see given name states are aggerated now, the sub rule cannot be represented by matching path any more, so, maintain the path info in consolidated name state becomes mandatory, otherwise, it will yield the wrong match.
As summarized in the description of the change, the memory and GC stuff get great improvement, just in theory, how about the run-time event to rule matching performance? it seems not mentioned in the description, in theory, it shall be a little downgrade as it subjects to number of sub rules using the same pattern, but it might not be explicit given there won't have many sub rules in reality.

The overall proposal looks good to me, very glad to see this PR, and thanks to make it happen. I have tried to reuse the namespace for the same path, and had similar porotype but had not gotten chance to complete it (also my idea was a little difference, refer to below diagram). The general goal was to reuse the name state as much as possible, achieve the "no-write" effect for any duplicated rules being added regardless how many times client add the rule with same patterns list.

Below was my draft idea, as I said it is a little bit different with yours, it will reuse the name states be shared globally, and the max number of name states would be the size of fields of the longest rule, just share for your reference ( Hi Tim, you remember this diagram, right?)
image
Just share for your reference, I did not figure our the efficient way to store the path, now I have some idea while see your code, refer to detail comments inline.

Please see my comments in lines.

/**
* Store an ID with a rule to represent a sub-rule, which differentiates between rules of the same name.
*/
static class SubRule {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good to assign a unique Id for each sub rule, just the sub rule can have big portion of overlap, e.g.

rule: |-sub_r1:"a":"a1"                                                       
      |-sub_r2:"a":"a1"->"b":"b1"
      |-sub_r3:"a":"a1"->"b":"b1"->"c":"c1

ideally, it would be great to save more if subrule object can be reused for the overlapped field/values along the path, so we can limit the size of Set<SubRule> to only name state where the new branch starts.
assume each field have a unique id,
in above example, the id in N1, N2, N3 name state would be:
Id(r1) = id1
Id(r2) = Id(r1)_id2 = id1_id2
Id(r3) = Id(r2)_id3 = Id1_id2_id3
the Set<SubRule> for given pattern will only be 1,instead of 3 for each sub rule in each its name states.

I see it is hard to model it with high perf, the alternative is to not track the rule, but try List<String, List<Patterns>> namevals and keyIndex ( field index), which patterns of value at keyIndex field doesn't matter actually, so the same subrule set even can be reused cross rules)

just my 2 cents for your reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that "Object rule" inside SubRule is just the name, not the full rule body. I'm not sure I follow your suggestion entirely. Even making the SubRule recursive in some way would still result in the same total number of SubRules. But I'm pretty sure the suggestion assumes that "Object rule" is the full rule body.

Copy link
Collaborator

@longzhang-lz longzhang-lz May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think we actually care more about the rule body than rule name as the "path" - I like call it "footprint" is relevant the rule body.
I way trying to explain a way to reduce the size of Set by reusing the subrule id if sub rule is an extension of exiting rule.
current way:
N1
sub rule id 1
sub rule id 2
sub rule id 3
N2
sub rule id 2
sub rule id 3
N3
sub rule id 3

My mental model with the proposal is:
N1
sub rule id 1
N2
sub rule id 1-2
N3
sub rule id 1-2-3
The first portion "1" reflects rule name, sub rules with starting of "1" means they are for the same rule, the sub rule 2 is an extension from sub-rule 1, so its id could also be extension from sub rule id 1 - "1-2" ... similar for next one ...

just share for your reference, I see it seems overoptimizing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, it does seem like a pretty minor optimization. I think the latest change I pushed, that removes the SubRule and just uses ID for all non-terminal cases, captures a similar (even if slightly different) benefit.

Set<Double> subRuleIds = new HashSet<>();
Set<SubRule> subRules = (isTerminal ? patternToTerminalSubRules : patternToNonTerminalSubRules).get(pattern);
if (subRules != null) {
for (SubRule subRule : subRules) {
Copy link
Collaborator

@longzhang-lz longzhang-lz May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the matching perf will bound to O(n) n is number of sub rules, we might control the sub rule size ...
and control if the same rule is added multiple times, it should be treated as 1 sub rule with the same id.
I remember one team have the use cases that they cannot efficiently detect the duplicated rule, so just blindly add the rule into Ruler and ask Ruler to dedupe them.

One option is to before create new sub rule, look for if it is already existing and reuse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this method is only called when deleting a rule. It's the only one in NameState that is O(n). The other NameState methods, including the ones called on the matching code paths, are all O(1).

You raise a good point regarding duplicated rules. We could only re-use existing sub-rule and ID if the entire rule was identical to a previous one. Just knowing that a rule is the same as a previous one up to a current key index is insufficient. So I'm not sure how to tackle this other than having some kind of rule tracker/comparer that we consult before we begin adding a rule. That would have some overhead. Not sure that it's worth it. Also not sure if I'm overlooking a better approach.

I did make a change though to store only the sub-rule ID instead of the whole sub-rule object for non-terminal cases (which is most). This should help a little, but will still lead to the accumulation of IDs if the same rule keeps getting repeatedly added. Let me think about this a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember one team have the use cases that they cannot efficiently detect the duplicated rule

If you remember which team this was, would you mind emailing me and telling me? Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was Q/A when that team onboarded Ruler, however I cannot access my previous email any more, I guess even it can happen it should be rare in reality, it's better to have a protection cap in case attacker know that and have a loop to repeat add the same rule to downgrade ruler which may affect other customer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've uploaded a new revision that solves the duplicate rules problem. Note that it involves iterating through all of a NameState's sub-rules for all patterns of a rule's terminal key when adding the rule. In practice, this will rarely be a performance issue, unless many duplicate or near-duplicate rules are added to a machine under different names.

} else {
assert byteMachine != null;
nextState = byteMachine.addPattern(pattern);
lastNextState = byteMachine.addPattern(pattern, lastNextState);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is helpful to highlight the name state reusable principle, it not only applies to sub rules, but also for different rules, and does matter on the sequence of rules being added, e.g.:

        machine.addRule("rule1", "{\"bar\":[\"x\", \"y\"]}");
        machine.addRule("rule2", "{\"bar\":[\"x\"]}");
        machine.addRule("rule3", "{\"bar\":[\"y\"]}");

yield one new name state, while below will yield 2 new name states

        machine.addRule("rule2", "{\"bar\":[\"x\"]}");
        machine.addRule("rule3", "{\"bar\":[\"y\"]}");
        machine.addRule("rule1", "{\"bar\":[\"x\", \"y\"]}");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good observation that the order the rules are added matters. I didn't solve this part. To be clear - are you pointing this out as an FYI or do you have a suggestion? I'm thinking a bit and could see solving this with some sort of a NameStateRegistry per GenericMachine. This registry would vend a NameState given a list of keys (up to a certain index). Feels like it could be a big win. Let me think about this a bit more too.

Copy link
Collaborator

@longzhang-lz longzhang-lz May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is FYI as I don't have simple solution as well unless introducing the name state merge logic which seems a chain effect and looks very complex.
I think with this change, further change of memory optimization become low priority.

@jonessha
Copy link
Contributor Author

jonessha commented May 1, 2023

Thanks @longzhang-lz for the very thorough feedback. I responded to your specific comments, but a couple things from your top-level write-up:

how about the run-time event to rule matching performance?

I did not see a noticeable change in matching performance according to the benchmark tests. However, as I mentioned in the overview, latency to create a Ruler machine is down 35-90% at high percentiles, depending on the percentile. So if you are creating a machine as well as matching, then this does improve latency quite a bit at high percentiles.

in theory, it shall be a little downgrade as it subjects to number of sub rules using the same pattern

On the matching path, even though multiple sub-rule sets might be encountered, all the operations on them are amortized constant time. So while there could be a tiny hit in performance for the simple/happy case, it seems almost too small to measure, and is greatly outweighed by the less happy cases (at least for any application with a good diversity of rules and machines).

Thanks for sharing your draft idea. To get it to work, I might need to change the data structures inside NameState to use key+pattern as keys, not just pattern. (Since in your diagram, different keys can lead to the same NameState). There is an intermediate win though where each NameState has only a single key leading into it, but there is only a single NameState for that key, regardless of what patterns are used and what order the rules are added in. This is what I'm thinking about now.

if (candidateSubRuleIds.isEmpty()) {
matchingSubRules.addAll(terminalSubRules);
} else {
for (NameState.SubRule subRule : terminalSubRules) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is on matching path, perf bound to number of candidate rules and number of terminate sub rules, but I see your point, there might not have large quantity of sub rules so unless generating dedicated benchmark with large sub rules, there may not have visible perf diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it would seem to be a very specifically crafted set of rules that would lead to a large number of terminal sub-rules on a single path for us to iterate through here.

if (candidateSubRuleIds.isEmpty()) {
matchingSubRules.addAll(terminalSubRules);
} else {
for (NameState.SubRule subRule : terminalSubRules) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one matching path, the perf would bound to number of subrules and candidate sub rules, but I see your point, there shall have small number of sub rules in reality, shall be no visible perf diff unless we create dedicated benchmark with large quantity of sub rules.
btw, change candidate as double set is a good enhancement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yeah, this code is basically the same code as from Task. At some point, we need to reduce the duplicated code from the AC vs non-AC code paths.

@@ -943,7 +943,7 @@ private NameState findMatchPattern(final InputCharacter[] characters, final Patt

SingleByteTransition nextByteState = eachTrans.getNextByteState();
if (nextByteState == null) {
return null;
continue;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated bug fix

@jonessha
Copy link
Contributor Author

jonessha commented May 3, 2023

@longzhang-lz I implemented the intermediate version of your draft idea. Not one NameState per position, but one NameState for each (sub)sequence of keys, which is an improvement over my original implementation, which was one NameState for each (sub)sequence of keys for not-before-encountered first patterns.

Although there will be a minor memory hit by storing an extra map in each NameState, there will almost certainly be an overall performance improvement for most use cases. You can see some drops in the approximate object count tests.

@longzhang-lz
Copy link
Collaborator

Thanks @jonessha, sounds great, I am a little bit busy but I will review it again in this weekend,

Copy link
Collaborator

@longzhang-lz longzhang-lz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enhancement looks great tradeoff landing between previous one and ideal one, very glad to see it, thanks.
It's a smart idea to record the keyToNextNameState and enforce the name state reuse as long as their sub key sequence is hit.
It overall looks good to me, just leave a few minor comments for your reference.

*/
NameState addPattern(final Patterns pattern, final NameState nameState) {
void addPattern(final Patterns pattern, final NameState nameState) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I see all pattern addition is now forced to reuse input nameState, but it seems a benefit to keep this return value to assert the returned one equals to input one for testing and future regression protection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought about this differently. I wanted to firm up and simplify the contract of ByteMachine.addPattern. Since it will always use the provided NameState, I didn't think it made sense for the interface to return a NameState. I wanted consumers of the class to understand that their provided NameState will always be used. If you're concerned about bugs, we could change access so only GenericMachine can create NameStates (without a workaround), but I feel this might be overkill.

final NameState nameStateForSupplier = lastNextState == null ? new NameState() : lastNextState;
lastNextState = nameMatcher.addPattern(pattern, () -> nameStateForSupplier);
final NameState nameStateForSupplier = nextNameState;
nameMatcher.addPattern(pattern, () -> nameStateForSupplier);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] NameMatcher can also be simplified as the name state become pointed one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -513,7 +514,7 @@ private boolean addStep(final NameState state,
final Map<String, List<Patterns>> patterns,
final T ruleName,
List<String> addedKeys,
final List<Set<NameState>> nameStatesForEachKey) {
final List<NameState> nameStateForEachKey) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[observation] refer to previous comment, it looks like this list is not necessary as it is naturally maintained by a state linked chain, we can get it from start name state to terminate one alone with keys index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done

Comment on lines 487 to 492
SubRuleContext context = subRuleContextGenerator.generate();
for (int i = 0; i < keys.size(); i++) {
boolean isTerminal = i + 1 == keys.size();
for (Patterns pattern : patterns.get(keys.get(i))) {
for (NameState nameState : nameStates.get(i)) {
nameState.addSubRule(ruleName, context.getId(), pattern, isTerminal);
}
nameStates.get(i).addSubRule(ruleName, context.getId(), pattern, isTerminal);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[observation] List<NameState> nameStates looks like this list is not necessary as it is naturally maintained by a state linked chain, we can get it from start name state to terminate one alone with keys index.
something like below:

            NameState state = startState;
            SubRuleContext context = subRuleContextGenerator.generate();
            for (int i = 0; i < keys.size(); i++) {
                boolean isTerminal = i + 1 == keys.size();
                String key = keys.get(i);
                for (Patterns pattern : patterns.get(key)) {
                    state.addSubRule(ruleName, context.getId(), pattern, isTerminal);
                }
                state = state.getNextNameState(key);
            }

isRuleNew = addStep(nextNameState, keys, nextKeyIndex, patterns, ruleName, addedKeys, nameStateForEachKey);
} else {
for (Patterns pattern : patterns.get(key)) {
if (!nextNameState.containsTerminalSubRule(ruleName, pattern)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[observation]Just put the comment here, as I cannot add comment on class SubRule

I am thinking the way to avoid loop for sub rules in collectRules and in containsTerminalSubRule,
Given

  • HashSet contains uses the equals method to determine if the object is contained - and duplicates are not kept within the HashSet.
  • the id in SubRule is globally unique, so itself alone can represent the unique subrule as key.

So, I think we may
either rewrite the equals and hashcode function in hashCode to only compare the id, by that we can leverage
or
change the Set<SubRule> to Map<Double, Object>

which could keep the O(1) regardless number of possible subrules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized separately that my use of containsTerminalSubRule was buggy. I need to check non-terminal rules as well. This would turn the containment check from O(pn) into O(kpn), where k is number of keys, and p is average number of patterns, and n is average number of sub-rules per pattern. I decided it was more efficient to add another map into NameState that tracks rule names for patterns. This will give a slight memory hit, but reduce the runtime of the containment check to O(kp), which seems more tenable. Note that k and p come from the current input rule, so existing state does not affect runtime. But n comes from previously input rules, and can be unbounded in theory, so it's good to eliminate it.

As for collectRules, I'm not sure how to eliminate the loop. We need to take the intersection between the candidates and the terminal sub-rules. How do you take an intersection in O(1) time?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, agree with your analyze,

We need to take the intersection between the candidates and the terminal sub-rules. How do you take an intersection in O(1) time?

It looks like O(1) is impossible and it has to be O(n) as trade-off, n is min(terminalSubRules.size and candidateSubRuleIds.size), candidateSubRuleIds is from possible sub rules and terminalSubRules.size o its size is bound to number of matches, with different situation its size might be different:
e.g.

        String rule1 = "{ \"a\" : [ 1 ], \"b\": [%d] }";
        for (int i = 0; i<50; i++) {
            machine.addRule("rule1", String.format(rule1,i));
        }

vs.

        String rule1 = "{ \"a\" : [ %d ], \"b\":  [ 1 ] }";
        for (int i = 0; i<50; i++) {
            machine.addRule("rule1", String.format(rule1,i));
        }

so, I am thinking the possible improvement to judge size and loop set with smaller size as improvement.

for(set with smaller size) {  // O(n_smaller)
   set with larger size.contains(x); // O(1)
} => O(n_smaller)

just my 2 cents.

maybe it is good to highlight user to try not create to many rules with similar pattern serial to avoid perf penalty for latency sensitive scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long, your insight never fails to amaze me. Last week, I heard from a highly-latency-sensitive team with a performance test that exercised this scenario: a large machine with many similar rules. This performance test showed a latency regression. I went through and made a handful of performance improvements, including the smaller set / larger set improvement you described above. I also reverted the improvement I made to further re-use NameState (as long as keys are the same). Although that improvement further reduced memory use, it increased runtime due to more sub-rules contained in each NameState. At this point, the runtime is only slightly increased (in the case of many similar repeated rules), but memory use is largely decreased. I think it's a good trade-off and I'd like to settle here for now. I'll push that revert and the other performance improvements to this PR in the next couple days.

I could see a future improvement by allowing a config to indicate how much to optimize for memory and how much to optimize for latency. It would certainly add more complexity though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the revert and other performance improvements. Hopefully these are the last changes before merging into main.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update Shawn, really happy to see you have found that team, experienced and improved the performance. Agree it is a good trade-off and big win to use a slight run-time increase to gain large memory saving, it's an awesome delivery.

@@ -2347,7 +2347,7 @@ public void testApproximateSizeForRulesManySouceAndEventNameArrayElements() thro
" \"eventName\": [\"Name6\",\"Name7\",\"Name8\",\"Name9\",\"Name10\"]\n" +
" }\n" +
"}");
assertEquals(90, machine.approximateObjectCount());
assertEquals(79, machine.approximateObjectCount());
}

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to add a test to ensure the name state reuse among different rules and it doesn't matter the sequence any more, the same name states should be always reused.

For example:

        machine.addRule("rule1", "{\"bar\":[\"x\", \"y\"]}");
        machine.addRule("rule2", "{\"bar\":[\"x\"]}");
        machine.addRule("rule3", "{\"bar\":[\"y\"]}");

yields one new name state, and below will totally reuse the name state created above without any new name state being created ...

        machine.addRule("rule2", "{\"bar\":[\"x\"]}");
        machine.addRule("rule3", "{\"bar\":[\"y\"]}");
        machine.addRule("rule1", "{\"bar\":[\"x\", \"y\"]}");

assert only one name state being created overall.
also verify the deletion does also care about the delete sequence, the new namestate will only be removed when the last key have been removed from state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@longzhang-lz longzhang-lz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a few minor comments.

Map<Patterns, ?> patternToSubRules = isTerminal ? patternToTerminalSubRuleIds : patternToNonTerminalSubRuleIds;
boolean deleted = deleteFromPatternToSetMap(patternToSubRules, pattern, subRuleId);
if (deleted) {
Integer count = subRuleIdToCount.get(subRuleId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] these lines can be simplified to one, fyi (I used to learn this while dealing with fieldStepsUsedRefCount :-)

subRuleIdToCount.compute(subRuleId, (k, v) -> (v == 1) ? null : v - 1);

Comment on lines +178 to +179
Integer count = subRuleIdToCount.get(subRuleId);
subRuleIdToCount.put(subRuleId, count == null ? 1 : count + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly:

subRuleIdToRule.compute(subRuleId, (k, v) -> (v == null) ? 1 : v + 1);

Copy link
Collaborator

@longzhang-lz longzhang-lz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried if there is clear performance regression if we only add back below and maintain the "field" transition chain with it?

    // Maps a key to the next NameState accessible via either valueTransitions or mustNotExistMatchers.
    private final Map<String, NameState> keyToNextNameState = new ConcurrentHashMap<>();

@jonessha
Copy link
Contributor Author

Have you tried if there is clear performance regression if we only add back below and maintain the "field" transition chain with it?

I have not. What would we use the map for in this case?

@baldawar baldawar merged commit 1928acc into main May 23, 2023
3 checks passed
@jonessha jonessha deleted the memory_optimization_2 branch May 24, 2023 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants