Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optional Configuration to Machine that can be used to enable a… #125

Merged
merged 4 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions src/main/software/amazon/event/ruler/Configuration.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package software.amazon.event.ruler;

/**
* Configuration for a Machine.
*/
public class Configuration {

/**
* Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that this would need quite a bit of work before it went in the README. I found it really hard to understand. I suggest that the easiest way to make it comprehensible to mere mortals is to include a few examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's "quite a bit of work", but I added to the readme with an example.

* been previously added, or if the current rule has already added a pattern for the given key subsequence. Hence,
* by default, NameState re-use is opportunistic. But by setting this flag to true, NameState re-use will be forced
* for a key subsequence. This means that the first pattern being added for a key subsequence for a rule will re-use
* a NameState if that key subsequence has been added before. Meaning each key subsequence has a single NameState.
* This improves memory utilization exponentially in some cases but does lead to more sub-rules being stored in
* individual NameStates, which Ruler sometimes iterates over, which can cause a modest runtime performance
* regression.
*/
private final boolean additionalNameStateReuse;

private Configuration(boolean additionalNameStateReuse) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see this being used for configuration beyond Machine? If not, we can be explicit by calling this class MachingConfiguration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that some configuration will not be related directly to Machine, but interface-wise, it will always be plumbed in via the Machine. So the name change is reasonable. Although to be fully correct, it should be GenericMachineConfiguration. I will perform this rename, although personally, I don't think the length and clunkiness is worth the explicitness.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it hadn't cross my mind that we'd implement anything other than MachineConfiguration or Configuration

this.additionalNameStateReuse = additionalNameStateReuse;
}

public boolean isAdditionalNameStateReuse() {
return additionalNameStateReuse;
}

public static class Builder {

private boolean additionalNameStateReuse = false;

public Builder withAdditionalNameStateReuse(boolean additionalNameStateReuse) {
this.additionalNameStateReuse = additionalNameStateReuse;
return this;
}

public Configuration build() {
return new Configuration(additionalNameStateReuse);
}
}
}

25 changes: 23 additions & 2 deletions src/main/software/amazon/event/ruler/GenericMachine.java
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ public class GenericMachine<T> {
*/
private static final int MAXIMUM_RULE_SIZE = 256;

/**
* Configuration for the Machine.
*/
private final Configuration configuration;

/**
* The start state of matching and adding rules.
*/
Expand All @@ -56,7 +61,13 @@ public class GenericMachine<T> {
*/
private final SubRuleContext.Generator subRuleContextGenerator = new SubRuleContext.Generator();

public GenericMachine() {}
public GenericMachine() {
this(new Configuration.Builder().build());
}

public GenericMachine(Configuration configuration) {
this.configuration = configuration;
}

/**
* Return any rules that match the fields in the event in a way that is Array-Consistent (thus trailing "AC" on
Expand Down Expand Up @@ -322,6 +333,7 @@ private Set<Double> deleteStep(final NameState state,
if (!doesNameStateContainPattern(nextNameState, pattern) &&
deletePattern(state, key, pattern)) {
deletedKeys.add(key);
state.removeNextNameState(key, configuration);
}
}
}
Expand All @@ -340,6 +352,7 @@ private Set<Double> deleteStep(final NameState state,
// does not transition to the next NameState.
if (!doesNameStateContainPattern(nextNameState, pattern) && deletePattern(state, key, pattern)) {
deletedKeys.add(key);
state.removeNextNameState(key, configuration);
}
}
}
Expand Down Expand Up @@ -545,6 +558,15 @@ private boolean addStep(final NameState state,
// for each pattern, we'll provisionally add it to the BMC, which may already have it. Pass the states
// list in in case the BMC doesn't already have a next-step for this pattern and needs to make a new one
NameState lastNextState = null;

if (configuration.isAdditionalNameStateReuse()) {
lastNextState = state.getNextNameState(key);
if (lastNextState == null) {
lastNextState = new NameState();
state.addNextNameState(key, lastNextState, configuration);
}
}

Set<NameState> nameStates = new HashSet<>();
if (nameStatesForEachKey[keyIndex] == null) {
nameStatesForEachKey[keyIndex] = new HashSet<>();
Expand All @@ -553,7 +575,6 @@ private boolean addStep(final NameState state,
if (isNamePattern(pattern)) {
lastNextState = nameMatcher.addPattern(pattern, lastNextState == null ? new NameState() : lastNextState);
} else {
assert byteMachine != null;
lastNextState = byteMachine.addPattern(pattern, lastNextState);
}
nameStates.add(lastNextState);
Expand Down
5 changes: 5 additions & 0 deletions src/main/software/amazon/event/ruler/Machine.java
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,10 @@
public class Machine extends GenericMachine<String> {

public Machine() {
super();
}

public Machine(Configuration configuration) {
super(configuration);
}
}
21 changes: 21 additions & 0 deletions src/main/software/amazon/event/ruler/NameState.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ class NameState {
// while add/delete Rule is active in another thread, without any locks.
private final Map<String, NameMatcher<NameState>> mustNotExistMatchers = new ConcurrentHashMap<>(1);

// Maps a key to the next NameState accessible via either valueTransitions or mustNotExistMatchers.
// Only used when Configuration is set for additionalNameStateReuse.
private final Map<String, NameState> keyToNextNameState = new ConcurrentHashMap<>();

// All rules, both terminal and non-terminal, keyed by pattern, that led to this NameState.
private final Map<Patterns, Set<Object>> patternToRules = new ConcurrentHashMap<>();

Expand Down Expand Up @@ -153,6 +157,12 @@ void removeKeyTransition(String name) {
mustNotExistMatchers.remove(name);
}

void removeNextNameState(String key, Configuration configuration) {
if (configuration.isAdditionalNameStateReuse()) {
keyToNextNameState.remove(key);
}
}

boolean isEmpty() {
return valueTransitions.isEmpty() &&
mustNotExistMatchers.isEmpty() &&
Expand Down Expand Up @@ -215,6 +225,12 @@ void addKeyTransition(final String key, final NameMatcher<NameState> to) {
mustNotExistMatchers.put(key, to);
}

void addNextNameState(final String key, final NameState nextNameState, final Configuration configuration) {
if (configuration.isAdditionalNameStateReuse()) {
keyToNextNameState.put(key, nextNameState);
}
}

NameMatcher<NameState> getKeyTransitionOn(final String token) {
return mustNotExistMatchers.get(token);
}
Expand Down Expand Up @@ -284,6 +300,10 @@ Set<NameState> getNameTransitions(final Event event, final ArrayMembership membe
return nextNameStates;
}

public NameState getNextNameState(String key) {
return keyToNextNameState.get(key);
}

public int evaluateComplexity(MachineComplexityEvaluator evaluator) {
int maxComplexity = evaluator.getMaxComplexity();
int complexity = 0;
Expand Down Expand Up @@ -321,6 +341,7 @@ public String toString() {
return "NameState{" +
"valueTransitions=" + valueTransitions +
", mustNotExistMatchers=" + mustNotExistMatchers +
", keyToNextNameState=" + keyToNextNameState +
", patternToRules=" + patternToRules +
", patternToTerminalSubRuleIds=" + patternToTerminalSubRuleIds +
", patternToNonTerminalSubRuleIds=" + patternToNonTerminalSubRuleIds +
Expand Down
19 changes: 19 additions & 0 deletions src/test/software/amazon/event/ruler/ConfigurationTest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
package software.amazon.event.ruler;

import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class ConfigurationTest {

@Test
public void testAdditionalNameStateReuseTrue() {
assertTrue(new Configuration.Builder().withAdditionalNameStateReuse(true).build().isAdditionalNameStateReuse());
}

@Test
public void testAdditionalNameStateReuseFalse() {
assertFalse(new Configuration.Builder().withAdditionalNameStateReuse(false).build().isAdditionalNameStateReuse());
}
}
42 changes: 42 additions & 0 deletions src/test/software/amazon/event/ruler/MachineTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -2621,4 +2621,46 @@ public void testLargeArrayRulesVsOR() throws Exception {
"}");
assertEquals(608, machine.approximateObjectCount(10000));
}

@Test
public void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime() throws Exception {
Machine machine = new Machine();
testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime(machine);
assertEquals(72216, machine.approximateObjectCount(500000));
}

@Test
public void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATimeWithAdditionalNameStateReuse() throws Exception {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok added. Got these results over three runs:

High NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 274.4 (223380)
Per rule: 270502 (290)
Low NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 608.6 (2625460)
Per rule: 705709 (3418)

High NameState Reuse Memory Benchmark
Before: 45.7 (1)
After: 209.3 (223380)
Per rule: 213056 (290)
Low NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 1438.9 (2625460)
Per rule: 1786864 (3418)

High NameState Reuse Memory Benchmark
Before: 58.2 (1)
After: 221.9 (223380)
Per rule: 213154 (290)
Low NameState Reuse Memory Benchmark
Before: 58.2 (1)
After: 1718.1 (2625460)
Per rule: 2161282 (3418)

Machine machine = new Machine(new Configuration.Builder().withAdditionalNameStateReuse(true).build());
testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime(machine);
assertEquals(136, machine.approximateObjectCount(500000));
}

private void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime(Machine machine) throws Exception {
machine.addRule("0", "{\"key1\": [\"a\"]}");
machine.addRule("1", "{\"key1\": [\"b\"]}");
machine.addRule("2", "{\"key1\": [\"c\"]}");
machine.addRule("3", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\"]}");
machine.addRule("4", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"e\"]}");
machine.addRule("5", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"f\"]}");
machine.addRule("6", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\"]}");
machine.addRule("7", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"h\"]}");
machine.addRule("8", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"i\"]}");
machine.addRule("9", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\"]}");
machine.addRule("10", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"k\"]}");
machine.addRule("11", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"l\"]}");
machine.addRule("12", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\"]}");
machine.addRule("13", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"n\"]}");
machine.addRule("14", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"o\"]}");
machine.addRule("15", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\"]}");
machine.addRule("16", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"q\"]}");
machine.addRule("17", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"r\"]}");
machine.addRule("18", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\"]}");
machine.addRule("19", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"t\"]}");
machine.addRule("20", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"u\"]}");
machine.addRule("21", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"v\"]}");
machine.addRule("22", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"w\"]}");
machine.addRule("23", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"x\"]}");
machine.addRule("24", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"v\", \"w\", \"x\"], \"key9\": [\"y\"]}");
}
}
22 changes: 22 additions & 0 deletions src/test/software/amazon/event/ruler/NameStateTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -115,4 +115,26 @@ public void testContainsRule() {
assertFalse(nameState.containsRule("rule3", Patterns.exactMatch("a")));
assertFalse(nameState.containsRule("rule1", Patterns.exactMatch("c")));
}

@Test
public void testNextNameStateWithoutAdditionalNameStateReuse() {
NameState nameState = new NameState();
NameState nextNameState = new NameState();
Configuration withoutAdditionalNameStateReuse =
new Configuration.Builder().withAdditionalNameStateReuse(false).build();
nameState.addNextNameState("key", nextNameState, withoutAdditionalNameStateReuse);
assertNull(nameState.getNextNameState("key"));
}

@Test
public void testNextNameStateWithAdditionalNameStateReuse() {
NameState nameState = new NameState();
NameState nextNameState = new NameState();
Configuration withAdditionalNameStateReuse =
new Configuration.Builder().withAdditionalNameStateReuse(true).build();
nameState.addNextNameState("key", nextNameState, withAdditionalNameStateReuse);
assertEquals(nextNameState, nameState.getNextNameState("key"));
nameState.removeNextNameState("key", withAdditionalNameStateReuse);
assertNull(nameState.getNextNameState("key"));
}
}