Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optional Configuration to Machine that can be used to enable a… #125

Merged
merged 4 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,44 @@ the strings it stored and returned were thought of as rule names.
For safety, the type used to "name" rules should be immutable. If you change the content of an object while
it's being used as a rule name, this may break the operation of Ruler.

### Configuration

The GenericMachine and Machine constructors optionally accept a GenericMachineConfiguration object, which exposes the
following configuration options.

#### additionalNameStateReuse
Default: false
Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have been
previously added, or if a pattern has already been added for the given key subsequence. Hence, by default, NameState
re-use is opportunistic. But by setting this flag to true, NameState re-use will be forced for a key subsequence. This
means that the first pattern being added for a key subsequence will re-use a NameState if that key subsequence has been
added before. Meaning each key subsequence has a single NameState. This improves memory utilization exponentially in
some cases but does lead to more sub-rules being stored in individual NameStates, which Ruler sometimes iterates over,
which can cause a modest runtime performance regression. This defaults to false for backwards compatibility, but likely,
all but the most latency sensitive of applications would benefit from setting this to true.
Comment on lines +496 to +497
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth setting this true by default since that's the applicable state for most users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I know of at least one case where a team will be impacted by setting this to true, I erred on making it false by default to make version upgrades safer/easier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 . In that case, it makes sense to be vocal bout the feature in the release notes & any upgrade docs we write; otherwise there's a chance that folks may not use this behaviour as often as we'd like them to.


Here's a simple example. Consider:

```javascript
machine.addRule("0", "{\"key1\": [\"a\", \"b\", \"c\"]}");
```

The pattern "a" creates a NameState, and then, even with additionalNameStateReuse=false, the second pattern ("b") and
third pattern ("c") re-use that same NameState. But consider the following instead:

```javascript
machine.addRule("0", "{\"key1\": [\"a\"]}");
machine.addRule("1", "{\"key1\": [\"b\"]}");
machine.addRule("2", "{\"key1\": [\"c\"]}");
```

Now, with additionalNameStateReuse=false, we end up with three NameStates, because the first pattern encountered for a
key subsequence on each rule addition will create a new NameState. So, "a", "b", and "c" all get their own NameStates.
However, with additionalNameStateReuse=true, "a" will create a new NameState, then "b" and "c" will reuse this same
NameState. This is accomplished by storing that we already have a NameState for the key subsequence "key1".

Note that it doesn't matter if each addRule uses a different rule name or the same rule name.

### addRule()

All forms of this method have the same first argument, a String which provides
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<groupId>software.amazon.event.ruler</groupId>
<artifactId>event-ruler</artifactId>
<name>Event Ruler</name>
<version>1.5.0</version>
<version>1.6.0</version>
<description>Event Ruler is a Java library that allows matching Rules to Events. An event is a list of fields,
which may be given as name/value pairs or as a JSON object. A rule associates event field names with lists of
possible values. There are two reasons to use Ruler: 1/ It's fast; the time it takes to match Events doesn't
Expand Down
57 changes: 55 additions & 2 deletions src/main/software/amazon/event/ruler/GenericMachine.java
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ public class GenericMachine<T> {
*/
private static final int MAXIMUM_RULE_SIZE = 256;

/**
* Configuration for the Machine.
*/
private final GenericMachineConfiguration configuration;

/**
* The start state of matching and adding rules.
*/
Expand All @@ -56,7 +61,14 @@ public class GenericMachine<T> {
*/
private final SubRuleContext.Generator subRuleContextGenerator = new SubRuleContext.Generator();

public GenericMachine() {}
@Deprecated
public GenericMachine() {
this(builder().buildConfig());
}

protected GenericMachine(GenericMachineConfiguration configuration) {
this.configuration = configuration;
}

/**
* Return any rules that match the fields in the event in a way that is Array-Consistent (thus trailing "AC" on
Expand Down Expand Up @@ -322,6 +334,7 @@ private Set<Double> deleteStep(final NameState state,
if (!doesNameStateContainPattern(nextNameState, pattern) &&
deletePattern(state, key, pattern)) {
deletedKeys.add(key);
state.removeNextNameState(key, configuration);
}
}
}
Expand All @@ -340,6 +353,7 @@ private Set<Double> deleteStep(final NameState state,
// does not transition to the next NameState.
if (!doesNameStateContainPattern(nextNameState, pattern) && deletePattern(state, key, pattern)) {
deletedKeys.add(key);
state.removeNextNameState(key, configuration);
}
}
}
Expand Down Expand Up @@ -545,6 +559,15 @@ private boolean addStep(final NameState state,
// for each pattern, we'll provisionally add it to the BMC, which may already have it. Pass the states
// list in in case the BMC doesn't already have a next-step for this pattern and needs to make a new one
NameState lastNextState = null;

if (configuration.isAdditionalNameStateReuse()) {
lastNextState = state.getNextNameState(key);
if (lastNextState == null) {
lastNextState = new NameState();
state.addNextNameState(key, lastNextState, configuration);
}
}

Set<NameState> nameStates = new HashSet<>();
if (nameStatesForEachKey[keyIndex] == null) {
nameStatesForEachKey[keyIndex] = new HashSet<>();
Expand All @@ -553,7 +576,6 @@ private boolean addStep(final NameState state,
if (isNamePattern(pattern)) {
lastNextState = nameMatcher.addPattern(pattern, lastNextState == null ? new NameState() : lastNextState);
} else {
assert byteMachine != null;
lastNextState = byteMachine.addPattern(pattern, lastNextState);
}
nameStates.add(lastNextState);
Expand Down Expand Up @@ -678,5 +700,36 @@ public String toString() {
", fieldStepsUsedRefCount=" + fieldStepsUsedRefCount +
'}';
}

public static Builder builder() {
return new Builder();
}

protected static class Builder<T extends GenericMachine> {

/**
* Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have
* been previously added, or if a pattern has already been added for the given key subsequence. Hence by default,
* NameState re-use is opportunistic. But by setting this flag to true, NameState re-use will be forced for a key
* subsequence. This means that the first pattern being added for a key subsequence will re-use a NameState if that
* key subsequence has been added before. Meaning each key subsequence has a single NameState. This improves memory
* utilization exponentially in some cases but does lead to more sub-rules being stored in individual NameStates,
* which Ruler sometimes iterates over, which can cause a modest runtime performance regression.
*/
private boolean additionalNameStateReuse = false;

public Builder<T> withAdditionalNameStateReuse(boolean additionalNameStateReuse) {
this.additionalNameStateReuse = additionalNameStateReuse;
return this;
}

public T build() {
return (T) new GenericMachine(buildConfig());
}

protected GenericMachineConfiguration buildConfig() {
return new GenericMachineConfiguration(additionalNameStateReuse);
}
}
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
package software.amazon.event.ruler;

/**
* Configuration for a GenericMachine. For descriptions of the options, see GenericMachine.Builder.
*/
class GenericMachineConfiguration {

private final boolean additionalNameStateReuse;

GenericMachineConfiguration(boolean additionalNameStateReuse) {
this.additionalNameStateReuse = additionalNameStateReuse;
}

boolean isAdditionalNameStateReuse() {
return additionalNameStateReuse;
}
}

17 changes: 17 additions & 0 deletions src/main/software/amazon/event/ruler/Machine.java
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,23 @@
*/
public class Machine extends GenericMachine<String> {

@Deprecated
public Machine() {
super();
}

private Machine(GenericMachineConfiguration configuration) {
super(configuration);
}

public static Builder builder() {
return new Builder();
}

protected static class Builder extends GenericMachine.Builder<Machine> {
@Override
public Machine build() {
return new Machine(buildConfig());
}
}
}
21 changes: 21 additions & 0 deletions src/main/software/amazon/event/ruler/NameState.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ class NameState {
// while add/delete Rule is active in another thread, without any locks.
private final Map<String, NameMatcher<NameState>> mustNotExistMatchers = new ConcurrentHashMap<>(1);

// Maps a key to the next NameState accessible via either valueTransitions or mustNotExistMatchers.
// Only used when Configuration is set for additionalNameStateReuse.
private final Map<String, NameState> keyToNextNameState = new ConcurrentHashMap<>();

// All rules, both terminal and non-terminal, keyed by pattern, that led to this NameState.
private final Map<Patterns, Set<Object>> patternToRules = new ConcurrentHashMap<>();

Expand Down Expand Up @@ -153,6 +157,12 @@ void removeKeyTransition(String name) {
mustNotExistMatchers.remove(name);
}

void removeNextNameState(String key, GenericMachineConfiguration configuration) {
if (configuration.isAdditionalNameStateReuse()) {
keyToNextNameState.remove(key);
}
}

boolean isEmpty() {
return valueTransitions.isEmpty() &&
mustNotExistMatchers.isEmpty() &&
Expand Down Expand Up @@ -215,6 +225,12 @@ void addKeyTransition(final String key, final NameMatcher<NameState> to) {
mustNotExistMatchers.put(key, to);
}

void addNextNameState(final String key, final NameState nextNameState, final GenericMachineConfiguration configuration) {
if (configuration.isAdditionalNameStateReuse()) {
keyToNextNameState.put(key, nextNameState);
}
}

NameMatcher<NameState> getKeyTransitionOn(final String token) {
return mustNotExistMatchers.get(token);
}
Expand Down Expand Up @@ -284,6 +300,10 @@ Set<NameState> getNameTransitions(final Event event, final ArrayMembership membe
return nextNameStates;
}

public NameState getNextNameState(String key) {
return keyToNextNameState.get(key);
}

public int evaluateComplexity(MachineComplexityEvaluator evaluator) {
int maxComplexity = evaluator.getMaxComplexity();
int complexity = 0;
Expand Down Expand Up @@ -321,6 +341,7 @@ public String toString() {
return "NameState{" +
"valueTransitions=" + valueTransitions +
", mustNotExistMatchers=" + mustNotExistMatchers +
", keyToNextNameState=" + keyToNextNameState +
", patternToRules=" + patternToRules +
", patternToTerminalSubRuleIds=" + patternToTerminalSubRuleIds +
", patternToNonTerminalSubRuleIds=" + patternToNonTerminalSubRuleIds +
Expand Down
46 changes: 46 additions & 0 deletions src/test/software/amazon/event/ruler/Benchmarks.java
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,52 @@ public void exactRuleMemoryBenchmark() throws Exception {
rules.clear();
}

@Test
public void lowNameStateReuseMemoryBenchmark() throws Exception {
Machine machine = new Machine();
System.out.println("Low NameState Reuse Memory Benchmark");
nameStateReuseMemoryBenchmark(machine);
}

@Test
public void highNameStateReuseMemoryBenchmark() throws Exception {
Machine machine = Machine.builder().withAdditionalNameStateReuse(true).build();
System.out.println("High NameState Reuse Memory Benchmark");
nameStateReuseMemoryBenchmark(machine);
}

private void nameStateReuseMemoryBenchmark(Machine machine) throws Exception {
int maxKeys = 256;
System.gc();
long memBefore = Runtime.getRuntime().freeMemory();
int sizeBefore = machine.approximateObjectCount();
System.out.printf("Before: %.1f (%d)\n", 1.0 * memBefore / 1000000, sizeBefore);

// For a readable version with a similar setup to the rules being added here, see
// MachineTest.testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime. By adding one pattern at a time
// for each key, we create three different branches in the low NameState reuse test, but a single branch in the
// high NameState reuse test. So with low NameState reuse, Machine size grows exponentially with number of keys.
for (int i = 0; i < maxKeys; i++) {
StringBuilder prefix = new StringBuilder();
for (int j = 0; j < i; j++) {
int k = 3 * j;
prefix.append("\"key" + k + "\": [\"" + k + "\", \"" + (k + 1) + "\", \"" + (k + 2) + "\"], ");
}
int k = 3 * i;
machine.addRule("" + k, "{" + prefix + "\"key" + i + "\": [\"" + k + "\"]}");
machine.addRule("" + k + 1, "{" + prefix + "\"key" + i + "\": [\"" + (k + 1) + "\"]}");
machine.addRule("" + k + 2, "{" + prefix + "\"key" + i + "\": [\"" + (k + 2) + "\"]}");
}

System.gc();
long memAfter = Runtime.getRuntime().freeMemory();
int sizeAfter = machine.approximateObjectCount();
System.out.printf("After: %.1f (%d)\n", 1.0 * memAfter / 1000000, sizeAfter);
int perRuleMem = (int) ((1.0 * (memAfter - memBefore)) / (maxKeys * 3));
int perRuleSize = (int) ((1.0 * (sizeAfter - sizeBefore)) / (maxKeys * 3));
System.out.println("Per rule: " + perRuleMem + " (" + perRuleSize + ")");
}

@Test
public void AnythingButPerformanceBenchmark() throws Exception {
readCityLots2();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
package software.amazon.event.ruler;

import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class GenericMachineConfigurationTest {

@Test
public void testAdditionalNameStateReuseTrue() {
assertTrue(new GenericMachineConfiguration(true).isAdditionalNameStateReuse());
}

@Test
public void testAdditionalNameStateReuseFalse() {
assertFalse(new GenericMachineConfiguration(false).isAdditionalNameStateReuse());
}
}
42 changes: 42 additions & 0 deletions src/test/software/amazon/event/ruler/MachineTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -2621,4 +2621,46 @@ public void testLargeArrayRulesVsOR() throws Exception {
"}");
assertEquals(608, machine.approximateObjectCount(10000));
}

@Test
public void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime() throws Exception {
Machine machine = new Machine();
testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime(machine);
assertEquals(72216, machine.approximateObjectCount(500000));
}

@Test
public void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATimeWithAdditionalNameStateReuse() throws Exception {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok added. Got these results over three runs:

High NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 274.4 (223380)
Per rule: 270502 (290)
Low NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 608.6 (2625460)
Per rule: 705709 (3418)

High NameState Reuse Memory Benchmark
Before: 45.7 (1)
After: 209.3 (223380)
Per rule: 213056 (290)
Low NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 1438.9 (2625460)
Per rule: 1786864 (3418)

High NameState Reuse Memory Benchmark
Before: 58.2 (1)
After: 221.9 (223380)
Per rule: 213154 (290)
Low NameState Reuse Memory Benchmark
Before: 58.2 (1)
After: 1718.1 (2625460)
Per rule: 2161282 (3418)

Machine machine = Machine.builder().withAdditionalNameStateReuse(true).build();
testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime(machine);
assertEquals(136, machine.approximateObjectCount(500000));
}

private void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime(Machine machine) throws Exception {
machine.addRule("0", "{\"key1\": [\"a\"]}");
machine.addRule("1", "{\"key1\": [\"b\"]}");
machine.addRule("2", "{\"key1\": [\"c\"]}");
machine.addRule("3", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\"]}");
machine.addRule("4", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"e\"]}");
machine.addRule("5", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"f\"]}");
machine.addRule("6", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\"]}");
machine.addRule("7", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"h\"]}");
machine.addRule("8", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"i\"]}");
machine.addRule("9", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\"]}");
machine.addRule("10", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"k\"]}");
machine.addRule("11", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"l\"]}");
machine.addRule("12", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\"]}");
machine.addRule("13", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"n\"]}");
machine.addRule("14", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"o\"]}");
machine.addRule("15", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\"]}");
machine.addRule("16", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"q\"]}");
machine.addRule("17", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"r\"]}");
machine.addRule("18", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\"]}");
machine.addRule("19", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"t\"]}");
machine.addRule("20", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"u\"]}");
machine.addRule("21", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"v\"]}");
machine.addRule("22", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"w\"]}");
machine.addRule("23", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"x\"]}");
machine.addRule("24", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"v\", \"w\", \"x\"], \"key9\": [\"y\"]}");
}
}
Loading