Skip to content

Commit

Permalink
Adding optional Configuration to Machine that can be used to enable a… (
Browse files Browse the repository at this point in the history
#125)

* Adding optional Configuration to Machine that can be used to enable additional NameState re-use where each key subsequence has a single NameState

* Updating readme, performing rename, adding benchmark test

* Moving Configuration builder into GenericMachine/Machine

* Bumping version
  • Loading branch information
jonessha authored and baldawar committed Nov 23, 2023
1 parent 6175850 commit 995f74d
Show file tree
Hide file tree
Showing 10 changed files with 277 additions and 3 deletions.
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -550,6 +550,44 @@ the strings it stored and returned were thought of as rule names.
For safety, the type used to "name" rules should be immutable. If you change the content of an object while
it's being used as a rule name, this may break the operation of Ruler.

### Configuration

The GenericMachine and Machine constructors optionally accept a GenericMachineConfiguration object, which exposes the
following configuration options.

#### additionalNameStateReuse
Default: false
Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have been
previously added, or if a pattern has already been added for the given key subsequence. Hence, by default, NameState
re-use is opportunistic. But by setting this flag to true, NameState re-use will be forced for a key subsequence. This
means that the first pattern being added for a key subsequence will re-use a NameState if that key subsequence has been
added before. Meaning each key subsequence has a single NameState. This improves memory utilization exponentially in
some cases but does lead to more sub-rules being stored in individual NameStates, which Ruler sometimes iterates over,
which can cause a modest runtime performance regression. This defaults to false for backwards compatibility, but likely,
all but the most latency sensitive of applications would benefit from setting this to true.

Here's a simple example. Consider:

```javascript
machine.addRule("0", "{\"key1\": [\"a\", \"b\", \"c\"]}");
```

The pattern "a" creates a NameState, and then, even with additionalNameStateReuse=false, the second pattern ("b") and
third pattern ("c") re-use that same NameState. But consider the following instead:

```javascript
machine.addRule("0", "{\"key1\": [\"a\"]}");
machine.addRule("1", "{\"key1\": [\"b\"]}");
machine.addRule("2", "{\"key1\": [\"c\"]}");
```

Now, with additionalNameStateReuse=false, we end up with three NameStates, because the first pattern encountered for a
key subsequence on each rule addition will create a new NameState. So, "a", "b", and "c" all get their own NameStates.
However, with additionalNameStateReuse=true, "a" will create a new NameState, then "b" and "c" will reuse this same
NameState. This is accomplished by storing that we already have a NameState for the key subsequence "key1".

Note that it doesn't matter if each addRule uses a different rule name or the same rule name.

### addRule()

All forms of this method have the same first argument, a String which provides
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<groupId>software.amazon.event.ruler</groupId>
<artifactId>event-ruler</artifactId>
<name>Event Ruler</name>
<version>1.5.0</version>
<version>1.6.0</version>
<description>Event Ruler is a Java library that allows matching Rules to Events. An event is a list of fields,
which may be given as name/value pairs or as a JSON object. A rule associates event field names with lists of
possible values. There are two reasons to use Ruler: 1/ It's fast; the time it takes to match Events doesn't
Expand Down
57 changes: 55 additions & 2 deletions src/main/software/amazon/event/ruler/GenericMachine.java
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ public class GenericMachine<T> {
*/
private static final int MAXIMUM_RULE_SIZE = 256;

/**
* Configuration for the Machine.
*/
private final GenericMachineConfiguration configuration;

/**
* The start state of matching and adding rules.
*/
Expand All @@ -56,7 +61,14 @@ public class GenericMachine<T> {
*/
private final SubRuleContext.Generator subRuleContextGenerator = new SubRuleContext.Generator();

public GenericMachine() {}
@Deprecated
public GenericMachine() {
this(builder().buildConfig());
}

protected GenericMachine(GenericMachineConfiguration configuration) {
this.configuration = configuration;
}

/**
* Return any rules that match the fields in the event in a way that is Array-Consistent (thus trailing "AC" on
Expand Down Expand Up @@ -322,6 +334,7 @@ private Set<Double> deleteStep(final NameState state,
if (!doesNameStateContainPattern(nextNameState, pattern) &&
deletePattern(state, key, pattern)) {
deletedKeys.add(key);
state.removeNextNameState(key, configuration);
}
}
}
Expand All @@ -340,6 +353,7 @@ private Set<Double> deleteStep(final NameState state,
// does not transition to the next NameState.
if (!doesNameStateContainPattern(nextNameState, pattern) && deletePattern(state, key, pattern)) {
deletedKeys.add(key);
state.removeNextNameState(key, configuration);
}
}
}
Expand Down Expand Up @@ -545,6 +559,15 @@ private boolean addStep(final NameState state,
// for each pattern, we'll provisionally add it to the BMC, which may already have it. Pass the states
// list in in case the BMC doesn't already have a next-step for this pattern and needs to make a new one
NameState lastNextState = null;

if (configuration.isAdditionalNameStateReuse()) {
lastNextState = state.getNextNameState(key);
if (lastNextState == null) {
lastNextState = new NameState();
state.addNextNameState(key, lastNextState, configuration);
}
}

Set<NameState> nameStates = new HashSet<>();
if (nameStatesForEachKey[keyIndex] == null) {
nameStatesForEachKey[keyIndex] = new HashSet<>();
Expand All @@ -553,7 +576,6 @@ private boolean addStep(final NameState state,
if (isNamePattern(pattern)) {
lastNextState = nameMatcher.addPattern(pattern, lastNextState == null ? new NameState() : lastNextState);
} else {
assert byteMachine != null;
lastNextState = byteMachine.addPattern(pattern, lastNextState);
}
nameStates.add(lastNextState);
Expand Down Expand Up @@ -678,5 +700,36 @@ public String toString() {
", fieldStepsUsedRefCount=" + fieldStepsUsedRefCount +
'}';
}

public static Builder builder() {
return new Builder();
}

protected static class Builder<T extends GenericMachine> {

/**
* Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have
* been previously added, or if a pattern has already been added for the given key subsequence. Hence by default,
* NameState re-use is opportunistic. But by setting this flag to true, NameState re-use will be forced for a key
* subsequence. This means that the first pattern being added for a key subsequence will re-use a NameState if that
* key subsequence has been added before. Meaning each key subsequence has a single NameState. This improves memory
* utilization exponentially in some cases but does lead to more sub-rules being stored in individual NameStates,
* which Ruler sometimes iterates over, which can cause a modest runtime performance regression.
*/
private boolean additionalNameStateReuse = false;

public Builder<T> withAdditionalNameStateReuse(boolean additionalNameStateReuse) {
this.additionalNameStateReuse = additionalNameStateReuse;
return this;
}

public T build() {
return (T) new GenericMachine(buildConfig());
}

protected GenericMachineConfiguration buildConfig() {
return new GenericMachineConfiguration(additionalNameStateReuse);
}
}
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
package software.amazon.event.ruler;

/**
* Configuration for a GenericMachine. For descriptions of the options, see GenericMachine.Builder.
*/
class GenericMachineConfiguration {

private final boolean additionalNameStateReuse;

GenericMachineConfiguration(boolean additionalNameStateReuse) {
this.additionalNameStateReuse = additionalNameStateReuse;
}

boolean isAdditionalNameStateReuse() {
return additionalNameStateReuse;
}
}

17 changes: 17 additions & 0 deletions src/main/software/amazon/event/ruler/Machine.java
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,23 @@
*/
public class Machine extends GenericMachine<String> {

@Deprecated
public Machine() {
super();
}

private Machine(GenericMachineConfiguration configuration) {
super(configuration);
}

public static Builder builder() {
return new Builder();
}

protected static class Builder extends GenericMachine.Builder<Machine> {
@Override
public Machine build() {
return new Machine(buildConfig());
}
}
}
21 changes: 21 additions & 0 deletions src/main/software/amazon/event/ruler/NameState.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ class NameState {
// while add/delete Rule is active in another thread, without any locks.
private final Map<String, NameMatcher<NameState>> mustNotExistMatchers = new ConcurrentHashMap<>(1);

// Maps a key to the next NameState accessible via either valueTransitions or mustNotExistMatchers.
// Only used when Configuration is set for additionalNameStateReuse.
private final Map<String, NameState> keyToNextNameState = new ConcurrentHashMap<>();

// All rules, both terminal and non-terminal, keyed by pattern, that led to this NameState.
private final Map<Patterns, Set<Object>> patternToRules = new ConcurrentHashMap<>();

Expand Down Expand Up @@ -153,6 +157,12 @@ void removeKeyTransition(String name) {
mustNotExistMatchers.remove(name);
}

void removeNextNameState(String key, GenericMachineConfiguration configuration) {
if (configuration.isAdditionalNameStateReuse()) {
keyToNextNameState.remove(key);
}
}

boolean isEmpty() {
return valueTransitions.isEmpty() &&
mustNotExistMatchers.isEmpty() &&
Expand Down Expand Up @@ -215,6 +225,12 @@ void addKeyTransition(final String key, final NameMatcher<NameState> to) {
mustNotExistMatchers.put(key, to);
}

void addNextNameState(final String key, final NameState nextNameState, final GenericMachineConfiguration configuration) {
if (configuration.isAdditionalNameStateReuse()) {
keyToNextNameState.put(key, nextNameState);
}
}

NameMatcher<NameState> getKeyTransitionOn(final String token) {
return mustNotExistMatchers.get(token);
}
Expand Down Expand Up @@ -284,6 +300,10 @@ Set<NameState> getNameTransitions(final Event event, final ArrayMembership membe
return nextNameStates;
}

public NameState getNextNameState(String key) {
return keyToNextNameState.get(key);
}

public int evaluateComplexity(MachineComplexityEvaluator evaluator) {
int maxComplexity = evaluator.getMaxComplexity();
int complexity = 0;
Expand Down Expand Up @@ -321,6 +341,7 @@ public String toString() {
return "NameState{" +
"valueTransitions=" + valueTransitions +
", mustNotExistMatchers=" + mustNotExistMatchers +
", keyToNextNameState=" + keyToNextNameState +
", patternToRules=" + patternToRules +
", patternToTerminalSubRuleIds=" + patternToTerminalSubRuleIds +
", patternToNonTerminalSubRuleIds=" + patternToNonTerminalSubRuleIds +
Expand Down
46 changes: 46 additions & 0 deletions src/test/software/amazon/event/ruler/Benchmarks.java
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,52 @@ public void exactRuleMemoryBenchmark() throws Exception {
rules.clear();
}

@Test
public void lowNameStateReuseMemoryBenchmark() throws Exception {
Machine machine = new Machine();
System.out.println("Low NameState Reuse Memory Benchmark");
nameStateReuseMemoryBenchmark(machine);
}

@Test
public void highNameStateReuseMemoryBenchmark() throws Exception {
Machine machine = Machine.builder().withAdditionalNameStateReuse(true).build();
System.out.println("High NameState Reuse Memory Benchmark");
nameStateReuseMemoryBenchmark(machine);
}

private void nameStateReuseMemoryBenchmark(Machine machine) throws Exception {
int maxKeys = 256;
System.gc();
long memBefore = Runtime.getRuntime().freeMemory();
int sizeBefore = machine.approximateObjectCount();
System.out.printf("Before: %.1f (%d)\n", 1.0 * memBefore / 1000000, sizeBefore);

// For a readable version with a similar setup to the rules being added here, see
// MachineTest.testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime. By adding one pattern at a time
// for each key, we create three different branches in the low NameState reuse test, but a single branch in the
// high NameState reuse test. So with low NameState reuse, Machine size grows exponentially with number of keys.
for (int i = 0; i < maxKeys; i++) {
StringBuilder prefix = new StringBuilder();
for (int j = 0; j < i; j++) {
int k = 3 * j;
prefix.append("\"key" + k + "\": [\"" + k + "\", \"" + (k + 1) + "\", \"" + (k + 2) + "\"], ");
}
int k = 3 * i;
machine.addRule("" + k, "{" + prefix + "\"key" + i + "\": [\"" + k + "\"]}");
machine.addRule("" + k + 1, "{" + prefix + "\"key" + i + "\": [\"" + (k + 1) + "\"]}");
machine.addRule("" + k + 2, "{" + prefix + "\"key" + i + "\": [\"" + (k + 2) + "\"]}");
}

System.gc();
long memAfter = Runtime.getRuntime().freeMemory();
int sizeAfter = machine.approximateObjectCount();
System.out.printf("After: %.1f (%d)\n", 1.0 * memAfter / 1000000, sizeAfter);
int perRuleMem = (int) ((1.0 * (memAfter - memBefore)) / (maxKeys * 3));
int perRuleSize = (int) ((1.0 * (sizeAfter - sizeBefore)) / (maxKeys * 3));
System.out.println("Per rule: " + perRuleMem + " (" + perRuleSize + ")");
}

@Test
public void AnythingButPerformanceBenchmark() throws Exception {
readCityLots2();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
package software.amazon.event.ruler;

import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class GenericMachineConfigurationTest {

@Test
public void testAdditionalNameStateReuseTrue() {
assertTrue(new GenericMachineConfiguration(true).isAdditionalNameStateReuse());
}

@Test
public void testAdditionalNameStateReuseFalse() {
assertFalse(new GenericMachineConfiguration(false).isAdditionalNameStateReuse());
}
}
42 changes: 42 additions & 0 deletions src/test/software/amazon/event/ruler/MachineTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -2621,4 +2621,46 @@ public void testLargeArrayRulesVsOR() throws Exception {
"}");
assertEquals(608, machine.approximateObjectCount(10000));
}

@Test
public void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime() throws Exception {
Machine machine = new Machine();
testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime(machine);
assertEquals(72216, machine.approximateObjectCount(500000));
}

@Test
public void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATimeWithAdditionalNameStateReuse() throws Exception {
Machine machine = Machine.builder().withAdditionalNameStateReuse(true).build();
testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime(machine);
assertEquals(136, machine.approximateObjectCount(500000));
}

private void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATime(Machine machine) throws Exception {
machine.addRule("0", "{\"key1\": [\"a\"]}");
machine.addRule("1", "{\"key1\": [\"b\"]}");
machine.addRule("2", "{\"key1\": [\"c\"]}");
machine.addRule("3", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\"]}");
machine.addRule("4", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"e\"]}");
machine.addRule("5", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"f\"]}");
machine.addRule("6", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\"]}");
machine.addRule("7", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"h\"]}");
machine.addRule("8", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"i\"]}");
machine.addRule("9", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\"]}");
machine.addRule("10", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"k\"]}");
machine.addRule("11", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"l\"]}");
machine.addRule("12", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\"]}");
machine.addRule("13", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"n\"]}");
machine.addRule("14", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"o\"]}");
machine.addRule("15", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\"]}");
machine.addRule("16", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"q\"]}");
machine.addRule("17", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"r\"]}");
machine.addRule("18", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\"]}");
machine.addRule("19", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"t\"]}");
machine.addRule("20", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"u\"]}");
machine.addRule("21", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"v\"]}");
machine.addRule("22", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"w\"]}");
machine.addRule("23", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"x\"]}");
machine.addRule("24", "{\"key1\": [\"a\", \"b\", \"c\"], \"key2\": [\"d\", \"e\", \"f\"], \"key3\": [\"g\", \"h\", \"i\"], \"key4\": [\"j\", \"k\", \"l\"], \"key5\": [\"m\", \"n\", \"o\"], \"key6\": [\"p\", \"q\", \"r\"], \"key7\": [\"s\", \"t\", \"u\"], \"key8\": [\"v\", \"w\", \"x\"], \"key9\": [\"y\"]}");
}
}
Loading

0 comments on commit 995f74d

Please sign in to comment.