Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optional Configuration to Machine that can be used to enable a… #125

Merged
merged 4 commits into from
Nov 21, 2023

Conversation

jonessha
Copy link
Contributor

@jonessha jonessha commented Nov 2, 2023

…dditional NameState re-use where each key subsequence has a single NameState

Description of changes:

Introducing Configuration for Machines, which can be extended in the future with additional functionality. Right now Configuration offers a single flag. Copying Javadoc for flag as it summarizes the change:

Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have been previously added, or if the current rule has already added a pattern for the given key subsequence. Hence, by default, NameState re-use is opportunistic. But by setting this flag to true, NameState re-use will be forced for a key subsequence. This means that the first pattern being added for a key subsequence for a rule will re-use a NameState if that key subsequence has been added before. Meaning each key subsequence has a single NameState. This improves memory utilization exponentially in some cases but does lead to more sub-rules being stored in individual NameStates, which Ruler sometimes iterates over, which can cause a modest runtime performance regression.

Long previously pushed me in this direction when making memory improvements (#88 (comment)). I made a similar change to this one before (#88 (comment)) but quickly reverted after discovering that some consumers are highly sensitive to the modest runtime increase (#88 (comment)). This is why I am now guarding this improvement behind Configuration. Although I expect the average consumer to benefit from enabling this flag, it will start off as disabled, so as not to break any current consumers who are highly latency sensitive.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…dditional NameState re-use where each key subsequence has a single NameState
public class Configuration {

/**
* Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that this would need quite a bit of work before it went in the README. I found it really hard to understand. I suggest that the easiest way to make it comprehensible to mere mortals is to include a few examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's "quite a bit of work", but I added to the readme with an example.

*/
private final boolean additionalNameStateReuse;

private Configuration(boolean additionalNameStateReuse) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see this being used for configuration beyond Machine? If not, we can be explicit by calling this class MachingConfiguration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that some configuration will not be related directly to Machine, but interface-wise, it will always be plumbed in via the Machine. So the name change is reasonable. Although to be fully correct, it should be GenericMachineConfiguration. I will perform this rename, although personally, I don't think the length and clunkiness is worth the explicitness.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it hadn't cross my mind that we'd implement anything other than MachineConfiguration or Configuration

}

@Test
public void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATimeWithAdditionalNameStateReuse() throws Exception {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok added. Got these results over three runs:

High NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 274.4 (223380)
Per rule: 270502 (290)
Low NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 608.6 (2625460)
Per rule: 705709 (3418)

High NameState Reuse Memory Benchmark
Before: 45.7 (1)
After: 209.3 (223380)
Per rule: 213056 (290)
Low NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 1438.9 (2625460)
Per rule: 1786864 (3418)

High NameState Reuse Memory Benchmark
Before: 58.2 (1)
After: 221.9 (223380)
Per rule: 213154 (290)
Low NameState Reuse Memory Benchmark
Before: 58.2 (1)
After: 1718.1 (2625460)
Per rule: 2161282 (3418)

Comment on lines +496 to +497
which can cause a modest runtime performance regression. This defaults to false for backwards compatibility, but likely,
all but the most latency sensitive of applications would benefit from setting this to true.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth setting this true by default since that's the applicable state for most users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I know of at least one case where a team will be impacted by setting this to true, I erred on making it false by default to make version upgrades safer/easier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 . In that case, it makes sense to be vocal bout the feature in the release notes & any upgrade docs we write; otherwise there's a chance that folks may not use this behaviour as often as we'd like them to.

@@ -56,7 +61,13 @@ public class GenericMachine<T> {
*/
private final SubRuleContext.Generator subRuleContextGenerator = new SubRuleContext.Generator();

public GenericMachine() {}
public GenericMachine() {
this(new GenericMachineConfiguration.Builder().build());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see value in accepting the pattern similar to AWS SDK where rather than exposing GenericMachineConfiguation, we make the publicly avilable builder() method within this class and let folks use it to build their object. It reduces the need for folks to learn about one more class and let us change our minds for class name between GenericMachineConfiguration, Configuration, or something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that suggestion in that it hides a class. That said, I'm not too worried about reserving the ability to change the name later. One thing that seems weird to me about the suggested approach is that we would need to still expose the default constructor for backwards compatibility and then also have the builder. Feels cleaner to me to have two constructors - one default and one that accepts configuration. So I'd err on leaving it as-is, but I'm happy to change to your approach if you think it's worthwhile.

Copy link
Collaborator

@baldawar baldawar Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can mark the GenericMachine() as deprecated to ensure folks are gradually moving to the golden-path. I don't have a good data around it, but this line from the benchmarks tests made me wonder if we'd regret the class name in future and want to change it.

Machine machine = new Machine(
    new GenericMachineConfiguration.Builder()
        .withAdditionalNameStateReuse(true).build());

If that feels overkill, then let's keep things as is. Otherwise, we can use a builder() method for time-being.

@jonessha jonessha merged commit 41572f6 into main Nov 21, 2023
3 checks passed
@jonessha jonessha deleted the additional-namestate-reuse branch November 21, 2023 00:04
baldawar pushed a commit that referenced this pull request Nov 23, 2023
#125)

* Adding optional Configuration to Machine that can be used to enable additional NameState re-use where each key subsequence has a single NameState

* Updating readme, performing rename, adding benchmark test

* Moving Configuration builder into GenericMachine/Machine

* Bumping version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants