-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some additional performance improvements #39
Conversation
Will look at this on Monday Pacific time. Just dropped in to make some minor callouts.
|
Before looking at the code:
|
1 similar comment
Before looking at the code:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will need to check out your branch to really understand the new ACFinder implementation. In the back of my mind I'm wondering if, after removing the stepQueue, we still need the ACTask and ACStep classes. Will pull branch and have a look.
@@ -193,7 +191,9 @@ ByteTransition getTransitionForAllBytes() { | |||
* @return All transitions contained in this map. | |||
*/ | |||
Set<ByteTransition> getTransitions() { | |||
return map.values().stream().filter(Objects::nonNull).collect(Collectors.toSet()); | |||
Set<ByteTransition> result = new HashSet<>(map.values()); | |||
result.remove(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? Don't know this idiom, why remove(null)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically stream().filter().collect() is creating a set from the values removing all nulls in the process. but there is some overhead going that route (stream is slow in hot paths), so I basically tried to reproduce the same that why I remove null after creating the set, as far as i remember the set can hold one copy of null, I will double check to make sure this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you end up keeping this, would recommend a short comment or test on why. I can see someone missing this in a refactor fairly easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for having a consistent style on those two, I believe it is easy to understand now.
for (int i = 0; i < wanted.length; i++) { | ||
assertEquals(wanted[i], (byte) l.get(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I work mostly in Go these days, I really like absolutely unified formatting. What tool are you using, and maybe we should put in a GitHub action or something to iron out the formatting. In the IntelliJ Go IDE it now runs gofmt on every save :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intellij idea with default format, but yep I have notice very small differences vs the format, some unintentional reformats still slipping tho jaja, agree that would be nice to have a format enforced by the project so we dont have to care about them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an issue for this #43
I did check on the deep level and it stays at 4 on the benchmarks, 8 in ACMachineTest, and 9 across the whole test suite, We should probably add one specific test to stress deeper that that. |
Here's a dummy JSON I've used in the past deep_json.txt. Probably can building some programmatically and add a benchmark test around it. |
for (String value : pattern.getValues()) { | ||
NameState matchPattern = findMatchPattern(getParser().parse(pattern.type(), value), pattern); | ||
if (matchPattern != null) { | ||
nextNameStates.add(matchPattern); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-duper nitpicking, but I think you can have a consistent style here and in ByteMap.getTransitions()
if you for x.forEach( elem -> addWhenNotNull (...) )
. I'm not sure on the performance implications yet but maybe slightly readable. Maybe...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did update this one to the new style that I believe is easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gentle reminder to update the readme and pom.xml in the next iteration.
@@ -193,7 +191,9 @@ ByteTransition getTransitionForAllBytes() { | |||
* @return All transitions contained in this map. | |||
*/ | |||
Set<ByteTransition> getTransitions() { | |||
return map.values().stream().filter(Objects::nonNull).collect(Collectors.toSet()); | |||
Set<ByteTransition> result = new HashSet<>(map.values()); | |||
result.remove(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you end up keeping this, would recommend a short comment or test on why. I can see someone missing this in a refactor fairly easily.
for (int i = 0; i < wanted.length; i++) { | ||
assertEquals(wanted[i], (byte) l.get(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an issue for this #43
I have increased the minor version on the pom. not sure what to do about readme performance section, my machine is a desktop Ryzen9 5900x so my numbers look very different than the reference macbook, maybe you Rishi can take care of updating the numbers after your benchmark split lands on master ? I also added a new test to make sure deep nested events and rules dont break anything Thanks!. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from one fairly trivial complaint, I'm good on this PR.
Set<NameState> nextNameStates = new HashSet<>(pattern.getValues().size()); | ||
for (String value : pattern.getValues()) { | ||
NameState matchPattern = findMatchPattern(getParser().parse(pattern.type(), value), pattern); | ||
if (Objects.nonNull(matchPattern)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, the javadocs say that Objects.nonNull()
is designed to be used while filtering. Obviously it does no harm, but when I saw this I had to go read a bunch of stuff to figure out whether this was better for some cosmic reason than just matchPattern != null
and it's not. So maybe let's not make other people get the same puzzled look on their face and and up at StackOverflow…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jajaja, I got biased by the streams (very common in that context) , I did try to make it closer to the other side, but I think you are right, != null is crystal clear
@@ -193,7 +191,13 @@ ByteTransition getTransitionForAllBytes() { | |||
* @return All transitions contained in this map. | |||
*/ | |||
Set<ByteTransition> getTransitions() { | |||
return map.values().stream().filter(Objects::nonNull).collect(Collectors.toSet()); | |||
Set<ByteTransition> result = new HashSet<>(map.values().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that the Set may end up containing less elements than map.values().size(), because 1) nulls are removed, and 2) the values in the map are not unique. I suppose it's probably more efficient to oversize the Set initially though than to undersize it by going with the default constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good observation, I was aware of this, I just took the size of the values as the upper bound of the set size, but perf wise its cheaper to allocate above, than allowing the set to grow
return value - 48; | ||
} | ||
// ['A'-'F'] maps to [10-15] indexes | ||
return (value - 65) + 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to just go with -55 here? If you're trying to lay out both steps here for understandability/readability, perhaps it would be better to go further and turn these numbers into private static final class variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly to try to keep it easy to understand, if we trust the compiler this is being optimized away.
I think the variable may be the better option then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the changes!
@@ -108,11 +109,11 @@ static byte[] digitSequence(byte first, byte last, boolean includeFirst, boolean | |||
|
|||
private static int getHexByteIndex(byte value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonessha what do you think? I feel like using the chars helps with the clarity , and then only one obscure constant is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Thank you!
Thanks again @rudygt |
Issue #25 :
Description of changes:
Following up on the clues from the numeric benchmark slow speed I tested Tim comment about the recursion on ACTask instead of the stepQueue approach (witch is a big issue when we have millions of steps, and the ArrayDeque needs to grow), and indeed we got a nice speed bump by getting rid of the queue and going down using recursion instead.
there are 3 individual items addresed here
togheter these changes made the numeric benchmark performance go up by around 60%
Benchmark / Performance:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.