Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use stream ids for blacklisting in stream router engine. #6170

Merged
merged 1 commit into from Jul 26, 2019

Conversation

@dennisoelkers
Copy link
Member

commented Jul 19, 2019

Description

Motivation and Context

This change is a performance optimization for the stream router engine.
When matching a message against the configured stream rules, it maintains a blacklist containing streams that have already been processed and are a) either unable to match anymore (in case of an AND-matched stream that already has a mismatch) or b) have already matched finally (in case of an OR-matched stream that already has one match).

This blacklist contains complete Stream objects, which means that a check if the stream of the current rule is already blacklisted (by calling blacklist.contains(obj)) is generating hash codes for comparison by iterating over all fields of both streams.

In our case comparing the two stream IDs is enough, as they can be considered unique and a mismatch between the two means that the streams are not equal. So this PR is changing the blacklist to a set of strings which allows more lightweight comparison.

Before and after were benchmarked by generating 1000 streams with 10 stream rules each which do not match, leading to a 90% skip rate of streams. Additionally, the stream rules are GREATER rules which require the presence of the configured field and the field itself is missing, leading to an automatic mismatch. Each work unit in the benchmark consists of matching 1000 messages against the configured streams.

After a warmup period of 20 work units, the benchmark is started by performing 30 work units for the code before and after this change. The results are the following:

(durations in ms) before after
mean 610.9 251.2
stddev +-13.54 +-24.99

This means that in some configurations, the stream router engine overhead (matching, without executing actual matchers) can be reduced by 60%.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
Use stream ids for blacklisting in stream router engine.
This change is a performance optimization for the stream router engine.
When matching a message against the configured stream rules, it
maintains a blacklist containing streams that have already been
processed and are a) either unable to match anymore (in case of an
`AND`-matched stream that already has a mismatch) or b) have already
matched finally (in case of an `OR`-matched stream that already has one
match).

This blacklist contains complete `Stream` objects, which means that a
check if the stream of the current rule is already blacklisted (by
calling `blacklist.contains(obj)`) is generating hash codes for
comparison by iterating over all fields of both streams.

In our case comparing the two stream IDs is enough, as they can be
considered unique and a mismatch between the two means that the streams
are not equal. So this PR is changing the blacklist to a set of strings
which allows more lightweight comparison.

Before and after were benchmarked by generating 1000 streams with 10
stream rules each which do not match, leading to a 90% skip rate of
streams. Additionally, the stream rules are `GREATER` rules which
require the presence of the configured field and the field itself is
missing, leading to an automatic mismatch. Each work unit in the
benchmark consists of matching 1000 messages against the configured
streams.

After a warum period of 20 work units, the benchmark is started by
performing 30 work units for the code before and after this change. The
results are the following:

| (durations in ms) | before  | after   |
|-------------------|---------|---------|
| mean              | 610.9   | 251.2   |
| stddev            | +-13.54 | +-24.99 |

This means that in some configurations, the stream router engine
overhead (matching, without executing actual matchers) can be reduced
by 60%.

@dennisoelkers dennisoelkers added this to the 3.1.0 milestone Jul 19, 2019

@dennisoelkers dennisoelkers requested review from bernd, mpfz0r, kmerz and kroepke Jul 19, 2019

@bernd bernd modified the milestones: 3.1.0-legacy, 3.1.0 Jul 25, 2019

@mpfz0r
mpfz0r approved these changes Jul 26, 2019
Copy link
Member

left a comment

LGTM 👍

@mpfz0r mpfz0r merged commit 9aad6e1 into master Jul 26, 2019

4 checks passed

ci-web-linter Jenkins build graylog-pr-linter-check 3938 has succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
graylog-project/pr Jenkins build graylog-project-pr-snapshot 4799 has succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@mpfz0r mpfz0r deleted the use-stream-ids-for-blacklisting-in-stream-router branch Jul 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.