Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pipeline functions remove_single_field and remove_multiple_fields #19268

Merged
merged 7 commits into from
May 8, 2024

Conversation

patrickmann
Copy link
Contributor

@patrickmann patrickmann commented May 7, 2024

Resolves #19098

Address performance and functional regression in pipeline function remove_field by introducing new, unambiguous functions.

Description

GL 5.1 added regex-matching to the pipeline function remove_field. This breaks existing pipeline rules that call
remove_field with a field name containing a regex reserved character, notably .. Performance of existing rules
may also be degraded.
Both issues are addressed by introducing more specific functions:
remove_single_field removes just a single field specified by name. It is simple and fast.
remove_multiple_fields removes fields matching a regex pattern and/or list of names. Depending on the
complexity of the matching it is slower.

Motivation and Context

See discussion in linked issue and #19259.

How Has This Been Tested?

  • unit test
  • perf testing using metric org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.executionTime

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (non-breaking change)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.

@patrickmann patrickmann marked this pull request as ready for review May 7, 2024 12:19
@patrickmann patrickmann requested a review from a team May 7, 2024 12:20
Copy link
Member

@bernd bernd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a full review and test, but I have some comments:

@@ -69,7 +69,7 @@ public FunctionDescriptor<Void> descriptor() {
.params(ImmutableList.of(fieldParam, messageParam, invertParam))
.description("Removes the named field from message, unless the field is reserved. If no specific message is provided, it removes the field from the currently processed message.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a note to the description about why the function is deprecated and the alternatives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@moesterheld moesterheld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested, works as expected and lgtm. I only have two cosmetical suggestions

@@ -67,9 +70,11 @@ public FunctionDescriptor<Void> descriptor() {
.name(NAME)
.returnType(Void.class)
.params(ImmutableList.of(fieldParam, messageParam, invertParam))
.description("Removes the named field from message, unless the field is reserved. If no specific message is provided, it removes the field from the currently processed message.")
.description("Removes the named field from message, unless the field is reserved. " +
"If no specific message is provided, it uses the currently processed message." +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"If no specific message is provided, it uses the currently processed message." +
"If no specific message is provided, it uses the currently processed message. " +

Missing space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.description("Removes the specified field(s) from message, unless the field name is reserved. If no specific message is provided, it uses the currently processed message.")
.ruleBuilderEnabled()
.ruleBuilderName("Remove field - multiple")
.ruleBuilderTitle("Remove multiple fields by regex<#if pattern??> '${pattern}'</#if> or name list<#if names??> '${names}'</#if>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.ruleBuilderTitle("Remove multiple fields by regex<#if pattern??> '${pattern}'</#if> or name list<#if names??> '${names}'</#if>")
.ruleBuilderTitle("Remove multiple fields by<#if pattern??> regex '${pattern}'</#if><#if pattern?? && names??> or</#if><#if names??> name list '${names}'</#if>")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Could still be improved to better handle empty strings, but this is good enough!

Copy link
Contributor

@moesterheld moesterheld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@patrickmann patrickmann merged commit 4a1cbb4 into master May 8, 2024
5 checks passed
@patrickmann patrickmann deleted the remove_fields branch May 8, 2024 14:06
patrickmann added a commit that referenced this pull request May 8, 2024
…lds` (#19268)

* new pipeline functions remove_single_field and remove_multiple_fields

* CL

* improve rulebuilder descriptions

* compile regex and other feedback changes

* pre-compile pattern

* UI string improvements

(cherry picked from commit 4a1cbb4)
dennisoelkers pushed a commit that referenced this pull request May 8, 2024
…lds` (#19268)

* new pipeline functions remove_single_field and remove_multiple_fields

* CL

* improve rulebuilder descriptions

* compile regex and other feedback changes

* pre-compile pattern

* UI string improvements
patrickmann added a commit that referenced this pull request May 8, 2024
…lds` (#19268)

* new pipeline functions remove_single_field and remove_multiple_fields

* CL

* improve rulebuilder descriptions

* compile regex and other feedback changes

* pre-compile pattern

* UI string improvements

(cherry picked from commit 4a1cbb4)
patrickmann added a commit that referenced this pull request May 8, 2024
…lds` (#19268)

* new pipeline functions remove_single_field and remove_multiple_fields

* CL

* improve rulebuilder descriptions

* compile regex and other feedback changes

* pre-compile pattern

* UI string improvements

(cherry picked from commit 4a1cbb4)
kodjo-anipah pushed a commit that referenced this pull request May 10, 2024
…lds` (#19268) (#19297)

* new pipeline functions remove_single_field and remove_multiple_fields

* CL

* improve rulebuilder descriptions

* compile regex and other feedback changes

* pre-compile pattern

* UI string improvements

(cherry picked from commit 4a1cbb4)
patrickmann added a commit that referenced this pull request Jun 5, 2024
…ultiple_fields (#19301)

* New pipeline functions `remove_single_field` and `remove_multiple_fields` (#19268)

* new pipeline functions remove_single_field and remove_multiple_fields

* CL

* improve rulebuilder descriptions

* compile regex and other feedback changes

* pre-compile pattern

* UI string improvements

(cherry picked from commit 4a1cbb4)

* modify cherry-pick for things missing in 5.1

* Update changelog/unreleased/issue-19098.toml

Co-authored-by: Maxwell <98284293+kodjo-anipah@users.noreply.github.com>

---------

Co-authored-by: Maxwell <98284293+kodjo-anipah@users.noreply.github.com>
patrickmann added a commit that referenced this pull request Jun 5, 2024
…ultiple_fields (#19300)

* New pipeline functions `remove_single_field` and `remove_multiple_fields` (#19268)

* new pipeline functions remove_single_field and remove_multiple_fields

* CL

* improve rulebuilder descriptions

* compile regex and other feedback changes

* pre-compile pattern

* UI string improvements

(cherry picked from commit 4a1cbb4)

* adjust unit test for 5.2

* Update issue-19098.toml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove_field pipeline function performance regression
3 participants