Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep only non-matched series in streaming aggregation #4243

Closed
Amper opened this issue May 2, 2023 · 7 comments
Closed

Keep only non-matched series in streaming aggregation #4243

Amper opened this issue May 2, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request streamaggr TBD To Be Done vmagent

Comments

@Amper
Copy link
Contributor

Amper commented May 2, 2023

Is your feature request related to a problem? Please describe

At the moment regardless of matching in config streaming aggregation:

  • keeps all input series altogether if keepInput=true
  • drops all input series altogether if keepInput=false

But in typical case user need to keep non-matched input series and drop all matched input series.

Now this effect can only be achieved with two coordinated remoteWrite (see additional information of this issue).

Describe the solution you'd like

There are two options for implementation (without backward compatibility break):

  • new flag streamAggr.dropMatched, that drops all matched series if flag streamAggr.keepInput is set
  • new flag streamAggr.keepMismatched, that keep all matched series if flag streamAggr.keepInput isn't set

Update:
As described in this comment: based on user feedback it makes more sense to implement it with backward incompatible change and implement the following flow:

  • if series did not match any rules - pass it to remote write
  • if series did match some rule - pass aggregated results
    Setting keepInput=true in this case allows keeping full source data (both aggregated and original).

Describe alternatives you've considered

No response

Additional information

Discussion: https://victoriametrics.slack.com/archives/CGZF1H6L9/p1682204682016349

@Amper Amper added enhancement New feature or request vmagent labels May 2, 2023
@oleg-tolmashov
Copy link

Also if using stream aggregation with some match filter without keepInput=true, it drops internal vmagent vm_ metrics, which is not a desired behaviour I suppose.

@Amper are there plans when it's going to be implemented?

@Amper
Copy link
Contributor Author

Amper commented Jun 13, 2023

Hi @oleg-tolmashov, i hope to do it in the next quarter. For now, you can use workaround with separate remoteWrite and dropping metrics with relabeling.

@zekker6
Copy link
Contributor

zekker6 commented Jun 13, 2023

Hi @oleg-tolmashov @Amper
We've discussed this issue last week and decided that it will make more sense to change existing behavior.
Based on user feedback, it makes more sense to keep metrics which did not match any streamAggr rules by default.

So the idea is to change default behavior to:

  • if series did not match any rules - pass it to remote write
  • if series did match some rule - pass aggregated results
    Setting keepInput=true in this case allows keeping full source data (both aggregated and original).

Also, I was going to implement this one next.

@Amper
Copy link
Contributor Author

Amper commented Jun 13, 2023

@zekker6 I got it, great!

@iamhritikkumar
Copy link

@zekker6 Do you have any updates on when it will be available in vmagent?

@hagen1778 hagen1778 added the TBD To Be Done label Jul 4, 2023
zekker6 added a commit that referenced this issue Jul 4, 2023
Changes default behaviour of keepInput flag to write series which did not match any aggregators to the remote write.
See: #4243

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
valyala added a commit that referenced this issue Jul 24, 2023
…lag (#4575)

* {lib/streamaggr,vmagent/remotewrite}: breaking change for keepInput flag

Changes default behaviour of keepInput flag to write series which did not match any aggregators to the remote write.
See: #4243

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update app/vmagent/remotewrite/remotewrite.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
valyala added a commit that referenced this issue Jul 25, 2023
- Use a byte slice instead of a map for tracking indexes for matching series.
  This improves performance, since access by slice index is faster than access by map key.
- Re-use the byte slice for tracking indexes for matching series.
  This removes unnecessary memory allocations and improves stream aggregation performance a bit.
- Add an ability to return to the previous behvaiour by specifying -remoteWrite.streamAggr.dropInput command-line flag.
  In this case all the input samples are dropped when stream aggregation is enabled.
- Backport the new stream aggregation behaviour from vmagent to single-node VictoriaMetrics when -streamAggr.config
  option is set.
- Improve docs regarding this change at docs/CHANGELOG.md
- Document the new behavior at docs/stream-aggregation.md

Updates #4243
Updates #4575
valyala added a commit that referenced this issue Jul 25, 2023
…lag (#4575)

* {lib/streamaggr,vmagent/remotewrite}: breaking change for keepInput flag

Changes default behaviour of keepInput flag to write series which did not match any aggregators to the remote write.
See: #4243

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update app/vmagent/remotewrite/remotewrite.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
valyala added a commit that referenced this issue Jul 25, 2023
- Use a byte slice instead of a map for tracking indexes for matching series.
  This improves performance, since access by slice index is faster than access by map key.
- Re-use the byte slice for tracking indexes for matching series.
  This removes unnecessary memory allocations and improves stream aggregation performance a bit.
- Add an ability to return to the previous behvaiour by specifying -remoteWrite.streamAggr.dropInput command-line flag.
  In this case all the input samples are dropped when stream aggregation is enabled.
- Backport the new stream aggregation behaviour from vmagent to single-node VictoriaMetrics when -streamAggr.config
  option is set.
- Improve docs regarding this change at docs/CHANGELOG.md
- Document the new behavior at docs/stream-aggregation.md

Updates #4243
Updates #4575
@valyala
Copy link
Collaborator

valyala commented Jul 25, 2023

FYI, the commit 6265157 changes the default behavior for stream aggregation, so now non-matched input series are preserved in the output. The old behavior can be restored by passing -remoteWrite.streamAggr.dropInput command-line option to vmagent or by passing -streamAggr.dropInput command-line option to single-node VictoriaMetrics. In this case all the input series are dropped when stream aggregation is enabled.

This commit will be included in the next release. In the mean time it is possible to test the new stream aggregation behavior by building vmagent from this commit according to these docs and verifying that vmagent keeps input samples, which do not match any match filter in stream aggregation configs.

@hagen1778
Copy link
Collaborator

vmagent changes the default behavior for stream aggregation for matched series starting from v1.92.0. Closing the feature request as done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request streamaggr TBD To Be Done vmagent
Projects
None yet
Development

No branches or pull requests

6 participants