Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

applying extra_filters might hit maxUniqueTimeSeries limit for label_values, labels, series API requests #5055

Closed
f41gh7 opened this issue Sep 25, 2023 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@f41gh7
Copy link
Contributor

f41gh7 commented Sep 25, 2023

Is your feature request related to a problem? Please describe

Requests for /api/v1/label/name/values and other APIs are processed by different code path if extra_filters is set.

Without extra_filters, VM simply selects first N results from index search. maxUniqueSeries limit isn't involved.

With extra_filters VM performs metricIDs search first and uses maxUniqueSeries limit during the search.

The problem is that user can have /api/v1/label/name/values API call executed without any issues, but once they add extra_filters the request can breach the complexity limit. This is controversial, as it is expected that specifying additional filters always reduces the complexity of the query.

Describe the solution you'd like

It'd be great to implement different approach.

  1. At first execute original request with given limit.
  2. Construct new filter that include found label value plus extra_filters
  3. make a search for the first metricID with given filter.
  4. If previous filter returns at least 1 metricID keep original result. Otherwise drop it and go to the next.

It makes sense only if limit LTE 1000.

Describe alternatives you've considered

No response

Additional information

No response

@f41gh7 f41gh7 added the enhancement New feature or request label Sep 25, 2023
@hagen1778 hagen1778 changed the title extra_filters reaches maxUniqueTimeSeries limit for label_values, labels, series API requests applying extra_filters might hit maxUniqueTimeSeries limit for label_values, labels, series API requests Sep 26, 2023
@hagen1778 hagen1778 added the TBD To Be Done label Jan 10, 2024
f41gh7 added a commit that referenced this issue Jan 16, 2024
for extra_filters and extra_label for /labels and /label/{}/values it must prevent errors with search.Max exceed error
it's possible with scanning (date,tag) -> metricIDs index and joining `metricName` for each found `metricID`. Those `metricName` can filtered with tagFilters.
#5055
valyala added a commit that referenced this issue Jan 29, 2024
…hen performing /api/v1/labels and /api/v1/label/.../values requests

This limit has little sense for these APIs, since:

- Thses APIs frequently result in scanning of all the time series on the given time range.
  For example, if extra_filters={datacenter="some_dc"} .

- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
  which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.

Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:

- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui

Updates #5055
valyala added a commit that referenced this issue Jan 29, 2024
…hen performing /api/v1/labels and /api/v1/label/.../values requests

This limit has little sense for these APIs, since:

- Thses APIs frequently result in scanning of all the time series on the given time range.
  For example, if extra_filters={datacenter="some_dc"} .

- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
  which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.

Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:

- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui

Updates #5055
@valyala valyala removed the TBD To Be Done label Jan 30, 2024
@valyala
Copy link
Collaborator

valyala commented Jan 30, 2024

FYI, the commit 5d66ee8 removes checking for the -search.maxUniqueTimeseries limit at /api/v1/labels and /api/v1/label/.../values handlers. This should prevent from returning the number of matching timeseries exceeds ... errors in default configuration for VictoriaMetrics. This commit will be included in the next release.

@valyala
Copy link
Collaborator

valyala commented Jan 30, 2024

VictoriaMetrics doesn't use -search.maxUniqueTimeseries command-line flag value when processing requests to /api/v1/labels and /api/v1/label/.../values, starting from v1.97.0. Closing the issue as resolved.

@f41gh7
Copy link
Contributor Author

f41gh7 commented Feb 17, 2024

I have doubts about current design for extra_label and extra_filters support for /api/v1/series, /api/v1/labels and /api/v1/label/values apis.

By default, those API uses an index for searching results. It allows to perform fast and efficient search operations.

Current index has two versions:

  1. single node TAG:VALUE:metric_ids
  2. cluster node TENANT_ID:TAG:VALUE:metric_ids.

extra_filters cannot be used with those indexes. Since it has to perform a join operation.

It's possible to use two different join strategies:

  1. join metricName for matched metrics_id.
  2. join metrics_ids for matched extra_filters.

First approach has inconsistent performance. It may work really fast for some cases and incredibly slow for others. Good case for it, when you have a small churn rate at match patterns for search requests. E.g. search for /api/v1/label/low_churn_metric/values with extra_filters={env="prod"}`.

Second approach has consistently slow performance and it depends on the number of filters and matched metric_ids. Other problem with it, that it's not scalable. With more than 100M active series request will be constantly timeout.

But what's reason behind using this api enchantment? Mostly, it's a replacement for current tenant implementation. And instead of tenants, it's possible to use labels as a tenant id.

It looks, that the efficient way to implement that - provide a custom index creation feature for tags. It allows to perform fast search operation with low resource usage.

E.g. -customTagIndexLabels="env,team".

It must create following indexes for tags:
env=value:team:value:tag:value:metric_ids
env=value:tag:value:metric_ids
team:value:tag:value:metric_ids

It makes request with extra_labels - {env="prod", team="dev"} or {env="dev"} fast and reliable. But it increases disk usage, storage complexity and cpu usage.

It doesn't allow to make "cross-tenant" requests, aka using regexp at extra_labels or extra_filters.

valyala added a commit that referenced this issue Feb 23, 2024
…PISeries options for fine-tuning CPU and RAM usage for /api/v1/series , /api/v1/labels and /api/v1/label/.../values

This commit returns back limits for these endpoints, which have been removed at 5d66ee8 ,
since it has been appeared that missing limits result in high CPU usage, while the introduced concurrency limiter
results in failed lightweight requests to these endpoints because of timeout when heavyweight requests are executed.

Updates #5055
valyala added a commit that referenced this issue Feb 23, 2024
…PISeries options for fine-tuning CPU and RAM usage for /api/v1/series , /api/v1/labels and /api/v1/label/.../values

This commit returns back limits for these endpoints, which have been removed at 5d66ee8 ,
since it has been appeared that missing limits result in high CPU usage, while the introduced concurrency limiter
results in failed lightweight requests to these endpoints because of timeout when heavyweight requests are executed.

Updates #5055
@valyala
Copy link
Collaborator

valyala commented Mar 1, 2024

FY, the v1.97.3 LTS release takes into account -search.maxUniqueTimeseries when requesting /api/v1/labels and /api/v1/label/.../values as it was in v1.96.0.

@valyala
Copy link
Collaborator

valyala commented Mar 1, 2024

FYI, commits fab02fa and 0b7a23a add -search.ignoreExtraFiltersAtLabelsAPI command-line flag, which can be set in order to ignore match[], extra_filters[] and extra_label query args at /api/v1/labels and /api/v1/label/.../values. This allows reducing CPU usage and disk read IO usage when the match[], extra_filters[] or extra_label match too many time series. The downside is that additional labels and label values can be returned from the the APIs, which do not match the ignored filters. See these docs for more details.

These commits will be included in the next release.

@valyala
Copy link
Collaborator

valyala commented Mar 6, 2024

There was a bug in -search.ignoreExtraFiltersAtLabelsAPI implementation, which broke requests to /api/v1/series. This bug has been fixed in the commit 2711732 .

@valyala
Copy link
Collaborator

valyala commented Mar 6, 2024

FYI, VictoriaMetrics doesn't use -search.maxUniqueTimeseries and -search.maxQueryDuration command-line flag values for limiting the number of processed series and the duration of queries to /api/v1/labels and /api/v1/label/.../values starting from v1.99.0. Instead, it uses a dedicated command-line flags for the limits at these APIs:

  • -search.maxLabelsAPIDuration - limits the maximum duration of the request to labels APIs
  • -search.maxLabelsAPISeries - limits the maximum number of unique time series, which can be scanned while searching for the requested label names or label values.

See these docs for more details.

Closing the issue as addressed.

@valyala valyala closed this as completed Mar 6, 2024
@valyala
Copy link
Collaborator

valyala commented Mar 6, 2024

Additional information, which could be useful for users, who encounter slow queries to /api/v1/labels and /api/v1/label_values when loading Grafana dashboards with the defined Label values template variables: always specify the Metric field when editing the template variable. This field must contain a name of metric, which contains all the label values, which should be shown by the given template variable. If this field is left empty, then Grafana sends too broad request to /api/v1/label/.../values at VictoriaMetrics, which needs to scan labels for all the metrics instead of scanning labels only for the requested metric.

valyala added a commit that referenced this issue Mar 10, 2024
…rching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb

This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs

Updates #5055
valyala added a commit that referenced this issue Mar 11, 2024
…ers into indexSearch.searchMetricIDsInternal

This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now.
While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal()
in order to quickly terminate search of time series in the old indexdb for new time ranges.

Updates #5055

This is a follow-up for 2d31fd7
valyala added a commit that referenced this issue Mar 12, 2024
valyala added a commit that referenced this issue Mar 12, 2024
…rching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb

This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs

Updates #5055
valyala added a commit that referenced this issue Mar 12, 2024
…ers into indexSearch.searchMetricIDsInternal

This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now.
While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal()
in order to quickly terminate search of time series in the old indexdb for new time ranges.

Updates #5055

This is a follow-up for 2d31fd7
valyala added a commit that referenced this issue Mar 12, 2024
@valyala
Copy link
Collaborator

valyala commented Mar 12, 2024

FYI, the commits 2d31fd7 and d1d2771 improve performance for /api/v1/labels and /api/v1/label/.../values calls when match[] filter contains metric name. For example, /api/v1/label/instance/values?match[]=up should be executed must faster now. These commits will be included in the next release of VictoriaMetrics.

hardproblems pushed a commit to hardproblems/VictoriaMetrics that referenced this issue Mar 13, 2024
…rching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb

This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs

Updates VictoriaMetrics#5055
hardproblems pushed a commit to hardproblems/VictoriaMetrics that referenced this issue Mar 13, 2024
…ers into indexSearch.searchMetricIDsInternal

This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now.
While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal()
in order to quickly terminate search of time series in the old indexdb for new time ranges.

Updates VictoriaMetrics#5055

This is a follow-up for 2d31fd7
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
…rching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb

This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs

Updates VictoriaMetrics#5055
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
…ers into indexSearch.searchMetricIDsInternal

This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now.
While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal()
in order to quickly terminate search of time series in the old indexdb for new time ranges.

Updates VictoriaMetrics#5055

This is a follow-up for 2d31fd7
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
…rching for matching time series at /api/v1/labels, /api/v1/label/.../values and /api/v1/status/tsdb

This should improve query performance when match[], extra_filters[] or extra_label args are passed to these APIs

Updates VictoriaMetrics#5055
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
…ers into indexSearch.searchMetricIDsInternal

This makes the code less fragile - it is harder to skip the convertToCompositeTagFilterss() call now.
While at it, call indexSearch.containsTimeRange() inside indexSearch.searchMetricIDsInternal()
in order to quickly terminate search of time series in the old indexdb for new time ranges.

Updates VictoriaMetrics#5055

This is a follow-up for 2d31fd7
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
@valyala
Copy link
Collaborator

valyala commented Apr 4, 2024

FYI, the performance of /api/v1/labels and /api/v1/label/.../values APIs has been improved significantly starting from v1.100.0 release when the match[] filter contains metric name. For example, /api/v1/label/instance/values?match[]=up now works much faster when VictoriaMetrics contains big number of time series. Such queries are usually used by Grafana during auto-suggestion and template variable population at dashboards as described in this comment, so VictoriaMetrics should process /api/v1/labels and /api/v1/label/.../values requests much faster starting from v1.100.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants