Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSID index: switch to per-day index instead of global one #4563

Closed
hagen1778 opened this issue Jul 3, 2023 · 3 comments
Closed

TSID index: switch to per-day index instead of global one #4563

hagen1778 opened this issue Jul 3, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request performance Performance-related issue

Comments

@hagen1778
Copy link
Collaborator

Is your feature request related to a problem? Please describe

TSID index is most vulnerable to high-churn rate issue. This not only make index bigger in high churn-rate environment but also makes index/* caches less efficient, as they rely on data blocks size on disk.

Describe the solution you'd like

Would be nice to switch to per-day partitioning for TSID indexes. This may result in bigger disk usage in a long-run, but would significantly reduce memory usage and increase cache-hit rate for environments with high churn rate

Describe alternatives you've considered

No response

Additional information

No response

@hagen1778 hagen1778 added enhancement New feature or request performance Performance-related issue labels Jul 3, 2023
@hagen1778 hagen1778 added the TBD To Be Done label Jul 7, 2023
@hagen1778
Copy link
Collaborator Author

cc @valyala

f41gh7 added a commit that referenced this issue Jul 13, 2023
indexDB rotation

Previously, during indexDB dateMetricID cache was reseted and it caused
a lot of new records creation. It may saturate memory usage, since
lookups for exist entries were made.
With new logic, daily index records will be pre-created at the 1 hour
before indexDB rotation.
There is no need to reset dateMetricID cache, since it belongs to
indexDB. It greatly improves perforamnce.
It should help to implement next feature #4563
f41gh7 added a commit that referenced this issue Jul 17, 2023
during an hour before indexDB rotation start creating records at the next indexDB
it must improve performance during switch for the next indexDB and remove ingestion issues.
Since there is no need for creation new index records for timeseries already ingested into current indexDB
#4563
f41gh7 added a commit that referenced this issue Jul 18, 2023
during an hour before indexDB rotation start creating records at the next indexDB
it must improve performance during switch for the next indexDB and remove ingestion issues.
Since there is no need for creation new index records for timeseries already ingested into current indexDB
#4563
valyala added a commit that referenced this issue Jul 22, 2023
valyala added a commit that referenced this issue Jul 22, 2023
* lib/storage: pre-create timeseries before indexDB rotation
during an hour before indexDB rotation start creating records at the next indexDB
it must improve performance during switch for the next indexDB and remove ingestion issues.
Since there is no need for creation new index records for timeseries already ingested into current indexDB
#4563

* lib/storage: further work on indexdb rotation optimization

- Document the change at docs/CHAGNELOG.md
- Move back various caches from indexDB to Storage. This makes the change less intrusive.
  The dateMetricIDCache now takes into account indexDB generation, so it stores (date, metricID)
  entries for both the current and the next indexDB.
- Consolidate the code responsible for idbNext pre-filling into prefillNextIndexDB() function.
  This improves code readability and maintainability a bit.
- Rewrite and simplify the code responsible for calculating the next retention timestamp.
  Add various tests for corner cases of this code.
- Remove indexdb pre-filling from RegisterMetricNames() function, since this function is rarely called.
  It is OK to add indexdb entries on demand in this function. This simplifies the code.

Updates #1401

* docs/CHANGELOG.md: refer to #4563

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
@valyala
Copy link
Collaborator

valyala commented Jul 22, 2023

The commit 7094fa3 adds per-day index for MetricName -> TSID. This commit will be included in the next release.

valyala added a commit that referenced this issue Jul 22, 2023
* lib/storage: pre-create timeseries before indexDB rotation
during an hour before indexDB rotation start creating records at the next indexDB
it must improve performance during switch for the next indexDB and remove ingestion issues.
Since there is no need for creation new index records for timeseries already ingested into current indexDB
#4563

* lib/storage: further work on indexdb rotation optimization

- Document the change at docs/CHAGNELOG.md
- Move back various caches from indexDB to Storage. This makes the change less intrusive.
  The dateMetricIDCache now takes into account indexDB generation, so it stores (date, metricID)
  entries for both the current and the next indexDB.
- Consolidate the code responsible for idbNext pre-filling into prefillNextIndexDB() function.
  This improves code readability and maintainability a bit.
- Rewrite and simplify the code responsible for calculating the next retention timestamp.
  Add various tests for corner cases of this code.
- Remove indexdb pre-filling from RegisterMetricNames() function, since this function is rarely called.
  It is OK to add indexdb entries on demand in this function. This simplifies the code.

Updates #1401

* docs/CHANGELOG.md: refer to #4563

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
@valyala
Copy link
Collaborator

valyala commented Jul 28, 2023

VictoriaMetrics uses per-day index for MetricName -> TSID mapping starting from v1.92.0.

Closing this feature request as done.

@valyala valyala closed this as completed Jul 28, 2023
@valyala valyala removed the TBD To Be Done label Jul 28, 2023
valyala added a commit that referenced this issue Jul 29, 2023
valyala added a commit that referenced this issue Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Performance-related issue
Projects
None yet
Development

No branches or pull requests

3 participants