[Feat]: Support setting retention time instead of space usage limits in dbengine configuration. #13424

Ferroin · 2022-07-22T13:37:23Z

Problem

Currently, the dbengine only allows setting disk usage limits to control data storage. This is useful in some scenarios, but runs into a couple of usability issues in others:

Users who don’t care about disk usage, but need a specific retention limit have to either do some non-trivial math themselves, or rely on our online space requirements calculator.
Retention targets are dependent on not only the disk usage limits, but also the total number of metrics. However, the total number of metrics cannot be determined reliably ahead of time, meaning users have to install Netdata, configure all their plugins, and then figure out how much space to give Netdata based on their data retention requirements.
A number of deployment scenarios involve a non-constant number of metrics, and only being able to configure disk usage limits instead of retention time limits results in potential unpredictability on such deployments in terms of actual retention times.

As an example of a scenario where these usability issues are relevant: On my home server, I have a huge amount of excess storage space that Netdata could utilize without a significant impact on the rest of the system, so disk usage limits are not particularly relevant to me except as insurance against the agent misbehaving. However, I also have two things set up on the system which result in the total number of metrics varying unpredictably over time:

User sessions each end up in their own cgroup, resulting in each new SSH or console login creating (currently) 44 new metrics via the cgroups plugin. Terminal servers using either systemd-logind or elogind would suffer from the same issue.
The total number of running VMs on the system is non-constant, and each cold restart of a VM results in a new set of metrics because of how libvirt handles the associated cgroups. Build farms, systems used for cloud hosting, and similar cases of systems running large numbers of potentially ephemeral VMs would suffer from the same issue.

Description

To support deployment scenarios where the usability issues mentioned above matter, I recommend adding a new dbengine configuration option to control how many data points get stored for each metric in the dbengine. I envision this functioning alongside the existing disk space usage limits and being applied only when the space usage of the dbengine does not exceed the disk space usage limits. A normal deployment scenario for such a setup would thus involve setting the desired retention limit, and then using the disk usage limit as a hard limit for worst case scenarios so that Netdata does not fill the whole disk if things go wrong.

I also recommend that when this option is specified, we add a log message during dbengine startup indicating worst-case disk usage and expected disk usage per dimension, to better allow users to fine-tune this new setting without needing to use an online calculator.

Importance

nice to have

Value proposition

Users who are not practically affected by disk space can set exact retention periods without needing to look online or understand how the dbengine works.
Users who have deployments with variable numbers of metrics can set specific retention periods without having to determine the worst-case scenario ahead of time and plan based on that.
Users migrating from other monitoring solutions that only provide time-based retention limits instead of disk usage limits will be able to more easily learn to configure Netdata.

Proposed implementation

Suggested name for the new configuration options: dbengine max points per metric

The text was updated successfully, but these errors were encountered:

aldem · 2022-12-23T16:40:39Z

I second that, though I would prefer to specify the retention period and not the number of points (as it depends on the interval).

Something like dbengine max retention period = 30d (and equivalents for tiered storage) would be really nice - as long as space limit allows.

MikeJakubik · 2023-03-20T19:29:27Z

Hello,

This would also be a useful way to configure retention for me, as im primarily concerned with how much data time wise we can store, i.e i want to keep up to 3 years, but disk space is not that much of an issue for me. Taking a value of x days or years per tier, Netdata would then ideally auto tune for best settings for the engine.

Thanks.

ilyam8 · 2024-02-29T12:58:15Z

Closing in favour of #17082

Ferroin added feature request New features needs triage Issues which need to be manually labelled labels Jul 22, 2022

ilyam8 added the area/database label Dec 26, 2023

ilyam8 assigned ktsaou Dec 26, 2023

ilyam8 removed the needs triage Issues which need to be manually labelled label Dec 26, 2023

ilyam8 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: Support setting retention time instead of space usage limits in dbengine configuration. #13424

[Feat]: Support setting retention time instead of space usage limits in dbengine configuration. #13424

Ferroin commented Jul 22, 2022

aldem commented Dec 23, 2022

MikeJakubik commented Mar 20, 2023 •

edited

ilyam8 commented Feb 29, 2024

[Feat]: Support setting retention time instead of space usage limits in dbengine configuration. #13424

[Feat]: Support setting retention time instead of space usage limits in dbengine configuration. #13424

Comments

Ferroin commented Jul 22, 2022

Problem

Description

Importance

Value proposition

Proposed implementation

aldem commented Dec 23, 2022

MikeJakubik commented Mar 20, 2023 • edited

ilyam8 commented Feb 29, 2024

MikeJakubik commented Mar 20, 2023 •

edited