Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat]: Support setting retention time instead of space usage limits in dbengine configuration. #13424

Closed
Ferroin opened this issue Jul 22, 2022 · 3 comments
Assignees

Comments

@Ferroin
Copy link
Member

Ferroin commented Jul 22, 2022

Problem

Currently, the dbengine only allows setting disk usage limits to control data storage. This is useful in some scenarios, but runs into a couple of usability issues in others:

  • Users who don’t care about disk usage, but need a specific retention limit have to either do some non-trivial math themselves, or rely on our online space requirements calculator.
  • Retention targets are dependent on not only the disk usage limits, but also the total number of metrics. However, the total number of metrics cannot be determined reliably ahead of time, meaning users have to install Netdata, configure all their plugins, and then figure out how much space to give Netdata based on their data retention requirements.
  • A number of deployment scenarios involve a non-constant number of metrics, and only being able to configure disk usage limits instead of retention time limits results in potential unpredictability on such deployments in terms of actual retention times.

As an example of a scenario where these usability issues are relevant: On my home server, I have a huge amount of excess storage space that Netdata could utilize without a significant impact on the rest of the system, so disk usage limits are not particularly relevant to me except as insurance against the agent misbehaving. However, I also have two things set up on the system which result in the total number of metrics varying unpredictably over time:

  • User sessions each end up in their own cgroup, resulting in each new SSH or console login creating (currently) 44 new metrics via the cgroups plugin. Terminal servers using either systemd-logind or elogind would suffer from the same issue.
  • The total number of running VMs on the system is non-constant, and each cold restart of a VM results in a new set of metrics because of how libvirt handles the associated cgroups. Build farms, systems used for cloud hosting, and similar cases of systems running large numbers of potentially ephemeral VMs would suffer from the same issue.

Description

To support deployment scenarios where the usability issues mentioned above matter, I recommend adding a new dbengine configuration option to control how many data points get stored for each metric in the dbengine. I envision this functioning alongside the existing disk space usage limits and being applied only when the space usage of the dbengine does not exceed the disk space usage limits. A normal deployment scenario for such a setup would thus involve setting the desired retention limit, and then using the disk usage limit as a hard limit for worst case scenarios so that Netdata does not fill the whole disk if things go wrong.

I also recommend that when this option is specified, we add a log message during dbengine startup indicating worst-case disk usage and expected disk usage per dimension, to better allow users to fine-tune this new setting without needing to use an online calculator.

Importance

nice to have

Value proposition

  1. Users who are not practically affected by disk space can set exact retention periods without needing to look online or understand how the dbengine works.
  2. Users who have deployments with variable numbers of metrics can set specific retention periods without having to determine the worst-case scenario ahead of time and plan based on that.
  3. Users migrating from other monitoring solutions that only provide time-based retention limits instead of disk usage limits will be able to more easily learn to configure Netdata.

Proposed implementation

Suggested name for the new configuration options: dbengine max points per metric

@Ferroin Ferroin added feature request New features needs triage Issues which need to be manually labelled labels Jul 22, 2022
@aldem
Copy link

aldem commented Dec 23, 2022

I second that, though I would prefer to specify the retention period and not the number of points (as it depends on the interval).

Something like dbengine max retention period = 30d (and equivalents for tiered storage) would be really nice - as long as space limit allows.

@MikeJakubik
Copy link

MikeJakubik commented Mar 20, 2023

Hello,

This would also be a useful way to configure retention for me, as im primarily concerned with how much data time wise we can store, i.e i want to keep up to 3 years, but disk space is not that much of an issue for me. Taking a value of x days or years per tier, Netdata would then ideally auto tune for best settings for the engine.

Thanks.

@ilyam8 ilyam8 removed the needs triage Issues which need to be manually labelled label Dec 26, 2023
@ilyam8
Copy link
Member

ilyam8 commented Feb 29, 2024

Closing in favour of #17082

@ilyam8 ilyam8 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants