[Feat]: Stream data from all tiers to a parent node (not just tier 0) #16849

hugovalente-pm · 2024-01-25T18:18:41Z

Problem

If a user setups a standalone agent using the 3 tiers and then wants to stream the entire that from that agent to a parent agent only tier 0 data is streamed
This is an issue if the setup of Netdata Agents isn't carefully thought from the start, it is easy for this situation to occur.

Discussion started on this discord thread

Description

When data is to be streamed from a child agent to a parent agent not only tier 0 data should be streamed but the other tiers as well.

Importance

really want

Value proposition

In cases where the setup/hierarchy of agents changes it will be possible to ensure the entire data from a child node is available on the parent node

Proposed implementation

No response

luisj1983 · 2024-01-25T18:33:14Z

+1
In the future it will be a very common scenario that a netdata user will start off with the default standalone agent and then for various reasons, including increasing data retention, move to a streaming parent-child setup. We should allow all data to be preserved.

ilyam8 · 2024-01-26T10:39:30Z

But the new parent node will create tier 1 and 2 from tier0, what is the problem?

luisj1983 · 2024-01-26T12:47:43Z

But the new parent node will create tier 1 and 2 from tier0, what is the problem?

I was describing the original issue with a little more context to help in assessing the priority of the feature.
How you guys achieve it is up to you :)

hugovalente-pm · 2024-01-26T15:41:44Z

@ilyam8 was having a chat with @ktsaou about this and due to the nature of how streaming works this indeed is not straightforward thing - in short we are building tier 1 and tier 2 as we are collecting data on tier 0

a suggestion that is probably much more efficient both in cost of operations and time to complete is moving the files from the child directly to the parent.
Netdata could provide documentation and an auxiliary script that could allow the user to specify where they would like to copy the files to and ssh them to the destination.

@luisj1983 @stelfrag what do you think?

luisj1983 · 2024-01-28T00:20:10Z

@hugovalente-pm

Thanks very much for picking this up!
I suppose it depends on whether the proposal is just something quick and dirty until something more substantial is delivered or not.

You're probably going to need to do some sort of 'offline export option' at some point anyway for the scenario where you have a parent with massive quantities of multi-tiered historical data and need to now replicate all those tiers off to another parent.

Now, as a quick and dirty workaround for me your option isn't a terrible one but from an IT Operations perspective it has a lot of challenges...

In the first place I can't just ssh from a child node to a parent node (or vice versa) without reconfiguring a bunch of security; which would raise red flags from a SecOps team. For some context, I'm currently spending time removing the need to allow any inbound ssh on the local network by using cloudflare tunnels; this would have me doing the opposite.

The second part is that remotely running a script like that on nodes at scale just isn't a nice thing to do. Sure, running stuff isn't hard but handling any failures etc takes work.

Another issue is that if, as an example, I want to use parents for data resilience- that is to say, have an exact replica of data and retention on the parent so that if a child (or the parent) gets nuked then I can reconstitute that data easily (and don't worry, I'm well aware of the separate need for backups). But in that scenario I've got to do all of the above and then if something breaks later on then I have to do it all over again. If an organisation has any sort of change control process then that's just a nightmare. There are plenty of orgs where the time from change request to green-light is measured in weeks; and in that time data could easily be lost by falling out of the retention range.

luisj1983 · 2024-04-21T18:13:41Z

Is there any plan around this? I still have a parent node sitting around useless because it is receiving streamed metrics but hasn't got the backfilled ones from the children.

hugovalente-pm · 2024-04-22T08:28:16Z

I think this not one of the priorities for the @netdata/agent atm. the guide that you contributed with doesn't help on getting that parent into the same "state" as the children? https://learn.netdata.cloud/docs/netdata-agent/backup-and-restore-an-agent

luisj1983 · 2024-04-22T18:08:18Z

@hugovalente-pm
I'm not sure. I think the team was going to get back to me on which files to copy over.
Is it OK to just copy them into:

/var/cache/netdata/

How will the parent handle things like house-keeping for these files? Is that a concern?
Will the parent be able to handle the fact that it already has data for the same node for the same time period but in different files?

hugovalente-pm · 2024-05-03T13:22:23Z

from my understanding it should work, that's why initially we pointed you in that direction.
probably better to get someone from the @netdata/agent to chime in on this

hugovalente-pm added feature request New features needs triage Issues which need to be manually labelled labels Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: Stream data from all tiers to a parent node (not just tier 0) #16849

[Feat]: Stream data from all tiers to a parent node (not just tier 0) #16849

hugovalente-pm commented Jan 25, 2024 •

edited

luisj1983 commented Jan 25, 2024

ilyam8 commented Jan 26, 2024

luisj1983 commented Jan 26, 2024

hugovalente-pm commented Jan 26, 2024

luisj1983 commented Jan 28, 2024

luisj1983 commented Apr 21, 2024

hugovalente-pm commented Apr 22, 2024

luisj1983 commented Apr 22, 2024

hugovalente-pm commented May 3, 2024

[Feat]: Stream data from all tiers to a parent node (not just tier 0) #16849

[Feat]: Stream data from all tiers to a parent node (not just tier 0) #16849

Comments

hugovalente-pm commented Jan 25, 2024 • edited

Problem

Description

Importance

Value proposition

Proposed implementation

luisj1983 commented Jan 25, 2024

ilyam8 commented Jan 26, 2024

luisj1983 commented Jan 26, 2024

hugovalente-pm commented Jan 26, 2024

luisj1983 commented Jan 28, 2024

luisj1983 commented Apr 21, 2024

hugovalente-pm commented Apr 22, 2024

luisj1983 commented Apr 22, 2024

hugovalente-pm commented May 3, 2024

hugovalente-pm commented Jan 25, 2024 •

edited