Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin becomes CPU hog on larger nodes #86

Open
svewa opened this issue Mar 10, 2022 · 9 comments · Fixed by #88
Open

Plugin becomes CPU hog on larger nodes #86

svewa opened this issue Mar 10, 2022 · 9 comments · Fixed by #88
Assignees
Labels
metric_one Refering to the metric_one of the ln-metrics-rfc performance ⚡ Performance issue

Comments

@svewa
Copy link

svewa commented Mar 10, 2022

I'm running a node with ~180 channels. Normally my clightning process uses about 1% CPU. If I start running the metrics collector, things change to ~150% (equally split between the plugin and lightningd). Also (obviously?) interacting with lightningd becomes horribly slow, listpeers etc takes ~7s with the plugin running and 0.03s without.

Sadly that makes it unusable for me.

@vincenzopalazzo vincenzopalazzo added metric_one Refering to the metric_one of the ln-metrics-rfc performance ⚡ Performance issue labels Mar 10, 2022
@vincenzopalazzo vincenzopalazzo self-assigned this Mar 10, 2022
@svewa
Copy link
Author

svewa commented Mar 10, 2022

metrics.log is at https://nopaste.net/mO6ckGOgZp

@svewa
Copy link
Author

svewa commented Apr 3, 2022

Tried the new version. While it got much better, it's still consuming ~65% CPU constantly for the go-lnmetrics process alone.

@vincenzopalazzo
Copy link
Member

Tried the new version. While it got much better, it's still consuming ~65% CPU constantly for the go-lnmetrics process alone.

I'm reopening it because I think the problem now is the list of forwarding payments, do you have a lot of these?

@svewa
Copy link
Author

svewa commented Apr 4, 2022

while testing yesterday in 24h: 40 forwards, 430 local fails, 2700 non-local fails
on busier days 4x as much

if you poll the whole listforwards, it is ~400k elements in the forwards-array.

@vincenzopalazzo
Copy link
Member

if you poll the whole listforwards, it is ~400k elements in the forwards-array.

Here we go! we find the bottleneck, so this is something that we need to fix from core lightning side!

I will look into it!

@svewa
Copy link
Author

svewa commented Apr 4, 2022

could the plugin not just hook into any forwards happening while it's running? of course it would only get those forwards that happen while it's running, but this seems to be reasonable.

@vincenzopalazzo
Copy link
Member

vincenzopalazzo commented Apr 4, 2022

could the plugin not just hook into any forwards happening while it's running? of course, it would only get those forwards that happen while it's running, but this seems to be reasonable.

And if you are not running the plugin? you will break the metrics.

This introduces an amount of work to keep all in sync, and I will not go to start to look into it!

c-lightning need to provide the forward payment since a timestamp, all the other solution are only hack, very bad hack!

@svewa
Copy link
Author

svewa commented Apr 4, 2022

if you don't run the metrics plugin properly you won't get proper metrics. Sounds reasonable to me. There is important-plugin for that, too. But well, your choice obviously.

Another thing is, if the plugin works on historical data anyways, and the forwards are timestamped, why would it constantly poll those? Why not just not-poll for 100x as long as it did take to poll last time, so CPU consumption is <1%? Would still be undesirable if while getting/processing the data the node became unresponsive of course. No idea how/if this can be done in parallel.

Of course proper filtering and pagination would also solve this - and a few other problems.

@vincenzopalazzo
Copy link
Member

vincenzopalazzo commented Apr 4, 2022

Of course proper filtering and pagination would also solve this - and a few other problems.

filtering and pagination work by timestamps if you want to iterate over somethings

if you don't run the metrics plugin properly you won't get proper metrics. Sounds reasonable to me. There is important-plugin for that, too. But well, your choice obviously.

how you can get the metrics in the last 30 days if the plugin was not started? how your "run metrics plugin properly" looks like? if there is a bug in the software, you invalidate all the metrics collection and with the metrics architecture I can not accept this, because is a metrics collection service, and not a simple plugin that iterates in a fancy way over listforwords

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metric_one Refering to the metric_one of the ln-metrics-rfc performance ⚡ Performance issue
Projects
Status: In Progress
2 participants