Sidecar - Node Crash Monitoring, Logging #4

mambisi · 2022-04-05T22:13:37Z

Sidecar for monitoring defid process and logs analytics

Overview

Sidecar primary function is to use a debugging tool i.e to make easier to read logs , node events and monitor node process resource usage. It can be extended to profile memory and cpu usage in the future with BPF for real-time profiling.

Implementation Details

This approach uses syslog protocol which wraps log message into UDP packet and send them through a UDP server. A log server which run on the sidecar will take the packet and process them into a struct and index them in a rockdb storage and into a full search engine, sidecar support for opentelemetry export of logs/events which supported format for AWS cloudtrail for distributed logging.

Storage

Rockdb is used to make the sidecar as lightweight and customizable as possible, and tantivy is used a full search database.

API

Log can be looked up and referenced through an api endpoint provided by log server

Endpoints

`/logs`

Query Paramters

Name	Type	Description
`offset`	integer	offset from the last log message if `direction` is `backward` or the first log message of `direction` is `forward`
`limit`	integer	Max logs
`log_category`	enum	log level
`timestamp`	integer	timestamp
`direction`	enum	`forward` `backward`
`query`	string	Text search Query, all other parameter are disable except limit

The text was updated successfully, but these errors were encountered:

prasannavl · 2022-04-06T14:30:56Z

Just leaving some early feedback. Will have to discuss more before getting this direction clear. But I do not think using syslog to ship it to the sidecar is a good idea, since we should make it work with zero config and this involves configuring syslog to now push it to the process. Additionally I don't think having a full fledged search on the sidecar is needed. We can build a separate service for that later.

For now, I think we it would be better to watch the log life on default location or let this be configured. So it can run on both servers as well regular users and dev systems.

prasannavl · 2022-04-06T14:36:59Z

I like the idea of being able to remotely query just what we need. But I do not think we can do this without the sidecar getting heavier and indexing it. If we have an implementation to do query, but on live data in mem (even if it does a page by page full scan of the logs at the time of the query - but page by page is important, and not to have the whole set in memory), then we can do querying. Otherwise, I'd rather ship the logs and query them separately.

prasannavl · 2022-04-06T14:58:59Z

Adding some more thought on the sidecar summarizing it's goals:

Phase 1

Tail logs, trigger actionable events.
Trigger events:
- Start
- Stop
- Restart
- Crashed
- Rollback, chain change.
- Mining started
- Mining expected, but unable to confirm miner active
- Initial block download done
- Synced to the tip
- Stale tip
- And also a mechanism to trigger each of these events after X times or within X time frame windows.
Actions:
- Run perf.
- Stop, start, restart

Phase 2

Trigger events
- CPU
- RAM
- Disk size
- IO metrics
Actions:
- Clear data.
- Download snapshot
- Auto update node.
- Deploy a node from the CI automatically on a git tag / commit.
- Full node deployment + snapshot and auto-configuration of the node.
Log query capability?

After phase 2, we should just be able to deploy a VM with the sidecar, and just let the sidecar auto configure the node/container into a full fledged defid machine.

Phase 3:

Expand sidecar to bitcoin and any other similar CLIs
More log query capability?

mambisi · 2022-04-06T15:20:29Z

Okay I see the idea now, so it more of managing the node and automating performance test rather than monitoring

prasannavl · 2022-04-06T15:30:24Z

We'll add more actions over time, but on the triggers events. The monitoring events can be actions that are triggered with more fine tuned perf based monitoring (and in advanced cases as you mentioned eBPF, though we likely may not need it) - but I do not think we need a lot.

Rest, we can always just ship the logs and do what we need, once we have the triggers - and we can rely existing cloud monitoring for other regular aspects, once we have the trigger points.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sidecar - Node Crash Monitoring, Logging #4

Sidecar - Node Crash Monitoring, Logging #4

mambisi commented Apr 5, 2022

prasannavl commented Apr 6, 2022

prasannavl commented Apr 6, 2022

prasannavl commented Apr 6, 2022 •

edited

Loading

mambisi commented Apr 6, 2022

prasannavl commented Apr 6, 2022 •

edited

Loading

Sidecar - Node Crash Monitoring, Logging #4

Sidecar - Node Crash Monitoring, Logging #4

Comments

mambisi commented Apr 5, 2022

Overview

Implementation Details

Storage

API

Endpoints

/logs

Query Paramters

prasannavl commented Apr 6, 2022

prasannavl commented Apr 6, 2022

prasannavl commented Apr 6, 2022 • edited Loading

Phase 1

Phase 2

Phase 3:

mambisi commented Apr 6, 2022

prasannavl commented Apr 6, 2022 • edited Loading

`/logs`

prasannavl commented Apr 6, 2022 •

edited

Loading

prasannavl commented Apr 6, 2022 •

edited

Loading