Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar - Node Crash Monitoring, Logging #4

Open
mambisi opened this issue Apr 5, 2022 · 5 comments
Open

Sidecar - Node Crash Monitoring, Logging #4

mambisi opened this issue Apr 5, 2022 · 5 comments

Comments

@mambisi
Copy link
Contributor

mambisi commented Apr 5, 2022

Sidecar for monitoring defid process and logs analytics

Overview

Sidecar primary function is to use a debugging tool i.e to make easier to read logs , node events and monitor node process resource usage. It can be extended to profile memory and cpu usage in the future with BPF for real-time profiling.

Implementation Details

This approach uses syslog protocol which wraps log message into UDP packet and send them through a UDP server. A log server which run on the sidecar will take the packet and process them into a struct and index them in a rockdb storage and into a full search engine, sidecar support for opentelemetry export of logs/events which supported format for AWS cloudtrail for distributed logging.

Storage

Rockdb is used to make the sidecar as lightweight and customizable as possible, and tantivy is used a full search database.

API

Log can be looked up and referenced through an api endpoint provided by log server

Endpoints

/logs

Query Paramters

Name Type Description
offset integer offset from the last log message if direction is backward or the first log message of direction is forward
limit integer Max logs
log_category enum log level
timestamp integer timestamp
direction enum forward backward
query string Text search Query, all other parameter are disable except limit
@prasannavl
Copy link
Member

Just leaving some early feedback. Will have to discuss more before getting this direction clear. But I do not think using syslog to ship it to the sidecar is a good idea, since we should make it work with zero config and this involves configuring syslog to now push it to the process. Additionally I don't think having a full fledged search on the sidecar is needed. We can build a separate service for that later.

For now, I think we it would be better to watch the log life on default location or let this be configured. So it can run on both servers as well regular users and dev systems.

@prasannavl
Copy link
Member

I like the idea of being able to remotely query just what we need. But I do not think we can do this without the sidecar getting heavier and indexing it. If we have an implementation to do query, but on live data in mem (even if it does a page by page full scan of the logs at the time of the query - but page by page is important, and not to have the whole set in memory), then we can do querying. Otherwise, I'd rather ship the logs and query them separately.

@prasannavl
Copy link
Member

prasannavl commented Apr 6, 2022

Adding some more thought on the sidecar summarizing it's goals:

Phase 1

  • Tail logs, trigger actionable events.
  • Trigger events:
    • Start
    • Stop
    • Restart
    • Crashed
    • Rollback, chain change.
    • Mining started
    • Mining expected, but unable to confirm miner active
    • Initial block download done
    • Synced to the tip
    • Stale tip
    • And also a mechanism to trigger each of these events after X times or within X time frame windows.
  • Actions:
    • Run perf.
    • Stop, start, restart

Phase 2

  • Trigger events
    • CPU
    • RAM
    • Disk size
    • IO metrics
  • Actions:
    • Clear data.
    • Download snapshot
    • Auto update node.
    • Deploy a node from the CI automatically on a git tag / commit.
    • Full node deployment + snapshot and auto-configuration of the node.
  • Log query capability?

After phase 2, we should just be able to deploy a VM with the sidecar, and just let the sidecar auto configure the node/container into a full fledged defid machine.

Phase 3:

  • Expand sidecar to bitcoin and any other similar CLIs
  • More log query capability?

@mambisi
Copy link
Contributor Author

mambisi commented Apr 6, 2022

Okay I see the idea now, so it more of managing the node and automating performance test rather than monitoring

@prasannavl
Copy link
Member

prasannavl commented Apr 6, 2022

We'll add more actions over time, but on the triggers events. The monitoring events can be actions that are triggered with more fine tuned perf based monitoring (and in advanced cases as you mentioned eBPF, though we likely may not need it) - but I do not think we need a lot.

Rest, we can always just ship the logs and do what we need, once we have the triggers - and we can rely existing cloud monitoring for other regular aspects, once we have the trigger points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants