Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classify & format panic logs correctly #521

Closed
CharlieC3 opened this issue Mar 10, 2024 · 2 comments · Fixed by #524
Closed

Classify & format panic logs correctly #521

CharlieC3 opened this issue Mar 10, 2024 · 2 comments · Fixed by #524
Assignees
Labels

Comments

@CharlieC3
Copy link
Member

It seems panic logs are not being printed in correct JSON syntax, which may cause issues when our log collectors try to parse them. Additionally, panic logs may not be getting classified correctly as fatal level (a rank above error level). See here for various supported log levels we can use.

{thread 'Chainhook event observer' panicked at 'Unable to register new chainhook spec: Network unknown', "components/chainhook-sdk/src/observer/mod.rs:msg1337"::25"

Ideally all panics would be printed in valid JSON, and be classified as fatal (or something similar to the critical log level referenced in the link above).

Referenced panic, but a fix may be needed for all panics.

@MicaiahReid MicaiahReid self-assigned this Mar 10, 2024
@aravindgee
Copy link
Contributor

[Fixing this issue will also unblock chainhook alerting for thread panic.]

@smcclellan smcclellan added this to the Production Reliability milestone Mar 12, 2024
MicaiahReid added a commit that referenced this issue Mar 27, 2024
This PR introduces a few fixes in an effort to improve reliability and
debugging problems when running Chainhook as a service:
- Revisits log levels throughout the tool (fixes #498, fixes #521). The
general approach for the logs were:
- `crit` - fatal errors that will crash mission critical component of
Chainhook. In these cases, Chainhook should automatically kill all main
threads (not individual scanning threads, which is tracked by #404) to
crash the service.
- `erro` - something went wrong the could lead to a critical error, or
that could impact all users
- `warn` - something went wrong that could impact an end user (usually
due to user error)
- `info` - control flow logging and updates on the state of _all_
registered predicates
   - `debug` - updates on the state of _a_ predicate
- Crash the service if a mission critical thread fails (see
#517 (comment)
for a list of these threads). Previously, if one of these threads
failed, the remaining services would keep running. For example, if the
event observer handler crashed, the event observer API would keep
running. This means that the stacks node is successfully emitting blocks
that Chainhook is acknowledging but not ingesting. This causes gaps in
our database Fixes #517
- Removes an infinite loop with bitcoin ingestion, crashing the service
instead: Fixes #506
- Fixes how we delete predicates from our db when one is deregistered.
This should reduce the number of logs we have on startup. Fixes #510
 - Warns on all reorgs. Fixes #519
MicaiahReid added a commit that referenced this issue Mar 27, 2024
This PR introduces a few fixes in an effort to improve reliability and
debugging problems when running Chainhook as a service:
- Revisits log levels throughout the tool (fixes #498, fixes #521). The
general approach for the logs were:
- `crit` - fatal errors that will crash mission critical component of
Chainhook. In these cases, Chainhook should automatically kill all main
threads (not individual scanning threads, which is tracked by #404) to
crash the service.
- `erro` - something went wrong the could lead to a critical error, or
that could impact all users
- `warn` - something went wrong that could impact an end user (usually
due to user error)
- `info` - control flow logging and updates on the state of _all_
registered predicates
   - `debug` - updates on the state of _a_ predicate
- Crash the service if a mission critical thread fails (see
#517 (comment)
for a list of these threads). Previously, if one of these threads
failed, the remaining services would keep running. For example, if the
event observer handler crashed, the event observer API would keep
running. This means that the stacks node is successfully emitting blocks
that Chainhook is acknowledging but not ingesting. This causes gaps in
our database Fixes #517
- Removes an infinite loop with bitcoin ingestion, crashing the service
instead: Fixes #506
- Fixes how we delete predicates from our db when one is deregistered.
This should reduce the number of logs we have on startup. Fixes #510
 - Warns on all reorgs. Fixes #519
github-actions bot pushed a commit that referenced this issue Mar 27, 2024
## [1.4.0](v1.3.1...v1.4.0) (2024-03-27)

### Features

* detect http / rpc errors as early as possible ([ad78669](ad78669))
* use stacks.rocksdb for predicate scan ([#514](#514)) ([a4f1663](a4f1663)), closes [#513](#513) [#485](#485)

### Bug Fixes

* enable debug logs in release mode ([#537](#537)) ([fb49e28](fb49e28))
* improve error handling, and more! ([#524](#524)) ([86b5c78](86b5c78)), closes [#498](#498) [#521](#521) [#404](#404) [/github.com//issues/517#issuecomment-1992135101](https://github.com/hirosystems//github.com/hirosystems/chainhook/issues/517/issues/issuecomment-1992135101) [#517](#517) [#506](#506) [#510](#510) [#519](#519)
* log errors on block download failure; implement max retries ([#503](#503)) ([0fc38cb](0fc38cb))
* **metrics:** update latest ingested block on reorg ([#515](#515)) ([8f728f7](8f728f7))
* order and filter blocks used to seed forking block pool ([#534](#534)) ([a11bc1c](a11bc1c))
* seed forking handler with unconfirmed blocks to improve startup stability ([#505](#505)) ([485394e](485394e)), closes [#487](#487)
* skip db consolidation if no new dataset was downloaded ([#513](#513)) ([983a165](983a165))
* update scan status for non-triggering predicates ([#511](#511)) ([9073f42](9073f42)), closes [#498](#498)
Copy link

🎉 This issue has been resolved in version 1.4.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants