Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research: Service Stats on Crashes #364

Closed
1 task
nh758 opened this issue Feb 20, 2024 · 1 comment
Closed
1 task

Research: Service Stats on Crashes #364

nh758 opened this issue Feb 20, 2024 · 1 comment
Assignees

Comments

@nh758
Copy link
Contributor

nh758 commented Feb 20, 2024

“When the site crashes, I want to see report of the current/avg cpu and memory usage of the various services

Requirements

  1. Research and suggest 1-2 possible approaches

Tasks

  • Write Documentation

Reference

@zachhh3 zachhh3 self-assigned this Feb 26, 2024
@zachhh3
Copy link

zachhh3 commented Feb 27, 2024

Suggestion - Prometheus an open-source monitoring and alerting toolkit.

  • Real-Time Monitoring: Prometheus provides real-time visibility into the performance metrics of our site, including CPU and memory usage of various services.
  • Alerting and Notification: With Prometheus, we can define alerting rules to trigger notifications when performance metrics exceed predefined thresholds (e.g. site crashing).
  • Historical Analysis: Prometheus stores historical metrics data, allowing us to analyze trends, identify patterns, and perform root cause analysis of performance issues.

Setup:

Instrument the Applications:

  • Instrument web applications with Prometheus client libraries to collect custom metrics. for a Node.js application, you would use the prom-client library.
  • Add instrumentation code to application to expose relevant metrics.

Expose Metrics Endpoints:

  • Expose an HTTP endpoint to serve Prometheus metrics data. This endpoint should return metrics data in a format that Prometheus can scrape (e.g., plaintext or Protocol Buffers).
  • Configure web server or application framework to handle requests to this metrics endpoint.

Visualization (Optional):

  • Grafana, a visualization tool that integrates well with Prometheus could be used to visualize data. Grafana allows us to create dashboards and visualize metrics collected from Prometheus.

Alerting:

  • Configure alerting rules in Prometheus to trigger alerts based on certain performance thresholds or anomalies detected in the collected metrics. E.g. alerts for high CPU usage, high memory usage, increased error rates, site crashes.

https://prometheus.io/
https://github.com/prometheus

@zachhh3 zachhh3 closed this as completed Mar 25, 2024
@nh758 nh758 mentioned this issue Mar 26, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants