This project provides a simple, script-based setup for a monitoring stack using Prometheus, Grafana, Node Exporter, Alertmanager, and PagerDuty. Each component is managed by a dedicated shell script, making it easy to deploy and run on a Unix-like system. The stack enables not only monitoring and visualization, but also alerting and incident management.
Purpose:
Alertmanager is a core component of the Prometheus ecosystem, responsible for handling alerts sent by Prometheus server. It manages alert notifications, grouping, inhibition, silencing, and routing to various receivers such as email, Slack, or PagerDuty.
How it fits in:
- Prometheus is configured to send alerts to Alertmanager based on alerting rules.
- Alertmanager processes these alerts and sends notifications to the configured receivers.
Typical Workflow:
- Prometheus detects an issue (e.g., high CPU usage) based on alerting rules.
- Prometheus sends the alert to Alertmanager.
- Alertmanager groups, deduplicates, and routes the alert to the appropriate notification channel.
Configuration:
- Alertmanager is usually configured via a YAML file (
alertmanager.yml
) specifying receivers and routing logic. - Receivers can include email, chat, or incident management platforms like PagerDuty.
Purpose:
PagerDuty is a popular incident management platform that provides real-time alerting, on-call scheduling, and escalation policies. Integrating PagerDuty with Alertmanager allows critical alerts to trigger incidents and notify the right people immediately.
How it fits in:
- Alertmanager is configured with a PagerDuty receiver using a service integration key.
- When a critical alert is fired, Alertmanager sends a notification to PagerDuty.
- PagerDuty creates an incident and notifies the on-call engineer via SMS, phone, email, or mobile push.
Configuration Steps:
- Create a service in PagerDuty and obtain the integration key (Events API v2).
- Add a PagerDuty receiver in
alertmanager.yml
:receivers: - name: 'pagerduty' pagerduty_configs: - routing_key: <YOUR_PAGERDUTY_INTEGRATION_KEY>
- Set up routing in Alertmanager to send critical alerts to the PagerDuty receiver.
- Ensure Prometheus alerting rules are defined for the conditions you want to be paged for.
Benefits:
- Automated, reliable alert delivery to the right people.
- Escalation and on-call management.
- Incident tracking and resolution workflows.
Purpose:
Node Exporter is an open-source tool that exposes a wide variety of hardware- and kernel-related metrics (CPU, memory, disk, network, etc.) from your system. These metrics are made available via an HTTP endpoint, which Prometheus can scrape for monitoring and alerting.
How the script works:
- Download: The script fetches the latest Node Exporter binary from the official source.
- Extract: It unpacks the downloaded archive.
- Run: Node Exporter is started, typically listening on port
9100
. - Metrics Endpoint: Once running, metrics are available at
http://localhost:9100/metrics
.
Usage:
./node.sh
What you get:
- Real-time system metrics accessible to Prometheus.
- No need for manual installation or configuration—everything is handled by the script.
Prometheus
- Purpose: Prometheus is a powerful open-source monitoring and alerting toolkit. It scrapes metrics from Node Exporter and stores them in a time-series database, allowing for querying and alerting.
- How the script works:
- Downloads the Prometheus binary.
- Extracts and runs Prometheus, usually on port
9090
. - The script configures Prometheus to scrape metrics from Node Exporter (on port
9100
).
- Access: Prometheus UI is available at
http://localhost:9090
.
Grafana
- Purpose: Grafana is an open-source analytics and visualization platform. It connects to Prometheus and provides dashboards for visualizing metrics.
- How the script works:
- Downloads the Grafana binary.
- Extracts and runs Grafana, usually on port
3000
. - Grafana can be accessed at
http://localhost:3000
(default login:admin
/admin
).
- Setup:
- After starting Grafana, add Prometheus as a data source.
- Import or create dashboards to visualize your system metrics.
Usage:
./grafna-promethous.sh
What you get:
- A running Prometheus server scraping metrics from Node Exporter.
- A running Grafana server ready for dashboard creation and visualization.
-
Make scripts executable (if needed):
chmod +x node.sh grafna-promethous.sh
-
Start Node Exporter:
./node.sh
-
Start Prometheus and Grafana:
./grafna-promethous.sh
-
Access the services:
- Node Exporter: http://localhost:9100/metrics
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000
- Alertmanager: http://localhost:9093 (default port)
-
Configure Alertmanager and PagerDuty:
- Edit
alertmanager.yml
to add your PagerDuty integration key and desired routing. - Ensure Prometheus is configured to send alerts to Alertmanager (see
prometheus.yml
). - Test alerting by triggering a sample alert and verifying PagerDuty receives the incident.
- Edit
- Unix-like OS (Linux, macOS, or WSL on Windows)
wget
andtar
installed- Sufficient permissions to run scripts and install binaries
- If ports 9100, 9090, or 3000 are in use, stop the conflicting services or change the ports in the scripts.
- For persistent monitoring, consider running the scripts in the background or as system services.
- Grafana default login is
admin
/admin
. Change the password after first login. - You can import community dashboards in Grafana for quick visualization.
sudo apt-get install stress-ng -y # Ubuntu/Debian stress-ng --cpu 2 --timeout 300
`
- For cloudwatch
SELECT AVG(CPUUtilization) FROM "AWS/EC2"
node_cpu_seconds_total{cpu="1"}
MIT License (or specify your license here)