Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Shivam9268 committed Mar 14, 2022
1 parent 602f487 commit e6c2fd1
Show file tree
Hide file tree
Showing 12 changed files with 488 additions and 15 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
.idea/
.vscode/

148 changes: 135 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,122 @@
## Kubernetes Pod Monitor

Kubernetes Pod Monitor gives teams visibility into current and historical pod crashes. This provides immediate alerting that reduces the mean time to detect (MTTD). It also captures the error logs and streams them to Elasticsearch. It is integrated with Slack to notify of failures and send messages with details like the last container state, reason for pod failure along with a direct link to the crash logs stored in Elasticsearch.
Kubernetes Pod Monitor actively tracks your K8S pods and alerts container restarts alongwith its crash logs thereby decreasing mean time to detect (MTTD). The features include:

![Sample Elasticsearch Dashboard](getting-started/dashboard.png)
- Alerting using slack integration
- Capturing critical crash logs and storing in Elasticsearch
- Historical pod crashes
- Storing container state that gives transparency on pod lifetime and status before termination
- Kibana Visualization for filtering through crashes
- Ability to configure slack channel based on namespace
- Ability to ignore certain namespaces


![Elasticsearch Dashboard](getting-started/dashboard.jpeg)

## Requirements

- Kubernetes version 1.13 or higher
- MySQL version 5.7 or higher
- Elasticsearch version 6.5 or higher
- [Slack access tokens](https://api.slack.com/authentication/token-types) (optional)
<a name="requirements"></a>
The following table lists the minimum requirements for running Kubernetes Pod Monitor.

Tool | Minimum version | Minimum configuration
--------- | ----------- | -------
Kubernetes | 1.13 | 100 MB RAM
MySQL | 5.7 | `-`
Elasticsearch | 6.5 | 4 GB RAM

To send alerts via Slack integration, access tokens can be generated here: https://api.slack.com/authentication/token-types

## Getting Started

You can deploy Kubernetes Pod Monitor on any Kubernetes 1.13+ cluster in a matter of minutes, if not seconds.
- [Apply MySQL migrations](getting-started/sql.md)
- [Install using the Helm chart](helm-chart/kubernetes-pod-monitor/README.md)
### Using Helm chart (recommended)
- [Apply MySQL migrations](#mysql-migrations)
- [Install using the Helm chart](helm-chart/kubernetes-pod-monitor/README.md)
- Import [Kibana dashboard](getting-started/es_saved_objects.json) into Elasticsearch by following https://www.elastic.co/guide/en/kibana/current/managing-saved-objects.html

### Using docker compose
- Add kuberentes configuration file to `config` directory and update `CLUSTER_NAME` env variable in docker-compose
- Start docker compose using:

```sh
docker-compose up
```


## MySQL Migrations

You can run the following queries to create the required database and tables:

```sql
CREATE DATABASE kubernetes_pod_monitor
```

## Usage
```sql
CREATE TABLE `k8s_crash_monitor` (
`clustername` char(64) NOT NULL,
`namespace` char(64) NOT NULL,
`podname` char(255) NOT NULL,
`containername` char(255) NOT NULL,
`restartcount` int(11) DEFAULT NULL,
`retries` int(11) DEFAULT NULL,
`edited_at` int(11) DEFAULT NULL,
PRIMARY KEY (`clustername`,`namespace`,`podname`,`containername`)
);
```

- To send slack notifications to a non-default slack channel based on namespace, add a row in the `k8s_pod_crash_notify` table with `clustername`, `namespace` and `slack_channel`.
- To ignore slack notifications for a specific namespace in a cluster, add a row in the `k8s_crash_ignore_notify` table with `clustername`, `namespace` and `containername`.
- Use `k8s_pod_crash` table to create dashboards.
- An indexed document in Elasticsearch consists of following fields:
```sql
CREATE TABLE `k8s_pod_crash` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`clustername` varchar(120) NOT NULL,
`namespace` varchar(120) NOT NULL,
`containername` varchar(120) NOT NULL,
`restartcount` int(11) NOT NULL DEFAULT '0',
`date` datetime(6) DEFAULT NULL,
PRIMARY KEY (`id`)
);
```

```sql
CREATE TABLE `k8s_pod_crash_notify` (
`clustername` varchar(255) NOT NULL,
`namespace` varchar(255) NOT NULL,
`slack_channel` varchar(255) NOT NULL,
PRIMARY KEY (`clustername`,`namespace`)
);
```

```sql
CREATE TABLE `k8s_crash_ignore_notify` (
`clustername` varchar(255) NOT NULL,
`namespace` varchar(255) NOT NULL,
`containername` varchar(255) NOT NULL,
PRIMARY KEY (`clustername`,`namespace`,`containername`)
);
```

## Configuring notifications

You can easily configure slack notifications, by using the [notification management utility](scripts/notification_management_utility.py).

The following lists the minimum requirements for running this utility:
- Python v3.6 or higher
- PyMSQL package to manage MySQL tables: https://pypi.org/project/PyMySQL/
```sh
pip3 install PyMySQL
```
- Tabulate package to render tables: https://pypi.org/project/tabulate/
```sh
pip3 install tabulate
```

Run the utility and follow the onscreen steps:

```sh
python3 scripts/notification_management_utility.py
```

## Sample Elasticsearch document
An indexed document in Elasticsearch consists of following fields:
- `namespace`: Namespace of the crashed pod
- `pod_name`: Name of the pod that crashed
- `container_name`: Container name which restarted. Helpful incase of multiple containers in a pod
Expand All @@ -32,6 +126,34 @@ You can deploy Kubernetes Pod Monitor on any Kubernetes 1.13+ cluster in a matte
- `restart_count`: Number of times the pod restarted
- `termination_state`: State of the container with reason, message, started at timestamp and finished at timestamp

```json
{
"_index": "k8s-crash-monitor-2022.03.11",
"_type": "_doc",
"_id": "Zn3DeH8BpsFVE9gY0heI",
"_version": 1,
"_score": null,
"_source": {
"namespace": "prometheus",
"pod_name": "prometheus-server-68bf5b8675-bxpq6",
"container_name": "prometheus-server",
"created_at": 1646998573563,
"cluster_name": "dev-001",
"logs": "level=error ts=2022-03-11T11:35:53.889Z caller=main.go:723 err=\"opening storage failed: zero-pad torn page: write /data/wal/00000269: no space left on device\"\n",
"restart_count": 183,
"termination_state": "&ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2022-03-11 11:35:53 +0000 UTC,FinishedAt:2022-03-11 11:35:53 +0000 UTC,ContainerID:docker://3cc68f0bdff60e4ac3ab494235225af22bfa3efa97ab5ea55464fcb510dbb0f6,}"
},
"fields": {
"created_at": [
"2022-03-11T11:36:13.563Z"
]
},
"sort": [
1646998573563
]
}
```


## Software stack

Expand Down
2 changes: 1 addition & 1 deletion config/application.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ DEPLOY_ENV: local
CLUSTER_NAME: local-cluster

server:
port: 80
port: 8080

log:
level: INFO
Expand Down
100 changes: 100 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
version: "3.9"
services:
kubernetes-pod-monitor:
build: .
restart: always
ports:
- "8080:8080"
depends_on:
mysql:
condition: service_healthy
elasticsearch:
condition: service_healthy
kibana:
condition: service_healthy
links:
- mysql
- elasticsearch
- kibana
environment:
- ELASTICSEARCH_URL=http://elasticsearch
- ELASTICSEARCH_SCHEME=http
- ELASTICSEARCH_PORT=9200
- ELASTICSEARCH_V7=true
- MAX_CRASHLOG_LENGTH=1000
- SQL_HOST=mysql
- ELASTICSEARCH_DASHBOARD=http://127.0.0.1:5601/app/dashboards#/view/31fd2fd0-f36e-11ea-bce5-ab00d82ef8ed
- CLUSTER_NAME=local-cluster
- SLACK_NOTIFY=false
- SLACK_CHANNEL=""
- SLACK_TOKEN=""
# - AWS_ACCESS_KEY_ID=
# - AWS_SECRET_ACCESS_KEY=
mysql:
image: mysql:8.0.28-oracle
restart: always
ports:
- "3306:3306"
environment:
MYSQL_USER: admin
MYSQL_PASSWORD: admin
MYSQL_ROOT_PASSWORD: root
MYSQL_DATABASE: kubernetes_pod_monitor
volumes:
- "./scripts/schema.sql:/docker-entrypoint-initdb.d/1.sql"
healthcheck:
test: mysqladmin ping -h 127.0.0.1 -u $$MYSQL_USER --password=$$MYSQL_PASSWORD
interval: 10s
timeout: 10s
retries: 30
elasticsearch:
image: elasticsearch:7.17.0
restart: always
ports:
- "9200:9200"
- "9300:9300"
environment:
- discovery.type=single-node
- xpack.security.enabled=false
healthcheck:
test:
[
"CMD-SHELL",
"curl -v http://127.0.0.1:9200",
]
interval: 10s
timeout: 10s
retries: 30
kibana:
image: kibana:7.17.0
restart: always
depends_on:
elasticsearch:
condition: service_healthy
ports:
- "5601:5601"
links:
- elasticsearch
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
healthcheck:
test:
[
"CMD-SHELL",
"curl -v http://127.0.0.1:5601",
]
interval: 10s
timeout: 10s
retries: 120
dashboard_create_utility:
image: curlimages/curl:7.81.0
restart: on-failure
depends_on:
kibana:
condition: service_healthy
links:
- kibana
command: sh /create_dashboard.sh
volumes:
- ./scripts/create_dashboard.sh:/create_dashboard.sh
- ./scripts/es_dashboard.ndjson:/es_dashboard.ndjson
Binary file added getting-started/dashboard.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed getting-started/dashboard.png
Binary file not shown.
Loading

0 comments on commit e6c2fd1

Please sign in to comment.