updated readme

Unacademy · Mar 14, 2022 · e6c2fd1 · e6c2fd1
1 parent 602f487
commit e6c2fd1
Show file tree

Hide file tree

Showing 12 changed files with 488 additions and 15 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,3 @@
 .idea/
 .vscode/
+
diff --git a/README.md b/README.md
@@ -1,28 +1,122 @@
 ## Kubernetes Pod Monitor
 
-Kubernetes Pod Monitor gives teams visibility into current and historical pod crashes. This provides immediate alerting that reduces the mean time to detect (MTTD). It also captures the error logs and streams them to Elasticsearch. It is integrated with Slack to notify of failures and send messages with details like the last container state, reason for pod failure along with a direct link to the crash logs stored in Elasticsearch.
+Kubernetes Pod Monitor actively tracks your K8S pods and alerts container restarts alongwith its crash logs thereby decreasing mean time to detect (MTTD). The features include:
 
-![Sample Elasticsearch Dashboard](getting-started/dashboard.png)
+- Alerting using slack integration
+- Capturing critical crash logs and storing in Elasticsearch
+- Historical pod crashes
+- Storing container state that gives transparency on pod lifetime and status before termination
+- Kibana Visualization for filtering through crashes
+- Ability to configure slack channel based on namespace
+- Ability to ignore certain namespaces
+
+
+![Elasticsearch Dashboard](getting-started/dashboard.jpeg)
 
 ## Requirements
 
-- Kubernetes version 1.13 or higher
-- MySQL version 5.7 or higher
-- Elasticsearch version 6.5 or higher
-- [Slack access tokens](https://api.slack.com/authentication/token-types) (optional)
+<a name="requirements"></a>
+The following table lists the minimum requirements for running Kubernetes Pod Monitor.
+
+Tool | Minimum version | Minimum configuration
+--------- | ----------- | -------
+Kubernetes | 1.13 | 100 MB RAM
+MySQL | 5.7 | `-`
+Elasticsearch | 6.5 | 4 GB RAM
+
+To send alerts via Slack integration, access tokens can be generated here: https://api.slack.com/authentication/token-types
 
 ## Getting Started
 
 You can deploy Kubernetes Pod Monitor on any Kubernetes 1.13+ cluster in a matter of minutes, if not seconds.
-- [Apply MySQL migrations](getting-started/sql.md)
-- [Install using the Helm chart](helm-chart/kubernetes-pod-monitor/README.md)
+### Using Helm chart (recommended)
+  - [Apply MySQL migrations](#mysql-migrations)
+  - [Install using the Helm chart](helm-chart/kubernetes-pod-monitor/README.md)
+  - Import [Kibana dashboard](getting-started/es_saved_objects.json) into Elasticsearch by following https://www.elastic.co/guide/en/kibana/current/managing-saved-objects.html
+
+### Using docker compose
+  - Add kuberentes configuration file to `config` directory and update `CLUSTER_NAME` env variable in docker-compose
+  - Start docker compose using:
+
+    ```sh
+    docker-compose up
+    ```
+
+
+## MySQL Migrations
+
+You can run the following queries to create the required database and tables:
+
+```sql
+CREATE DATABASE kubernetes_pod_monitor
+```
 
-## Usage
+```sql
+CREATE TABLE `k8s_crash_monitor` (
+`clustername` char(64) NOT NULL,
+`namespace` char(64) NOT NULL,
+`podname` char(255) NOT NULL,
+`containername` char(255) NOT NULL,
+`restartcount` int(11) DEFAULT NULL,
+`retries` int(11) DEFAULT NULL,
+`edited_at` int(11) DEFAULT NULL,
+PRIMARY KEY (`clustername`,`namespace`,`podname`,`containername`)
+);
+```
 
-- To send slack notifications to a non-default slack channel based on namespace, add a row in the `k8s_pod_crash_notify` table with `clustername`, `namespace` and `slack_channel`.
-- To ignore slack notifications for a specific namespace in a cluster, add a row in the `k8s_crash_ignore_notify` table with `clustername`, `namespace` and `containername`.
-- Use `k8s_pod_crash` table to create dashboards.
-- An indexed document in Elasticsearch consists of following fields:
+```sql
+CREATE TABLE `k8s_pod_crash` (
+`id` int(11) NOT NULL AUTO_INCREMENT,
+`clustername` varchar(120) NOT NULL,
+`namespace` varchar(120) NOT NULL,
+`containername` varchar(120) NOT NULL,
+`restartcount` int(11) NOT NULL DEFAULT '0',
+`date` datetime(6) DEFAULT NULL,
+PRIMARY KEY (`id`)
+);
+```
+
+```sql
+CREATE TABLE `k8s_pod_crash_notify` (
+`clustername` varchar(255) NOT NULL,
+`namespace` varchar(255) NOT NULL,
+`slack_channel` varchar(255) NOT NULL,
+PRIMARY KEY (`clustername`,`namespace`)
+);
+```
+
+```sql
+CREATE TABLE `k8s_crash_ignore_notify` (
+`clustername` varchar(255) NOT NULL,
+`namespace` varchar(255) NOT NULL,
+`containername` varchar(255) NOT NULL,
+PRIMARY KEY (`clustername`,`namespace`,`containername`)
+);
+```
+
+## Configuring notifications
+
+You can easily configure slack notifications, by using the [notification management utility](scripts/notification_management_utility.py). 
+
+The following lists the minimum requirements for running this utility:
+- Python v3.6 or higher
+- PyMSQL package to manage MySQL tables: https://pypi.org/project/PyMySQL/
+  ```sh
+  pip3 install PyMySQL
+  ```
+- Tabulate package to render tables: https://pypi.org/project/tabulate/
+  ```sh
+  pip3 install tabulate
+  ```
+
+Run the utility and follow the onscreen steps:
+
+```sh
+python3 scripts/notification_management_utility.py
+```
+
+## Sample Elasticsearch document
+An indexed document in Elasticsearch consists of following fields:
   - `namespace`: Namespace of the crashed pod
   - `pod_name`: Name of the pod that crashed
   - `container_name`: Container name which restarted. Helpful incase of multiple containers in a pod
@@ -32,6 +126,34 @@ You can deploy Kubernetes Pod Monitor on any Kubernetes 1.13+ cluster in a matte
   - `restart_count`: Number of times the pod restarted
   - `termination_state`: State of the container with reason, message, started at timestamp and finished at timestamp
 
+```json
+{
+  "_index": "k8s-crash-monitor-2022.03.11",
+  "_type": "_doc",
+  "_id": "Zn3DeH8BpsFVE9gY0heI",
+  "_version": 1,
+  "_score": null,
+  "_source": {
+    "namespace": "prometheus",
+    "pod_name": "prometheus-server-68bf5b8675-bxpq6",
+    "container_name": "prometheus-server",
+    "created_at": 1646998573563,
+    "cluster_name": "dev-001",
+    "logs": "level=error ts=2022-03-11T11:35:53.889Z caller=main.go:723 err=\"opening storage failed: zero-pad torn page: write /data/wal/00000269: no space left on device\"\n",
+    "restart_count": 183,
+    "termination_state": "&ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2022-03-11 11:35:53 +0000 UTC,FinishedAt:2022-03-11 11:35:53 +0000 UTC,ContainerID:docker://3cc68f0bdff60e4ac3ab494235225af22bfa3efa97ab5ea55464fcb510dbb0f6,}"
+  },
+  "fields": {
+    "created_at": [
+      "2022-03-11T11:36:13.563Z"
+    ]
+  },
+  "sort": [
+    1646998573563
+  ]
+}
+```
+
 
 ## Software stack
 

diff --git a/config/application.yml b/config/application.yml
@@ -3,7 +3,7 @@ DEPLOY_ENV: local
 CLUSTER_NAME: local-cluster
 
 server:
-  port: 80
+  port: 8080
 
 log:
   level: INFO

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -0,0 +1,100 @@
+version: "3.9"
+services:
+  kubernetes-pod-monitor:
+    build: .
+    restart: always
+    ports:
+      - "8080:8080"
+    depends_on:
+      mysql:
+        condition: service_healthy
+      elasticsearch:
+        condition: service_healthy
+      kibana:
+        condition: service_healthy
+    links:
+      - mysql
+      - elasticsearch
+      - kibana
+    environment:
+      - ELASTICSEARCH_URL=http://elasticsearch
+      - ELASTICSEARCH_SCHEME=http
+      - ELASTICSEARCH_PORT=9200
+      - ELASTICSEARCH_V7=true
+      - MAX_CRASHLOG_LENGTH=1000
+      - SQL_HOST=mysql
+      - ELASTICSEARCH_DASHBOARD=http://127.0.0.1:5601/app/dashboards#/view/31fd2fd0-f36e-11ea-bce5-ab00d82ef8ed
+      - CLUSTER_NAME=local-cluster
+      - SLACK_NOTIFY=false
+      - SLACK_CHANNEL=""
+      - SLACK_TOKEN=""
+      # - AWS_ACCESS_KEY_ID=
+      # - AWS_SECRET_ACCESS_KEY=
+  mysql:
+    image: mysql:8.0.28-oracle
+    restart: always
+    ports:
+      - "3306:3306"
+    environment:
+      MYSQL_USER: admin
+      MYSQL_PASSWORD: admin
+      MYSQL_ROOT_PASSWORD: root
+      MYSQL_DATABASE: kubernetes_pod_monitor
+    volumes:
+      - "./scripts/schema.sql:/docker-entrypoint-initdb.d/1.sql"
+    healthcheck:
+      test: mysqladmin ping -h 127.0.0.1 -u $$MYSQL_USER --password=$$MYSQL_PASSWORD
+      interval: 10s
+      timeout: 10s
+      retries: 30
+  elasticsearch:
+    image: elasticsearch:7.17.0
+    restart: always
+    ports:
+      - "9200:9200"
+      - "9300:9300"
+    environment:
+      - discovery.type=single-node
+      - xpack.security.enabled=false
+    healthcheck:
+      test:
+        [
+          "CMD-SHELL",
+          "curl -v http://127.0.0.1:9200",
+        ]
+      interval: 10s
+      timeout: 10s
+      retries: 30
+  kibana:
+    image: kibana:7.17.0
+    restart: always
+    depends_on:
+      elasticsearch:
+        condition: service_healthy
+    ports:
+      - "5601:5601"
+    links:
+      - elasticsearch
+    environment:
+      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
+    healthcheck:
+      test:
+        [
+          "CMD-SHELL",
+          "curl -v http://127.0.0.1:5601",
+        ]
+      interval: 10s
+      timeout: 10s
+      retries: 120
+  dashboard_create_utility:
+    image: curlimages/curl:7.81.0
+    restart: on-failure
+    depends_on:
+      kibana:
+        condition: service_healthy
+    links:
+      - kibana
+    command: sh /create_dashboard.sh
+    volumes:
+      - ./scripts/create_dashboard.sh:/create_dashboard.sh 
+      - ./scripts/es_dashboard.ndjson:/es_dashboard.ndjson
diff --git a/getting-started/dashboard.jpeg b/getting-started/dashboard.jpeg
diff --git a/getting-started/dashboard.png b/getting-started/dashboard.png