# 3 start monitoring & logging on Azure AKS

change ${PJ_ROOT} to your directory.

In [None]:
export PJ_ROOT="${HOME}/core"
cd ${PJ_ROOT};pwd

example)
```
/Users/user/roboticbase-core
```

## load environment variables

In [None]:
source ${PJ_ROOT}/docs/azure_aks/env

## start fiware cygnus for elasticsearch

In [None]:
kubectl apply -f cygnus/cygnus-elasticsearch.yaml

In [None]:
kubectl get pods -l app=cygnus-elasticsearch

example)
```
NAME                                    READY   STATUS    RESTARTS   AGE
cygnus-elasticsearch-689b7f5fd8-dtptx   1/1     Running   0          36s
cygnus-elasticsearch-689b7f5fd8-wj5vm   1/1     Running   0          36s
cygnus-elasticsearch-689b7f5fd8-xnhhj   1/1     Running   0          36s
```

In [None]:
kubectl get services -l app=cygnus-elasticsearch

example)
```
NAME                   TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
cygnus-elasticsearch   ClusterIP   10.0.93.83   <none>        5050/TCP,8081/TCP   1m
```

## start prometheus & grafana

### install coreos/prometheus-operator

In [None]:
helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/
helm install coreos/prometheus-operator --name po --namespace monitoring

In [None]:
kubectl --namespace monitoring get pods -l "app=prometheus-operator,release=po"

example)
```
NAME                                      READY     STATUS    RESTARTS   AGE
po-prometheus-operator-7f75b4645b-xznff   1/1       Running   0          3m
```

### install coreos/kube-prometheus

In [None]:
helm install coreos/kube-prometheus --name kp --namespace monitoring -f monitoring/kube-prometheus-azure.yaml

In [None]:
kubectl get daemonsets --namespace monitoring

example)
```
NAME               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kp-exporter-node   4         4         4       4            4           <none>          11s
```

In [None]:
kubectl get deployments --namespace monitoring

example)
```
NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kp-exporter-kube-state   1         1         1            1           54s
kp-grafana               1         1         1            1           54s
po-prometheus-operator   1         1         1            1           3m
```

In [None]:
kubectl get statefulsets --namespace monitoring

example)
```
NAME                       DESIRED   CURRENT   AGE
alertmanager-kp            1         1         1m
prometheus-kp-prometheus   1         1         1m
```

In [None]:
kubectl get pods --namespace monitoring

example)
```
NAME                                     READY     STATUS    RESTARTS   AGE
alertmanager-kp-0                        2/2       Running   0          2m
kp-exporter-kube-state-89bc454b9-m75pz   2/2       Running   0          1m
kp-exporter-node-cvsxc                   1/1       Running   0          2m
kp-exporter-node-m888f                   1/1       Running   0          2m
kp-exporter-node-rldvr                   1/1       Running   0          2m
kp-grafana-74dff5b954-b8kvr              2/2       Running   0          2m
po-prometheus-operator-78c74bd9f-4tdht   1/1       Running   0          4m
prometheus-kp-prometheus-0               3/3       Running   1          2m
```

In [None]:
kubectl get services --namespace monitoring

example)
```
NAME                     TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
alertmanager-operated    ClusterIP   None           <none>        9093/TCP,6783/TCP   2m
kp-alertmanager          ClusterIP   10.0.220.192   <none>        9093/TCP            2m
kp-exporter-kube-state   ClusterIP   10.0.116.154   <none>        80/TCP              2m
kp-exporter-node         ClusterIP   10.0.173.208   <none>        9100/TCP            2m
kp-grafana               ClusterIP   10.0.7.29      <none>        80/TCP              2m
kp-prometheus            ClusterIP   10.0.137.247   <none>        9090/TCP            2m
prometheus-operated      ClusterIP   None           <none>        9090/TCP            2m
```

In [None]:
kubectl get persistentvolumeclaims --namespace monitoring

example)
```
NAME                                                     STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
alertmanager-kp-db-alertmanager-kp-0                     Bound     pvc-95d5a26c-b010-11e8-b618-066567bdfa8c   30Gi       RWO            managed-premium   3m
prometheus-kp-prometheus-db-prometheus-kp-prometheus-0   Bound     pvc-95f599bb-b010-11e8-b618-066567bdfa8c   30Gi       RWO            managed-premium   3m
```

### patch kube-dns-v20
* Azure AKS does not export dns metrics
    * https://github.com/Azure/AKS/issues/345

In [None]:
kubectl patch deployment --namespace kube-system kube-dns-v20 --patch "$(cat monitoring/kube-dns-azure-patch.yaml)"

### patch kube-prometheus-exporter-kubelets
* the ServiceMonitor of kubelets on Azur AKS does not accept https
    * https://github.com/coreos/prometheus-operator/issues/926

In [None]:
kubectl get servicemonitor --namespace monitoring kp-exporter-kubelets -o yaml | sed 's/https/http/' | kubectl replace -f -

### delete ServiceMonitor of apiserver
* the ServiceMonitor of apiserver on Azure AKS does not allow to connect directry
    * https://github.com/coreos/prometheus-operator/issues/1522

In [None]:
kubectl delete servicemonitor --namespace monitoring kp-exporter-kubernetes

### edit some prometheus rules

In [None]:
echo 'kubectl edit prometheusrules --namespace monitoring kp-kube-prometheus'

```diff
       for: 10m
       labels:
         severity: warning
-    - alert: DeadMansSwitch
-      annotations:
-        description: This is a DeadMansSwitch meant to ensure that the entire Alerting
-          pipeline is functional.
-        summary: Alerting DeadMansSwitch
-      expr: vector(1)
-      labels:
-        severity: none
     - expr: process_open_fds / process_max_fds
       record: fd_utilization
     - alert: FdExhaustionClose
```

In [None]:
echo 'kubectl edit prometheusrules --namespace monitoring kp-exporter-kube-controller-manager'

```diff
 spec:
   groups:
   - name: kube-controller-manager.rules
-    rules:
-    - alert: K8SControllerManagerDown
-      annotations:
-        description: There is no running K8S controller manager. Deployments and replication
-          controllers are not making progress.
-        runbook: https://coreos.com/tectonic/docs/latest/troubleshooting/controller-recovery.html#recovering-a-controller-manager
-        summary: Controller manager is down
-      expr: absent(up{job="kube-controller-manager"} == 1)
-      for: 5m
-      labels:
-        severity: critical
+    rules: []
```

In [None]:
echo 'kubectl edit prometheusrules --namespace monitoring kp-exporter-kube-scheduler'

```diff
       labels:
         quantile: "0.5"
       record: cluster:scheduler_binding_latency_seconds:quantile
-    - alert: K8SSchedulerDown
-      annotations:
-        description: There is no running K8S scheduler. New pods are not being assigned
-          to nodes.
-        runbook: https://coreos.com/tectonic/docs/latest/troubleshooting/controller-recovery.html#recovering-a-scheduler
-        summary: Scheduler is down
-      expr: absent(up{job="kube-scheduler"} == 1)
-      for: 5m
-      labels:
-        severity: critical
```

In [None]:
echo 'kubectl edit prometheusrules --namespace monitoring kp-exporter-kubernetes --namespace monitoring'

```diff
       for: 10m
       labels:
         severity: critical
-    - alert: K8SApiserverDown
-      annotations:
-        description: No API servers are reachable or all have disappeared from service
-          discovery
-        summary: No API servers are reachable
-      expr: absent(up{job="apiserver"} == 1)
-      for: 20m
-      labels:
-        severity: critical
     - alert: K8sCertificateExpirationNotice
       annotations:
         description: Kubernetes API Certificate is expiring soon (less than 7 days)
```

### confirm prometheus

In [None]:
echo 'kubectl --namespace monitoring port-forward $(kubectl get pod --namespace monitoring -l prometheus=kube-prometheus -l app=prometheus -o template --template "{{(index .items 0).metadata.name}}") 9090:9090'

In [None]:
xdg-open http://localhost:9090

1. confirm Prometheus
    * no `Target` is down.
    * no `Alert` is fired.

### setup grafana

In [None]:
echo 'kubectl --namespace monitoring port-forward $(kubectl get pod --namespace monitoring -l app=kp-grafana -o template --template "{{(index .items 0).metadata.name}}") 3000:3000'

In [None]:
xdg-open http://localhost:3000

1. login grafana
    * At the first, a admin user (`admin`/`admin`) is available.
2. show `Configuration -> Data Sources -> prometheus`
3. change `URL` from `http://kp:9090` to **`http://kp-prometheus:9090`**
4. push `Save & Test`

### add `persistent volume` dashboard to grafana

1. import  `monitoring/dashboard_persistent_volumes.json`

## start Elasticsearch, fluentd and Kibana

### start Elasticsearch

In [None]:
kubectl apply -f logging/elasticsearch-azure.yaml

In [None]:
kubectl get statefulsets --namespace monitoring -l k8s-app=elasticsearch-logging

example)
```
NAME                    DESIRED   CURRENT   AGE
elasticsearch-logging   2         2         3m
```

In [None]:
kubectl get pods --namespace monitoring -l k8s-app=elasticsearch-logging

example)
```
NAME                      READY     STATUS    RESTARTS   AGE
elasticsearch-logging-0   1/1       Running   0          4m
elasticsearch-logging-1   1/1       Running   0          2m
```

In [None]:
kubectl get services --namespace monitoring -l k8s-app=elasticsearch-logging

example)
```
NAME                    TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
elasticsearch-logging   ClusterIP   10.0.80.88   <none>        9200/TCP   4m
```

In [None]:
kubectl get persistentvolumeclaims -n monitoring -l k8s-app=elasticsearch-logging

example)
```
NAME                                            STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
elasticsearch-logging-elasticsearch-logging-0   Bound     pvc-238139db-b014-11e8-b618-066567bdfa8c   64Gi       RWO            managed-premium   4m
elasticsearch-logging-elasticsearch-logging-1   Bound     pvc-70ca5ec3-b014-11e8-b618-066567bdfa8c   64Gi       RWO            managed-premium   2m
```

In [None]:
kubectl exec -it elasticsearch-logging-0 --namespace monitoring -- curl -H "Content-Type: application/json" -X PUT http://elasticsearch-logging:9200/_cluster/settings -d '{"transient": {"cluster.routing.allocation.enable":"all"}}'

### start fluentd

In [None]:
kubectl apply -f logging/fluentd-es-configmap.yaml

In [None]:
kubectl apply -f logging/fluentd-es-ds.yaml

In [None]:
kubectl get daemonsets --namespace monitoring -l k8s-app=fluentd-es

example)
```
NAME                DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
fluentd-es-v2.2.0   4         4         4       4            4           <none>          53s
```

In [None]:
kubectl get pods --namespace monitoring -l k8s-app=fluentd-es

example)
```
NAME                      READY   STATUS    RESTARTS   AGE
fluentd-es-v2.2.0-8sv45   1/1     Running   0          1m
fluentd-es-v2.2.0-96ghs   1/1     Running   0          1m
fluentd-es-v2.2.0-cjhtc   1/1     Running   0          1m
fluentd-es-v2.2.0-djzff   1/1     Running   0          1m
```

### start Kibana

In [None]:
kubectl apply -f logging/kibana.yaml

In [None]:
kubectl get pods --namespace monitoring -l k8s-app=kibana-logging

example)
```
NAME                              READY     STATUS    RESTARTS   AGE
kibana-logging-7444956bf8-stnfm   1/1       Running   0          1m
```

### start curator

In [None]:
kubectl apply -f logging/curator-configmap.yaml

In [None]:
kubectl apply -f logging/curator-cronjob.yaml

In [None]:
kubectl get cronjobs --namespace monitoring -l k8s-app=elasticsearch-curator

example)
```
NAME                    SCHEDULE     SUSPEND   ACTIVE    LAST SCHEDULE   AGE
elasticsearch-curator   0 18 * * *   False     0         <none>          7s
```

### setup Kibana

In [None]:
echo 'kubectl --namespace monitoring port-forward $(kubectl get pod -l k8s-app=kibana-logging --namespace monitoring -o template --template "{{(index .items 0).metadata.name}}") 5601:5601'

In [None]:
xdg-open http://localhost:5601/

1. show `Management -> Index Patterns`
2. set `logstash-*` as Index Pattern, and push `Next step`
3. set `@timestamp` as Time Filter field name, and push `Create index pattern`

In [None]:
echo 'kubectl --namespace monitoring port-forward $(kubectl get pod --namespace monitoring -l app=kp-grafana -o template --template "{{(index .items 0).metadata.name}}") 3000:3000'

In [None]:
xdg-open http://localhost:3000

### add `elasticsearch` dashboard to grafana

1. add a new Data Source
    * Name: `elasticsearch`
    * Type: `Elasticsearch`
    * URL: `http://elasticsearch-logging:9200`
    * Access: `Server(Default)`
    * Index name: `logstash-*`
    * Time field name: `@timestamp`
    * Version: `5.6+`
2. import `monitoring/dashboard_elasticsearch.json`