# Implementation health check expose via Prometheus metrics into Insights Operator

## User story

As a developer I'd need to know how to expose any data in Prometheus format from Insights Operator, so I'll be able to expose health check results later.

## About the task

This task is basically about having a PoC how to expose health check data via Prometheus metrics. Then we will be able to decide how the metrics should look like.


## Acceptance criteria:

- PoC with very mocked health results exposed as Prometheus metric
- Demo presented on CCX Demo meeting

## JIRA issue:
https://projects.engineering.redhat.com/browse/CCXDEV-2179

## Insights integration with Web Console
(one possible solution, other is based on CRD)

![Web console integration](./web_console_anim.gif)

## Why to expose metrics consumed by Web console?

Insights messages and status can be displayed on Overview page

![title](./web_console_01.png)

It will be also possible to display more detailed information about cluster status

![title](./web_console_02.png)

![title](./web_console_03.png)

### So the IO sends data to External data pipeline and we need to fetch the results back to IO?

![endless-loop](./endless_loop.jpg)

## The whole workflow

![title](./io-pulling-prometheus-anim.gif)

## IO already has some Prometheus metrics!

-> easier solution, almost oneliner

## An implementation of new metric to be produced by IO

![title](./pr.png)

## Let's patch IO and start it against some cluster

```
W0714 16:44:33.132520   10855 cmd.go:195] Using insecure, self-signed certificates
I0714 16:44:33.585740   10855 observer_polling.go:155] Starting file observer
W0714 16:44:33.586285   10855 builder.go:170] unable to identify the current namespace for events: open /var/run/secrets/kubernetes.io/serviceaccount/namespace: no such file or directory
W0714 16:44:34.137275   10855 builder.go:174] unable to get owner reference (falling back to namespace): unable to setup event recorder as "POD_NAME" env variable is not set and there are no pods
W0714 16:44:34.378811   10855 configmap_cafile_content.go:102] unable to load initial CA bundle for: "client-ca::kube-system::extension-apiserver-authentication::client-ca-file" due to: configmap "extension-apiserver-authentication" not found
W0714 16:44:34.378906   10855 configmap_cafile_content.go:102] unable to load initial CA bundle for: "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" due to: configmap "extension-apiserver-authentication" not found
I0714 16:44:34.737810   10855 operator.go:57] Starting insights-operator v0.0.0-master+$Format:%h$
I0714 16:44:34.738716   10855 configmap_cafile_content.go:205] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0714 16:44:34.738733   10855 configmap_cafile_content.go:205] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0714 16:44:34.738749   10855 shared_informer.go:197] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0714 16:44:34.738762   10855 shared_informer.go:197] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0714 16:44:34.739307   10855 dynamic_serving_content.go:129] Starting serving-cert::/tmp/ramdisk/serving-cert-899292214/tls.crt::/tmp/ramdisk/serving-cert-899292214/tls.key
I0714 16:44:34.739961   10855 secure_serving.go:178] Serving securely on [::]:8443
I0714 16:44:34.740007   10855 tlsconfig.go:219] Starting DynamicServingCertificateController
I0714 16:44:34.938924   10855 shared_informer.go:204] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0714 16:44:35.077870   10855 periodic.go:113] Gathering cluster info every 30s
I0714 16:44:35.138981   10855 shared_informer.go:204] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E0714 16:44:35.238659   10855 clusterconfig.go:139] Unable to retrieve most recent metrics: Get https://prometheus-k8s.openshift-monitoring.svc:9091/federate?match%5B%5D=ALERTS&match%5B%5D=etcd_object_counts&match%5B%5D=cluster_installer&match%5B%5D=namespace%3Acontainer_cpu_usage_seconds_total%3Asum_rate&match%5B%5D=namespace%3Acontainer_memory_usage_bytes%3Asum: dial tcp: lookup prometheus-k8s.openshift-monitoring.svc on 127.0.0.1:53: no such host
I0714 16:44:35.246845   10855 status.go:388] The initial operator extension status is healthy
I0714 16:44:35.246863   10855 status.go:413] It is safe to use fast upload
```

## And now let's retrieve some metrics
```
curl --cert k8s.crt --key k8s.key -vs -k https://localhost:8443/metrics | grep insights 
```

Seems like it works correctly:

```
# HELP health_statuses_insights [ALPHA] Foobar.
# TYPE health_statuses_insights gauge
health_statuses_insights{metric="connected"} 1
health_statuses_insights{metric="critical"} 1
health_statuses_insights{metric="important"} 5
health_statuses_insights{metric="low"} 3
health_statuses_insights{metric="moderate"} 3
health_statuses_insights{metric="total"} 12
```

(finito)

## By-product

In [None]:
#!/usr/bin/env python3

"""Script to generate certificate and user key from provided Kubernetes configuration file.

Generated files k8s.crt and k8s.key might be used to access Insights Operator
REST API endpoints and Prometheus metrics as well.
"""

# Copyright © 2020 Red Hat
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import yaml
import base64
import sys


def get_data_for_user(payload, user_name):
    """
    Try to retrieve data for given user.

    KeyError will be raised in case of improper payload format.
    """
    users = payload["users"]
    for user_data in users:
        if "name" in user_data and user_data["name"] == user_name:
            return user_data


def get_value_assigned_to_user(user_data, key):
    """
    Try to retrieve (attribute) value assigned to an user.

    In practise it will be certificate or key. KeyError will be raised in case
    of improper payload format or when the attribute for given key does not
    exist.
    """
    d = user_data["user"]
    return d[key]


def decode(b64):
    """
    Decode given attribute encoded by using Base64 encoding.

    The result is returned as regular Python string. Note that TypeError might
    be thrown when the input data are not encoded properly.
    """
    barray = base64.b64decode(b64)
    return barray.decode('ascii')


def generate_cert_and_key_files(input_file):
    """Generate file with certificate and user key from k8s configuration file."""
    with open(input_file) as f:
        payload = yaml.load(f)
        if payload is not None:
            user_data = get_data_for_user(payload, "admin")
            encoded_certificate = get_value_assigned_to_user(user_data, "client-certificate-data")
            encoded_key = get_value_assigned_to_user(user_data, "client-key-data")
            decoded_certificate = decode(encoded_certificate)
            decoded_key = decode(encoded_key)
            with open("k8s.crt", "w") as cert:
                cert.write(decoded_certificate)
            with open("k8s.key", "w") as cert:
                cert.write(decoded_key)


def main():
    """Entry point to this script."""
    if len(sys.argv) <= 1:
        print("Usage: gen_cert_file.py kubeconfig.yaml")
        sys.exit(1)
    generate_cert_and_key_files(sys.argv[1])


# Common Python's black magic
if __name__ == "__main__":
    main()