SoLOMON is a tool for SLO-based monitoring.
Overview of the architecture:
- install dependencies for frontend and backend:
npm install
(in both:/solomon-backend
and/solomon-frontend
) - start backend:
npm run start:dev
(in/solomon-backend
) - start frontend:
npm run start
(in/solomon-frontend
) - for full functionality you also need to run an instance of the Gropius backend and set the URL where it is running in the SoLOMON env file (
/config/envs/.env.dev.local
).
Hint: The backend can be started either as HTTP or as HTTPS server.
To start it as HTTP server just make sure that HTTPS_ENABLED=false
is set in the env file (/config/envs/.env.dev.local
).
If you want to run it as HTTPS server set HTTPS_ENABLED=true
and follow the following instructions.
- Create self-signed SSL certificate, using following commands (in Git Bash):
openssl genrsa -out key.pem
openssl req -new -key key.pem -out csr.pem
openssl x509 -req -days 9999 -in csr.pem -signkey key.pem -out cert.pem
rm csr.pem
- After the third command you will be prompted to enter in different information for the certificate. You can just put in random information or even leave most of the fields empty. Don't set a challenge password!
- When done you should find two files in the location where you executed the commands:
cert.pem
key.pem
- Copy these files into the following location of the SLO backend folder:
/config/certificate/
Hint: Git comes with open-ssl, this is why this works in the Git bash but not in PS or CMD. Alternatively you can just generate the certificate in a Linux system and copy the files afterwards, or you could install open-ssl on Windows.
- create a new JSON file called
basic-user.json
- it should have the following format, but you can invent your own username and password:
{ "users": { "yourusername": "yourpassword"} }
- put this JSON file into the following folder of the SLO backend:
/config/credentials/
When accessing the backend now, we have to use the HTTPS protocol in the URL and also add the basic auth user. A call to the (locally deployed) backend from the browser might thus look like this:
https://yourusername:yourpassword@localhost/solomon/rules/aws
When using tools like Postman, there is an "Authorization" tab where the Basic Auth user can be specified once. Postman then takes care of adding the user and password to each request and you don't have to add it to the URL manually.
- Prometheus
- Read https://prometheus.io/docs/prometheus/latest/getting_started/
- Add targets to be scraped
- Define Alert Rules
- Read about Alertmanager, Exporters and Blackbox Exporter
- Docker Basics: What are containers? What is
Dockerfile
- Kubernetes Basics
- Learn about Pods, Deployments, Services, Custom Resources Definitions (CRD
- Learn about Helm Charts
- Frameworks React.JS and Nest.JS are being using but it is probably sufficient to know basics of JavaScript to understand the current code
- Learn about current System by reading and executing the following instructions
Install Minikube to run a local Kubernetes cluster. Following steps uses Minikube and Helm.
- Create new Minikube profile
minikube start -p sla-management --cpus=2 --memory=2048
- Install kube-prometheus-stack helm chart with prometheus values. This sets up a basic prometheus environment
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
then
helm install prometheus prometheus-community/kube-prometheus-stack -f ./prometheus-operator-values.yaml
- Apply kubernetes resources in
kubernetes-environment
folder. This create sla-manager and nodejs-demo-client deployment & service and set roles.
kubectl apply -f .
- Install Blackbox Exporter Helm Chart with values
helm install blackbox-exporter prometheus-community/prometheus-blackbox-exporter -f ./blackbox-exporter-values.yaml
Explanations: You have set up a kubernetes cluster, configured the prometheus operator, that is used to monitor pods of the cluster. You created deployments and services for
-
nodejs-demo-client
, a dummy server to be monitored by prometheus. This server exports metrics that can be monitored by prometheus. This is the common way to monitor applications with prometheus but it requires configuration on the application itself which is not optimal for our system. To solve this, we use the Blackbox Exporter to monitor Applications. This is a "prober" that sends HTTP request to the configured applications to check availability and response times (and more). -
As the name says,
sla-manager
manages the configured SLA's. It converts existing SLA's to prometheus rules using theprometheusrules
CRD, which is being applied via the Kubernetes API (the "programmatically" method of defining a yaml rules file (see examplesla-rules.yaml
)). Seesla-rules.service.ts
in thesla-manager/src
directory. It also handles the "Gropius Issue Creation", when a Prometheus Alert Rule get triggered, the Alert Manager is configured to send fired Alerts to the sla-manager service via webhooks. Seeissue-manager.service.ts
. -
The
sla-manager-frontend
is where you can configure SLA's.
Add the Helm repo:
helm repo add myrepo https://tobiasrodestock.github.io/helm-chart
Install with Helm:
helm install solomon myrepo/solomon
Use port-forward to access sla-manager and sla-manager-frontend
kubectl port-forward [Sla manager pod] 6400
kubectl port-forward [Sla Manager Frontend Pod] 3000
Get familiar with Gropius. Start Backend and Frontend. Instructions at the corresponding Repos.
Now you can configure SLA Rules (by default one Rule is already configured).
Access prometheus to see the configured SLA Rules in action (names of pods might be different for you):
kubectl port-forward prometheus-prometheus-kube-prometheus-prometheus-0 9090
Also checkout Alert Manager and Blackbox Exporter
kubectl port-forward alertmanager-prometheus-kube-prometheus-alertmanager-0 9093
kubectl port-forward blackbox-exporter-prometheus-blackbox-exporter-d94fc99c5-j8gn5 9115
You can simulate an unavailable service by rescaling to 0 pods like this.
kubectl scale --replicas=0 deployment/nodejs-client-deploy
Now you will see the Alert Rules go to state Active
(yellow) and the Fire
(red), which means the Alert was triggered and send to SLA Manager via the Alertmanager. See logs of Alertmanager to check if it recieved the triggered Alerts:
kubectl logs [SLA Manager Pod]
- Errors are not shown in the frontend yet (e.g. "ExpiredToken: The security token included in the request is expired" when connecting to AWS or: "ValidationError: Period must be 10, 30 or a multiple of 60" when setting false period for AWS SLO)
- When an SLO is edited, but its name is changed, a new SLO gets created and the old one is not deleted
- Network Load Balancer targets for AWS can currently not be loaded, which makes it impossible to create SLOs for AWS Network Load Balancers
- SlaRule Serialization: When shutting down sla-manager, SLA's are not persistent and such not available when restarting service
- Blackbox Exporter needs probe targets configured in blackbox-exporter-values.yaml, solution: Probe crd but setup not working yet