Skip to content

CICD Setup

hbh7 edited this page Apr 27, 2022 · 2 revisions

Kubernetes Overview

Poll Buddy CICD Setup Overview

The Poll Buddy CICD system automatically runs on (nearly) every commit and pull request submitted to the main repository, and generates a test / development instance on our deployment server. Each instance is fully independent of every other instance, and can be used to test out features and functionality outside your local development environment. It will also enable certain services like RPI's CAS login system to work properly, which cannot be enabled during local testing, unfortunately.

This system is built upon Docker images and Kubernetes configuration files which can be found here, and these run on a Kubernetes cluster. Legacy Docker-Compose based setups can be found at the root of the repository for other purposes, such as a simple "mock" production setup, as well as more useful development, testing, and linting configurations. Functionality from the PollBuddy.app repository is incorporated to support the full system.

CICD actions are initiated through Github Workflows / Actions, which can be found in the workflows folder of the main repository. Unfortunately, limitations of the Github workflows platform mean that outside contributions in the form of forked repositories and pull requests into this repository will not be able to create any development instances. This is because the workflows rely on repository secrets, which are not shared with forks for obvious reasons like security.

Kubernetes Overview

Kubernetes is a very useful tool for managing the deployment and management of application services. It's based around the concept of containerization, where you install and set up your application to run in small isolated environments to improve security, scalability, and manageability. Management is done through a series of YAML configuration files to allow for declarative configuration and easy automation. It's designed to be an extremely portable, easily extensible, and fully open source.

Containers are very similar in concept to a virtual machine, but they aim to solve some of the problems with traditional virtual machines, mainly their resource usage. Containers share the host operating system kernel, and only bundle resources into the container image if they are necessary to support the running application. This helps to keep file size down, but also to eliminate running any background services or other processes that don't support the application. Container images are how containers know what to run. They are built ahead of time from source code and possibly from existing images to build a fully self-contained environment for software to run in. Since they have everything they need to run built into them, they are therefore OS-agnostic, and an image built on one system will happily run on any other system in exactly the same manner.

In general, containers are used as the smallest unit of abstraction on top of the application. From there, containers are grouped into Kubernetes Pods, which share some resources to help manage the application's environmental needs. Pods run on Kubernetes Nodes, and can be grouped into Services and Deployments to take advantage of additional functionality built into Kubernetes.

Poll Buddy Kubernetes Configuration Overview

For each part of the application, there is at least one YAML file for the deployment and service required by each part. Depending on the reusability and secrecy, there are also additional YAML files to hold other configuration items and secrets.

MongoDB Database

This setup relies on the use of a "sidecar" container that runs alongside all the Mongo containers, one of each per pod. This sidecar handled automatically notifying Mongo of the change in topology, and made the entire process automatic, which is ideal. We wanted to have the capabilities of auto scaling within Mongo should the need arise, so we went with the sidecar solution and 3 database replicas to make sure the database would never be a bottleneck. Since Mongo stores data, we needed to use a StatefulSet to configure the pods, with several volume mounts for storing and providing data. We needed several ConfigMaps, one to store the initialization script that creates the database users, a second for storing a script that configures various file system permissions, a third for storing a key used to secure inter-node communications between the Mongo instances, and lastly, a fourth for storing the usernames for various users, as well as the database name. Additionally, a SecretMap is configured for storing the passwords of the Mongo users. Lastly, a persistent volume claim template was created to store each replica's local database copy, one volume per replica. The reason for the ConfigMaps being separated instead of a single ConfigMap is because of how many of them were mounted as files within the container. Since only one mount point can be specified for each volume, and each file needed to go in a different location, multiple ConfigMaps were needed.

Since the database needed to be accessed by the backend, a service was created to direct traffic at any of the database pods. Additionally, the sidecar container needs access to the cluster to detect other instances and join them, so a service account, cluster role, and cluster role binding were required to allow this access.

InfluxDB Reporting Database

The InfluxDB database does not feature replication as the MongoDB database does, so it uses a standard deployment with a single persistent volume claim. Additionally, it has a ConfigMap to store an initialization script that creates the database, a user to access the database, and a data retention policy. There is also another ConfigMap for storing the database name and user account names. Lastly, there is a SecretMap to store the passwords for the two user accounts, and a service was created so the backend and reporting frontend could reach the database.

Grafana Reporting Dashboard

The Grafana reporting dashboard only has a deployment and a service, since no data needed to be stored as all information was built directly into the container image or passed through environment variables. An environment variable specified the service name of the reporting database, and the ConfigMap and SecretMap used by the reporting database were reused to provide the credentials required to connect.

Backend

A deployment and service were created to run and access the backend. The backend requires access to the MongoDB database and the InfluxDB database, and therefore requires knowledge about where to locate these. Environment variables were used to configure where to find the databases, the frontend's URL for authentication callbacks, and other database credentials and application secrets. Some of these were sourced from the existing database ConfigMaps and SecretMaps, but the frontend URL was sourced from a backend-specific ConfigMap, and the HTTP session secret used by the Express-Session Node module was stored in a backend-specific SecretMap.

React Frontend and NGINX Web Proxy

Since in production mode, the frontend's React code gets compiled to standard HTML, CSS, and JavaScript, which can then be served by a single NGINX instance that also handles sending requests to the backend and to the reporting frontend. Therefore, only one deployment and service were required. The frontend only needed two environment variables to provide all necessary information for its operation, one to tell it where to find the backend, and one to tell it where to find the reporting frontend.

The NGINX configuration did need to be modified from the Docker-based setup because of how NGINX handles DNS resolutions and environment variables. NGINX uses its own DNS resolver settings instead of the system ones, and must be told what server to use to resolve names, so this had to be set to the Kubernetes DNS service name. An additional module was required to enable environment variable support for the backend and reporting service paths as well.

Monitoring and Managing

Health Probes

Health probes are how Kubernetes monitors the internal state of the applications running within pods. Probes are created on containers, and pods with unhealthy containers are terminated and recreated. There are three main types of probes: startup, readiness, and liveliness. The startup probe monitors for when the application in the container has finished starting and doing any initialization tasks, and is ready to accept connections or perform any tasks. The readiness probe then takes over to ensure the application continues to be ready to accept connections and perform tasks. Lastly, the liveness probe is used to make sure the application is still responsive and functioning properly. The main difference between the readiness and liveness probes is that an application may not be ready to process requests, but still be alive, functioning, and moving towards being ready to process requests.

Each container in Poll Buddy has all three probes configured on it, with varying endpoints to test depending on what the application is. In general, Poll Buddy does not make much distinction between any of the three, so they are often identical.

Development Site and Management Server

The development site was previously just a frontend to proxy between the master instance and the development instances, but has been transformed into a more capable instance management controller to manage the instances within Kubernetes.

This portion has two containers within its pod, an NGINX frontend that routes between the master and development instances, and a container running Node to handle managing the instances. This container displays the development site homepage and exposes API endpoints for creating, starting, stopping, and deleting instances. Github Actions can use these API endpoints to automatically create and delete instances as part of the CICD system. The development server also enables users to authenticate with Github, where it will then confirm their membership in the Poll Buddy organization to enable full control over the development instances.

The development instance management server has a service account like the Mongo setup does, but this account has more permissions so that the server can deploy and manage entire new instances. A benefit of this server being in the same pod as the NGINX frontend is that Kubernetes containers within the same pod share certain resources, such as a process namespace. By having a shared volume between the two containers, NGINX configuration files can be generated by the development server and placed in a folder for NGINX to read, and then a signal can be sent by the development server to the NGINX process telling it to reload its configuration files, activating new instances without any downtime.

Deployment Monitoring

Kubernetes uses probes to ensure that containers and pods are functioning as expected. They are also used to ensure that deployments and rollouts have succeeded. Kubernetes by itself does not support automatic rollbacks at this time, and instead requires you to make use of the "kubectl rollout status" command to monitor the deployment's progress, or check the deployment's status through the condition statuses with the "kubectl describe deployment" command. Specifically, the status of type "Progressing" will have a status of "false" and a reason of "ProgressDeadlineExceeded" if the deployment exceeds its default time limit of 600 seconds (set by the "progressDeadlineSeconds" spec).

Kubernetes will not act on this status in any way other than continually attempting the deployment. The Kubernetes documentation claims that automatic rollback will be implemented in the future, but since it is currently not implemented, it must be manually implemented external to Kubernetes. In Poll Buddy's configuration, the development instance management server run a shell script to verify the deployment succeeded. First, the script waits 600 seconds for the deployment to exceed the timeout, plus an extra 30 seconds to account for any delays that may cause the script to resume before the rollout actually exceeds 600 seconds. Then, it uses the rollout status command to confirm the rollout succeeded, or to roll back the rollout if it has not.

Other Notes

Each component of the system has a number of labels associated with it to describe what the component is, how it fits into the larger system, and what type of instance it is (development or production, plus some extra information if it is a development instance). Some of these labels are simply for clarity to human operators, while others are used by the Kubernetes system and other associated systems to manage the instances.

Further Reading and Information

For more information about the inner workings of Kubernetes and how this setup was created, please refer to this white paper on the topic. If you have any questions, please feel free to open an issue to let us know where we need to clarify the documentation.