Skip to content

Project 3

Hrishikesh Paul edited this page Apr 14, 2021 · 23 revisions

Updates

Below are the updates to the application from Milestone 2:

  1. Added metadata extractor service to extract metadata from an image and store them in a database.
  2. Integrated kafka between the google drive service and the metadata service to establish a producer-consumer, asynchronous communication medium between the 2 services.
  3. Implemented a custom sign in and sign up feature for authentication due to the lack of a domain name. Google OAuth only accepts top-level domain names.
  4. Revamped the session service to also store the user's history and keep track of their actions on the website. These actions can be seen on the UI.
  5. Added the ability for the user to see their shared albums on the UI.
  6. Switched to MongoAtlas and Redis Cloud for persistent storages.
  7. Changed and improved the git workflow. Each service now has its own dev and main branch. The CI/CD are linked to these branches to deploy on the production or staging environments.
  8. Updated the overall architecture of the system to incorporate Kubernetes, Jenkins, and Jetstream.
  9. Incorporated production and staging version of the application.
  10. Configured an Ingress Controller to route external traffic and add a domain name to our server.

Updated Workflow

We updated our github workflow to have separate main and dev branches for each service. This allowed us to create a production version and staging version for our application, that are hosted on separated VMs and separate Kubernetes clusters. A push to any of these branches triggers a build on our Jenkins where the the docker image is built after running unit tests and then the updated image is pushed to Docker Hub. Jenkins also deploys the Kubernetes configuration on the master node of the respective Kubernetes cluster. Kubernetes uses the updated image from docker hub to create its pod, hence deploying the application.

Updated Workflow

Updated Architecture

The main changes to the architecture includes the addition of Kubernetes and deployment on a cloud (jetstream). To route the external traffic to our internal services, we have made use of an Ingress controller. This also gives us the ability to add a DNS to our cloud. We have also used MongoAtlas and Redis for persistent storages.

Updated Architecture

Containerizing Micro-services

All the micro-services have been containerized using Docker. These images are then uploaded to Docker Hub. We have incorporated production version and staging version for the images. Below are the links to the services' production docker hub images,

Below are the links to the services' staging docker hub images

We first used docker-compose locally to connect all the services and make sure they were working with Dockerfile configurations.

Establishing CI/CD

We have used Jenkins for our CI/CD that is deployed on a VM on jetstream. We have also set up a github webhook to automatically trigger builds when a push is made to specific branches. Each micro-service has its own Jenkinsfile. Below are the steps in our jenkins pipeline,

  1. Clone repository - getting the repository from github
  2. Prepare repository - due to our folder structure (with the files being in a sub-directory), we need to process the folders in order to access the files.
  3. Build repository - installing the dependencies using npm, mvn or pip.
  4. Test repository - running the unit tests for the services
  5. Building image - building the docker image using the Dockerfile
  6. Pushing image - pushing the built image to Docker Hub
  7. Deleting local image - cleaning up the local image so that dangling images do not consume space
  8. Deploy on kubernetes - ssh'ing in the kubernetes master node and applying the deployment/service config files to create a pod/service.

Jenkins is configured for 2 production and staging that deploys the application on our staging VM. Snippets of each are shown below,

Production

Production Jenkins

Staging

Staging Jenkins

Here is how our Jenkins instance can be accessed,

Continuous Deployment

For CD we tried using various plugins on Jenkins for Kubernetes but they were all not compatible with the current version of Jenkins. So we used a plugin called SSH Agent to first establish a connection between Jenkins and the Kubernetes cluster and then copying over the config YAML via scp and then applying the config so that these pods/services can be deployed on the cloud. Therefore, in order to set up CD for a new VM, one would need to manually configure the 2 VMs. See this link to add your Jenkins VM to connect to the Kubernetes cluster.

Unit Tests

Each service has a few unit tests that tests whether the service is up and running. We have tried to keep the unit tests simple, yet effective enough to depict an actual deployment scenario. These unit tests can be found in each of the services. The unit tests need to be passes in order for the services to be deployed.

Kubernetes

To create the Kubernetes deployment and service scripts, we used Minikube to test it out locally. Minikube is an application that allows us to locally create one node of Kubernetes. This way, we had the freedom to play around with the configurations and also debug it.

Each service has its on deployment/service config files that are applied on the Kubernetes master node. The gateway and the UI have external IP (LoadBalancer type service) ports as they should be able to be accessed from outside. Our rationale for making the Gateway externally available is so that applications that are hosted on another cloud service can still use the scrapbook APIs. All the other communication is internal, thus happens via their ClusterIP (internal) ports and will not be available to applications outside the Kuberenetes cluster. An example of a Kubernetes config YAML is given here.

Deployment

The steps to create VMs and configure Kubernetes are given here

VM Creation

Using this blog we have constructed Terraform scripts to automate the creation of VMs on Jetstream. Needless to say, we have ad to install and authenticate the openstack API to work on our local system. The tf file that we have used can be found here where had to specify the cluster name, floating ip, number of nodes, flavor of the VMs and the name of the network. The scripts automatically creates security groups too. Different security groups are created as the Kubernetes Master node requires different ports from the Worker nodes. By default TCP 22 port is open to allow SSHs. To use this script, one has to change the variables mentioned above.

Kubernetes Cluster Creation

Using this blog we have created Ansible scripts to set up the kubernetes cluster. The script uses the node created in the previous section, and installs Kubernetes on them. Then it assigns them roles using various security groups. The master node has port 6443 open for the Control Plane to be set up. The worker nodes have port 30000-32767 open. These are ports described in the Kubernetes documentation. By configuring the floating IP of the master node, the ansible script configures the Master and the Worker nodes.

Application Deployment

With the VMs created and Kubernetes set up along with the Master and the Worker nodes configured, the last step is to deploy the application on the cloud. For this we have created a this Bash script to automatically apply the kubernetes configurations on the server. This way, the full process of VM creation, Kubenrnetes configuration and application deployment is automated. Below are the steps to deploy the application after the VMs have been created and Kubernetes cluster configured.

# SSH into the master node of the Kubernetes cluster
$ git clone https://github.com/airavata-courses/scrapbook.git
$ cd scrapbook
$ sudo su
$ chmod +x deploy.sh
$ ./deploy.sh

Trigger a Production build

Make a push to the main branch and Jenkins will automatically deploy the whole application

Trigger a Staging build

Make a push to the develop branch and Jenkins will automatically deploy the whole application

Reflections

  • Before dockerizing, all the URLs in the services (such as the gateway) were all hard coded into the code. We realized that this wouldn't work and would be better to set it dynamically during build time. Therefore we changed all the variables to environment variables so that they can be passed either via the Dockerfile, docker-compose or the Kubernetes deployment script.

  • In order to set up github webhooks that would trigger builds on commits on Jenkins, we had to create a separate VM on Jetstream and deploy an instance of Jenkins on there. An added benefit to doing this is that we don't have to rely on any other 3rd party CI/CD cloud services (eg circleci, travis-ci etc) to have the openstack credentials and private/public keys. Since the VM is on jetstream, these credentails are stored on the VM itself.

  • Connecting Jenkins to the Kubernetes master node VM was very difficult. Most of the plugins for Kubernetes on Jenkis is outdated and is not compatible. We had to figure out a workaround for continuous deployment (mentioned above).

  • While writing the Kubernetes deployment scripts, learning how container ports work and how they're mapped with their services really helped in configuring our whole system.

Clone this wiki locally