Ingest Service Deployment

Deployment setup for the Ingestion Service on Kubernetes clusters.

Set up local environment

We have migrated to helm 3, make sure to install the correct package version (>=3) on your system.

Mac

git clone <this-repo-url>
Ensure your pip is running on python 3.
install needed software

brew install warrensbox/tap/tfswitch
tfswitch 0.13.5
pip install awscli
brew install aws-iam-authenticator
brew install kubernetes-cli
brew install kubectx
brew install kubernetes-helm
mkdir ~/.kube
brew install jq

Ubuntu

git clone <this-repo-url>
Install terraform with the terraform instructions.

If you install with sudo snap install terraform you may run into the error Error configuring the backend "s3": NoCredentialProviders: no valid providers in chain. Deprecated.

Install awscli: pip install awscli.
Install aws-iam-authenticator
Install kubectl 1.22: sudo snap install kubectl@1.22 --classic
Install kubectx and kubens.
Install helm: sudo snap install helm --classic
mkdir ~/.kube
Install jq, if required. sudo apt-get install jq

Configuring AWS connection

aws configure --profile embl-ebi
- See Quickly Configure ASW CLI for AWS Access Key ID & AWS Secret Access Key
- Set region to us-east-1
Edit your ./aws/config to look like this:

[profile embl-ebi]
role_arn = arn:aws:iam::871979166454:role/ingest-devops
source_profile = ebi
region = us-east-1

Access/Create/Modify/Destroy EKS Clusters

Access existing ingest eks cluster (aws)

These steps assumes you have the correct aws credentials and local environment tools set up correctly. This only has to be run one time.

source config/environment_ENVNAME where ENVNAME is the name of the environment you are trying to access
cd infra
make retrieve-kubeconfig-ENVNAME where ENVNAME is the name of the environment you are trying to access
kubectl, kubens, kubectx, and helm will now be tied to the cluster you have sourced in the step above.

How to access dashboard

These steps assumes you have the correct aws credentials + local environment tools set up correctly and that you have followed the instructions to access the existing cluster.

kubectx ingest-eks-ENVNAME where ENVNAME is the name of the cluster environment you are trying to access
Generate token (it will be copied to the clipboard): kubectl -n kube-system describe secrets/$(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}') | grep token: | cut -d: -f2 | xargs | pbcopy
Start the proxy: kubectl proxy
Browse to the dashboard endpoint
Choose Token and paste token from step 2 above

Create new ingest eks cluster (aws)

These steps assumes you have the correct aws credentials and local environment tools set up correctly. These steps will set up a new ingest environment from scratch via terraform and will also apply all kubernetes monitoring and dashboard configs, RBAC role and aws auth setup.

cp config/environment_template config/environment_ENVNAME where envname should reflect the name of the environment you are trying to create.
Replace all values marked as 'PROVIDE...' with the appropriate value
Ensure the aws profile name in this config is mapped to the name of the aws profile in your ~/.aws/config or ~/.aws/credentials/ path that has admin access to the relevant aws account.
Ensure the VPC IP in this config file is a valid and unique VPC IP value.
source config/environment_ENVNAME where ENVNAME reflects the name of the environment in the config file you created above
cd infra
make create-cluster-ENVNAME where ENVNAME is the name of the environment you are trying to create. This step will also deploy the backend services (mongo, redis, rabbit)
Follow the steps to access the kubernetes dashboard. Note that there is no server-side component (tiller) in helm 3.
Follow instructions below to deploy applications.

Modify and deploy updated EKS and AWS infrastructure

These steps assumes you have the correct aws credentials + local environment tools set up correctly and that you have followed the instructions to access the existing cluster.

source config/environment_ENVNAME where ENVNAME reflects the name of the environment you are trying to modify.
Update infra/eks.tf as desired.
cd infra
make modify-cluster-ENVNAME where ENVNAME reflects the name of the environment you are trying to modify.

Destroy ingest eks cluster (aws)

These steps assumes you have the correct aws credentials + local environment tools set up correctly and that you have followed the instructions to access the existing cluster. These steps will bring down the entire infrastructure and all the resources for your ingest kubernetes cluster and environment. This goes all the way up to the VPC that was created for this environment's cluster.

Follow setups 2-5 in 'Create new ingest eks cluster (aws)' if config/environment_ENVNAME does not exist where ENVNAME is the environment you are trying to destroy
source config/environment_ENVNAME where ENVNAME reflects the name of the environment in the config file you created above
cd infra
make destroy-cluster-ENVNAME where ENVNAME is the name of the environment you are trying to destroy Note: The system could encounter an error (most likely a timeout) during the destroy operation. Failing to remove some resources such as the VPC, network interfaces, etc. can be tolerated if the ultimate goal is to rebuild the cluster. In the case VPC, for example, the EKS cluster will just reuse it once recreated.

Reusing Undeleted Network Components

The Terraform manifest declares some network components like the VPC and subnets, that for some reason refuse to get delete during the destroy operation. This operation needs some work to improve, but at the meantime, a workaround is suggested below.

Force the subnet declarations to use the existent ones by overriding the cidr_block attribute of the aws_subnet.ingest_eks resource. To see the undeleted subnet components, use terraform show.

For example, in dev, the CIDR is set to 10.40.0.0/16. The Terraform manifest at the time of writing makes computations with this on build time that often results in 2 subnets 10.40.0.0/24 and 10.40.1.0/24 that could refuse deletion. To make the Terraform build reuse these components, the cidr_block attribute under the aws_subnet resource can be set to the following:

cidr_block        = "10.40.${count.index}.0/24"

Install and Upgrade Core Ingest Backend Services (mongo, redis, rabbit)

Install backend services (mongo, redis, rabbit)

cd infra
make deploy-backend-services-ENVNAME where ENVNAME is the name of the environment you are trying to create.

Upgrade backend services (mongo, redis, rabbit)

Coming soon

Install Ingest Monitoring Dashboard (Grafana, Prometheus)

source config/environment_ENVNAME
cd infra
make install-infra-helm-chart-ingest-monitoring
kubectl get secret aws-keys -o jsonpath="{.data.grafana-admin-password}" | base64 --decode

Copy the result to your clipboard

Navigate to https://monitoring.ingest.ENVNAME.archive.data.humancellatlas.org

Login with admin and the result from step 4

Upgrading Ingest Monitoring Dashboard

If you would like to change the dashboard for Ingest Monitoring, you must save the JSON file in this repo and deploy it.

Make the changes to the dashboard in any environment
Copy the dashboard's JSON model to the clipboard

dashboard settings (cog at top) -> JSON model

Replace the contents of infra/helm-charts/ingest-monitoring/dashboards/ingest-monitoring.json with the contents of your clipboard
source config/environment_ENVNAME
cd infra && make upgrade upgrade-infra-helm-chart-ingest-monitoring

The script will replace any references to e.g. prod-environment with the environment you are deploying to.

Vertical autoscaling

Vertical autoscaling can be deployed to give recommendations on CPU and memory constraints for containers. See infra/vertical-autoscaling/README.md.

Deploy CRON jobs

CRON jobs are located in cron-jobs/. Further details for deploying and updating CRON jobs are located in cron-jobs/README.md and details on individual CRON jobs are found in their helm chart's READMEs.

They can be deployed all at once by running:

source config/environment_ENVNAME
cd cron-jobs
./deploy-all.sh

Deploy and Upgrade Ingest Applications

Deployments are automatically handled by Gitlab but you can still manually deploy if required (see below). However, ontology is not deployed by Gitlab but there is a special command for deploying ontology (see below).

Manually deploy one kubernetes dockerized applications to an environment (aws)

Make sure you have followed the instructions above to create or access an existing eks cluster
source config/environment_ENVNAME
cd apps
make deploy-app-APPNAME image=IMAGE_NAME where APPNAME is the name of the ingest application. and IMAGE_NAME is the image you want to deploy For example, make deploy-app-ingest-core image=quay.io/ebi-ait/ingest-core:1c1f6ab9

Deploy ontology

See docs

Notes on Fresh Installation

Before running the script to redeploy all ingest components, make sure that secrets have been properly defined in the AWS security manager. At the time of writing, the following secrets are required to be defined in dcp/ingest/<deployment_env>/secrets to ensure that deployments go smoothly:

emails
staging_api_key (retrieve this from dcp/upload/staging/secrets)
exporter_auth_info
ingest-monitoring

Make sure you have followed the instructions above to create or access an existing eks cluster
Change the branch or tag in config/environment_ENVNAME if needed where ENVNAME is the environment you are deploying to.
cd apps
make deploy-all-apps where APPNAME is the name of the ingest application.

After Deployment

All requests to the Ingest cluster go through the Ingress controller. Any outward facing service needs to be mapped to the Ingress service for it to work correctly. This is set through the AWS Route 53 mapping.

The first step is to retrieve the external IP of the Ingress service load balancer. This can be done using kubectl get services -l app=nginx-ingress.
Copy the external IP and go the Route 53 service dashboard on the AWS Web console.
From the Hosted Zones, search for <deployment_env>.data.humancellatlas.org and search for Ingest related records.
Update each record to use the Ingress load balancer external IP as alias.
To ensure that each record are set correctly, run a quick test using the Test Record Set facility on the AWS Web console.

Deploy cloudwatch log exporter

Make sure you have followed the instructions above to create or access an existing eks cluster
Source the configuration file for the environment
Make sure the secrets api-keys and aws-keys are deployed substituted with valid base64 encoded values
cd infra and make install-infra-helm-chart-fluentd-cloudwatch

CI/CD Setup

Promote one application environment configurations to another (ie dev => integration)

Coming soon

Local Setup

Local deployment with Minikube

Coming soon

Accessing RabbitMQ Management UI

tldr: Use this command: kubectl port-forward rabbit-0 15672:15672

kubectl port-forward <localhost-port>:15672 https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/

Get the rabbit service local port kubectl get service rabbit-service
Get the rabbit service pod kubectl get pods | grep rabbit
Access the RabbitMQ Management UI kubectl port-forward rabbit-0 15672:15672

Accessing Mongo DB container

SSH into the container

kubectl exec -it mongo-0 -- sh

Using a MongoDB Client

setup port forwarding

kubectl port-forward mongo-0 27017:27017

connect to mongodb://localhost:27017/admin

Restoring Mongo DB backup

Download the latest compressed directory of backup from s3 bucket ingest-db-backup
Copy the backup to the mongo pod.

$ kubectl cp 2020-05-21T00_00.tar.gz staging-environment/mongo-0:/2020-05-21T00_00.tar.gz

SSH into the mongo pod. Verify the file is copied there.

$ kubectl exec -it mongo-0 -- sh

Extract dump files

$ tar -xzvf 2020-05-21T00_00.tar.gz

This will create a directory structure, data/db/dump/2020-05-21T00_00, which contains the output of the mongodump

Go to the backup dir and restore.

$ cd data/db/dump/
$ mongorestore 2020-05-21T00_00 --drop

For more info on the restoring data, please refer to https://github.com/ebi-ait/ingest-kube-deployment/tree/master/infra/helm-charts/mongo/charts/ingestbackup#restoring-data

Remove the dump files

$ rm -rf 2020-05-21T00_00

Accessing our PostgreSQL DB instance on AWS

Installing a PostgreSQL client to your local computer

You can use the open-source tool pgAdmin to connect to your RDS for PostgreSQL DB instance. You can download and install pgAdmin from http://www.pgadmin.org/.

Get the connection details for our PostgreSQL DB instance from AWS Secrets Manager

Sign in to the AWS Management Console and open the AWS Secrets Manager
In the filter type database and select the secret that corresponds with the environment you would like to connect to. For example: if you would like to connect to the PostgreSQL DB that is running in the dev cluster then select dcp/upload/dev/database
Go to the Secret value box and click on the Retrieve secret value button
We are interested in the value in the pgbouncer_uri field. You are going to see something similar: postgresql://username:password@hostname/database_name
Please note down the above 4 values (username, password, hostname, database_name) from this field

Using pgAdmin to connect to a RDS for PostgreSQL DB instance

Launch the pgAdmin application on your computer.
On the Dashboard tab, choose Add New Server.
In the Create - Server dialog box, type a name on the General tab to identify the server in pgAdmin.
On the Connection tab, type the following information from your DB instance using the 4 values you noted down above:
1. Add the hostname to the Hostname/address field
2. The port is the default value for PostgreSQL DB: 5432
3. Add the database_name to the Maintenance database field
4. Add the username to the Username field
5. Add the password to the Password field
6. Choose Save.
To access a database in the pgAdmin browser, expand Servers, the DB instance, and Databases. Choose the DB instance's database name.
To open a panel where you can enter SQL commands, choose Tools, Query Tool.

There is also an article about how to connect to PostgreSQL DB instances running on AWS here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ConnectToPostgreSQLInstance.html

Name		Name	Last commit message	Last commit date
Latest commit History 1,665 Commits
apps		apps
config		config
cron-jobs		cron-jobs
images		images
infra		infra
integration		integration
jobs		jobs
production		production
release		release
sandbox		sandbox
staging		staging
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
README.md		README.md
util		util
util.README.md		util.README.md

ebi-ait/ingest-kube-deployment

Folders and files

Latest commit

History

Repository files navigation

Ingest Service Deployment

Set up local environment

Mac

Ubuntu

Configuring AWS connection

Access/Create/Modify/Destroy EKS Clusters

Access existing ingest eks cluster (aws)

How to access dashboard

Create new ingest eks cluster (aws)

Modify and deploy updated EKS and AWS infrastructure

Destroy ingest eks cluster (aws)

Reusing Undeleted Network Components

Install and Upgrade Core Ingest Backend Services (mongo, redis, rabbit)

Install backend services (mongo, redis, rabbit)

Upgrade backend services (mongo, redis, rabbit)

Install Ingest Monitoring Dashboard (Grafana, Prometheus)

Upgrading Ingest Monitoring Dashboard

Vertical autoscaling

Deploy CRON jobs

Deploy and Upgrade Ingest Applications

Manually deploy one kubernetes dockerized applications to an environment (aws)

Deploy ontology

After Deployment

Deploy cloudwatch log exporter

CI/CD Setup

Promote one application environment configurations to another (ie dev => integration)

Local Setup

Local deployment with Minikube

Accessing RabbitMQ Management UI

Accessing Mongo DB container

SSH into the container

Using a MongoDB Client

Restoring Mongo DB backup

Accessing our PostgreSQL DB instance on AWS

Installing a PostgreSQL client to your local computer

Get the connection details for our PostgreSQL DB instance from AWS Secrets Manager

Using pgAdmin to connect to a RDS for PostgreSQL DB instance

About

Resources

Stars

Watchers

Forks

Languages