Skip to content

MAYHEM-Lab/Depot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Depot is an open-source collaborative data platform with the objective of facilitating cross-organizational creation, distribution, and consumption of dynamic digital assets.

Architecture

Depot Architecture

Installation

Depot's artifacts are packaged as Docker images. We use sbt to manage our builds and package JVM applications. Additionally, we use yarn to bundle and package static frontend assets.

Requirements

Docker Images

  1. Clone this repository
git clone https://github.com/MAYHEM-Lab/Depot
  1. Navigate to repository
cd Depot
  1. Build images via sbt
sbt docker

The resulting images will be made available in the local Docker image repostiory with the prefix racelab/depot-.

Deployment

Depot integrates tightly with a Eucalyptus cloud environment to manage storage and access control policies and as such requires administrative access to a Eucalyptus deployment. It does not yet support running on other cloud providers or baremetal out-of-the-box.

Depot uses GitHub OAuth for identity management and requires credentials to an existing GitHub OAuth application.

Setup

These instructions outline running a non-highly-available, barebones, Depot deployment in Eucalyptus, facilitated by Kubernetes. They are for Ubuntu 20.04 hosts running a Linux kernel version of at least 5.4.

Requirements

Eucalyptus Setup

  1. Open ports required by Depot and Kubernetes
euca-authorize default -p 80
euca-authorize default -p 443
euca-authorize default -p 6443
euca-authorize default -p 30080
  1. Create load balancer
eulb-create-lb -l "lb-port=80, protocol=HTTP, instance-port=30080, instance-protocol=HTTP" -z <ZONE> depot-frontend
  1. Provision Ubuntu 20.04 VMs. This example provisions 6 instances.
euca-run-instances emi-0222560f -t m1.large -n 6 -k <key-pair> -z <ZONE>
  1. Wait until all instances are reported as running via euca-describe-instances
  2. Select one instance for MySQL, one instance for RabbitMQ, and the rest to form the Kubernetes cluster. These instances can be named with a user-friendly name for convenience:
euca-create-tags <instance1> --tag Name=depot-mysql
  1. Register Kubernetes instances with frontend load balancer
eulb-register-instances-with-lb --instances <k8s-instance1>,<k8s-instance2>,... depot-frontend

MySQL

  1. Connect to the instance selected as the MySQL host
ssh ubuntu@<instance-public-ip>
  1. Clone this repository
git clone https://github.com/MAYHEM-Lab/Depot
  1. Run the MySQL installation script as root.
cd Depot
chmod 750 ./deploy/platform/mysql/install.sh
sudo ./deploy/platform/mysql/install.sh
  1. Keep track of the MySQL credentials output at the end.

RabbitMQ

  1. Connect to the instance selected as the RabbitMQ host
ssh ubuntu@<instance-public-ip>
  1. Clone this repository
git clone https://github.com/MAYHEM-Lab/Depot
  1. Run the RabbitMQ installation script as root.
sudo ./deploy/platform/rabbitmq/install.sh
  1. Keep track of the RabbitMQ credentials output at the end.

Kubernetes

Common
  1. SSH into each instance selected as a Kubernetes host
ssh ubuntu@<instance-public-ip>
  1. Clone this repository
git clone https://github.com/MAYHEM-Lab/Depot
  1. Run the Kubernetes common installation script
sudo ./deploy/platform/k8s/common.sh

From the instances selected as Kubernetes hosts, select one to serve as the cluster's master node.

Master
  1. SSH into the instance selected as the Kubernetes master
ssh ubuntu@<master-instance-public-ip>
  1. Clone this repository
git clone https://github.com/MAYHEM-Lab/Depot
  1. Run the Kubernetes master installation script
sudo ./deploy/platform/k8s/master.sh
  1. Save the generated output command resulting from installation script invocation
  2. Change owner of Kubernetes credential file to ubuntu:
chown ubuntu:ubuntu /etc/kubernetes/admin.conf
  1. On local machine (SSH client), copy the Kubernetes credentials over to authorize kubectl:
mkdir -p ~/.kube
scp ubuntu@<master-instance-public-ip>:/etc/kubernetes/admin.conf ~/.kube/config
Worker
  1. SSH into the instance selected as the Kubernetes master
ssh ubuntu@<master-instance-public-ip>
  1. Run the generated command from installation of Kubernetes master
echo "...." >> /etc/hosts && kubeadm join ...

Services

In addition to the MySQL and RabbitMQ credentials from the aforementioned steps, you will need your Github OAuth App credentials and Eucalyptus access keys to configure and deploy the services.

  1. Generate the Depot access key. Store this securely somewhere - it is used to interact with the Depot REST API in an administrative manner.
access_key=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 13 ; echo '')
  1. Generate the Depot JWT key. This will be used to sign user tokens. Do not save or distribute this.
jwt_secret=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 13 ; echo '')
  1. Create Kubernetes secrets
kubectl create secret generic mysql-credentials \ 
  --from-literal=host="$mysql_host" \
  --from-literal=username="$mysql_username" \
  --from-literal=password="$mysql_password"

kubectl create secret generic rabbitmq-credentials \
  --from-literal=host="$rabbitmq_host" \
  --from-literal=username="$rabbitmq_username" \
  --from-literal=password="$rabbitmq_password"

kubectl create secret generic github-credentials \
  --from-literal=client_id="$github_client_id" \
  --from-literal=client_secret="$github_client_secret"

kubectl create secret generic auth-keys \
  --from-literal=jwt_secret="$jwt_secret" \
  --from-literal=access_key="$access_key"

kubectl create secret generic cloud-credentials \
  --from-literal=access_key="$cloud_access_key" \
  --from-literal=secret_key="$cloud_secret_key"
  1. Create Kubernetes deployments and services
kubectl apply -f deploy/system/manager.yaml
kubectl apply -f deploy/system/notebook-router.yaml
kubectl apply -f deploy/system/frontend.yaml
  1. After the Eucalyptus load balancer registers and discovers the frontend service, the Depot platform should be available at the load balancer's DNS name. This process may take a few minutes.

User Clusters

Clusters used by users and organizations to dispatch transformations and execute notebooks are deployed as self-contained Kubernetes namespaces.

Creation

User clusters are managed with Depot's clusterctl tool.

  1. Identify Depot endpoint and admin access key. If the steps above were followed, the endpoint is the DNS name of the Eucalyptus load balancer and the access key should have been securely stored.
  2. Identify entity name for which the cluster should be created (username or organization name)
  3. Create cluster
./deploy/cluster/clusterctl -k <key> -s <endpoint> create <entity> <cluster-name>
  1. The created cluster's Spark capacity can be managed by modifying the spark-worker deployment in the cluster's namespace.