# Pre-requisites

Before we get started, first make sure to install all the required tools. We provide two lists below, one needed for setting up the testbed. And one for developing code to use with the testbed. Feel free to skip the installation of the second list, and return in a later point in time.

Make sure to install a recent version of each of the dependencies.

 * GCloud SDK
    - Follow the installation instructions [here](https://cloud.google.com/sdk/docs/install)
    - Intialize the SDK with `gcloud init`
    - ⚠️ Run the command `gcloud auth application-default login`
        - ℹ️ We need to run this command in order to utilize your login credentials programmatically with terraform. This is needed as we will use these to impersonate a service account during the creation and setup of the Kubernetes cluster.
    - ⚠️ Run the command `gcloud components install beta`
        - ℹ️ We need to run this command to list the billing account ID's and enable billing. Currently, these features fall under beta access.
 * Kubectl
 * Helm
 * Terraform
 * Python3.9
   * Jupyter
        ```bash
        pip3 install jupyter
        ```
   * bash_kernel
        ```bash
        pip3 install bash_kernel
        python3 -m bash_kernel.install
        ```

For development, the following tools are needed/recommended:

 * Docker (>= 18.09).
    - If you don't have experience with using Docker, we recommend following [this](https://docs.docker.com/get-started/) tutorial.
 * Python3.9
 * pip3
 * JetBrains PyCharm


# Preparation

To make sure we can request resources on Google Cloud Platform (GCP), perform the following;

1. Create a GCP account on [https://cloud.google.com](https://cloud.google.com), using a Google account
2. Redeem your academic coupon on GCP, see Brightspace for information on obtaining the $\$50 academic coupon, or use the free $\$300 credits for new users provided by Google.

# Deployment

## Getting started

First we will set a few variables. If you change any of these, make sure to change the corresponding variables as well in;

* [`terraform-gke/variables.tf`](terraform-gke/variables.tf)
* [`terraform-dependencies/variables.tf`](terraform-dependencies/variables.tf)



In [None]:
ACCOUNT_ID="terraform-iam-service-account"
PROJECT_ID="qpecs-fltk-2022"
PRIVILEGED_ACCOUNT_ID="${ACCOUNT_ID}@${PROJECT_ID}.iam.gserviceaccount.com"
CLUSTER_NAME="fltk-testbed-cluster"
REGION="us-central1"

## Project creation

Next, we create a project using the `PROJECT_ID` variable, and get all the billing account information.

In [None]:
gcloud projects create $PROJECT_ID --set-as-default
gcloud beta billing accounts list # Copy the Account ID of the account

Copy the billing account identifier, e.g. `015594-41687F-092941`, and assign to the variable in the cell below

In [None]:
BILLING_ACCOUNT="015594-41687F-092941"

Setup billing and enable services, this will allow us to create a GKE cluster (Google managed Kubernetes cluster), and push and pull containers to our private container repo.

In [24]:
# Setup billing to project
gcloud beta billing projects link $PROJECT_ID --billing-account $BILLING_ACCOUNT
# Enable services now billing is enabled
gcloud services enable compute container --project $PROJECT_ID

billingAccountName: billingAccounts/015594-41687F-092941
billingEnabled: true
name: projects/qpecs-fltk-2022/billingInfo
projectId: qpecs-fltk-2022
Operation "operations/acat.p2-507430712695-6f66cea7-2ce4-4190-b9eb-4a0e403e4744" finished successfully.


## Creating a service-account

Create service account that has the minimum set of permissions for creating and managing a cluster. This service account
will be used to create the cluster, and deploy the dependencies that we use.

During the deployment we will make use of impersonation, to let *your* account utilize the service-account. For more information about this practise, see also [this](https://cloud.google.com/blog/topics/developers-practitioners/using-google-cloud-service-account-impersonation-your-terraform-code) blog by Google.

In [5]:
function enable_gcp_role () {
  gcloud projects add-iam-policy-binding \
    $PROJECT_ID \
    --member="serviceAccount:$PRIVILEGED_ACCOUNT_ID" \
    --role="roles/$1"
}

# Create service-account
gcloud iam service-accounts create $ACCOUNT_ID --display-name="Terraform service account" --project ${PROJECT_ID}

# Allow the service account to use the the set of roles below.
enable_gcp_role "compute.viewer"
enable_gcp_role "compute.securityAdmin"
enable_gcp_role "container.clusterViewer"
enable_gcp_role "container.clusterAdmin"
enable_gcp_role "container.developer"
enable_gcp_role "iam.serviceAccountAdmin"
enable_gcp_role "iam.serviceAccountUser"
enable_gcp_role "compute.networkAdmin"

ERROR: (gcloud.iam.service-accounts.create) Resource in projects [qpecs-fltk-2022] is the subject of a conflict: Service account terraform-iam-service-account already exists within project projects/qpecs-fltk-2022.
- '@type': type.googleapis.com/google.rpc.ResourceInfo
  resourceName: projects/qpecs-fltk-2022/serviceAccounts/terraform-iam-service-account@qpecs-fltk-2022.iam.gserviceaccount.com
Updated IAM policy for project [qpecs-fltk-2022].
bindings:
- members:
  - serviceAccount:service-507430712695@compute-system.iam.gserviceaccount.com
  role: roles/compute.serviceAgent
- members:
  - serviceAccount:terraform-iam-service-account@qpecs-fltk-2022.iam.gserviceaccount.com
  role: roles/compute.viewer
- members:
  - serviceAccount:service-507430712695@container-engine-robot.iam.gserviceaccount.com
  role: roles/container.serviceAgent
- members:
  - serviceAccount:service-507430712695@containerregistry.iam.gserviceaccount.com
  role: roles/containerregistry.ServiceAgent
- members:
  - s

## Enable impersonation
With the service account created, we must enable impersonation, to allow the main account of the project to make use of the service account. For more information see also the [`add-iam-policy-binding`](https://cloud.google.com/sdk/gcloud/reference/iam/service-accounts/add-iam-policy-binding) reference.

Assign your `google_account` mail to the `OWNER_MAIL` variable, and run the command box below.

In [6]:
OWNER_MAIL="jargsnork@gmail.com"
gcloud iam service-accounts add-iam-policy-binding $PRIVILEGED_ACCOUNT_ID \
 --member="user:$OWNER_MAIL" \
 --role=roles/iam.serviceAccountTokenCreator \
 --project $PROJECT_ID

Updated IAM policy for serviceAccount [terraform-iam-service-account@qpecs-fltk-2022.iam.gserviceaccount.com].
bindings:
- members:
  - user:jargsnork@gmail.com
  role: roles/iam.serviceAccountTokenCreator
etag: BwXm1QDSgg0=
version: 1


## Creating a Google managed cluster (GKE)
To create the cluster, first change the active directory to the `terraform-gke` directory.

In [None]:
cd terraform-gke

Init the directory, to initialize the Terraform module.

In [None]:
terraform init -reconfigure

Next, we can check whether we can create a cluster. No warnings or errors should occur during this process. It may take a while to complete.

In [12]:
terraform plan

[0m[1mdata.google_service_account_access_token.default: Reading...[0m[0m
[0m[1mdata.google_service_account_access_token.default: Read complete after 0s [id=projects/-/serviceAccounts/terraform-iam-service-account@qpecs-fltk-2022.iam.gserviceaccount.com][0m
[0m[1mdata.google_client_config.default: Reading...[0m[0m
[0m[1mdata.google_client_config.default: Read complete after 0s [id=projects/qpecs-fltk-2022/regions//zones/][0m
[0m[1mmodule.gke.data.google_container_engine_versions.region: Reading...[0m[0m
[0m[1mmodule.gke.data.google_compute_zones.available: Reading...[0m[0m
[0m[1mmodule.gke.data.google_container_engine_versions.region: Read complete after 0s [id=2022-08-22 14:12:19.966472 +0000 UTC][0m
[0m[1mmodule.gke.data.google_compute_zones.available: Read complete after 0s [id=projects/qpecs-fltk-2022/regions/us-central1][0m
[0m[1mmodule.gke.data.google_container_engine_versions.zone: Reading...[0m[0m
[0m[1mmodule.gke.data.google_container_engine_ve

When the previous command completes successfully, we can start the deployment. Depending on any changes you may have done, this might take a while.

By default, this will create a private zonal cluster consisting of two node-pools.

⚠️ Any changes to make the deployment to a regional cluster (even with all nodepools only spanning a single region), an additional free of 0.10 USD/hour will be billed with minute increments.

In [None]:
terraform apply -auto-approve

Next, we add cluster credentials (so you can interact with the cluster through `kubectl` an `helm`).

In [None]:
# Add credentials for interacting with cluster via kubectl
gcloud container clusters get-credentials $CLUSTER_NAME --region $REGION --project $PROJECT_ID

## Installing dependencies
Lastly, we need to install the dependencies on our cluster. First change the directories, and then run the `init`, `plan` and `apply` commands as we did for creating the GKE cluster.

In [None]:
cd ../terraform-kubeflow

Init the directory, to initialize the Terraform module.

In [None]:
terraform init -reconfigure

Check to see if we can plan the deployment. This will setup the following:

* Kubeflow training operator (used to deploy and manage PyTorchTrainJobs programmatically)
* NFS-provisioner (used to enable logging on a persistent `ReadWriteMany` PVC in the cluster)


In [17]:
terraform plan

[0m[1mmodule.kubeflow.data.template_file.config_yaml: Reading...[0m[0m
[0m[1mmodule.kubeflow.data.template_file.config_yaml: Read complete after 0s [id=e70434e7cbdc7b7b42ddd875f3c0aa739f8612543152222cc4e6bfae9394b994][0m
[0m[1mmodule.kubeflow.data.kustomization_build.kserve-web-app: Reading...[0m[0m
[0m[1mmodule.kubeflow.data.kustomization_overlay.knative-eventing: Reading...[0m[0m
[0m[1mmodule.kubeflow.data.kustomization_overlay.istio-ingress: Reading...[0m[0m
[0m[1mmodule.kubeflow.data.kustomization_build.istio-resources: Reading...[0m[0m
[0m[1mmodule.kubeflow.data.kustomization_build.letsencrypt-cluster-resources: Reading...[0m[0m
[0m[1mmodule.kubeflow.data.kustomization_build.volumes: Reading...[0m[0m
[0m[1mmodule.kubeflow.data.kustomization_build.profiles: Reading...[0m[0m
[0m[1mmodule.kubeflow.data.kustomization_build.katib: Reading...[0m[0m
[0m[1mmodule.kubeflow.data.kustomization_overlay.user-namespace: Reading...[0m[0m
[0m[1mmodule.

In [None]:
terraform apply -auto-approve

## Testing the deployment

To make sure that the deployment went OK, we can run the following command to test whether we can use Pytorch-Training operators.

This will create a simple deployment using a Kubeflow pytorch example job.

In [None]:
kubectl create -f https://raw.githubusercontent.com/kubeflow/training-operator/master/examples/pytorch/simple.yaml