# README (Ignore if you are running on Mac/Linux)

If you are running on Windows, make sure you have started the Jupyter Notebook in a Bash shell.
Moreover, all the requirements below must be installed in this Bash (compatible) shell.

This can be achieved as follows:

1. Enable and install WSL(2) for Windows 10/11 [official documentation](https://docs.microsoft.com/en-us/windows/wsl/install)
    * On newer builds of W10/11 you can install WSL by running the following command in an *administrator* PowerShell terminal. Which will install by default an Ubuntu instance of WSL.
    ```bash
   wsl --install
    ```
2. Start the Ubuntu Bash shell by searching for `Bash` under Start, or by running `bash` in a (normal) PowerShell terminal.

Using a Bash terminal as started under step 2 above, you can install the Requirements as described below as if you are running it under Linux or Ubuntu/Debian.

## Requirements
These requirements may also be installed on Windows, however, development has only been tested on Linux/macOS.

Before we get started, first make sure to install all the required tools. We provide two lists below, one needed for setting up the testbed. And one for developing code to use with the testbed. Feel free to skip the installation of the second list, and return at a later point in time.


### Deployment

 > ⚠️ All dependencies must be installed in a Bash-compatible shell. For Windows users also see [above](#read-me)
Make sure to install a recent version of each of the dependencies.


 * (Windows only) Install every dependency in a Windows Subsystem for the Linux, Bash shell (see also README above).
 * MiniKube
 * Kubectl (>= 1.22.0)
 * Helm (>= 3.9.4)
 * Terraform (>= 1.2.8)
 * Python3.9/10
   * jupyter, ipython, bash_kernel
```bash
pip3 install -r requirements-jupyter.txt
python3 -m bash_kernel.install
```

### Development
For development, the following tools are needed/recommended:

 * Docker (>= 18.09).
    - If you don't have experience with using Docker, we recommend following [this](https://docs.docker.com/get-started/) tutorial.
    - ⚠️Make sure that you have BuildX installed, follow the official GitHub [page](https://github.com/docker/buildx#installing).
 * Python3.9
 * pip3
 * (Recommended) JetBrains PyCharm

# Preparation

To make sure we can request resources on Google Cloud Platform (GCP), perform the following;

1. Make sure to use the `Bash` kernel, not a Python or other kernel. For those on windows machines, make sure to launch the `jupyter notebook` server from a bash-compliant command line, we recommend Windows Subsystem for Linux.

⚠️ Make sure to run this Notebook within a cloned repository, not standalone/downloaded from GitHub. This allows you to see changes you made more easily!


# Deployment

⚠️ This notebook assumes that commands are executed in order. Executing the provided commands multiple times should not result in issues. However, re-running cells with `cd` commands, or altering cells (other than variables as instructed) may result in unexpected behaviour.

## Getting started

First, we will set a few variables used **throughout** the project. We set them in this notebook for convenience, but they are also set to some example default values in configuration files for the project. If you change any of these, make sure to change the corresponding variables as well in;

* [`../terraform/terraform-gke/variables.tf`](../terraform/terraform-gke/variables.tf)
* [`../terraform/terraform-dependencies/variables.tf`](../terraform/terraform-dependencies/variables.tf)


> ⚠️ As you have changed the `PROJECT_ID` parameter to a unique project name, also change the `project_id` variable in the following files. This allows you to run `terraform apply` without having to override the default value for the project.

> ℹ️ Any variable changed here can also be provided to `terraform` using the `-var` flag, i.e.  `-var terraform_variable=$BASH_VARIABLE`. An example for setting the `project_id` variable is also provided later.

In [None]:
# VARIABLES THAT NEEDS TO BE SET

TERRAFORM_DEPENDENCIES_DIR="../terraform/terraform-dependencies-local"

# In case you want to match the deployment between your real cluster and MiniKube,
# change the PROJECT_ID variable below.

DOMAIN="grc.io"
PROJECT_ID="test-bed-fltk"
IMAGE='fltk'

IMAGE_NAME="${DOMAIN}/${PROJECT_ID}/${IMAGE}:latest"

## Starting MiniKube
First, let us make sure that the MiniKube cluster is started properly. Additionally, we will activate the local clusters' docker environment, allowing us to directly build containers to be used within the cluster.

In [None]:
# Start the cluster
minikube start

# This will activate the docker environment for the current shell.
# When deploying locally, make sure to build the (tagged) docker container with this
# environment active, as otherwise the container might contain old code/not available.
eval $(minikube docker-env)



When the previous command completes successfully, we can start the deployment. Depending on any changes you may have done, this might take a while.

## Installing dependencies
Lastly, we need to install the dependencies on our cluster. First change the directories, and then run the `init`, `plan` and `apply` commands as we did for creating the GKE cluster.

Init the directory, to initialize the Terraform module.

In [None]:
# Only run this command once during the setup
terraform -chdir=$TERRAFORM_DEPENDENCIES_DIR init -reconfigure

Check to see if we can plan the deployment. This will setup the following:

* Kubeflow training operator (used to deploy and manage PyTorchTrainJobs programmatically)
* NFS-provisioner (used to enable logging on a persistent `ReadWriteMany` PVC in the cluster)


In [None]:
# Perform a dry-run
terraform -chdir=$TERRAFORM_DEPENDENCIES_DIR plan

When the previous command completes successfully, we can start the deployment. This will install the NFS provisioner and Kubeflow Training Operator dependencies


In [None]:
# Install all dependencies
terraform -chdir=$TERRAFORM_DEPENDENCIES_DIR apply -auto-approve

## Deploying extractor

Lastly, we deploy the extractor pod, which also provides PVCs which can be used for artifact retrieval.

Retrieval can be done by running

```bash
EXTRACTOR_POD_NAME=$(kubectl get pods -n test -l "app.kubernetes.io/name=fltk.extractor" -o jsonpath="{.items[0].metadata.name}")
kubectl cp -n test $EXTRACTOR_POD_NAME:/opt/federation-lab/logging ./logging
```

For copying from the extractor path `/opt/federation-lab/logging` to a directory locally named `logging`.

First build the docker container, following the instructions of the [readme](https://github.com/JMGaljaard/fltk-testbed#creating-and-uploading-docker-container).


N.B. Make sure to have setup a working authentication provider for docker, such that you can push to your repository.

Run this in a terminal in the content-root directory (so [`fltk-testbed`](../) if the project name was not altered).
```bash
python3 -m venv venv
source venv
pip3 install -r requirements-cpu.txt
python3 -m fltk extractor configs/example_cloud_experiment.json
```

In [None]:
# Make sure that you have MiniKube's docker environment activated
# eval $(minikube docker-env)

# Build the docker container with buildkit. Make sure you have Docker Desktop running on Windows/MacOS
DOCKER_BUILDKIT=1 docker build --platform linux/amd64 ../ --tag gcr.io/test-bed-fltk/fltk
# This will automatically be available in your cluster.

# In case you have issues with the command above, in a seperate terminal run
# DOCKER_BUILDKIT=1 docker build --platform linux/amd64 <fltk-directory> --tag gcr.io/test-bed-fltk/fltk
# minikube image load gcr.io/test-bed-fltk/fltk

In [None]:
# Install the extractor, and set the projectName to $PROJECT_ID.
# In case you get a warning regarding the namespace test, this means that the dependencies have not been properly installed.
# Make sure to check whether you have enough resources available, and re-run the installation of dependencies. (see above).

# Deploy extractor, in test namespace with updated image reference (--set overwrites values from `fltk-values.yaml`).
# NOTE THAT WE SET fltk.pullPolicy to Never, as we won't have access to the external registry
# NOTE THAT WE SET provider.projectName,
helm install extractor ../charts/extractor -f ../charts/fltk-values.yaml --namespace test --set provider.projectName="${PROJECT_ID}",fltk.pullPolicy=Never

# To uninstall the extractor.
# helm uninstall -n test extractor

## Testing the deployment

To make sure that the deployment went OK, we can run the following command to test whether we can use Pytorch-Training operators.

This will create a simple deployment using a Kubeflow pytorch example job.

This will create a small (1 master, 1 client) training job on mnist on your cluster. You can follow the deployment by navigating to your cluster on [cloud.google.com](cloud.google.com)

In [None]:
# This cell is optional, but the next shell should show that a pytorch train job is created.
# This will create a KubeFlow example deployment
kubectl create -f https://raw.githubusercontent.com/kubeflow/training-operator/master/examples/pytorch/simple.yaml

In [None]:
# Retrieve all CRD Pytorchjob from Kubeflow.
kubectl get pytorchjobs.kubeflow.org --all-namespaces

# Alternatively, we can remove all jobs, this will remove all information and logs as well.
# kubectl delete pytorchjobs.kubeflow.org --all-namespaces --all

In [None]:
# To run an example deployment with Freddie/FLTK
# NOTE THAT WE SET fltk.pullPolicy to Never, as we won't have access to the external registry
# NOTE THAT WE SET provider.projectName to $PROJECT_ID.
helm install flearner charts/orchestrator --namespace test -f charts/fltk-values.yaml --set-file orchestrator.experiment=./configs/distributed_tasks/example_arrival_config.json,orchestrator.configuration=./configs/example_cloud_experiment.json --set fltk.pullPolicy=Never,provider.projectID="${PROJECT_ID}"

# After 180 seconds the first experiment (with the default config) will start with a simulated Orchestrator.

# Cleaning up

## Scaling down the cluster

This is the preferred way to scale down, which stops the minikube instance from running. To start the cluster again, simply run `minikube start`

In [None]:
# This stops the minikube cluster,
minikube stop

# To start the cluster again simply run the following
# All dependencies will still be installed as you described them.
# minikube start