# Kubeflow Setup

## Introduction
>In this lesson, we learn how to install Kubeflow on various platforms. We will guide you through the process of creating a cluster on EKS, and how to install Kubeflow on it.

## Installation

Kubeflow works on Kubernetes; therefore, applications can be developed locally and, thereafter, deployed to a Kubernetes cluster. 

For example, you can run everything on minikube and subsequently deploy the same application to EKS. However, not all computers have the capacity to run a large cluster locally; thus, you may need to directly deploy to EKS.

Note that kubectl must be installed in order to use Kubernetes locally.

## Creating a Cluster on EKS

> __Note:__ We will create an EKS cluster using EC2 instances belonging to the paid tiers. Therefore, we advise you to use the provided AWS credentials (to avoid paying) and follow the instructions for the Facebook Marketplace scenario provided [here](https://aicore-files.s3.amazonaws.com/MLOps/Facebook_Setup.md). Additionally, run the included commands **in the EC2 instance** not in your local machine.

### Elastic Kubernetes service
__Elastic Kubernetes Service (EKS)__ is an AWS service that facilitates the orchestration of Docker containers on AWS.

Since Kubeflow runs on Kubernetes, EKS is a great tool for running applications on AWS.

<p align=center><img src=images/EKS_1.png width=700></p>

Although you can create a cluster using the AWS Management Console, we encourage you to use the CLI, since you will eventually need it to install Kubeflow on the created cluster.

### Requirements 
Before creating the EKS cluster, you need to install the following tools for use in the CLI:
- __eksctl__: facilitates the creation and management of Kubernetes clusters on EKS.
- __kubectl__: for interacting with the API server of your cluster.
- __AWS CLI__: for interacting with AWS services.

Additionally, an IAM user with the following permissions is required:
- EKS roles
- CloudFormation (since the EKS cluster is created from a stack)
- VPC-related resources

If you alone will use this service, you can create an IAM role with all the permissions instead.

To learn more about Kubernetes permissions, visit this [link](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonelastickubernetesservice.html).

### Installing _eksctl_

<details>
  <summary>For Ubuntu Users</summary>

  1. Download and extract eksctl using the following commands:
  ```
      curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp

      sudo mv /tmp/eksctl /usr/local/bin
  ```

  2. Verify that eksctl was installed correctly:
  ```
      eksctl version
  ```
</details>

<details>
  <summary>For Mac Users</summary>

  1. Install Homebrew following the instructions [here](https://brew.sh/).
  2. Install the Weaveworks Homebrew tap by running the following command:
  ```
    brew tap weaveworks/tap
  ```
  3. Install eksctl with the following command:
  ```
    brew install weaveworks/tap/eksctl
  ```
  4. Verify that eksctl was installed correctly:
  ```
    eksctl version
  ```

</details>

<details>
  <summary>For Windows Users</summary>

  1. Install Chocolatey following the instructions [here](https://chocolatey.org/install).
  2. Install the binaries with the following command:
  ```
    choco install -y eksctl
  ```
  3. Verify that eksctl was installed correctly:
  ```
    eksctl version
  ```
</details>

Once completed, your output should be similar to that shown below.

<p align=center><img src=images/EKSCTL_1.png width=400></p>

### Installing _kubectl_

<details>
  <summary>For Ubuntu Users</summary>

  1. Download the kubectl binary from AWS. At the time of writing, we are working with version 1.21 because it is the stable version. However, things may be different now. Thus, we encourage you to explore the available versions [here](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html).
  ```
      curl -o kubectl https://s3.us-west-2.amazonaws.com/amazon-eks/1.21.2/2021-07-05/bin/linux/amd64/kubectl
  ```

  2. Change the permissions to the binary file:
  ```
      chmod +x ./kubectl
  ```
  3. Copy the binary to a folder in your PATH:
  ```
      mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin
  ```
  4. Add the binary to the shell initialisation:
  ```
      echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
  ```
  5. Verify that kubectl was installed correctly:
  ```
      kubectl version --client
  ```
</details>

<details>
  <summary>For Mac Users</summary>

  1. Download the kubectl binary from AWS. At the time of writing, we are working with version 1.21 because it is the stable version. However, things may be different now. Thus, we encourage you to explore the available versions [here](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html). 
  ```
      curl -o kubectl https://s3.us-west-2.amazonaws.com/amazon-eks/1.21.2/2021-07-05/bin/darwin/amd64/kubectl
  ```

  2. Change the permissions to the binary file:
  ```
      chmod +x ./kubectl
  ```
  3. Copy the binary to a folder in your PATH:
  ```
      mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$HOME/bin:$PATH
  ```
  4. Add the binary to the shell initialisation:
  ```
      echo 'export PATH=$PATH:$HOME/bin' >> ~/.bash_profile
  ```
  5. Verify that kubectl was installed correctly:
  ```
      kubectl version --client
  ```

</details>

<details>
  <summary>For Windows Users</summary>

  1. Download the kubectl binary from AWS. At the time of writing, we are working with version 1.21 because it is the stable version. However, things may be different now. Thus, we encourage you to explore the available versions [here](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html). 
  ```
      curl -o kubectl.exe https://s3.us-west-2.amazonaws.com/amazon-eks/1.21.2/2021-07-05/bin/windows/amd64/kubectl.exe
  ```
  2. Copy the binary to a folder in your PATH:
  
  $\qquad$ a. Create a new directory for your command-line binaries, such as `C:\bin`.

  $\qquad$ b. Copy the binary to the new directory.

  $\qquad$ c. Edit your user or system PATH environment variable to add the new directory to your PATH.
  
  $\qquad$ d. Restart the terminal.
  
  3. Verify that kubectl was installed correctly:
  ```
      kubectl version --client
  ```

</details>

Once completed, your output should be similar to that shown below.

<p align=center><img src=images/KUBECTL_1.png width=500></p>

### Installing _aws cli_

If you have installed pip on your computer, you can install AWS CLI using the following command:

```
    pip install awscli
```


### Creating the IAM user

An IAM user that will facilitate communication between the EKS cluster and the AWS services is required. If you know how to create the IAM user and authenticate with the AWS CLI, you can skip this section.

First, in your AWS console, go to the IAM console and create a new user.

<p align=center>
  <img src=images/IAM_1.png width=150>
</p>

Click on 'Add users'.

<p align=center>
    <img src=images/IAM_2.png width=500>
</p>

Assign a name to your user, and select 'Access Key - Programmatic access' for 'Select AWS credential type'; thereafter, click 'Next'.

<p align=center>
    <img src=images/IAM_3.png width=500>
</p>

Afterwards, click on 'Attach existing policies directly', and select 'AdministratorAccess'. Click 'Next' until the user is created.

<p align=center>
    <img src=images/IAM_4.png width=500>
</p>

Download the `CSV` file with your Access key ID and Secret access key. If you close this window, you will no longer have access to the secret access key; hence, endeavour to download the file before closing the window.

#### Authenticating with the AWS CLI

In your terminal, type `aws configure`. This will ask you for your Access key ID, Secret access key, and region. Note that the region you input now will be applied when creating the cluster; thus, make sure you remember it.

<p align=center>
    <img src=images/IAM_5.png width=500>
</p>

### Creating the EKS cluster

Now that everything is installed, you can create the cluster that will be used in this notebook.

The syntax for creating a cluster is as follows:

`eksctl create cluster [flags]`

The main required flags are

- `--name`: the name of the cluster.
- `--region`: the region where the cluster will be created (use the same region you used to create the IAM user).
- `--nodegroup-name`: the name of the node group.
- `--node-type`: the type of node that will be created (we will use `t2.xlarge` for Kubeflow installation).
- `--nodes`: the number of nodes that will be created (we will use 2 for Kubeflow installation).
- `--timeout`: the maximum waiting time for any long-running operation to complete (40 minutes (`40m`) in our case).
- `--version`: the Kubernetes version employed (we will use version `1.19` for the Kubeflow installation because it is stable).

You can always view the available information on these flags by running `eksctl create cluster --help`.

Here are the commands:

```
eksctl create cluster \
    --name kubeflow-cluster \
    --region us-east-2 \
    --nodegroup-name kubeflow-node-group \
    --node-type t2.xlarge \
    --nodes 2 \
    --timeout 40m \
    --version 1.19
```

The cluster-creation process takes some time (approximately 20 to 30 minutes). Therefore, as you wait, you can install other tools, such as kustomize (see the next section). During the creation process, your CLI output should appear similar to that shown below:

<p align=center><img src=images/EKS_2.png width=600></p>

__Note:__ When a cluster is created, eksctl will modify the kubeconfig file so that the next time you apply a manifest in your local machine, you will be applying the manifest to the cluster you are now creating. 

You can always change where to apply the manifest by changing the `--kubeconfig` flag or modifying the `~/.kube/config` file.

## Installing Kubeflow in the Cluster

Before installing Kubeflow, you need to install `kustomize`, which is a command-line tool that allows you to modify the manifest files of Kubernetes. Depending on your local machine, you may need to install `yq` and `jq` to use `kustomize`, which are command-line tools for processing YAML and JSON files, respectively.


### Installing _yq_



<details>
  <summary>For Ubuntu Users</summary>

  1. Download the latest executable file of yq from GitHub:
  ```
      sudo wget -qO /usr/local/bin/yq https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64
  ```

  2. Change the permissions to the binary file:
  ```
      chmod a+x /usr/local/bin/yq
  ```
  3. Verify that yq was installed correctly:
  ```
      yq --version
  ```
</details>

<details>
  <summary>For Mac Users</summary>

  1. You can install it using Homebrew:
  ```
      brew install yq
  ```
  2. Verify that yq was installed correctly:
  ```
      yq --version
  ```
</details>

<details>
  <summary>For Windows Users</summary>

  1. You can install it using Chocolatey:
  ```
      choco install yq
  ```
  2. Verify that yq was installed correctly:
  ```
      yq --version
  ```
</details>

### Installing _jq_


<details>
  <summary>For Ubuntu Users</summary>

  1. Update the package lists, and install jq:
  ```
      sudo apt update
      sudo apt install -y jq
  ```
  2. Verify that jq was installed correctly:
  ```
      jq --version
  ```
</details>

<details>
  <summary>For Mac Users</summary>

  1. You can install it using Homebrew:
  ```
      brew install jq
  ```
  2. Verify that jq was installed correctly:
  ```
      jq --version
  ```
</details>

<details>
  <summary>For Windows Users</summary>

  1. You can simply use Chocolatey to install it
  ```
      choco install jq
  ```
  2. Verify that jq is installed correctly:
  ```
      jq --version
  ```
</details>

### Installing _kustomize_

> __Note:__ The latest version of kustomize does not work with the latest versions of Kubeflow. Thus, you will need to use kustomize 3.2.0.

<details>
  <summary>For Ubuntu Users</summary>

  1. Download the corresponding version and binary:
  ```
      wget -O kustomize https://github.com/kubernetes-sigs/kustomize/releases/download/v3.2.0/kustomize_3.2.0_linux_amd64
  ```
  2. Change the permissions:
  ```
      chmod +x kustomize
  ```
  3. Move the binary to PATH:
  ```
      sudo mv kustomize /usr/local/bin/
  ```
  4. Verify that kustomize was installed correctly:
  ```
      kustomize version
  ```
</details>

<details>
  <summary>For Mac Users</summary>

  1. Download the corresponding version and binary:
  ```
      wget -O kustomize https://github.com/kubernetes-sigs/kustomize/releases/download/v3.2.0/kustomize_3.2.0_darwin_amd64
  ```
  2. Change the permissions:
  ```
      chmod +x kustomize
  ```
  3. Move the binary to PATH:
  ```
      sudo mv kustomize /usr/local/bin/
  ```
  4. Verify that kustomize was installed correctly:
  ```
      kustomize version
  ```
</details>

<details>
  <summary>For Windows Users</summary>

  Windows users can use Chocolatey for the installation. However, unfortunately, Chocolatey only downloads the latest version of kustomize.

  To install kustomize 3.2.0, follow this [StackOverflow thread](https://stackoverflow.com/questions/70838480/build-kustomize-3-2-0-on-windows).
</details>

### Installing Kubeflow

As mentioned, Kubeflow works using Kubernetes manifests.

At the time of writing, the latest version of the manifest is `v1.5`. However, this might be outdated at present. You can explore and download the latest version of these manifests from the [Kubeflow GitHub repository](https://github.com/kubeflow/manifests).
```
git clone https://github.com/kubeflow/manifests
cd manifests
git checkout v1.5-branch
```

Next, we utilise `kustomize` to build all the Kubeflow components and services:

```
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
```

The `while` loop will keep attempting to apply the resources until it succeeds. This occurs in consideration of the high failure probability of Kubernetes and `kubectl` on the first attempt.

This process will also take approximately 20 to 30 minutes (ignore all the warnings and retries).

<p align=center><img src=images/KUBECTL_2.png width=600></p>

The process will run until everything is configured. As mentioned, this is because Kubernetes and `kubectl` may fail on the first attempt.

## Connecting to Kubeflow

Once the long process is complete, you should be able to connect to Kubeflow. Run the following command to expose the Kubeflow dashboard:

```
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
```

The CLI will display the port for accessing the dashboard:

<p align=center><img src=images/KUBECTL_3.png width=600></p>

Accordingly, go to `http://localhost:8080` to access the dashboard.

Recall that when you installed Kubeflow, you used the default email and password given by the manifests you downloaded:
- user: user@example.com
- password: 12341234

If you intend to deploy your application into production, consider changing the password to something more secure (see [here](https://github.com/kubeflow/manifests/tree/v1.5-branch#change-default-user-password) for detailed information).

The window shown below confirms the successful installation of Kubeflow.

<p align=center><img src=images/Kubeflow.png width=600></p>

However, note that a minor configuration is required to use notebooks in Kubeflow. This is because Kubeflow is currently running on HTTP, which is unnecessary if the notebooks can be run on HTTPS. Click [here](https://github.com/awslabs/kubeflow-manifests/issues/67#issuecomment-1059566247) for information on the required configuration.

Here, we learn how to run notebooks on HTTP.

### Running notebooks on HTTP
First, type `ctrl + c` to stop exposing the EKS.

Next, change the `jupyter-web-app-deployment.yaml` file to expose the notebook on HTTP:

```kubectl edit deployments -n kubeflow jupyter-web-app-deployment```

This will open a VIM editor. __Please exercise caution here!__ A slight change will break the Kubeflow configuration. Locate the `spec.template.spec.containers.env` section. It should appear similar to that shown below.

<p align=center><img src=images/KUBECTL_4.png width=400></p>

Therein, add the following lines (recall that you are on VIM; therefore, you need to type 'i' to insert new lines). Additionally, since you are modifying a YAML file, the TAB key will not always work; in such a case, hit the spacebar to add the whitespaces:

```
  - name: APP_SECURE_COOKIES
    value: "false"
```

Your output should be similar to that shown below.

<p align=center><img src=images/KUBECTL_5.png width=400></p>

Save and exit (press 'Esc', and type ':wq').

Restart the deployment for the changes to take effect.

```kubectl -n kubeflow rollout restart deployment/jupyter-web-app-deployment```

Subsequently, expose the Kubeflow dashboard once again:

```
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
```

Afterwards, go to `http://localhost:8080` to access the notebook.

Next, click on 'Notebooks' and then 'New Notebook' to create a new notebook.

Assign a name to it, and scroll down to the bottom of the page; thereafter, click on 'Launch' to start the notebook. This is what you should see:

<p align=center><img src=images/Kubeflow_2.png width=700></p>

Attempt to connect to it, and if successful, your Kubeflow setup is complete.

## Conclusion
At this point, you should have a good understanding of

- how to set up Kubeflow.
- how to create a cluster on EKS.
- how to install Kubeflow in the cluster.
- how to connect to Kubeflow.