# End to End ML with Metaflow and Tempo

We will train two models and deploy them with tempo within a Metaflow pipeline. To understand the core example see [here](https://tempo.readthedocs.io/en/latest/examples/multi-model/README.html)

![archtecture](architecture.png)

## MetaFlow Prequisites


### Install metaflow locally

```
pip install metaflow
```

### Setup Conda-Forge Support

The flow will use conda-forge so you need to add that channel to conda.

```
conda config --add channels conda-forge
```



## Iris Flow Summary

In [1]:
!python src/irisflow.py --environment=conda show

[35m[1mMetaflow 2.3.2[0m[35m[22m executing [0m[31m[1mIrisFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:clive[0m[35m[22m[K[0m[35m[22m[0m
[22m
A Flow to train two Iris dataset models and combine them for inference with Tempo

The flow performs the following steps:

1) Load Iris Data
2) Train SKLearn LR Model
3) Train XGBoost LR Model
4) Create and deploy Tempo artifacts[K[0m[22m[0m
[22m
Step [0m[31m[1mstart[0m[22m[K[0m[22m[0m
[22m    Download Iris classification datatset[K[0m[22m[0m
[22m    [0m[35m[22m=>[0m[22m [0m[35m[22mtrain_sklearn[0m[22m, [0m[35m[22mtrain_xgboost[0m[22m[K[0m[22m[0m
[22m
Step [0m[31m[1mtrain_sklearn[0m[22m[K[0m[22m[0m
[22m    Train a SKLearn Logistic Regression Classifier on dataset and save model as artifact[K[0m[22m[0m
[22m    [0m[35m[22m=>[0m[22m [0m[35m[22mjoin[0m[22m[K[0m[22m[0m
[22m
Step [0m[31m[1mtrain_xgboost[0m[22m[K[0m[22m[0m
[22m    Train an XGBoost cl

## Run Flow locally to deploy to Docker

To run the workflow with a local Docker deployment use the flag:

```
--tempo-on-docker true
```


In [2]:
!python src/irisflow.py --environment=conda run 

[35m[1mMetaflow 2.3.2[0m[35m[22m executing [0m[31m[1mIrisFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:clive[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[22m    Pylint not found, so extra checks are disabled.[K[0m[32m[22m[0m
[22mBootstrapping conda environment...(this could take a few minutes)[K[0m[22m[0m
[22mIncluding file src/conda.yaml of size 115B [K[0m[22m[0m
[22mFile persisted at s3://metaflow1-metaflows3bucket-1ou61547mcwbq/metaflow/data/IrisFlow/a94f1cff7702ed70807d16917bb282f51a28511e[K[0m[22m[0m
[22mIncluding file src/gsa-key.json of size 2KB [K[0m[22m[0m
[22mFile persisted at s3://metaflow1-metaflows3bucket-1ou61547mcwbq/metaflow/data/IrisFlow/c39c6e4ed18f183a2a3ae16b491f2848b57f6824[K[0m[22m[0m
[22mIncluding file src/kubeconfig.yaml of size 2KB [K[0m[22m[0m
[22mFile pe

[35m2021-09-04 08:17:28.850 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mUsing cached requests_oauthlib-1.3.0-py2.py3-none-any.whl (23 kB)[0m
[35m2021-09-04 08:17:28.850 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mCollecting cachetools<5.0,>=2.0.0[0m
[35m2021-09-04 08:17:28.851 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mUsing cached cachetools-4.2.2-py3-none-any.whl (11 kB)[0m
[35m2021-09-04 08:17:28.851 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mCollecting pyasn1-modules>=0.2.1[0m
[35m2021-09-04 08:17:28.851 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mUsing cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)[0m
[35m2021-09-04 08:17:28.853 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mCollecting rsa<5,>=3.1.4[0m
[35m2021-09-04 08:17:28.853 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mUsing cached rsa-4.7.2-py3-none-any.whl (34 kB)[0m
[35m2021-09-04 08:17:28.853 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mCollecting pyasn1<0.5.0,>=0.4.6[0m
[35m2021-09-04 08:17:28.85

[35m2021-09-04 08:17:46.002 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mPreparing transaction: ...working... done[0m
[35m2021-09-04 08:17:46.151 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mVerifying transaction: ...working... done[0m
[35m2021-09-04 08:17:46.326 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mExecuting transaction: ...working... done[0m
[35m2021-09-04 08:18:24.545 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mRemove all packages in environment /home/clive/anaconda3/envs/tempo-bfe7ae29-ca47-4b67-a749-8c99f97d780e:[0m
[35m2021-09-04 08:18:24.545 [0m[32m[2/tempo/13 (pid 535666)] [0m[22m[0m
[35m2021-09-04 08:18:24.546 [0m[32m[2/tempo/13 (pid 535666)] [0m[22mInsights Manager not initialised as empty URL provided.[0m
[35m2021-09-04 08:18:34.767 [0m[32m[2/tempo/13 (pid 535666)] [0m[22m{'output0': array([[0.00847207, 0.03168793, 0.95984   ]], dtype=float32), 'output1': 'xgboost prediction'}[0m
[35m2021-09-04 09:18:37.183 [0m[32m[2/tempo/13 (pid 535666)] 

## Make Predictions with Metaflow Tempo Artifact

In [3]:
from metaflow import Flow
import numpy as np
run = Flow('IrisFlow').latest_run
client = run.data.client_model
client.predict(np.array([[1, 2, 3, 4]]))

{'output0': array([[0.00847207, 0.03168793, 0.95984   ]], dtype=float32),
 'output1': 'xgboost prediction'}

## Run Flow on AWS and Deploy to Remote Kubernetes

We will now run our flow on AWS Batch and will launch Tempo artifacts onto a remote Kubernetes cluster. 

### Setup AWS Metaflow Support

Note at present this is required even for a local run as artifacts are stored on S3.

[Install Metaflow with remote AWS support](https://docs.metaflow.org/metaflow-on-aws/metaflow-on-aws).

### Seldon Requirements

For deploying to a remote Kubernetes cluster with Seldon Core installed do the following steps:

#### Install Seldon Core on your Kubernetes Cluster

Create a GKE cluster and install Seldon Core on it using [Ansible to install Seldon Core on a Kubernetes cluster](https://github.com/SeldonIO/ansible-k8s-collection).


### K8S Auth from Metaflow

To deploy services to our Kubernetes cluster with Seldon Core installed, Metaflow steps that run on AWS Batch and use tempo will need to be able to access K8S API. This step will depend on whether you're using GKE or AWS EKS to run 
your cluster.

#### Option 1. K8S cluster runs on GKE

We will need to create two files in the flow src folder:

```bash
kubeconfig.yaml
gsa-key.json
```

Follow the steps outlined in [GKE server authentication](https://cloud.google.com/kubernetes-engine/docs/how-to/api-server-authentication#environments-without-gcloud).




#### Option 2. K8S cluster runs on AWS EKS

Make note of two AWS IAM role names, for example find them in the IAM console. The names depend on how you deployed Metaflow and EKS in the first place:

1. The role used by Metaflow tasks executed on AWS Batch. If you used the default CloudFormation template to deploy Metaflow, it is the role that has `*BatchS3TaskRole*` in its name.

2. The role used by EKS nodes. If you used `eksctl` to create your EKS cluster, it is the role that starts with `eksctl-<your-cluster-name>-NodeInstanceRole-*`

Now, we need to make sure that AWS Batch role has permissions to access the K8S cluster. For this, add a policy to the AWS Batch task role(1) that has `eks:*` permissions on your EKS cluster (TODO: narrow this down).

You'll also need to add a mapping for that role to `aws-auth` ConfigMap in `kube-system` namespace. For more details, see [AWS docs](https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html) (under "To add an IAM user or role to an Amazon EKS cluster"). In short, you'd need to add this to `mapRoles` section in the aws-auth ConfigMap:
```
     - rolearn: <batch task role ARN>
       username: cluster-admin
       groups:
         - system:masters
```

We also need to make sure that the code running in K8S can access S3. For this, add a policy to the EKS node role (2) to allow it to read and write Metaflow S3 buckets.

### S3 Authentication
Services deployed to Seldon will need to access Metaflow S3 bucket to download trained models. The exact configuration will depend on whether you're using GKE or AWS EKS to run your cluster.

From the base templates provided below, create your `k8s/s3_secret.yaml`.

```yaml
apiVersion: v1
kind: Secret
metadata:
  name: s3-secret
type: Opaque
stringData:
  RCLONE_CONFIG_S3_TYPE: s3
  RCLONE_CONFIG_S3_PROVIDER: aws
  RCLONE_CONFIG_S3_BUCKET_REGION: <region>
  <...cloud-dependent s3 auth settings (see below)>
```

For GKE, to access S3 we'll need to add the following variables to use key/secret auth:
```yaml
  RCLONE_CONFIG_S3_ENV_AUTH: "false"
  RCLONE_CONFIG_S3_ACCESS_KEY_ID: <key>
  RCLONE_CONFIG_S3_SECRET_ACCESS_KEY: <secret>
```

For AWS EKS, we'll use the instance role assigned to the node, we'll only need to set one env variable:
```yaml
RCLONE_CONFIG_S3_ENV_AUTH: "true"
```

We provide two templates to use in the `k8s` folder:

```
s3_secret.yaml.tmpl.aws
s3_secret.yaml.tmpl.gke
```

Use one to create the file `s3_secret.yaml` in the same folder


## Setup RBAC and Secret on Kubernetes Cluster

These steps assume you have authenticated to your cluster with kubectl configuration

In [4]:
!kubectl create ns production

Error from server (AlreadyExists): namespaces "production" already exists


In [5]:
!kubectl create -f k8s/tempo-pipeline-rbac.yaml -n production

Error from server (AlreadyExists): error when creating "k8s/tempo-pipeline-rbac.yaml": serviceaccounts "tempo-pipeline" already exists
Error from server (AlreadyExists): error when creating "k8s/tempo-pipeline-rbac.yaml": roles.rbac.authorization.k8s.io "tempo-pipeline" already exists
Error from server (AlreadyExists): error when creating "k8s/tempo-pipeline-rbac.yaml": rolebindings.rbac.authorization.k8s.io "tempo-pipeline-rolebinding" already exists


Create a Secret from the `k8s/s3_secret.yaml.tmpl` file by adding your AWS Key that can read from S3 and saving as `k8s/s3_secret.yaml`

In [6]:
!kubectl create -f k8s/s3_secret.yaml -n production

Error from server (AlreadyExists): error when creating "k8s/s3_secret.yaml": secrets "s3-secret" already exists


## Run Metaflow on AWS Batch

In [8]:
!python src/irisflow.py \
    --environment=conda \
    --with batch:image=seldonio/seldon-core-s2i-python37-ubi8:1.10.0-dev \
    run

[35m[1mMetaflow 2.3.2[0m[35m[22m executing [0m[31m[1mIrisFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:clive[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[22m    Pylint not found, so extra checks are disabled.[K[0m[32m[22m[0m
[22mBootstrapping conda environment...(this could take a few minutes)[K[0m[22m[0m
[22mIncluding file src/conda.yaml of size 115B [K[0m[22m[0m
[22mFile persisted at s3://metaflow1-metaflows3bucket-1ou61547mcwbq/metaflow/data/IrisFlow/a94f1cff7702ed70807d16917bb282f51a28511e[K[0m[22m[0m
[22mIncluding file src/gsa-key.json of size 2KB [K[0m[22m[0m
[22mFile persisted at s3://metaflow1-metaflows3bucket-1ou61547mcwbq/metaflow/data/IrisFlow/c39c6e4ed18f183a2a3ae16b491f2848b57f6824[K[0m[22m[0m
[22mIncluding file src/kubeconfig.yaml of size 2KB [K[0m[22m[0m
[22mFile pe

[35m2021-09-04 09:22:47.906 [0m[32m[4/join/25 (pid 558394)] [0m[22m[1b45a3e8-6eec-428e-bf48-9e0ce995b1dc] Bootstrapping environment...[0m
[35m2021-09-04 09:23:04.216 [0m[32m[4/join/25 (pid 558394)] [0m[22m[1b45a3e8-6eec-428e-bf48-9e0ce995b1dc] Environment bootstrapped.[0m
[35m2021-09-04 09:23:12.402 [0m[32m[4/join/25 (pid 558394)] [0m[22m[1b45a3e8-6eec-428e-bf48-9e0ce995b1dc] Task finished with exit code 0.[0m
[35m2021-09-04 10:23:13.349 [0m[32m[4/join/25 (pid 558394)] [0m[1mTask finished successfully.[0m
[35m2021-09-04 10:23:15.268 [0m[32m[4/tempo/26 (pid 558822)] [0m[1mTask is starting.[0m
[35m2021-09-04 09:23:16.485 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d] Task is starting (status SUBMITTED)...[0m
[35m2021-09-04 09:23:19.778 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d] Task is starting (status RUNNABLE)...[0m
[35m2021-09-04 09:23:20.892 [0m[32m[4/tempo/26 (pid 558822)

[35m2021-09-04 09:24:40.470 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d] Collecting charset-normalizer~=2.0.0[0m
[35m2021-09-04 09:24:40.470 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d]   Downloading charset_normalizer-2.0.4-py3-none-any.whl (36 kB)[0m
[35m2021-09-04 09:24:40.470 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d] Collecting urllib3<1.27,>=1.21.1[0m
[35m2021-09-04 09:24:40.470 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d]   Downloading urllib3-1.26.6-py2.py3-none-any.whl (138 kB)[0m
[35m2021-09-04 09:24:40.470 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d] Collecting starlette==0.14.2[0m
[35m2021-09-04 09:24:40.470 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d]   Using cached starlette-0.14.2-py3-none-any.whl (60 kB)[0m
[35m2021-09-04 09:24:40.

[35m2021-09-04 09:24:52.835 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d] The following packages will be REMOVED:[0m
[35m2021-09-04 09:24:52.835 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d][0m
[35m2021-09-04 09:24:52.835 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d]   _libgcc_mutex-0.1-main[0m
[35m2021-09-04 09:24:52.835 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d]   _openmp_mutex-4.5-1_gnu[0m
[35m2021-09-04 09:24:52.835 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d]   ca-certificates-2021.7.5-h06a4308_1[0m
[35m2021-09-04 09:24:52.835 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d]   certifi-2021.5.30-py37h06a4308_0[0m
[35m2021-09-04 09:24:52.835 [0m[32m[4/tempo/26 (pid 558822)] [0m[22m[31103662-b002-45a2-bd37-eaf23263f12d]   ld_impl_linux-64-2.35.1-h7

## Make Predictions with Metaflow Tempo Artifact

In [9]:
from metaflow import Flow
run = Flow('IrisFlow').latest_run
client = run.data.client_model
import numpy as np
client.predict(np.array([[1, 2, 3, 4]]))

{'output0': array([[0.00847207, 0.03168793, 0.95984   ]], dtype=float32),
 'output1': 'xgboost prediction'}