# mlflow@AIMat

[Mlflow](https://mlflow.org) is a platform to manage ML models, track metrics and parameters and create workflows. The AIMat lab runs its own mlflow server which is available on the KIT LAN (or over VPN) under the ip [141.3.29.7](http://141.3.29.7/). If you visit this with your browser you can have a look at the UI.

The server is connected to a so-called artifact storage, a mass storage with a fast connection for uploads and downloads, as e.g. trained ML-models, entire code projects, or seperate data - like e.g. numpy arrays, etc. For us this is the Large Scale Data Facility (LSDF) at SCC. It is connected via sftp. The main purpose of this artifact storage is to make it possible for clients to fastly store large amounts of data without the need to send this via a slow web API of the mlflow server itself (see image below). That also means that the client needs direct acces to the artifact store *via the same url as the mlflow server*. More on that in the configuration section. 

Additionally the server has a so called backend store, which is a managed MySQL database at the SCC. This is needed to store metrics and parameters of models, register models to make pretrained models available to other users, etc. As you can see from the diagram this is only connected directly to the server and needs no configuration from the client side.
![Framework-automol-mlflow.jpg](attachment:Framework-automol-mlflow.jpg)

## Configuring your clients
Your client can be your laptop but also your useraccount on the bwunicluster.
We nee three things:
- A connection to the LSDF (ask Matthias for an entitlement)
- An environment file for mlflow that defines the server address and your credentials
- A module that loads the environment file for you

### 1. Configuring the LSDF connection
The main problem to solve is that the url on your client needs to be the same as on the server. The server url is the following: `sftp://lsdf/kit/iti/projects/aimat-mlflow/artifacts`  
To ensure the same url on the client we need to configure the ssh connection. If you not already have an ssh-key pair for the lsdf, create a new one with `ssh-keygen`. As path I recommend `~/.ssh/lsdf`. Add it to your LSDF account with:
```bash
ssh-copy-id -i ~/.ssh/lsdf user@os-login.lsdf.kit.edu
```
Now open or create you ssh config with `nano ~/.ssh/config` and add the following lines and replace username with your user:
```
Host lsdf
    HostName os-login.lsdf.kit.edu
    User username
    IdentityFile ~/.ssh/lsdf
```
ssh into LSDF with `ssh lsdf` and make sure that you can accesss the storage project for mlflow e.g. with:
```bash
ls /lsdf/kit/iti/projects/aimat-mlflow
```
You should see the folder artifacts. If this is not the case, contact your ITB (Matthias)

### 2. Create an environment file 
In your home directory (local laptop) create the file `.env`with `nano .env` and add the follwoing lines:
```
MLFLOW_TRACKING_USERNAME=user
MLFLOW_TRACKING_PASSWORD=your_password
MLFLOW_TRACKING_URI=http://141.3.29.7
```
The password here is a custom generated one which you get from your ITB (Matthias).
Now make sure that this file is only readable by you with `chmod 600 .env`

### 3. Read and export the variables for mlflow
Now that you have your mlflow client configuration in a file that is readable by you, each python code that you execute can read it to use it to authenticate at the server. So you don't have to put your credentials into any code! yay...
To read and export the file you can either install mlflow_utils from the aimat package index with:
```
pip install mlflow_utils --extra-index-url https://aimat-lab.github.io/package-index/
```
(You need acces to the aimat-lab organization on github)
or you can use the follwoing lines of code:

In [1]:
from os.path import expanduser
import os


def load_env():
    home = expanduser("~")
    print(home)
    with open(home+'/.env', 'r') as f:
        env = dict()
        for line in f.readlines():
            key, value = line.split('=')
            env[key] = value.split('\n')[0]
        return env

def export_env():
    """Loads your .env file and exports the three variables important for mlflow.
    MLFLOW_TRACKING_USERNAME, MLFLOW_TRACKING_PASSWORD, MLFLOW_TRACKING_URI
    """
    env = load_env()
    for key in ["MLFLOW_TRACKING_USERNAME", "MLFLOW_TRACKING_PASSWORD", "MLFLOW_TRACKING_URI"]:
        os.environ[key] = env[key]

`export_env()` can then be called at the beginning of a script or this notebook to automatically export the necessary environment variables.
Using this environment file gives you at least a basic security so that you don't have to put credentials into code and e.g. still have them in your git history when publishing a repo. Still on a security breach on the bwunicluster we can renew your password seperately from your actual KIT account password.

## Using mlflow in your projects

**Make sure that you are in the VPN or in the KIT Wifi!!**

We first need to install mlflow and for this dummy test-case tensorflow and pandas into our current environment with `pip install mlflow tensorflow pandas`. Now let us first set up mlflow and set the expriment to "demo":

In [2]:
import mlflow
from mlflow_utils.load_env import export_env

export_env()
mlflow.set_experiment("demo")

Now for sake of simplicity to demonstrate how mlflow works in conjunction with tensorflow we can work with an example using e.g. the car dataset:

In [3]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
import pandas as pd
import numpy as np

url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', Data for 133885 GDB-9 molecules
'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model Year', 'Origin']

dataset = pd.read_csv(url, names=column_names,
                      na_values='?', comment='\t',
                      sep=' ', skipinitialspace=True)
dataset = dataset.dropna()
# One-hot encode origin:
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')
dataset.tail()

  and should_run_async(code)


Unnamed: 0,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model Year,Europe,Japan,USA
393,27.0,4,140.0,86.0,2790.0,15.6,82,0,0,1
394,44.0,4,97.0,52.0,2130.0,24.6,82,1,0,0
395,32.0,4,135.0,84.0,2295.0,11.6,82,0,0,1
396,28.0,4,120.0,79.0,2625.0,18.6,82,0,0,1
397,31.0,4,119.0,82.0,2720.0,19.4,82,0,0,1


Let's split it into train and test datasets:

In [4]:
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)
train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('MPG')
test_labels = test_features.pop('MPG')

train_dataset.describe().transpose()[['mean', 'std']]

  and should_run_async(code)


Unnamed: 0,mean,std
MPG,23.31051,7.728652
Cylinders,5.477707,1.699788
Displacement,195.318471,104.331589
Horsepower,104.869427,38.096214
Weight,2990.251592,843.898596
Acceleration,15.559236,2.78923
Model Year,75.898089,3.675642
Europe,0.178344,0.383413
Japan,0.197452,0.398712
USA,0.624204,0.485101


...and fit a normalizer on the train data:

In [5]:
normalizer = preprocessing.Normalization()
normalizer.adapt(np.array(train_features))

horsepower = np.array(train_features['Horsepower'])
horsepower_normalizer = preprocessing.Normalization(input_shape=[1,])
horsepower_normalizer.adapt(horsepower)

  and should_run_async(code)


In [6]:
def build_and_compile_model(norm):
    model = keras.Sequential([
        norm,
        layers.Dense(64, activation='relu'),
        layers.Dense(64, activation='relu'),
        layers.Dense(1)
    ])
    model.compile(loss='mean_absolute_error',
                  optimizer=tf.keras.optimizers.Adam(0.001))
    return model

In [17]:
class MlFlowCallback(tf.keras.callbacks.Callback):
    """ This Callback logs train and validation metrics to mlflow on every epoch end.
    """
    def on_epoch_end(self, epoch, logs=None):
        mlflow.log_metrics(metrics=logs, step=epoch)
        mlflow.log_metric(key="quality", value=2*epoch, step=epoch)

callbacks = [MlFlowCallback()]

In [18]:
mlflow.tensorflow.autolog()
model = build_and_compile_model(normalizer)

with mlflow.start_run():
    model.fit(train_features, train_labels,
              epochs=10,
              validation_data=(test_features, test_labels),
              callbacks=callbacks)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Now we can head to [http://141.3.29.7](http://141.3.29.7), log in with our user data and 