# Tutorial

In [1]:
!pwd

/workspace/Experiments


## Local File store logging
By default, mlflow run as local file store mode generating logging information under current working directory. You can check what files are created at every mlflor logging as I print out the directory structures below

In [2]:
import os
from random import random, randint
from mlflow import log_metric, log_param, log_artifacts

if __name__ == "__main__":
    # Log a parameter (key-value pair)
    for i in range(2):
        print(f'[{i}]-------------------')
        !tree /workspace/Experiments

        log_param(f"param{i}", randint(0, 100))
        # Log a metric; metrics can be updated throughout the run
        log_metric("foo", random())
        log_metric("foo", random() + 1)
        log_metric("foo", random() + 2)
        # Log an artifact (output file)
        if not os.path.exists("outputs"):
            os.makedirs("outputs")
        with open("outputs/test.txt", "w") as f:
            f.write("hello world!")
        log_artifacts("outputs")
        !tree /workspace/Experiments
        

[0]-------------------
[01;34m/workspace/Experiments[00m
├── install-venv.sh
├── mlflow-rnd.ipynb
├── README.md
└── [01;32mrun-venv.sh[00m

0 directories, 4 files
[01;34m/workspace/Experiments[00m
├── install-venv.sh
├── mlflow-rnd.ipynb
├── [01;34mmlruns[00m
│   └── [01;34m0[00m
│       ├── [01;34mcc1bcf9a32594cef906e21def7407134[00m
│       │   ├── [01;34martifacts[00m
│       │   │   └── test.txt
│       │   ├── meta.yaml
│       │   ├── [01;34mmetrics[00m
│       │   │   └── foo
│       │   ├── [01;34mparams[00m
│       │   │   └── param0
│       │   └── [01;34mtags[00m
│       │       ├── mlflow.source.name
│       │       ├── mlflow.source.type
│       │       └── mlflow.user
│       └── meta.yaml
├── [01;34moutputs[00m
│   └── test.txt
├── README.md
└── [01;32mrun-venv.sh[00m

8 directories, 13 files
[1]-------------------
[01;34m/workspace/Experiments[00m
├── install-venv.sh
├── mlflow-rnd.ipynb
├── [01;34mmlruns[00m
│   └── [01;34m0[00m
│       ├─

## Remote Tracking Server



MLflow supports seven types of filesystme to store artifact as follows. And I use [SFTP](https://mlflow.org/docs/latest/tracking.html#sftp-server) for my convenience of this research.
* Amazon S3 and S3-compatible storage
* Azure Blob Storage
* Google Cloud Storage
* FTP server
* `SFTP Server`
* NFS
* HDFS


Below is a part of setting public key based authentication method for ssh or sftp connection to remote server. 

```bash
ssh-keygen -t rsa
ssh-keygen -p -m PEM -f ~/.ssh/id_rsa
# and copy and paste a string from id_rsa.pub to ~/.ssh/authorized_keys in remote server
```

```python
#append  configuration in ~/.ssh/config file as follows
Host kingo.iptime.org
    Hostname kingo.iptime.org
    Port 7022
    User damianos
    ServerAliveCountMax 10
    ServerAliveInterval 60
    IdentityFile ~/.ssh/id_rsa
```

For the reason that there is a bug in pysftp, temporal modification of pysftp source code is required in a file '/opt/conda/envs/mlflow/lib/python3.7/site-packages/mlflow/store/artifact/sftp_artifact_repo.py'
Replace line 70 with next 4 lines of code
```python
70             options = pysftp.CnOpts()
71             options.hostkeys=None
72             self.config['cnopts']=options
73             self.sftp = pysftp.Connection(**self.config)
```


MLflow server is running on my host OS.  This section folelow the steps explained in mlflow [Quickstart page](https://mlflow.org/docs/latest/quickstart.html#logging-to-a-remote-tracking-server) to test how mlflow client successfully write log to the mlflow server running on the remote server. To test this, we need to install packages such as mlflow and pysftp and execute server with a few options as below code block. 

```bash
pip install pysftp
pip install mlfow[extras]
mkdir ~/mlflow/store ~/mlflow/artifact
mlflow server --port 9000 --host 0.0.0.0 \
    --backend-store-uri /home/damianos/mlflow/store \
    --default-artifact-root sftp://damianos@kingo.iptime.org:7022/home/damianos/mlflow/artifact
```


In [8]:
import pysftp

options = pysftp.CnOpts()
options.hostkeys=None
with pysftp.Connection(host='kingo.iptime.org',port=7022, username='damianos', cnopts=options ) as sftp:
    with sftp.cd('/home/damianos'):             # temporarily chdir to public
        print(sftp.listdir())

['.bash_history', '.bash_it', '.bash_logout', '.bashrc', '.bashrc.backup', '.bashrc.bak', '.boto', '.cache', '.config', '.docker', '.exa', '.gitconfig', '.gnome', '.gnupg', '.ipynb_checkpoints', '.ipython', '.iterm2', '.iterm2_shell_integration.bash', '.iterm2_shell_integration.zsh', '.jupyter', '.keras', '.lesshst', '.local', '.mozilla', '.mypy_cache', '.npm', '.nv', '.nvm', '.oh-my-zsh', '.p10k.zsh', '.pam_environment', '.pki', '.profile', '.python_history', '.shell.pre-oh-my-zsh', '.ssh', '.sudo_as_admin_successful', '.thunderbird', '.vim', '.viminfo', '.vimrc', '.virtual_documents', '.vnc', '.vscode', '.wget-hsts', '.workspace', '.xinputrc', '.yarn', '.zcompcache', '.zcompdump', '.zcompdump-damianos-5.8', '.zplug', '.zsh_history', '.zshrc', '.zshrc.bak', 'Desktop', 'Documents', 'Downloads', 'Music', 'Pictures', 'Public', 'Templates', 'Untitled.ipynb', 'Untitled1.ipynb', 'Untitled10.ipynb', 'Untitled2.ipynb', 'Untitled3.ipynb', 'Untitled4.ipynb', 'Untitled5.ipynb', 'Untitled6.ipynb'

In [10]:
import os
from random import random, randint
from mlflow import log_metric, log_param, log_artifacts
import mlflow

# mlflow.set_tracking_uri("http://YOUR-SERVER:4040")
mlflow.set_tracking_uri("http://kingo.iptime.org:9000")
mlflow.set_experiment("my-experiment-1")

if __name__ == "__main__":
    # Log a parameter (key-value pair)
    !tree /workspace/Experiments

    with mlflow.start_run(nested=True):
        for i in range(15,20):
            print(f'[{i}]-------------------')

            log_param(f"param{i}", randint(0, 100))
            # Log a metric; metrics can be updated throughout the run
            log_metric("foo", random())
            log_metric("foo", random() + 1)
            log_metric("foo", random() + 2)
            # Log an artifact (output file)
            if not os.path.exists("outputs"):
                os.makedirs("outputs")
            with open("outputs/test.txt", "w") as f:
                f.write("hello world!")
            log_artifacts("outputs")

    !tree /workspace/Experiments


[01;34m/workspace/Experiments[00m
├── install-venv.sh
├── mlflow-rnd.ipynb
├── [01;34mmlruns[00m
│   └── [01;34m0[00m
│       ├── [01;34mcc1bcf9a32594cef906e21def7407134[00m
│       │   ├── [01;34martifacts[00m
│       │   │   └── test.txt
│       │   ├── meta.yaml
│       │   ├── [01;34mmetrics[00m
│       │   │   └── foo
│       │   ├── [01;34mparams[00m
│       │   │   ├── param0
│       │   │   └── param1
│       │   └── [01;34mtags[00m
│       │       ├── mlflow.source.name
│       │       ├── mlflow.source.type
│       │       └── mlflow.user
│       └── meta.yaml
├── [01;34moutputs[00m
│   └── test.txt
├── README.md
└── [01;32mrun-venv.sh[00m

8 directories, 14 files
[15]-------------------
[16]-------------------
[17]-------------------
[18]-------------------
[19]-------------------
[01;34m/workspace/Experiments[00m
├── install-venv.sh
├── mlflow-rnd.ipynb
├── [01;34mmlruns[00m
│   └── [01;34m0[00m
│       ├── [01;34mcc1bcf9a32594cef906e21def7407134

There are no changes in local folder filesystem as, deservedly, remote tracking is enabled and we can confirm that logging results are shown on the remote mlflow server UI. Following screenshots are the UI

![](./mlflow1.png)

![](./mlflow2.png)

![](./mlflow3.png)

![](./mlflow4.png)

![](./mlflow5.png)

In [5]:
!ls -lrt /tmp
!date

total 44
drwxr-xr-x 3 root    root    4096 Sep  3 20:45 v8-compile-cache-0
drwx------ 2 root    root    4096 Sep  6 12:18 ssh-MyaNkLSMmYVF
drwx------ 2 root    root    4096 Sep  6 12:18 ssh-uNVU9NZInVqe
-rw------- 1 root    root     338 Sep  6 12:18 ICEauthority
drwx------ 2 root    root    4096 Sep  6 12:18 gvfs
drwx------ 2 root    root    4096 Sep  6 12:18 dconf
drwx------ 2 root    root    4096 Sep  6 12:18 tracker-extract-files.0
srwxrwx--- 1 netdata netdata    0 Sep  6 16:15 netdata-ipc
-rw------- 1 root    root    1819 Sep  6 22:21 jupyterlab-debug-gcqo9gzt.log
drwx------ 2 root    root    4096 Sep  6 22:46 tmpuilak3q5_kernels
drwx------ 2 root    root    4096 Sep  6 22:49 tmpy3qfe135_kernels
drwx------ 2 root    root    4096 Sep  6 22:51 tmplyshyh4v_kernels
Mon 06 Sep 2021 11:14:11 PM KST


# MLflow on remote with Tracking Server and SQLite
This example requires SQLite db for ElasticnetWineModel. Execute remote server with a new command as follows.
```bash
mlflow server --port 9000 --host 0.0.0.0 \                                                 
    --backend-store-uri sqlite:///mydb.sqlite \
    --default-artifact-root sftp://damianos@kingo.iptime.org:7022/home/damianos/mlflow/artifact
```

In [6]:
#parameters 
sysargv=[0,]

In [8]:
#  The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
import mlflow.sklearn

import logging

logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)
mlflow.set_tracking_uri("http://kingo.iptime.org:9000")
mlflow.set_experiment("my-experiment-2")

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


if __name__ == "__main__":
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    csv_url = (
        "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    )
    try:
        data = pd.read_csv(csv_url, sep=";")
    except Exception as e:
        logger.exception(
            "Unable to download training & test CSV, check your internet connection. Error: %s", e
        )

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    print('sysargv',sysargv)
    alpha = float(sysargv[1]) if len(sysargv) > 1 else 0.5
    l1_ratio = float(sysargv[2]) if len(sysargv) > 2 else 0.5

    with mlflow.start_run():
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

        # Model registry does not work with file store
        if tracking_url_type_store != "file":

            # Register the model
            # There are other ways to use the Model Registry, which depends on the use case,
            # please refer to the doc for more information:
            # https://mlflow.org/docs/latest/model-registry.html#api-workflow
            mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel")
        else:
            mlflow.sklearn.log_model(lr, "model")

INFO: 'my-experiment-2' does not exist. Creating a new experiment
sysargv [0]
Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.7931640229276851
  MAE: 0.6271946374319586
  R2: 0.10862644997792614


Successfully registered model 'ElasticnetWineModel'.
2021/09/07 03:30:22 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: ElasticnetWineModel, version 1
Created version '1' of model 'ElasticnetWineModel'.


In [9]:
# Wine Quality Sample
def train(in_alpha, in_l1_ratio):
    import os
    import warnings
    import sys

    import pandas as pd
    import numpy as np
    from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import ElasticNet

    import mlflow
    import mlflow.sklearn
    
    import logging
    logging.basicConfig(level=logging.WARN)
    logger = logging.getLogger(__name__)

    def eval_metrics(actual, pred):
        rmse = np.sqrt(mean_squared_error(actual, pred))
        mae = mean_absolute_error(actual, pred)
        r2 = r2_score(actual, pred)
        return rmse, mae, r2


    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    csv_url =\
        'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
    try:
        data = pd.read_csv(csv_url, sep=';')
    except Exception as e:
        logger.exception(
            "Unable to download training & test CSV, check your internet connection. Error: %s", e)

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    # Set default values if no alpha is provided
    if float(in_alpha) is None:
        alpha = 0.5
    else:
        alpha = float(in_alpha)

    # Set default values if no l1_ratio is provided
    if float(in_l1_ratio) is None:
        l1_ratio = 0.5
    else:
        l1_ratio = float(in_l1_ratio)

    # Useful for multiple runs (only doing one run in this sample notebook)    
    with mlflow.start_run():
        # Execute ElasticNet
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        # Evaluate Metrics
        predicted_qualities = lr.predict(test_x)
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        # Print out metrics
        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log parameter, metrics, and model to MLflow
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, "model")

In [10]:
train(0.5, 0.5)


Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.7931640229276851
  MAE: 0.6271946374319586
  R2: 0.10862644997792614


In [11]:
train(0.2, 0.2)


Elasticnet model (alpha=0.200000, l1_ratio=0.200000):
  RMSE: 0.7336400911821402
  MAE: 0.5643841279275428
  R2: 0.23739466063584158


In [12]:
train(0.1, 0.1)


Elasticnet model (alpha=0.100000, l1_ratio=0.100000):
  RMSE: 0.7128829045893679
  MAE: 0.5462202174984664
  R2: 0.2799376066653344
