In [1]:
!python -c "import sys; print(sys.executable)"

/home/maviaalamkhan/Downloads/maviaxloop/bin/python


# MLFlow lab

In [2]:
import pandas as pd

In [3]:
pd.__version__

'2.0.0'

### Setting up MLFlow tracking server

We also specify artifact root and backend store URI. This makes it possible to store models.

After running this command tracking server will be accessible at `localhost:5000`

In [4]:
%%bash --bg

mlflow server --host 0.0.0.0 \
    --port 5000 \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns

### MLProject file

This file is used to configure MLFlow steps.

Using `MLproject` we can define our project's pipeline steps, called *entry points*.

Each entry point in this file corresponds to a shell command.

Entry points can be ran using

```
mlflow run -e <ENTRY_POINT>
```

By default `mlflow run` runs `main` entrypoint.

In [5]:
%cat MLproject

name: basic_mlflow

# this file is used to configure Python package dependencies.
# it uses Anaconda, but it can be also alternatively configured to use pip.
conda_env: conda.yaml

# entry points can be ran using `mlflow run <project_name> -e <entry_point_name>
entry_points:
  # download_data:
    # you can run any command using MLFlow
    # command: "bash download_data.sh"
  # MLproject file has to have main entry_point. It can be toggled without using -e option.
  main:
    # parameters is a key-value collection.
    parameters:
      file_name:
        type: str
        default: "data.csv"
      max_n:
        type: int
        default: 100
    command: "python train.py {file_name} {max_n}"



First we need to download data. We will use weather data from previous machine learning tutorial.

In [6]:
# %%bash
# source mlflow_env_vars.sh
# mlflow run .  -e download_data

Traceback (most recent call last):
  File "/home/maviaalamkhan/Documents/mlop/data_engineering_bootcamp_2303/tasks/3_machine_learning_essentials/day_4_mlops/mlops-student/bin/mlflow", line 10, in <module>
    sys.exit(cli())
  File "/home/maviaalamkhan/Documents/mlop/data_engineering_bootcamp_2303/tasks/3_machine_learning_essentials/day_4_mlops/mlops-student/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/maviaalamkhan/Documents/mlop/data_engineering_bootcamp_2303/tasks/3_machine_learning_essentials/day_4_mlops/mlops-student/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/maviaalamkhan/Documents/mlop/data_engineering_bootcamp_2303/tasks/3_machine_learning_essentials/day_4_mlops/mlops-student/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/maviaalamkhan/Documents/mlop/da

CalledProcessError: Command 'b'source mlflow_env_vars.sh\nmlflow run .  -e download_data\n'' returned non-zero exit status 1.

## Training

Now we can train models. See `train.py`.
It contains code from supervised machine learning tutorial; we added tracking metrics and model.

We will train kNN models for $k \in \{1, 2, ..., 10\}$ using *temperature* and *casual* features.

After running this command you can go to `localhost:5000` and see the trained models.

In [None]:
import sklearn

In [None]:
sklearn.__version__

In [None]:
%%bash
source mlflow_env_vars.sh
mlflow run . 

## Inspecting stored models

The trained models are stored in `mlruns/0`.

These directories contain artifacts and config that is needed to serve them.

In [None]:
%%bash
last_model_path=$(ls -tr mlruns/0/ | tail -1)
cat mlruns/0/$last_model_path/artifacts/knn/MLmodel

In [None]:
import mlflow

In [None]:
mlflow.__version__

## Serving model

Now that we trained our models we can go to *Models* page on MLFLow UI (http://localhost:5000/#/models).

Click *sklearn_knn* on this page, choose a model and move it to *Production* stage.

The following cell will serve the model at localhost on port 5001.

In [None]:
%%bash --bg
source mlflow_env_vars.sh
mlflow --version
mlflow models serve -m models:/sklearn_knn/Production -p 5001 --env-manager=conda 


# Prediction

We'll load data that we can feed into prediction server.

In [None]:
df = pd.read_csv("day.csv")[["temp", "casual", "season"]]
df["is_winter"] = df["season"] == 1

df[~df["is_winter"]].head()

Let's predict for first winter day and first non-winter day (first rows of previous two dataframes)

**warning: this might fail at first because the prediction server didn't spin up; in this case wait a minute**

In [None]:
%%bash
data='[[0.344,331], [0.43, 401]]'
echo $data

curl -d "{\"inputs\": $data}" -H 'Content-Type: application/json' 127.0.0.1:5001/invocations

In [None]:
%%bash
data='[[0.344,331], [0.43, 401]]'
echo $data

curl -d "{\"instances\": $data}" -H 'Content-Type: application/json' 127.0.0.1:5001/invocations

In [None]:
%%bash
data='[[0.344,331], [0.43, 401]]'
columns='["temp","casual"]'
echo $data

curl -d "{\"dataframe_split\":{\"columns\":[\"temp\",\"casual\"],\"data\": $data}}" -H 'Content-Type: application/json' 127.0.0.1:5001/invocations

Voila! We see that the model outputs correct predictions.