# Set MLFlow Tracking 

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/mlflow-tracking.png" width="600" />

## What you will learn in this course 🧐🧐

Let's start with MLFlow Tracking. This component of the API lets you collaborate on building Machine Learning models as you will be able to track each important metric of your models as you are training them.

* What is MLFlow Tracking 
* How to install MLFlow
* Set up MLFLow tracking on a remote server using Docker & Heroku

## What is MLFlow Tracking?

MLFlow tracking is here to help you to: 

* Monitor your ML trainings,
* Log parameters for hyper-parameter tuning,
* Log metrics for assessing for model performance.

When you are working in teams, an MLFlow tracking server is setup and all data scientists logs into it when they are building their models. This is what we will be building in this course.

> 👋 From now on, we will be using a little bit of vocabulary that you need to be familiar with to understand the rest of the course: 
> * **Experiment**: We qualify anything related to building an ML model as an experiment. 
> * **Persisting**: Relates to save (a model) as a set of files to be able to use it in a production environment. 
> * **Serve a model**: Relates to using a model in a production environment.

## Set MLFlow remote server 

At the heart of MLFlow is the idea of **collaboration**. Using it locally would be underestimating its capacities. Therefore, we'll be building a **remote tracking server** to use MLFlow tracking to its fullest. Here is the architecture we will need to build: 

![crack](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/MLFlow-tracking-server.png)

Let's explain this diagram a little bit: 

* **MLFlow (tracking) Server**: This is the first step. We will need to build a tracking server using [Heroku](https://www.heroku.com/) & [Docker](https://docker.com) so that anybody in our Data Science team will be able to monitor their experiments. 

* **Backend Store**: All the information related to an experiment needs to be stored in a SQL Database as MLFlow uses [SQLAlchemy](https://app.jedha.co/course/etl-processes-ft/sqlalchemy-ft) as its backend language to store data. For this, we will be using [Heroku Postgres](https://data.heroku.com/) datastore. It's free and works exactly as a PostgreSQL DB.

* **Artifact Store**: Finally, each time anybody in your team will train a model, we will want to persist it somewhere so that anybody can use it and serve it. 


Although it seems complicated at a first glance, it's actually not. We'll be using technologies that we already know about: 

![crack](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/MLFlow-tracking-server-Technologies.drawio.png)

### Example App 

Before we start, you can check out the end work on this app:

* [Checkout MLFlow remote tracking server](https://sample-mlflow-app.herokuapp.com/)

> 👋 It might take up to 2 minutes to open up since the app is hosted on a heroku free dyno. 

### Build your own remote server 🏗️

Alright, let's build our own remote server. You will need to follow these steps:

#### **Step-1a**: Build a Docker container 

Your first step is to build a Docker container by writing a `Dockerfile`. Here is an example you can follow: 

```Dockerfile
FROM continuumio/miniconda3

WORKDIR /home/app

RUN apt-get update
RUN apt-get install nano unzip
RUN apt install curl -y

RUN curl -fsSL https://get.deta.dev/cli.sh | sh

RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
RUN unzip awscliv2.zip
RUN ./aws/install

COPY requirements.txt /dependencies/requirements.txt
RUN pip install -r /dependencies/requirements.txt

ENV AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
ENV AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
ENV BACKEND_STORE_URI=$BACKEND_STORE_URI
ENV ARTIFACT_STORE_URI=$ARTIFACT_STORE_URI

CMD mlflow server -p $PORT \
    --host 0.0.0.0 \
    --backend-store-uri $BACKEND_STORE_URI \
    --default-artifact-root $ARTIFACT_STORE_URI


```

This `Dockerfile` starts from [`miniconda`](https://hub.docker.com/r/continuumio/miniconda3) image in which we add:

* `aws` cli 
* `nano` - to be able to edit files directly from console
* `curl` - to be able to download files
* `unzip` - to be able to unzip files 
* `requirements.txt` - dependencies that contains:
    ```
    boto3
    pandas 
    gunicorn 
    streamlit 
    sklearn 
    matplotlib 
    seaborn 
    plotly
    mlflow
    psycopg2-binary
    ```
* We setup several *environment variables* that we will setup on Heroku later on 


Now once you `Dockerfile` is ready, you can build you image:

* `docker build . -t sample-mlflow-server`

Then you can run your container:

```
>docker run -it\
> -p 4000:4000\
> -v "$(pwd):/home/app"\
> -e PORT=4000\
> -e AWS_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY_ID\
> -e AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY\
> -e BACKEND_STORE_URI=YOUR_BACKEND_STORE_URI\
> -e ARTIFACT_STORE_URI=ARTIFACT_STORE_URI\
sample-mlflow-server
```

You should see the following output:

```
[2022-01-02 17:01:40 +0000] [15] [INFO] Starting gunicorn 20.1.0
[2022-01-02 17:01:40 +0000] [15] [INFO] Listening at: http://0.0.0.0:4000 (15)
[2022-01-02 17:01:40 +0000] [15] [INFO] Using worker: sync
[2022-01-02 17:01:40 +0000] [16] [INFO] Booting worker with pid: 16
[2022-01-02 17:01:40 +0000] [17] [INFO] Booting worker with pid: 17
[2022-01-02 17:01:40 +0000] [18] [INFO] Booting worker with pid: 18
[2022-01-02 17:01:40 +0000] [19] [INFO] Booting worker with pid: 19
```

Now open up your browser and paste `http://0.0.0.0:4000`, you should see MLFlow UI appearing on your screen (if it does not work do not worry and keep following the demo):

![crack](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/MLFlow_local_server.png)


If you see it, you are done with half of the first step 🎉

> 👋 If you move around your app, you will see some bugs appearing. It's completely normal. It's because we haven't setup our Backend & Artifact Store and we haven't set them up in our Environment variables. But we will do them later on. 

#### **Step-1b**: Ship your container to heroku 

Now that your container works locally, you can directly ship it to Heroku: 

* `heroku create APP_NAME` - Create your app
* `heroku container:login` - make sure heroku and docker can talk together
* `heroku container:push web -a APP_NAME` - Push your container to heroku 
* `heroku container:release web -a APP_NAME` - Release container 
* `heroku open -a APP_NAME` - Open your app on your web browser.

Now you should see the exact same application with a real URL this time. 

> 👋 Again, you will see bugs on your app if you navigate it. It's for the exact same reason than in `step-1a`

#### **Step-2**: Create your Backend store 

Now, let's create our Backend store. To do so: 

1. [Install Heroku Postgres](https://elements.heroku.com/addons/heroku-postgresql) on your heroku Account
2. On the submission form, select a **Hobby Dev - Free** plan and choose the application you need attach your DB to. 

![crack](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/Heroku_postgres_installation.png)

3. You should be prompted to this screen

![crack](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/Heroku_postgres_dashboard.png)

4. Now click on `Heroku postgres` > `Settings` > `View Credentials` and you will be able to see the backend store uri that you will need to copy/paste in your environment variable

![crack](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/Heroku_postgres_URI.png)

> 👋 **IMPORTANT**, your URI looks something like this: `postgres://...`. **YOU NEED TO REPLACE IT BY**: `postgresql://...` in your `$BACKEND_STORE_URI` environment variable

<Video video='https://vimeo.com/661767872' />


#### **Step-3**: Create your Artifact Store

To create your artifact store, simply [create an S3 Bucket](https://app.jedha.co/course/data-storage-ft/simple-storage-service-ft). Now look for your bucket's URI: 

![snap](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/S3_URI.png)

#### **Step-4**: Paste your environment variables at the right place

Alright, we have all the components we need. Now the final thing to do is to set up **environment variables**. You need to gather the following:

* Your [AWS credentials](https://app.jedha.co/course/data-storage-ft/iam-ft). Especially:
    * `AWS_ACCESS_KEY_ID`
    * `AWS_SECRET_ACCESS_KEY`
* Your `BACKEND_STORE_URI` (on Heroku)
* Your `ARTIFACT_STORE_URI` (Your S3 Bucket)

> 👋 Also if you want to run your app locally using Docker, you can simply run the following command:
> ```
>docker run -it\
> -p 4000:4000\
> -v "$(pwd):/home/app"\
> -e PORT=4000\
> -e AWS_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY_ID\
> -e AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY\
> -e BACKEND_STORE_URI=YOUR_BACKEND_STORE_URI\
> -e ARTIFACT_STORE_URI=ARTIFACT_STORE_URI\
sample-mlflow-server
> ```
>
> 👋 👋 **IMPORTANT** Again, for your `BACKEND_STORE_URI` you need to replace `postgres://...` by `postgresql://...`

## Try MLFlow tracking

Now if everything works correctly, we should be able to run an experiment and our server should be able to track it. Create a `train.py` file and copy/past the following code:

```python
import os
import mlflow
from mlflow import log_metric, log_param, log_artifacts
from random import random, randint

# Set tracking URI to your Heroku application
mlflow.set_tracking_uri(os.environ["APP_URI"])

if __name__ == "__main__":
    # Log a parameter (key-value pair)
    log_param("param1", randint(0, 100))

    # Log a metric; metrics can be updated throughout the run
    log_metric("foo", random())
    log_metric("foo", random() + 1)
    log_metric("foo", random() + 2)

    # Log an artifact (output file)
    if not os.path.exists("outputs"):
        os.makedirs("outputs")
    with open("outputs/test.txt", "w") as f:
        f.write("hello world!")
    log_artifacts("outputs")
```

Run your code using Docker and the image you built in your Dockerfile at the beginning of this tutorial. Here is the command: 

```
docker run -it\
 -p 4000:4000\
 -v "$(pwd):/home/app"\
 -e APP_URI="APP_URI"\
 -e AWS_ACCESS_KEY_ID="AWS_ACCESS_KEY_ID"\
 -e AWS_SECRET_ACCESS_KEY="AWS_SECRET_ACCESS_KEY"\
 sample-mlflow-server python train.py
```

Go back to your heroku app, refresh the page and you should see some new information that appeared 😉

> 👋 If, for some reason, your image is not working use Jedha's image `jedha/sample-mlflow-server`

## Resources 📚📚

* [SQLAlchemy Core Engine](https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls)
* [Heroku Postgres](https://devcenter.heroku.com/articles/heroku-postgresql)
* [Connecting to Heroku Postgres Databases from Outside of Heroku](https://devcenter.heroku.com/articles/connecting-to-heroku-postgres-databases-from-outside-of-heroku)
* [Set Up MLflow on AWS EC2 Using Docker, S3, and RDS](https://aws.plainenglish.io/set-up-mlflow-on-aws-ec2-using-docker-s3-and-rds-90d96798e555)