### MLflow Project Example

<table>
  <tr><td>
    <img src="https://raw.githubusercontent.com/dmatrix/mlflow-workshop-part-2/master/images/project.png"
         alt="Bank Note " width="400">
  </td></tr>
</table>

An MLflow Project is a format for packaging data science code in a reusable and reproducible way, based primarily on conventions. 

In addition, the Projects component includes an API and command-line tools for running projects, making it possible to chain together projects into workflows.
You can run projects as:

* From command line: ```mlflow run git://<my_project> -P <arg>=<value> ... -P <arg>=<value>```
* In GitHub Repo: ``` cd <gitbub_project_directory>; mlflow run . -e <entry_point> -P <arg>=<value> ... -P <arg>=<value>```
* Programmatically: ``` mlflow.run("git://<my_project>", parameters={'arg':value, 'arg':value})```
* Programmatically: ``` mlflow.projects.run("git://<my_project>", parameters={'arg':value, 'arg':value})```

### What's does a MLflow Project Look Like?


[MLflow Project Example](https://github.com/mlflow/mlflow-example)
 * MLProject
 * conda.yaml
 * code ...
 * data

In [2]:
import mlflow
import warnings
from mlflow import projects
print(mlflow.__version__)

1.10.0


#### Configure databricks CLI

We have to configure the credentials here only for Databricks notebooks. On a localhost this step is unecessary.

Define arguments for alpha

In [3]:
parameters = [{'alpha': 0.3}]
ml_project_uri = "git://github.com/mlflow/mlflow-example.git"

Use MLflow Fluent API
 * [mlflow.run(...)](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.run)
 * Returns [SubmittedRun](https://mlflow.org/docs/latest/python_api/mlflow.projects.html#mlflow.projects.SubmittedRunhttps://mlflow.org/docs/latest/python_api/mlflow.projects.html#mlflow.projects.SubmittedRunhttps://mlflow.org/docs/latest/python_api/mlflow.projects.html#mlflow.projects.SubmittedRun) object

In [4]:
warnings.filterwarnings("ignore", category=DeprecationWarning)
# Iterate over three different runs with different parameters
for param in parameters:
  print(f"Running with param = {param}"),
  res_sub = mlflow.run(ml_project_uri, parameters=param)
  print(f"status={res_sub.get_status()}")
  print(f"run_id={res_sub.run_id}")

2020/08/18 14:37:59 INFO mlflow.projects.utils: === Fetching project from git://github.com/mlflow/mlflow-example.git into /var/folders/jz/qg062ynx5v39wwmfxmph5nn40000gp/T/tmpxglrl4tq ===


Running with param = {'alpha': 0.3}


2020/08/18 14:38:02 INFO mlflow.projects: === Creating conda environment mlflow-1abc00771765dd9dd15731cbda4938c765fbb90b ===
2020/08/18 14:38:54 INFO mlflow.projects: === Created directory /var/folders/jz/qg062ynx5v39wwmfxmph5nn40000gp/T/tmpt5aklxoc for downloading remote URIs passed to arguments of type 'path' ===
2020/08/18 14:38:54 INFO mlflow.projects: === Running command 'source /Applications/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-1abc00771765dd9dd15731cbda4938c765fbb90b 1>&2 && python train.py 0.3 0.1' in run with ID '636d5a64af3a4fef81fbc21f903a631b' === 
2020/08/18 14:38:56 INFO mlflow.projects: === Run (ID '636d5a64af3a4fef81fbc21f903a631b') succeeded ===


status=FINISHED
run_id=636d5a64af3a4fef81fbc21f903a631b


Use MLflow Project API
 * [mlflow.projects.run(...)](https://mlflow.org/docs/latest/python_api/mlflow.projects.html#mlflow.projects.run)
 * Returns [SubmittedRun](https://mlflow.org/docs/latest/python_api/mlflow.projects.html#mlflow.projects.SubmittedRun) object

In [5]:
warnings.filterwarnings("ignore", category=DeprecationWarning)
# Iterate over three different runs with different parameters
for param in parameters:
  print(f"Running with param = {param}"),
  res_sub = projects.run(ml_project_uri, parameters=param)
  print(f"status={res_sub.get_status()}")
  print(f"run_id={res_sub.run_id}")

2020/08/18 14:39:08 INFO mlflow.projects.utils: === Fetching project from git://github.com/mlflow/mlflow-example.git into /var/folders/jz/qg062ynx5v39wwmfxmph5nn40000gp/T/tmpogyg1jlb ===


Running with param = {'alpha': 0.3}


2020/08/18 14:39:11 INFO mlflow.projects: === Created directory /var/folders/jz/qg062ynx5v39wwmfxmph5nn40000gp/T/tmp_p9vjazd for downloading remote URIs passed to arguments of type 'path' ===
2020/08/18 14:39:11 INFO mlflow.projects: === Running command 'source /Applications/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-1abc00771765dd9dd15731cbda4938c765fbb90b 1>&2 && python train.py 0.3 0.1' in run with ID '2fbaef8c4bfa422d8bef45f33b9f8e24' === 
2020/08/18 14:39:12 INFO mlflow.projects: === Run (ID '2fbaef8c4bfa422d8bef45f33b9f8e24') succeeded ===


status=FINISHED
run_id=2fbaef8c4bfa422d8bef45f33b9f8e24


### Check the MLflow UI
 * Add Notes & Tags
 * Compare Runs pick two best runs
 * Annotate with descriptions and tags
 * Evaluate the best run
 * Check for conda.yaml
 * Check for Metrics

In [None]:
!mlflow ui

[2020-08-18 14:39:53 -0700] [2814] [INFO] Starting gunicorn 20.0.4
[2020-08-18 14:39:53 -0700] [2814] [INFO] Listening at: http://127.0.0.1:5000 (2814)
[2020-08-18 14:39:53 -0700] [2814] [INFO] Using worker: sync
[2020-08-18 14:39:53 -0700] [2821] [INFO] Booting worker with pid: 2821


### Excercise Assignment. Try different runs with:
* Change or add parameters `alpha`values
* Change or alter the range of runs and increments of n_estimators
* Check in MLfow UI if the metrics are affected