# PyData PDX Lightning Talk: Experiment Tracking with MLflow


## What is MLflow?

<img src='images/mlflow.png' width=500>

MLflow is an OSS (Linux Foundation) ML platform, meaning it *supports* standardization and best practices in ML-related tasks:
* Experiment tracking
* Artifact repository (e.g., trained models)
* Multistep workflow support
* Model registry (lineage, versioning, etc.)
* Model serving

Where it fits into the industry
* Created in 2018
* Work-in-progress: it doesn't do "everything" and continues to evolve
* Alternative: lots of proprietary ML platforms; YMMV
* Lots of OSS pieces, a few OSS platforms: cf. Kubeflow, Seldon Core, Pachyderm, others
    * Also "works in progress"

Why MLflow today?
* Start lightweight (most others assume + require Kubernetes)
* Scale if needed

## Today: First Steps w Experiment Tracking

__What is experiment tracking?__

Collecting information about ML trials so that we can compare, review, and generally maintain order.

Specifically, we may want to log
* Model configuration / hyperparams
* Training progress
* Train/validation/test scores
* Arbitrary properties with business value, such as "dollar cost to train" or "cost per 1000 predictions"

## Hello Tracking World with MLflow

1. Open a terminal from the JupyterLab launcher and run `mlflow ui`
2. Open a new browser tab to the current URL *but* where it says `lab` enter `proxy/5000/`
3. Let's record something:

In [3]:
from mlflow import log_metric, log_param, end_run

log_param("foo_setting", 42)
log_param("bar_setting", 43)

for i in range(10):
    log_metric("score", i)
    
end_run()

4. Check the UI (you may have to refresh the UI tab)

This "minimally invasive on-ramp" is a great feature of MLflow. We can do more things, and more complicated things, but we don't *have* to on day one.