# Lab: Grid Search with MLflow

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lab you:<br>
 - Import the housing data
 - Perform grid search using scikit-learn
 - Log the best model on MLflow
 - Load the saved model
 
## Prerequisites
- Web browser: Chrome
- A cluster configured with **8 cores** and **DBR 7.3 ML**

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Setup

For each lesson to execute correctly, please make sure to run the **`Classroom-Setup`** cell at the<br/>
start of each lesson (see the next cell) and the **`Classroom-Cleanup`** cell at the end of each lesson.

In [0]:
%run "../Includes/Classroom-Setup"

## Data Import

Load in same Airbnb data and create train/test split.

In [0]:
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("/dbfs/mnt/training/airbnb/sf-listings/airbnb-cleaned-mlflow.csv")
X_train, X_test, y_train, y_test = train_test_split(df.drop(["price"], axis=1), df[["price"]].values.ravel(), random_state=42)

## Perform Grid Search using scikit-learn

We want to know which combination of hyperparameter values is the most effective. Fill in the code below to perform <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV" target="_blank"> grid search using `sklearn`</a> over the 2 hyperparameters we looked at in the 02 notebook, `n_estimators` and `max_depth`.

In [0]:
# TODO
import mlflow.sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

# dictionary containing hyperparameter names and list of values we want to try
parameters = {'n_estimators': #FILL_IN , 
              'max_depth': #FILL_IN }

rf = RandomForestRegressor()
grid_rf_model = GridSearchCV(rf, parameters, cv=3)
grid_rf_model.fit(X_train, y_train)

best_rf = grid_rf_model.best_estimator_
for p in parameters:
  print("Best '{}': {}".format(p, best_rf.get_params()[p]))

## Log Best Model on MLflow

Log the best model as `grid-random-forest-model`, its parameters, and its MSE metric under a run with name `RF-Grid-Search` in our new MLflow experiment.

In [0]:
# TODO
from sklearn.metrics import mean_squared_error

with mlflow.start_run(run_name= FILL_IN) as run:
  # Create predictions of X_test using best model
  # FILL_IN
  
  # Log model with name
  # FILL_IN
  
  # Log params
  # FILL_IN
  
  # Create and log MSE metrics using predictions of X_test and its actual value y_test
  # FILL_IN
  
  runID = run.info.run_uuid
  artifactURI = mlflow.get_artifact_uri()
  print("Inside MLflow Run with id {}".format(runID))

Check on the MLflow UI that the run `RF-Grid-Search` is logged has the best parameter values found by grid search.

-sandbox
## Load the Saved Model

Load the trained and tuned model we just saved. Check that the hyperparameters of this model matches that of the best model we found earlier.

<img alt="Hint" title="Hint" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-light-bulb.svg"/>&nbsp;**Hint:** Use the `artifactURI` variable declared above.

In [0]:
# TODO
model = < FILL_IN >

Time permitting, continue to grid search over a wider number of parameters and automatically save the best performing parameters back to `mlflow`.

In [0]:
# TODO

Time permitting, use the `MlflowClient` to interact programatically with your run.

In [0]:
# TODO

-sandbox
<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> See the solutions folder for an example solution to this lab.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Cleanup<br>

Run the **`Classroom-Cleanup`** cell below to remove any artifacts created by this lesson.

In [0]:
%run "../Includes/Classroom-Cleanup"

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Next Steps

Start the next lesson, Packaging ML Projects.

-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>