# Machine learning application: Forecasting wind power

<table>
  <tr><td>
    <img src="https://github.com/dmatrix/mlflow-workshop-part-3/raw/master/images/wind_farm.jpg"
         alt="Keras NN Model as Logistic regression"  width="800">
  </td></tr>
</table>

In this notebook, you will use the MLflow Model Registry to build a machine learning application that forecasts the daily power output of a [wind farm](https://en.wikipedia.org/wiki/Wind_farm). Wind farm power output depends on weather conditions: generally, more energy is produced at higher wind speeds. Accordingly, the machine learning models used in the notebook predict power output based on weather forecasts with three features: `wind direction`, `wind speed`, and `air temperature`.

*This notebook uses altered data from the [National WIND Toolkit dataset](https://www.nrel.gov/grid/wind-toolkit.html) provided by NREL, which is publicly available and cited as follows:*

*Draxl, C., B.M. Hodge, A. Clifton, and J. McCaa. 2015. Overview and Meteorological Validation of the Wind Integration National Dataset Toolkit (Technical Report, NREL/TP-5000-61740). Golden, CO: National Renewable Energy Laboratory.*

*Draxl, C., B.M. Hodge, A. Clifton, and J. McCaa. 2015. "The Wind Integration National Dataset (WIND) Toolkit." Applied Energy 151: 355366.*

*Lieberman-Cribbin, W., C. Draxl, and A. Clifton. 2014. Guide to Using the WIND Toolkit Validation Code (Technical Report, NREL/TP-5000-62595). Golden, CO: National Renewable Energy Laboratory.*

*King, J., A. Clifton, and B.M. Hodge. 2014. Validation of Power Output for the WIND Toolkit (Technical Report, NREL/TP-5D00-61714). Golden, CO: National Renewable Energy Laboratory.*

*(This notebook is from the [Spark + AI Summit keynote demo](https://databricks.com/session_eu19/1-simplifying-model-management-with-mlflow), originally written by Corey Zumar and later modifed, modularized and shortened for MLflow tutorial by Jules S. Damji)*

## Load the dataset

The following cells load a dataset containing weather data and power output information for a wind farm in the United States. The dataset contains `wind direction`, `wind speed`, and `air temperature`, features sampled every eight hours (once at `00:00`, once at `08:00`, and once at `16:00`), as well as daily aggregate power output (`power`), over several years.

Run the notebook to bring defined classes for models and utilities into this notebook scope

In [0]:
%run ./classes_init

In [0]:
csv_path = "https://raw.githubusercontent.com/dmatrix/mlflow-workshop-part-3/master/src/data/windfarm_data.csv"
wind_farm_data = Utils.load_data(csv_path, index_col=0)

Display a sample of the data for reference.

In [0]:
wind_farm_data.head(5)

Unnamed: 0,temperature_00,wind_direction_00,wind_speed_00,temperature_08,wind_direction_08,wind_speed_08,temperature_16,wind_direction_16,wind_speed_16,power
2014-01-01,4.702022,106.74259,4.743292,7.189482,100.41638,6.593833,8.172301,99.288,5.967206,1959.3535
2014-01-02,7.695733,98.036705,6.142716,9.977118,94.03181,4.383676,9.690135,204.25444,1.696528,1266.6239
2014-01-03,9.608235,274.0612,10.514304,10.840864,242.87563,16.869741,8.991079,250.2683,12.038399,7545.6797
2014-01-04,6.955563,257.91022,7.18917,5.317223,254.2617,9.069233,3.021174,284.06537,4.590843,3791.0408
2014-01-05,0.830547,265.3944,4.263086,2.480239,104.79496,3.042063,4.227131,263.4169,3.899182,880.6115


In [0]:
wind_farm_data.describe()

Unnamed: 0,temperature_00,wind_direction_00,wind_speed_00,temperature_08,wind_direction_08,wind_speed_08,temperature_16,wind_direction_16,wind_speed_16,power
count,2555.0,2555.0,2555.0,2555.0,2555.0,2555.0,2555.0,2555.0,2555.0,2555.0
mean,9.613588,198.381353,5.001413,12.827786,191.538394,5.136628,13.991459,202.051406,6.072578,2424.97888
std,5.187017,81.767411,2.940271,6.536972,83.572022,3.016934,6.862011,88.747683,2.585946,2035.838491
min,-6.172769,14.527682,0.232208,-3.989666,14.087781,0.51922,-4.087519,5.659133,0.561177,207.06557
25%,6.228526,111.309685,2.863503,7.932104,108.053078,2.897562,8.576865,111.450595,4.299781,947.10043
50%,8.688278,214.46178,4.265017,11.178736,174.02873,4.356376,12.392289,245.21094,5.785738,1674.4309
75%,13.045183,276.89433,6.263791,17.898135,281.36138,6.745475,19.687266,275.93715,7.550025,3274.85645
max,27.039269,352.76413,19.162184,30.118828,351.4283,23.394577,32.521378,349.5891,18.859266,10774.853


### Get Training and Validation Data

In [0]:
X_train, y_train = Utils.get_training_data(wind_farm_data)
val_x, val_y = Utils.get_validation_data(wind_farm_data)

# Train a power forecasting model and track it with MLflow

The following cells train a neural network in Keras to predict power output based on the weather features in the dataset. MLflow is used to track the model's hyperparameters, performance metrics, source code, and artifacts.

Train the model and use MLflow to track its parameters, metrics, artifacts, and source code.

Define a power forecasting model in Keras, and create three different models with different configuratons and tuning parameters

<table>
  <tr><td>
    <img src="https://github.com/dmatrix/mlflow-workshop-part-3/raw/master/images/nn_linear_regression.png"
         alt="Keras NN Model as Logistic regression"  width="800">
  </td></tr>
</table>

Iterate over three different set of tunning parameters and track all its results:
 * input_units
 * epochs
 * batch_size

In [0]:
!ls

conf  derby.log  eventlogs  ganglia  logs  metastore_db  preload_class.lst


In [0]:
for (input_units, epochs, batch_size) in [(64, 100, 64), (128, 200, 128), (256, 300, 128)]:
  keras_obj = KerasModel(X_train, input_units = input_units, loss="mse", optimizer="adam", metrics=["mse"])
  run_id = keras_obj.mlflow_run(X_train, y_train,  epochs=epochs, batch_size=batch_size, validation_split=.2)
  print(f"Finished running Keras training with run_id={run_id}")

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
 1/19 [>.............................] - ETA: 0s - loss: 43

Using mlflow version 1.21.0


# Register the best Keras model with the MLflow Model Registry

Now that a forecasting model has been trained and tracked with MLflow, the next step is to register it with the MLflow Model Registry. You can register and manage models using the MLflow UI.

## Part 1: Use a Neural Network model using Kera

### Create a new Registered Model

1. First, navigate to the MLflow Runs Sidebar by clicking the `Runs` icon in the Databricks Notebook UI.


2. Next, locate the MLflow Run corresponding to the Keras model training session, and open it in the MLflow Run UI by clicking the `View Run Detail` icon.

3. In the MLflow UI, scroll down to the `Artifacts` section and click on the directory named `model`. Click on the `Register Model` button that appears.

4. Then, select `Create New Model` from the drop-down menu, and input the following model name: `PowerForecastingModel`. Finally, click `Register`. This registers a new model called `PowerForecastingModel` and creates a new model version: `Version 1`.

After a few moments, the MLflow UI displays a link to the new registered model. Follow this link to open the new model version in the MLflow Model Registry UI.

### Explore the Model Registry UI Workflow

1. The Model Version page in the MLflow Model Registry UI provides information about `Version 1` of the registered forecasting model, including its author, creation time, and its current stage.

2. The Model Version page also provides a `Source Run` link, which opens the MLflow Run that was used to create the model in the MLflow Run UI. From the MLflow Run UI, you can access the `Source Notebook Link` to view a snapshot of the Databricks Notebook that was used to train the model.

To navigate back to the MLflow Model Registry, click the `Models` icon in the Databricks Workspace Sidebar. The resulting MLflow Model Registry home page displays a list of all the registered models in your Databricks Workspace, including their versions and stages.

Select the `PowerForecastingModel` link to open the Registered Model page, which displays all of the versions of the forecasting model.

### Add model descriptions

You can add descriptions to Registered Models as well as Model Versions: 
* Model Version descriptions are useful for detailing the unique attributes of a particular Model Version (e.g., the methodology and algorithm used to develop the model). 
* Registered Model descriptions are useful for recording information that applies to multiple model versions (e.g., a general overview of the modeling problem and dataset).

Add a high-level description to the registered power forecasting model by clicking the `Edit Description` icon, entering the following description, and clicking `Save`:

```
This model forecasts the power output of a wind farm based on weather data. The weather data consists of three features: wind speed, wind direction, and air temperature.
```

Next, click the `Version 1` link from the Registered Model page to navigate back to the Model Version page. Then, add a model version description with information about the model architecture and machine learning framework; click the `Edit Description` icon, enter the following description, and click `Save`:
```
This model version was built using Keras. It is a feed-forward neural network with one hidden layer.
```

### Perform a model stage transition

The MLflow Model Registry defines several model stages: `None`, `Staging`, `Production`, and `Archived`. Each stage has a unique meaning. For example, `Staging` is meant for model testing, while `Production` is for models that have completed the testing or review processes and have been deployed to applications. 

Users with appropriate permissions can transition models between stages. In private preview, any user can transition a model to any stage. In the near future, administrators in your organization will be able to control these permissions on a per-user and per-model basis.

If you have permission to transition a model to a particular stage, you can make the transition directly. If you do not have permission, you can request a stage transition from another user.

Click the `Stage` button to display the list of available model stages and your available stage transition options. Select `Transition to -> Production` and press `OK` in the stage transition confirmation window to transition the model to `Production`.

#### Now that the model has been registered and transitioned to `Production`, navigate to the "Integrate the model with the forecasting application" section of the quickstart.

In [0]:
model_name = "PowerForecastingModel"

# Integrate the model with the forecasting application

Now that you have trained and registered a power forecasting model with the MLflow Model Registry, the next step is to integrate it with an application. This application fetches a weather forecast for the wind farm over the next five days and uses the model to produce power forecasts. For example purposes, the application consists of a simple `forecast_power()` function (defined below) that is executed within this notebook. In practice, you may want to execute this function as a recurring batch inference job using the Databricks Jobs service.

The following **"Load versions of the registed model"** section demonstrates how to load model versions from the MLflow Model Registry for use in applications. Then, the **"Forecast power output with the production model"** section uses the `Production` model to forecast power output for the next five days.

## Load versions of the registered model

The MLflow Models component defines functions for loading models from several machine learning frameworks. For example, `mlflow.keras.load_model()` is used to load Keras models that were saved in MLflow format, and `mlflow.sklearn.load_model()` is used to load scikit-learn models that were saved in MLflow format.

These functions can load models from the MLflow Model Registry.

You can load a model by specifying its name (e.g., `PowerForecastingModel`) and version number (e.g., `1`). The following cell uses the `mlflow.pyfunc.load_model()` API to load `Version 1` of the registered power forecasting model as a generic Python function.

In [0]:
import mlflow.pyfunc

model_version_uri = "models:/{model_name}/1".format(model_name=model_name)

print("Loading registered model version from URI: '{model_uri}'".format(model_uri=model_version_uri))
model_version_1 = mlflow.pyfunc.load_model(model_version_uri)


Loading registered model version from URI: 'models:/PowerForecastingModel/1'


[0;31m---------------------------------------------------------------------------[0m
[0;31mMissingSchema[0m                             Traceback (most recent call last)
[0;32m/databricks/python/lib/python3.8/site-packages/mlflow/utils/rest_utils.py[0m in [0;36mcloud_storage_http_request[0;34m(method, url, max_retries, backoff_factor, retry_codes, timeout, **kwargs)[0m
[1;32m    251[0m     [0;32mtry[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m--> 252[0;31m         with _get_http_response_with_retries(
[0m[1;32m    253[0m             [0mmethod[0m[0;34m,[0m [0murl[0m[0;34m,[0m [0mmax_retries[0m[0;34m,[0m [0mbackoff_factor[0m[0;34m,[0m [0mretry_codes[0m[0;34m,[0m [0mtimeout[0m[0;34m=[0m[0mtimeout[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m[0m[0;34m[0m[0m

[0;32m/databricks/python/lib/python3.8/site-packages/mlflow/utils/rest_utils.py[0m in [0;36m_get_http_response_with_retries[0;34m(method, url, max_retries, backoff_factor, retr

You can also load a specific model stage. The following cell loads the `Production` stage of the power forecasting model.

In [0]:
model_production_uri = "models:/{model_name}/production".format(model_name=model_name)

print("Loading registered model version from URI: '{model_uri}'".format(model_uri=model_production_uri))
model_production = mlflow.pyfunc.load_model(model_production_uri)

## Forecast power output with the production model

In this section, the production model is used to evaluate weather forecast data for the wind farm. The `PlotUtils.forecast_power()` class loads a version of the forecasting model from the specified URI and uses it to forecast power production over the next five days.

In [0]:
PlotUtils.forecast_power(model_production_uri, wind_farm_data)

[0;31m---------------------------------------------------------------------------[0m
[0;31mNameError[0m                                 Traceback (most recent call last)
[0;32m<command-2387317862618037>[0m in [0;36m<module>[0;34m[0m
[0;32m----> 1[0;31m [0mPlotUtils[0m[0;34m.[0m[0mforecast_power[0m[0;34m([0m[0mmodel_production_uri[0m[0;34m,[0m [0mwind_farm_data[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m
[0;31mNameError[0m: name 'model_production_uri' is not defined

## Part 2: Use a Scikit-learn Randorm Forest Regressor to create the second model. 

This model, in real development environment, could be developed by another developer and registered in the model registry. 

Use MLflow to compare and then elect the best model between Keras and Random Forest for production.

# Create and deploy a new model version

The MLflow Model Registry enables you to create multiple model versions corresponding to a single registered model. By performing stage transitions, you can seamlessly integrate new model versions into your staging or production environments. Model versions can be trained in different machine learning frameworks (e.g., `scikit-learn` and `Keras`); MLflow's `python_function` provides a consistent inference API across machine learning frameworks, ensuring that the same application code continues to work when a new model version is introduced.

The following sections create a new version of the power forecasting model using scikit-learn, perform model testing in `Staging`, and update the production application by transitioning the new model version to `Production`.

## Create a new model version

Classical machine learning techniques are also effective for power forecasting. The following cell trains a random forest model using scikit-learn and registers it with the MLflow Model Registry via the `mlflow.sklearn.log_model()` function.

As above, we will try few different parameters and choose the best model.

In [0]:
params_list = [
        {"n_estimators": 100},
        {"n_estimators": 200},
        {"n_estimators": 300}]
#iterate over few different tuning parameters
for params in params_list:
  rfr = RFRModel.new_instance(params)
  print("Using paramerts={}".format(params))
  runID = rfr.mlflow_run(X_train, y_train, val_x, val_y, model_name)
  print("MLflow run_id={} completed with MSE={} and RMSE={}".format(runID, rfr.mse,rfr.rsme))

Using paramerts={'n_estimators': 100}
Validation MSE: 45241
Validation RMSE: 212


[0;31m---------------------------------------------------------------------------[0m
[0;31mRestException[0m                             Traceback (most recent call last)
[0;32m<command-2387317862618041>[0m in [0;36m<module>[0;34m[0m
[1;32m      7[0m   [0mrfr[0m [0;34m=[0m [0mRFRModel[0m[0;34m.[0m[0mnew_instance[0m[0;34m([0m[0mparams[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m      8[0m   [0mprint[0m[0;34m([0m[0;34m"Using paramerts={}"[0m[0;34m.[0m[0mformat[0m[0;34m([0m[0mparams[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m----> 9[0;31m   [0mrunID[0m [0;34m=[0m [0mrfr[0m[0;34m.[0m[0mmlflow_run[0m[0;34m([0m[0mX_train[0m[0;34m,[0m [0my_train[0m[0;34m,[0m [0mval_x[0m[0;34m,[0m [0mval_y[0m[0;34m,[0m [0mmodel_name[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m     10[0m   [0mprint[0m[0;34m([0m[0;34m"MLflow run_id={} completed with MSE={} and RMSE={}"[0m[0;34m.[0m[0mformat[0m[0;34m([0

### Explore the Model Registry API Workflow

When a new model version is created with the MLflow Model Registry's Python APIs, the name and version information is printed for future reference. You can also navigate to the MLflow Model Registry UI to view the new model version. 

Define the new model version as a variable in the cell below.

In [0]:
model_version = 2 # If necessary, replace this with the version corresponding to the new scikit-learn model

Wait for the new model version to become ready.

In [0]:
Utils.wait_until_ready(model_name, model_version)

## Add a description to the new model version

In [0]:
from mlflow.tracking.client import MlflowClient

client = MlflowClient()
client.update_model_version(
  name=model_name,
  version=model_version,
  description="This is the best model version is a random forest that was trained in scikit-learn."
)

## Test the new model version in `Staging`

Before deploying a model to a production application, it is often best practice to test it in a staging environment. The following cells transition the new model version to `Staging` and evaluate its performance.

In [0]:
client.transition_model_version_stage(
  name=model_name,
  version=model_version,
  stage="Staging",
)

Evaluate the new model's forecasting performance in `Staging`

In [0]:
model_staging_uri = "models:/{model_name}/staging".format(model_name=model_name)
PlotUtils.forecast_power(model_staging_uri,wind_farm_data)

## Deploy the new model version to `Production`

After verifying that the new model version performs well in staging, the following cells transition the model to `Production` and use the exact same application code from the **"Forecast power output with the production model"** section to produce a power forecast. 

**The MLflow Model Model Registry automatically uses the latest production version of the specified model, allowing you to update your production models without changing any application code**.

In [0]:
client.transition_model_version_stage(
  name=model_name,
  version=model_version,
  stage="Production"
)

In [0]:
model_production_uri = "models:/{model_name}/production".format(model_name=model_name)
PlotUtils.forecast_power(model_production_uri,wind_farm_data)

# Delete models

When a model version is no longer being used, you can archive it or delete it. You can also delete an entire registered model; this removes all of its associated model versions.

### Workflow 1: Delete `Version 1` in the MLflow UI

To delete `Version 1` of the power forecasting model, open its corresponding Model Version page in the MLflow Model Registry UI. Then, select the drop-down arrow next to the version identifier and click `Delete`.

### Workflow 2: Delete `Version 1` using the MLflow API

The following cell permanently deletes `Version 1` of the power forecasting model. If you want to delete this model version, uncomment and execute the cell.

In [0]:
client.delete_model_version(name=model_name,version=1)

## Delete the power forecasting model

If you want to delete an entire registered model, including all of its model versions, you can use the `MlflowClient.delete_registered_model()` to do so. This action cannot be undone.

**WARNING: The following cell permanently deletes the power forecasting model, including all of its versions.** If you want to delete the model, uncomment and execute the cell.

In [0]:
#### REMOVE THIS CELL BEFORE DISTRIBUTING ####
client.delete_registered_model(name=model_name)