Databricks notebook source
# Overview
The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of MLflow Models. It provides model lineage (which MLflow Experiment and Run produced the model), model versioning, stage transitions, annotations, and deployment management.

In this notebook, you use each of the MLflow Model Registry's components to develop and manage a production machine learning application. This notebook covers the following topics:

- Track and log models with MLflow
- Register models with the Model Registry
- Describe models and make model version stage transitions
- Integrate registered models with production applications
- Search and discover models in the Model Registry
- Archive and delete models

## Requirements
- A cluster running Databricks Runtime 6.4 ML or above. Note that if your cluster is running Databricks Runtime 6.4 ML, you must upgrade the installed version of MLflow to 1.7.0. You can install this version from PyPI. See ([AWS](https://docs.databricks.com/libraries/cluster-libraries.html#cluster-installed-library)|[Azure](https://docs.microsoft.com/azure/databricks/libraries/cluster-libraries#cluster-installed-library)) for instructions. 

# Machine learning application: Forecasting wind power

In this notebook, you use the MLflow Model Registry to build a machine learning application that forecasts the daily power output of a [wind farm](https://en.wikipedia.org/wiki/Wind_farm). Wind farm power output depends on weather conditions: generally, more energy is produced at higher wind speeds. Accordingly, the machine learning models used in the notebook predict power output based on weather forecasts with three features: `wind direction`, `wind speed`, and `air temperature`.

*This notebook uses altered data from the [National WIND Toolkit dataset](https://www.nrel.gov/grid/wind-toolkit.html) provided by NREL, which is publicly available and cited as follows:*

*Draxl, C., B.M. Hodge, A. Clifton, and J. McCaa. 2015. Overview and Meteorological Validation of the Wind Integration National Dataset Toolkit (Technical Report, NREL/TP-5000-61740). Golden, CO: National Renewable Energy Laboratory.*

*Draxl, C., B.M. Hodge, A. Clifton, and J. McCaa. 2015. "The Wind Integration National Dataset (WIND) Toolkit." Applied Energy 151: 355366.*

*Lieberman-Cribbin, W., C. Draxl, and A. Clifton. 2014. Guide to Using the WIND Toolkit Validation Code (Technical Report, NREL/TP-5000-62595). Golden, CO: National Renewable Energy Laboratory.*

*King, J., A. Clifton, and B.M. Hodge. 2014. Validation of Power Output for the WIND Toolkit (Technical Report, NREL/TP-5D00-61714). Golden, CO: National Renewable Energy Laboratory.*

MAGIC %md ## Load the dataset
MAGIC 
MAGIC The following cells load a dataset containing weather data and power output information for a wind farm in the United States. The dataset contains `wind direction`, `wind speed`, and `air temperature` features sampled every eight hours (once at `00:00`, once at `08:00`, and once at `16:00`), as well as daily aggregate power output (`power`), over several years.

I am changing the training code for a better version

In [None]:
import pandas as pd
wind_farm_data = pd.read_csv("https://github.com/dbczumar/model-registry-demo-notebook/raw/master/dataset/windfarm_data.csv", index_col=0)

In [None]:
def get_training_data():
  training_data = pd.DataFrame(wind_farm_data["2014-01-01":"2018-01-01"])
  X = training_data.drop(columns="power")
  y = training_data["power"]
  return X, y

In [None]:
def get_validation_data():
  validation_data = pd.DataFrame(wind_farm_data["2018-01-01":"2019-01-01"])
  X = validation_data.drop(columns="power")
  y = validation_data["power"]
  return X, y

In [None]:
def get_weather_and_forecast():
  format_date = lambda pd_date : pd_date.date().strftime("%Y-%m-%d")
  today = pd.Timestamp('today').normalize()
  week_ago = today - pd.Timedelta(days=5)
  week_later = today + pd.Timedelta(days=5)
  
  past_power_output = pd.DataFrame(wind_farm_data)[format_date(week_ago):format_date(today)]
  weather_and_forecast = pd.DataFrame(wind_farm_data)[format_date(week_ago):format_date(week_later)]
  if len(weather_and_forecast) < 10:
    past_power_output = pd.DataFrame(wind_farm_data).iloc[-10:-5]
    weather_and_forecast = pd.DataFrame(wind_farm_data).iloc[-10:]

  return weather_and_forecast.drop(columns="power"), past_power_output["power"]

Display a sample of the data for reference.

In [None]:
wind_farm_data["2019-01-01":"2019-01-14"]

# Train a power forecasting model and track it with MLflow

The following cells train a neural network to predict power output based on the weather features in the dataset. MLflow is used to track the model's hyperparameters, performance metrics, source code, and artifacts.

MAGIC %md Define a power forecasting model using TensorFlow Keras.

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

In [None]:
def train_keras_model(X, y):
  
  model = Sequential()
  model.add(Dense(100, input_shape=(X_train.shape[-1],), activation="relu", name="hidden_layer"))
  model.add(Dense(1))
  model.compile(loss="mse", optimizer="adam")

  model.fit(X_train, y_train, epochs=100, batch_size=64, validation_split=.2)
  return model

### Q. why do you need MLflow in databricks?


You need **MLflow** in **Databricks** to **track, manage, and reproduce machine learning experiments** efficiently. Here's why it's useful, especially in a collaborative, production-ready environment:

---

#### 🔍 1. Experiment Tracking
MLflow automatically logs:
- Parameters (e.g., learning rate, number of layers)
- Metrics (e.g., accuracy, RMSE)
- Artifacts (e.g., models, plots, datasets)
- Code versions (linked to Git commits)

> 💡 This helps you compare models easily and reproduce results later.

---

#### 🏗️ 2. Model Management
MLflow provides a **Model Registry**, where you can:
- Register models
- Assign stages (e.g., Staging, Production)
- Track model versions
- Roll back if something breaks

> 💡 Think of it like version control for your ML models.

---

#### 🚀 3. Deployment Made Easy
MLflow supports multiple deployment targets out of the box:
- Databricks serving endpoints
- AWS SageMaker
- Azure ML
- Docker containers
- REST APIs

> 💡 You can deploy models without writing extra boilerplate code.

---

#### 🔁 4. Reproducibility
Since MLflow tracks everything (code, data versions, parameters), you can rerun an old experiment and expect the same results.

> 💡 Crucial for debugging and auditing models.

---

#### 🤝 5. Team Collaboration
In Databricks, experiments and models are shared within a workspace. Your team can:
- View and compare each other's runs
- Register and review models together
- Comment and collaborate on model lifecycle

> 💡 Great for cross-functional ML teams.

---

#### ✅ In short:
> **MLflow in Databricks = streamlined ML workflow from training to production, with full traceability and collaboration.**



This code set the mflow to databricks model registery workspace (Run the using databricks CLI in Bash and make sure Databricks cli is installed) - you have to do this for all of your Dev, staging and prod workspaces.


connect the current databricks to model registery databricks.<br>
use the bash command lines below inside a databricks terminal.


```bash
databricks configure --token
enter host (with worksapce id start with ?O)
enter token of model dev workspace
databricks secrets create-scope --scope modelregistery
databricks secrets put --scope modelregistery --key modelregistery-token --string-value dapi5d4a1a907559461e73117957709bfbb6-2
databricks secrets put --scope modelregistery --key modelregistery-workspace-id --string-value 8074051404611178
databricks secrets put --scope modelregistery --key modelregistery-host --string-value https://adb-8074051404611178.18.azuredatabricks.net/
```

In [None]:
import mlflow

In [None]:
registry_uri = f'databricks://modelregistery:modelregistery'
mlflow.set_registry_uri(registry_uri)

`registry_uri = f'databricks://modelregistery:modelregistery'`:

This line is creating a Databricks URI that points to a MLFlow Model Registry.<br>
this line is telling MLflow the address of MLFlow Model Registry-- central place is for storing and managing models

`mlflow.set_registry_uri(registry_uri)`:

### Q. what is MLFlow model registry?

The MLflow Model Registry is a centralized store for managing the lifecycle of ML models. Think of it as a version-controlled hub where you can register, organize, track, and manage your machine learning models in one place.

In [None]:
import mlflow
import mlflow.keras
import mlflow.tensorflow

In [None]:
X_train, y_train = get_training_data()

In [None]:
with mlflow.start_run():
  # Automatically capture the model's parameters, metrics, artifacts,
  # and source code with the `autolog()` function
  mlflow.tensorflow.autolog()
  
  train_keras_model(X_train, y_train)
  run_id = mlflow.active_run().info.run_id

`with mlflow.start_run():`

- Starts a new MLflow tracking run, which is a session for tracking everything (parameters, metrics, artifacts, etc.) during model training.
- **Think of it as**: Opening a new experiment log book.
- The with block ensures that the run is properly closed after training finishes.



<br>

`mlflow.tensorflow.autolog()`:
- Enables automatic logging for TensorFlow/Keras models.
- **Think of it as**: Putting your logging on autopilot.
- It automatically logs:
  - Parameters (e.g., optimizer, learning rate)
  - Metrics (e.g., loss, accuracy over epochs)
  - Artifacts (e.g., model files, training graphs)
  - Source code and environment info

> 🔁 This removes the need to manually call mlflow.log_param(), mlflow.log_metric(), etc.


<br>

`run_id = mlflow.active_run().info.run_id`
- Retrieves the unique ID of the current MLflow run.
- This ID can be used to:
  - Reference the run later
  - Register the trained model
  - Access run artifacts programmatically

In [None]:
run_id

# Register the model with the MLflow Model Registry API

Now that a forecasting model has been trained and tracked with MLflow, the next step is to register it with the MLflow Model Registry. You can register and manage models using the MLflow UI or the MLflow API .

The following cells use the API to register your forecasting model, add rich model descriptions, and perform stage transitions. See the documentation for the UI workflow.

In [None]:
model_name = "power-forecasting-model" # Replace this with the name of your registered model, if necessary.

### Create a new registered model using the API

The following cells use the `mlflow.register_model()` function to create a new registered model whose name begins with the string `power-forecasting-model`. This also creates a new model version (for example, `Version 1` of `power-forecasting-model`).

In [None]:
import mlflow

In [None]:
# The default path where the MLflow autologging function stores the model
artifact_path = "model"
model_uri = "runs:/{run_id}/{artifact_path}".format(run_id=run_id, artifact_path=artifact_path)

In [None]:
model_details = mlflow.register_model(model_uri=model_uri, name=model_name)

## **NOTE**
you can use unity catalogue to register your model.

After creating a model version, it may take a short period of time to become ready. Certain operations, such as model stage transitions, require the model to be in the `READY` state. Other operations, such as adding a description or fetching model details, can be performed before the model version is ready (for example, while it is in the `PENDING_REGISTRATION` state).

The following cell uses the `MlflowClient.get_model_version()` function to wait until the model is ready.

In [None]:
import time
from mlflow.tracking.client import MlflowClient
from mlflow.entities.model_registry.model_version_status import ModelVersionStatus

In [None]:
def wait_until_ready(model_name, model_version):
  client = MlflowClient()
  for _ in range(10):
    model_version_details = client.get_model_version(
      name=model_name,
      version=model_version,
    )
    status = ModelVersionStatus.from_string(model_version_details.status)
    print("Model status: %s" % ModelVersionStatus.to_string(status))
    if status == ModelVersionStatus.READY:
      break
    time.sleep(1)

In [None]:
wait_until_ready(model_details.name, model_details.version)