![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/deploy-multi-model/multi-model-register-and-deploy.png)

# Diabetes Progression Prediction Project

## Overview
This project focuses on predicting the progression of diabetes using the `Diabetes` dataset provided by `sklearn.datasets`. We aim to leverage machine learning techniques to understand how various features influence diabetes progression and deploy the model in a production environment using Azure ML. Additionally, we will conduct a responsible ML analysis to ensure our model is fair, interpretable, and compliant with ethical AI standards.

## Objectives
- **Data Exploration**: Investigate the dataset to understand the distribution and relationships of the data.
- **Data Preprocessing**: Prepare the data for modeling by cleaning and normalizing.
- **Model Development**: Construct and train machine learning models to predict the progression of diabetes.
- **Model Evaluation**: Assess the accuracy and efficacy of the models using appropriate metrics.
- **Model Deployment**: Deploy the trained model using Azure ML to make it available for real-time predictions in a production environment.

## Tools and Libraries
The project utilizes Python along with several libraries to achieve these goals:
- `numpy` and `pandas` for data manipulation.
- `matplotlib` and `seaborn` for data visualization.
- `scikit-learn` for building and evaluating machine learning models.
- `azureml-core` for model deployment on Azure ML.
- `raiwidgets` and `fairlearn` to conduct fairness analysis and enhance interpretability.

By the end of this notebook, you should have a robust understanding of how to apply machine learning techniques from data preprocessing to deployment and how to integrate principles of responsible AI into your workflow. Let's start by loading the dataset and exploring its initial properties.


# Setting Up Azure ML Environment

We need to first set up an Azure ML workspace and the necessary resources like storage, compute instances, and related configurations. Azure ML workspace is a foundational piece in managing the lifecycle of your machine learning models, from training to deployment.


- Create and configure an Azure ML Workspace.
- Set up storage accounts to manage datasets and model artifacts.
- Create compute resources to train and deploy models.


 Azure ML Workspace Creation and Configuration

An Azure ML Workspace is an integrated environment that allows you to manage, develop, train, and deploy machine learning models. It includes support for various tools and frameworks and provides a centralized place to manage all your ML assets.


In [2]:
from azureml.core import Workspace

# Attempt to load the workspace from the config file if it exists
# Otherwise, create one using the details provided (you need to replace placeholders with your actual details)
try:
    ws = Workspace.from_config()
except:
    ws = Workspace.create(name='YourWorkspaceName',
                          subscription_id='your-subscription-id',
                          resource_group='your-resource-group',
                          create_resource_group=True,
                          location='your-location'  # e.g., 'eastus'
                          )
    ws.write_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')


mlops_demo1
mlops_1
uksouth
fcd3ed4f-3ce9-448e-8084-9f119bc03559


In [1]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

mlops_demo1
mlops_1
uksouth
fcd3ed4f-3ce9-448e-8084-9f119bc03559


In [7]:
import joblib
import sklearn

from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
import joblib

## Loading and Preparing the Diabetes Dataset

The dataset used in this project is sourced from the `scikit-learn` library, which includes a number of datasets useful for testing machine learning algorithms. We are using the `Diabetes` dataset, which contains diagnostic measurements of several hundred patients diagnosed with diabetes.


In [4]:
from sklearn.datasets import load_diabetes
import pandas as pd

# Load the diabetes dataset
diabetes_data = load_diabetes()
X = diabetes_data.data  # the features
y = diabetes_data.target  # the target variable, quantitative measure of disease progression

# Create a DataFrame to store the features and target variable
df = pd.DataFrame(X, columns=diabetes_data.feature_names)
df['Progression'] = y


In [5]:
# Display the first few rows of the DataFrame to check everything is loaded correctly
df.sample(5)


Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,Progression
53,-0.009147,-0.044642,-0.015906,0.070073,0.012191,0.022172,0.015505,-0.002592,-0.033249,0.048628,104.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.06833,-0.092204,75.0
260,0.041708,-0.044642,-0.008362,-0.057314,0.008063,-0.031376,0.151726,-0.076395,-0.080237,-0.017646,39.0
336,-0.020045,-0.044642,0.085408,-0.036656,0.091996,0.089499,-0.061809,0.145012,0.080948,0.05277,306.0
70,-0.001882,-0.044642,-0.069797,-0.012556,-0.000193,-0.009143,0.07073,-0.039493,-0.062913,0.040343,48.0


In [6]:
y

array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
        69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
        68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
        87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
       259.,  53., 190., 142.,  75., 142., 155., 225.,  59., 104., 182.,
       128.,  52.,  37., 170., 170.,  61., 144.,  52., 128.,  71., 163.,
       150.,  97., 160., 178.,  48., 270., 202., 111.,  85.,  42., 170.,
       200., 252., 113., 143.,  51.,  52., 210.,  65., 141.,  55., 134.,
        42., 111.,  98., 164.,  48.,  96.,  90., 162., 150., 279.,  92.,
        83., 128., 102., 302., 198.,  95.,  53., 134., 144., 232.,  81.,
       104.,  59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,
       173., 180.,  84., 121., 161.,  99., 109., 115., 268., 274., 158.,
       107.,  83., 103., 272.,  85., 280., 336., 281., 118., 317., 235.,
        60., 174., 259., 178., 128.,  96., 126., 28

## Training the First Model: Ridge Regression

Ridge regression is a type of linear regression that includes a regularization term. The regularization term (L2 penalty) discourages large coefficients and helps to prevent overfitting, which is particularly useful in scenarios where the dataset is not very large or the number of features is close to or exceeds the number of observations.


In [8]:
# Assuming 'X' and 'y' are already defined as your features and target variable respectively
first_model = Ridge().fit(X, y)

# Save the model to a file for later use
model_filename = "model_ridge.pkl"
joblib.dump(first_model, model_filename)


['model_ridge.pkl']

## Registering the Model in Azure ML Workspace

Registering a model in the Azure Machine Learning workspace is the next step towards managing, versioning, and deploying machine learning models effectively. By registering the model, you can track its versions, metadata, and dependencies, and deploy it across various Azure services seamlessly.


In [9]:
from azureml.core.model import Model

# Assuming 'ws' is already defined as your Azure ML Workspace
# Register the model stored in 'model_ridge.pkl'
model = Model.register(model_path="model_ridge.pkl",
                       model_name="model_ridge",
                       workspace=ws)

# Output the model id to verify it has been registered successfully
print("Model registered successfully:")
print("Model ID:", model.id)
print("Model Name:", model.name)


Registering model model_ridge
Model registered successfully:
Model ID: model_ridge:1
Model Name: model_ridge


## Creating the Scoring Script

To deploy our Ridge regression model on Azure ML, we need to create a scoring script. This script will be used by the Azure ML service to load our model and to predict outcomes based on input data sent to the deployed model.

### Script Contents

The script `score.py` contains two primary functions:

1. **`init()`**:
   - This function is called when the deployment service starts. It is used for initialization tasks, such as loading the model from the Azure ML workspace into a global object. This ensures the model is loaded once when the service starts, rather than on each prediction request, improving performance.

2. **`run(raw_data)`**:
   - This function is called for every new prediction request. It receives `raw_data` as a JSON string, which it processes to make predictions using the pre-loaded model. The function then returns the predictions.

### Error Handling

The script also includes error handling in the `run` function to manage and return errors that might occur during data processing or prediction.




In [11]:
%%writefile score.py
import joblib
import json
import numpy as np
from azureml.core.model import Model

def init():
    # Load the model from the registered Azure ML model
    global model
    model_path = Model.get_model_path(model_name='model_ridge')
    model = joblib.load(model_path)

def run(raw_data):
    try:
        # Parse the JSON data input
        data = json.loads(raw_data)['data']
        # Convert data to numpy array for prediction
        data = np.array(data)
        # Generate predictions using the loaded model
        result_1 = model.predict(data)
        # Convert numpy array back to list for JSON serialization
        return {"prediction1": result_1.tolist()}
    except Exception as e:
        # Return the error message in case of an exception
        result = str(e)
        return result

Overwriting score.py


## Setting Up the Deployment Environment

For deploying our model to Azure ML, we have to create a consistent runtime environment that closely replicates the one used during model development and testing. This ensures that our model behaves as expected when deployed.

### Creating and Configuring the Environment

We use the Azure ML `Environment` class to create a managed environment, where we can specify the exact Python and library versions required for our model to run.


In [13]:
from azureml.core import Environment

# Create an Azure ML environment
env = Environment("deploytocloudenv")

# Add necessary libraries using pip
env.python.conda_dependencies.add_pip_package("joblib")  # Used for loading the model
env.python.conda_dependencies.add_pip_package("numpy==1.23")  # Specify exact numpy version
env.python.conda_dependencies.add_pip_package("scikit-learn=={}".format(sklearn.__version__))  # Use the same scikit-learn version used in development

# Register the environment (if not already registered)
#env.register(workspace=ws)



### Creating the Inference Configuration

`InferenceConfig` is a component that specifies the scoring script and the environment under which the model will operate. This setup ensures that the model executes correctly in the cloud.


In [14]:
from azureml.core.model import InferenceConfig

# Set up the inference configuration with the scoring script and environment
inference_config = InferenceConfig(entry_script="score.py", environment=env)


## Deploying the Model to Azure Container Instances (ACI)

After setting up the `InferenceConfig`, the next step is to deploy the model as a web service using Azure Container Instances (ACI). ACI provides a lightweight, isolated environment for running containerized applications, making it suitable for smaller workloads and testing environments.

### Deployment Configuration

Deployment on ACI requires specifying the amount of CPU and memory that the service needs, which is defined using `AciWebservice.deploy_configuration`.


In [16]:
from azureml.core.webservice import AciWebservice

aci_service_name = "aciservice-modelridge"

deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

service = Model.deploy(ws, aci_service_name, [model], inference_config, deployment_config, overwrite=True)
service.wait_for_deployment(True)

print(service.state)

To leverage new model deployment capabilities, AzureML recommends using CLI/SDK v2 to deploy models as online endpoint, 
please refer to respective documentations 
https://docs.microsoft.com/azure/machine-learning/how-to-deploy-managed-online-endpoints /
https://docs.microsoft.com/azure/machine-learning/how-to-attach-kubernetes-anywhere 
For more information on migration, see https://aka.ms/acimoemigration 
  service = Model.deploy(ws, aci_service_name, [model], inference_config, deployment_config, overwrite=True)


Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2024-05-10 11:13:36+00:00 Creating Container Registry if not exists.
2024-05-10 11:13:37+00:00 Registering the environment.
2024-05-10 11:13:38+00:00 Building image..
2024-05-10 11:24:21+00:00 Generating deployment configuration.
2024-05-10 11:24:22+00:00 Submitting deployment to compute..
2024-05-10 11:24:28+00:00 Checking the status of deployment aciservice-modelridge..
2024-05-10 11:25:42+00:00 Checking the status of inference endpoint aciservice-modelridge.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


## Testing the Deployed Model

Once your model is deployed as a web service in Azure Container Instances (ACI), the next step is to test it to ensure it is functioning as expected. We will create a sample request, send it to the deployed model, and receive predictions.

### Creating a Test Sample

First, we need to prepare a test sample that mimics the data format the model expects. This typically includes converting the data to a list, serializing it into JSON, and formatting it according to the API's expected schema.


In [20]:
import json
test_sample = json.dumps({'data': X[0:2].tolist()})
predictions = service.run(test_sample)
predictions

{'prediction1': [182.67357342863968, 90.99902728640282]}