## Preliminary Setup for Azure Machine Learning

Before diving into the practical aspects of machine learning with Azure ML, it's essential to establish the foundational components of your cloud infrastructure. This setup includes creating various resources in your Azure subscription. Below are the steps to get started:

### 1. **Create a Subscription**
   - Ensure you have an active Azure subscription. If you don’t have one, you can sign up for a [free account](https://azure.microsoft.com/en-us/free/).

### 2. **Create a Resource Group**
   - A resource group is a container that holds related resources for an Azure solution. Create a resource group in the Azure portal, specifying a location close to you for better performance. Choose "West US 2" if it suits your geographical preference.

### 3. **Create a Storage Account and Container**
   - **Storage Account**: Necessary for storing files and data. Go to the Azure portal, create a storage account within your resource group. Ensure it’s in the same region as your Machine Learning workspace to minimize data movement and latency.
   - **Blob Container**: Within the storage account, create a blob container where you'll store your files. This could include datasets, training scripts, or any other relevant files.

### 4. **Create an Azure Machine Learning Workspace**
   - Navigate to Azure Machine Learning in the portal and create a new workspace. Choose the resource group you created earlier and specify "West US 2" as the region. 

### 5. **Create a Container Registry**
   - If you plan to use custom Docker containers for your experiments, create an Azure Container Registry (ACR). This registry will store your Docker images.

### 6. **Create Compute Resources**
   - **Compute Instance**: For developing and running experiments directly in the cloud, create a compute instance.
   - **Compute Cluster**: For training models at scale, create a compute cluster. You can choose a CPU-based cluster for general tasks or a GPU-based cluster for intensive computations.

### 7. **Create a Datastore**
   - Connect your Azure ML workspace to your storage account by creating a datastore. This step will involve accessing the storage account keys:
     - Navigate to your storage account in the Azure portal.
     - Find 'Access keys' under 'Security + networking'.
     - Use one of the listed keys to link your storage account as a datastore in your Azure ML workspace.

### Additional Notes
   - **Workspace Configuration**: Download the workspace configuration file from the Azure ML portal by navigating to your workspace and selecting 'Download config file'. This file is used to easily manage your connection settings.

   - **Uploading to AML Notebook**: Any uploads to an AML notebook get saved to the default blob storage. To view these, navigate to the datastore linked to your Azure ML workspace.

---


In [2]:
import pandas as pd
import seaborn as sns
import numpy as np
import os
from datetime import datetime, date, timedelta
import argparse
import azureml.core
from azureml.core import Workspace, Dataset, Datastore
from azureml.core.compute import ComputeTarget
from azureml.core.runconfig import RunConfiguration 
from azureml.core.conda_dependencies import CondaDependencies 
from azureml.core import Environment 
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import Pipeline
from azureml.core import Experiment
from azureml.core.authentication import InteractiveLoginAuthentication

In [4]:
ws = Workspace.from_config() 

In [3]:
compute_name = "mlops1"
vm_size = "STANDARD_DS11_V2"
compute_target = ws.compute_targets[compute_name]

### Environment Configuration for Azure ML

This section sets up the environment for Azure ML runs:

1. **Initialization**: `RunConfiguration` is used to define settings for the run, including the `compute_target`.

2. **Compute Target**: Specifies where the training or inference will run, using predefined compute resources.

3. **Environment Selection**:
   - **Curated Environment**: If `USE_CURATEDUENV` is `True`, a pre-configured environment (`AzureML-sklearn-0.24-ubuntu18.04-py37-cpu`) is used, optimized with necessary libraries.
   - **Custom Environment**: If `False`, a custom environment is defined with specific packages managed by Conda and pip, enabling detailed control over the libraries and their versions.



In [4]:
#Declaring environment
aml_config = RunConfiguration()
aml_config.target = compute_target 

USE_CURATEDUENV = True 
if USE_CURATEDUENV:
    curated_env = Environment.get(workspace = ws, name="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu")
    aml_config.environment = curated_env
else:
    aml_config.environment.python.user_managed_dependencies = False
    #inside this env, setting up what all libraries will be needed
    aml_config.environment.python.conda_dependencies = CondaDependencies.create(
    #Package that will be required during the prep step
    conda_packages=['pandas','scikit-learn'],
    pip_packages=['azureml-sdk', 'azureml-dataset-runtime[fuse, pandas]', 'seaborn'],
    pin_sdk_version = False)

### Overview of Machine Learning Pipeline Configuration

This cell sets up a comprehensive machine learning pipeline in Azure Machine Learning (Azure ML), encompassing stages from data preparation to model training. Each stage of the pipeline is defined as a separate step, encapsulated within individual scripts that handle specific tasks:

- **Data Wrangling**: Initial data loading and cleaning.
- **Data Preprocessing**: Applying necessary data transformations and normalizations.
- **Model Training**: Executing the model training process on the preprocessed data.

**Pipeline Steps**:
- Each script step (`data_wrangling.py`, `preprocessing.py`, `modeling.py`) is set up to run on specified compute resources within Azure ML, ensuring that they are executed in the right sequence and with the correct configurations.
- `PythonScriptStep` objects are used to define these steps, with parameters for script names, compute targets, and input/output arguments. This ensures that outputs from one step are seamlessly passed as inputs to the next.

**Execution and Management**:
- The pipeline is orchestrated to run within an Azure ML workspace, utilizing Azure's cloud compute resources for scalability and efficiency.
- Set to `allow_reuse=False` to ensure that each step is run afresh, useful for developing stages or when inputs might change.

This setup not only facilitates a structured and reproducible workflow but also leverages Azure ML's robust management and scaling capabilities to handle complex machine learning tasks efficiently.


In [5]:
#Pipeline
read_data = 'data_wrangling_V1.py'
prep = 'preprocessing_V1.py'
model = 'modeling_V1.py'

#Script initialization
py_script_run_read = PythonScriptStep(
                script_name=read_data,
                compute_target=compute_target,
                arguments=['--input-data','diabetes.csv'],
                runconfig = aml_config,
                allow_reuse=False)


py_script_run_prep = PythonScriptStep(
    script_name=prep,
    compute_target=compute_target,
    arguments=['--input-file', 'wrangled.csv', '--output-file', 'preprocessed.csv'], 
    runconfig=aml_config,
    allow_reuse=False
)


py_script_run_model = PythonScriptStep(
                script_name=model,
                compute_target=compute_target,
                arguments=['--input-file','preprocessed.csv'],
                runconfig = aml_config,
                allow_reuse=False)

pipeline_step = [py_script_run_read, py_script_run_prep, py_script_run_model]
pipeline_1 = Pipeline(workspace=ws, steps=[pipeline_step])

Experiment

In [6]:
pipeline_run = Experiment(ws, "First_run").submit(pipeline_1)
pipeline_run.wait_for_completion(show_output=True)

Created step data_wrangling_V1.py [14209fb3][4c58b2ae-a170-4967-9ad1-bc754119fc3d], (This step will run and generate new outputs)
Created step preprocessing_V1.py [62f42892][7d44d5ca-d5d1-4dbd-9eab-de1bf55fa8e5], (This step will run and generate new outputs)
Created step modeling_V1.py [1cb02d2f][31360df6-63f4-4290-a662-826cd8f2e3a6], (This step will run and generate new outputs)
Submitted PipelineRun d1ea8eee-cb86-43c9-96ee-24f20a14aca4
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/d1ea8eee-cb86-43c9-96ee-24f20a14aca4?wsid=/subscriptions/fcd3ed4f-3ce9-448e-8084-9f119bc03559/resourcegroups/mlops_1/workspaces/mlops_demo1&tid=442b6079-5151-400f-ad5f-3707cf65adbf
PipelineRunId: d1ea8eee-cb86-43c9-96ee-24f20a14aca4
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/d1ea8eee-cb86-43c9-96ee-24f20a14aca4?wsid=/subscriptions/fcd3ed4f-3ce9-448e-8084-9f119bc03559/resourcegroups/mlops_1/workspaces/mlops_demo1&tid=442b6079-5151-400f-ad5f-3707cf65adbf
PipelineRun St

'Finished'

In [7]:
pipeline_run

Experiment,Id,Type,Status,Details Page,Docs Page
First_run,d1ea8eee-cb86-43c9-96ee-24f20a14aca4,azureml.PipelineRun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [5]:
datastore = Datastore.get(ws, 'workspaceblobstore')
datastore.download(target_path="azureml", prefix="model_estimator", overwrite=False, show_progress=True)


Path already exists. Skipping download for azureml/model_estimator_100.pkl
Path already exists. Skipping download for azureml/model_estimator_200.pkl
Path already exists. Skipping download for azureml/model_estimator_500.pkl


0

In [6]:
# #run_context = pipeline_run.get_context()
# bring the model to cwd
from azureml.core.model import Model
finetuning_model = Model.register(model_name='model_estimator_500',
                                              model_path="./azureml/model_estimator_500.pkl",
                                              tags={},
                                              description="Diabetes Model", workspace=ws)

Registering model model_estimator_500


In [7]:
# Load the model
from azureml.core.model import Model
model = Model(ws, name='model_estimator_500')
print("Loaded model version:", model.version)

Loaded model version: 2


Deployment

Builing image using
 - We need score.py
 - model file
 - myenv.yml the environment file

In [8]:
from azureml.core import Environment
import sklearn
env = Environment("deploytocloudenv_2comp")
env.python.conda_dependencies.add_pip_package("joblib")
env.python.conda_dependencies.add_pip_package("numpy==1.23")
env.python.conda_dependencies.add_pip_package("scikit-learn=={}".format(sklearn.__version__))

In [9]:
%%writefile score3.py
import joblib
import json
import numpy as np

from azureml.core.model import Model

def init():
    global model_3
    model_3_path = Model.get_model_path(model_name='model_estimator_500')
    model_3 = joblib.load(model_3_path)

def run(raw_data):
    try:
        data = json.loads(raw_data)['data']
        data = np.array(data)
        result_1 = model_3.predict(data)
        
        return {"prediction1": result_1.tolist()}
    except Exception as e:
        result = str(e)
        return result

Overwriting score3.py


### Inference configuration 

In [10]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script="score3.py", environment=env)

In [11]:
from azureml.core.webservice import AciWebservice

aci_service_name = "aciservice-modeldiabetes"

deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

service = Model.deploy(ws, aci_service_name, [model], inference_config, deployment_config, overwrite=True)
service.wait_for_deployment(True)

print(service.state)

To leverage new model deployment capabilities, AzureML recommends using CLI/SDK v2 to deploy models as online endpoint, 
please refer to respective documentations 
https://docs.microsoft.com/azure/machine-learning/how-to-deploy-managed-online-endpoints /
https://docs.microsoft.com/azure/machine-learning/how-to-attach-kubernetes-anywhere 
For more information on migration, see https://aka.ms/acimoemigration 
  service = Model.deploy(ws, aci_service_name, [model], inference_config, deployment_config, overwrite=True)


Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2024-05-11 01:11:01+00:00 Creating Container Registry if not exists..
2024-05-11 01:11:07+00:00 Use the existing image.
2024-05-11 01:11:08+00:00 Submitting deployment to compute.
2024-05-11 01:11:11+00:00 Checking the status of deployment aciservice-modeldiabetes..
2024-05-11 01:13:54+00:00 Checking the status of inference endpoint aciservice-modeldiabetes.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


In [12]:
datastore = Datastore.get(ws, 'workspaceblobstore')
df = Dataset.Tabular.from_delimited_files(path=[(datastore, "preprocessed.csv")]).to_pandas_dataframe()
df.head()

df = df[['Pregnancies', 'Glucose', 'SkinThickness', 'BMI', 'Age',
       'Outcome']]

df_new = np.array(df)

In [13]:
import json
test_sample = json.dumps({'data': df_new[:4].tolist()})
predictions = service.run(test_sample)
predictions

{'prediction1': [1.0, 0.0, 0.0, 0.0]}