# In this notebook, you'll use the modeling module to train and register models


You'll need the latest version of the **azureml-ai-ml** package to run the code in this notebook. Run the cell below to verify that it is installed.

## Connect to your workspace

With the required SDK packages installed, now you're ready to connect to your workspace.

To connect to a workspace, we need identifier parameters - a subscription ID, resource group name, and workspace name. Since you're working with a compute instance, managed by Azure Machine Learning, you can use the default values to connect to the workspace.

In [None]:
import warnings

warnings.filterwarnings("ignore")

In [None]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

In [None]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

In [None]:
# Verify that the handle works correctly.
# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
ws = ml_client.workspaces.get("ml-sandbox-core")
print(ws.location, ":", ws.resource_group)

## Testing the Data Modeling Pipeline

In [None]:
import os
import sys

project_root_directory = os.getcwd().split("/notebooks")[0]
sys.path.insert(0, project_root_directory)

In [None]:
from core.modeling.config import ModelingConfig, MethodConfig
from core.modeling.pipeline import ModelingPipelineBuilder
import pandas as pd

In [None]:
# Create the MLPipelineConfig object
config = ModelingConfig(
    model_name="Linear Regression",
    data_preprocessing_steps=[
        MethodConfig(name="sklearn.preprocessing.Normalizer", params=dict(norm="l2")),
    ],
    model_estimator=MethodConfig(
        name="sklearn.linear_model.Ridge", params=dict(alpha=0.9)
    ),
    track_experiment=False,
)

In [None]:
config

### Simple Usage with Diabetes Data

In [None]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

data = load_diabetes(as_frame=True)
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

In [None]:
pipeline_builder = ModelingPipelineBuilder(config)

In [None]:
pipeline_builder.pipeline

In [None]:
X_train, X_test, y_train, y_test = pipeline_builder.split_data(X, y)

In [None]:
pipeline_builder.X_train

In [None]:
pipeline_builder.fit()

test_score = pipeline_builder.pipeline.score(X_test, y_test)
train_score = pipeline_builder.pipeline.score(X_train, y_train)
print(f"Test Score: {test_score}")
print(f"Train Score: {train_score}")




## Checking for existing de env

Let's explore the environments within the workspace.


> **Note**:
> If the **azure-ai-ml** package is not installed, run `pip install azure-ai-ml` to install it.

In [None]:
envs = ml_client.environments.list()
for env in envs:
    print(env.name)

Submitting the job with the new custom environment triggers the build of the environment. The first time you use a newly created environment, it can take 10-15 minutes to build the environment, which also means your job will take longer to complete.
You can also choose to manually trigger the build of the environment before you submit a job. The environment only needs to be built the first time you use it.

## Creating a job to use a data asset

After using a notebook for experimentation. You can use scripts to train machine learning models. A script can be run as a job, and for each job you can specify inputs and outputs. 

You can use either **data assets** or **datastore paths** as inputs or outputs of a job. Also, it is possible to read these data directly from the job.

The cells below creates the **main.py** script in the **src** folder. 

In [None]:
%%writefile ../main.py


import sys
import os

# Add the parent directory to sys.path
current_path = os.path.dirname(os.path.abspath(__file__))
import mltable


from core.modeling.config import ModelingConfig, MethodConfig
from core.modeling.pipeline import ModelingPipelineBuilder
import pandas as pd

# Create the MLPipelineConfig object
config = ModelingConfig(
    model_name="Linear Regression",
    data_preprocessing_steps=[
        MethodConfig(
            name="sklearn.preprocessing.Normalizer", params=dict(norm="l2")
        ),
    ],
    model_estimator=MethodConfig(
        name="sklearn.linear_model.Ridge", params=dict(alpha=0.9)
    ),
    track_experiment=True
)


tbl = mltable.load(f"azureml://...path...")
df = tbl.to_pandas_dataframe()
X = df.drop(["Price"], axis = 1)
y = pd.DataFrame(df["Price"].copy())

pipeline_builder = ModelingPipelineBuilder(config)
pipeline_builder.split_data(X, y)

pipeline_builder.fit()


To submit a job that runs the **main.py** script, run the cell below. 

The job is configured to use the data asset `diabetes-local`, pointing to the local **mobile-price-local.csv** file as input. The output is a path pointing to a folder in the new datastore `blob_mobileprice_cleaned`.

In [None]:
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml import command

# configure job
job = command(
    code="../",
    command="python main.py",
    environment="docker-context-repo-based-v1:1",
    compute="sandbox-ci",
    display_name="training-mobile-data",  # if we dont define it, it will be the run name definition
    experiment_name="mobile-price-exp",
)

# submit job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor the job at", aml_url)

## Creating a component to execute a pipeline  (using yaml definition)

In [None]:
%%writefile example_configs/config.yml
# <component>
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
name: train_credit_defaults_model
display_name: training-mobile-data
description: Training job for mobile price data
# version: 1 # Not specifying a version will automatically update the version
code: .
environment: azureml:docker-context-repo-based-v1:1
command: >-
  python main.py 
# </component>

Optionally, register the component in the workspace for future reuse.

In [None]:
# importing the Component Package
from azure.ai.ml import load_component

# Loading the component from the yml file
train_component = load_component(source=os.path.join("../", "config.yml"))

In [None]:
# Create (register) the component in your workspace
print(
    f"Component {train_component.name} and {train_component.command} with Version {train_component.version} is registered"
)

In [None]:
# the dsl decorator tells the sdk that we are defining an Azure ML pipeline
from azure.ai.ml import dsl, Input, Output


@dsl.pipeline(
    compute="sandbox-demo-ci",
    description="training pipeline",
)
def mobile_defaults_pipeline():

    # using train_func like a python call with its own inputs
    train_job = train_component()

    # a pipeline returns a dictionary of outputs
    # keys will code for the pipeline output identifier

In [None]:
# Let's instantiate the pipeline with the parameters of our choice
pipeline = mobile_defaults_pipeline()

In [None]:
import webbrowser

# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(
    pipeline,
    experiment_name="mobile-price-exp",
    # Project's name
)
# open the pipeline in web browser
webbrowser.open(pipeline_job.studio_url)

## Registering models by run (after job execution)

In [None]:
from core.modeling.deployment import DeploymentPipeline
from core.modeling.config import DeploymentConfig

In [None]:
config = DeploymentConfig(model_name="mobile-price")

In [None]:
obj = DeploymentPipeline(config)

In [None]:
_, exp_names = obj.get_all_experiments()

In [None]:
exp_names

In [None]:
experiment_id = exp_names[1]

In [None]:
obj.register_model(experiment_id=experiment_id, metric="test_mse", metric_mode="min")

In [None]:
import mlflow
from mlflow.entities import ViewType


def get_all_runs_by_experiment(experiment_name):
    client = mlflow.tracking.MlflowClient()
    experiment = client.get_experiment_by_name(experiment_name)

    if experiment is None:
        raise ValueError(f"Experiment with name {experiment_name} does not exist.")

    runs = client.search_runs(
        experiment_ids=[experiment.experiment_id], run_view_type=ViewType.ALL
    )

    return runs

In [None]:
get_all_runs_by_experiment("")

In [None]:
mlflow.delete_experiment("")