# Create production ML pipelines with Python SDK v2



- connect to your Azure ML workspace
- create Azure ML data assets
- create reusable Azure ML components
- create, validate and run Azure ML pipelines
- deploy the newly-trained model as an endpoint
- call the Azure ML endpoint for inferencing

**Requirements** - In order to benefit from this tutorial, you need to have:
- basic understanding of Machine Learning projects workflow
- an Azure subscription. If you don't have an Azure subscription, [create a free account](https://aka.ms/AMLFree) before you begin.
- a working Azure ML workspace. A workspace can be created via Azure Portal, Azure CLI, or Python SDK. [Read more](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace?tabs=python).
- a Python environmnet
- [installed Azure Machine Learning Python SDK v2](https://github.com/Azure/azureml-examples/blob/sdk-preview/sdk/setup.sh)

```
prueba-loteria-blacksmith 
    src
        data_prep
            data_prep.py
            data_prep.yml  
        train
            train.py
            train.yml   
        conda_env
            conda.yaml
        deploy
            sample-request.json
        local
            eda.ipynb
            test_classes
        predict
        e2e-ml-workflow.ipynb
```


Reference for more available credentials if it does not work for you:,[azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [12]:
# Handle to the workspace
from azure.ai.ml import MLClient

# Authentication package
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find your Subscription ID:
1. In the upper right Azure Machine Learning Studio toolbar, select your workspace name.
1. At the bottom, select **View all properties in Azure Portal**
1. Copy the value from Azure Portal into the code.

In [13]:
# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id='dc213d49-9972-4e9e-a634-246cefdc8655',
    resource_group_name='rg-test-loteriablacksmith',
    workspace_name='aml-test-loteriablacksmith',
)

## Register data from local machine

Azure ML uses a [`Data`](https://docs.microsoft.com/azure/machine-learning/how-to-create-register-data-assets?tabs=Python-SDK) object to register a reusable definition of data, and consume data within a pipeline.

In [14]:
import os
os.getcwd()

'c:\\repos\\prueba-loteria-blacksmith\\src'

In [34]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

local_path = "data/Txs_LoteriaBlacksmith.xlsx"

historical_data = Data(
    name="ds_loteria_blacksmith",
    path=local_path,
    type=AssetTypes.URI_FILE,
    description="Dataset de historicos de ventas de loteria de Blacksmith",
    tags={"source_type": "local", "source": "unknown"},
    version="1.0.1",
)

In [35]:
data = ml_client.data.create_or_update(historical_data)
print(
    f"Dataset with name {data.name} was registered to workspace, the dataset version is {data.version}"
)

Dataset with name ds_loteria_blacksmith was registered to workspace, the dataset version is 1.0.1


## Create a job environment for pipeline steps

create a conda environment for your jobs, using a conda yaml file.

The specification contains some usual packages, that you'll use in your pipeline (numpy, pip).


Use the *yaml* file to create and register this custom environment in your workspace:

In [6]:
from azure.ai.ml.entities import Environment

custom_env_name = "aml-test-blacksmith"

pipeline_job_env = Environment(
    name=custom_env_name,
    description="Custom environment for lotery blacksmith forecasting pipeline",
    tags={"libraries": "scikit-learn,pmdarima,skforecast,xgboost"},
    conda_file=os.path.join('conda_env', "conda.yml"),
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
    version=None,
)
pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)

print(
    f"Environment with name {pipeline_job_env.name} is registered to workspace, the environment version is {pipeline_job_env.version}"
)

Environment with name aml-test-blacksmith is registered to workspace, the environment version is 0.1.0


## Build the training pipeline

## Create component 1: data prep (using yaml definition)
first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* python file.

In [36]:
# importing the Component Package
from azure.ai.ml import command
from azure.ai.ml import Input, Output
from azure.ai.ml import load_component,load_job

In [82]:
dir = 'c:/repos/prueba-loteria-blacksmith/src/data_preprocess/'

# Loading the component from the yml file
data_prep_component = load_component(source=os.path.join(dir, "data_prep.yml"))


In [83]:
# Now we register the component to the workspace
data_prep_component = ml_client.create_or_update(data_prep_component)

# Create (register) the component in your workspace
print(
    f"Component {data_prep_component.name} with Version {data_prep_component.version} is registered"
)

[32mUploading data_preprocess (0.01 MBs): 100%|##########| 8564/8564 [00:00<00:00, 9329.66it/s] 
[39m



Component data_prep_lotery_blacksmith with Version 2023-07-19-02-34-06-6997362 is registered


## Create component 2: training (using yaml definition)

The second component that you'll create will consume the training and test data, train and select a model and return the output model and the predictions.

In [109]:
# Loading the component from the yml file
train_component = load_component(source=os.path.join("train", "train.yml"))

Now create and register the component:

In [137]:
# Now we register the component to the workspace
train_component = ml_client.create_or_update(train_component)

# Create (register) the component in your workspace
print(
    f"Component {train_component.name} with Version {train_component.version} is registered"
)

Component train_and_select_blacksmith_model with Version 2023-07-19-04-02-18-4290380 is registered


## Create component 3: predict (using yaml definition)

In [138]:
predict_component = load_component(source=os.path.join("predict", "predict.yml"))
# Now we register the component to the workspace
predict_component = ml_client.create_or_update(predict_component)

# Create (register) the component in your workspace
print(
    f"Component {predict_component.name} with Version {predict_component.version} is registered"
)

Component forecasting_lotery_blacksmith with Version 2023-07-19-05-38-39-7224824 is registered


## Create the pipeline from components

In [134]:
dir = 'c:/repos/prueba-loteria-blacksmith/src/data_preprocess/'
def run_job(data_path,test_steps=7, 
            index_column='FechaTx',
            target_column='Cantidad',
            filter_column='CodSDV',
            filter_value=109216,
            del_columns ='IdCliente,NomProducto,CodProducto',
            lags_grid = '7,21,60',
            sel_exog= "Mes,Dia,media_movil", #Mes,Dia,media_movil
            ):
    
    # Experiment
    experiment_name = f'lotery-blacksmith-best-model-{filter_value}'
    registered_model_name = f'lotery-blacksmith-{filter_value}'

    #configure the job 
    
    # Let's instantiate the pipeline with the parameters of our choice
    
    pipeline_job_yml = load_job(source=os.path.join("./pipeline.yml"))
    pipeline_job_yml.experiment_name=experiment_name
    pipeline_job_yml.compute= "dedicated-cluster" #dedicated-cluster serverless
    pipeline_job_yml.inputs.pipeline_job_input_data=Input(type="uri_file", path=data_path)
    pipeline_job_yml.inputs.pipeline_job_test_steps=test_steps
    pipeline_job_yml.inputs.pipeline_job_index_column=index_column
    pipeline_job_yml.inputs.pipeline_job_target_column=target_column
    pipeline_job_yml.inputs.pipeline_job_filter_column=filter_column
    pipeline_job_yml.inputs.pipeline_job_filter_value=filter_value
    pipeline_job_yml.inputs.pipeline_job_del_columns=del_columns
    pipeline_job_yml.inputs.pipeline_job_lags_grid=lags_grid
    pipeline_job_yml.inputs.pipeline_job_sel_exog=sel_exog
    pipeline_job_yml.inputs.pipeline_job_registered_model_name=registered_model_name

    # submit the pipeline job
    pipeline_job = ml_client.jobs.create_or_update(pipeline_job_yml,
    # Project's name
    experiment_name=experiment_name)
    
    return pipeline_job

In [135]:
data_path="azureml:ds_loteria_blacksmith:1.0.1"#1.0.0


In [136]:
run_job(data_path)

Experiment,Name,Type,Status,Details Page
lotery-blacksmith-best-model-109216,modest_spring_3d4wg9dz7w,pipeline,Preparing,Link to Azure Machine Learning studio


In [131]:
top_cod_sdv = ['109216','123340','106061','3036101','128117','130756','106296','106730','106738','106599','107500','108390','108113',
 '143979','127500','107506','123389','106365','106496','107268','129550','113407','106292','144393','133872']

In [None]:
for cod_sdv in top_cod_sdv:
    run_job(data_path,filter_value = int(cod_sdv))