# Titanic Challenge - Pipeline Job

## 1 Connect to Workspace

In [20]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

In [21]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

Found the config file in: /config.json


## 2 Load Components

In [22]:
from azure.ai.ml import load_component
parent_dir = ""

prep_data = load_component(source=parent_dir + "./prep-data.yml")
train_random_forest = load_component(source=parent_dir + "./train-model.yml")
make_predictions = load_component(source=parent_dir + "./make-predictions.yml")
print("Components have been loaded...")

Components have been loaded...


## 3.A Build Pipeline

In [23]:
from azure.ai.ml import Input
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.dsl import pipeline

@pipeline()
def titanic_classification(titanic_data, sample_data):
    # 1 clean training data
    clean_data = prep_data(
        unclean_titanic=titanic_data, 
        unclean_sample=sample_data)
    # 2 train model
    train_model = train_random_forest(
        training_data=clean_data.outputs.clean_titanic)
    # 3 make predictions with clean sub data
    get_predictions = make_predictions(
        sample_data=clean_data.outputs.clean_sample,
        trained_model=train_model.outputs.trained_model)

    return {
        "pipeline_job_transformed_data": clean_data.outputs.clean_titanic,
        "pipeline_job_trained_model": train_model.outputs.trained_model,
        "pipeline_job_prediction_data": get_predictions.outputs.predictions_data,
    }

pipeline_job = titanic_classification(
    Input(type=AssetTypes.URI_FILE, path="azureml:titanic-local:1"),
    Input(type=AssetTypes.URI_FILE, path="azureml:titanic-sample-local:1")
)
print("Pipeline has been defined...")

Pipeline has been defined...


## 3.B Change Pipeline Parameters

In [24]:
# set pipeline level compute
pipeline_job.settings.default_compute = "aml-cluster"
# set pipeline level datastore
pipeline_job.settings.default_datastore = "workspaceblobstore"

# print the pipeline job again to review the changes
print(pipeline_job)

display_name: titanic_classification
type: pipeline
inputs:
  titanic_data:
    type: uri_file
    path: azureml:titanic-local:1
  sample_data:
    type: uri_file
    path: azureml:titanic-sample-local:1
outputs:
  pipeline_job_transformed_data:
    type: uri_folder
  pipeline_job_trained_model:
    type: mlflow_model
  pipeline_job_prediction_data:
    type: uri_folder
jobs:
  clean_data:
    type: command
    inputs:
      unclean_titanic:
        path: ${{parent.inputs.titanic_data}}
      unclean_sample:
        path: ${{parent.inputs.sample_data}}
    outputs:
      clean_titanic: ${{parent.outputs.pipeline_job_transformed_data}}
    component:
      $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
      name: prep_data
      version: '1'
      display_name: Prepare data for training and predictions
      type: command
      inputs:
        unclean_titanic:
          type: uri_file
        unclean_sample:
          type: uri_file
      outputs:
   

## 4 Submit Pipeline Job

In [25]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="pipeline_titanic"
)
pipeline_job

[32mUploading src (0.02 MBs): 100%|██████████| 21185/21185 [00:00<00:00, 169158.97it/s]
[39m

pathOnCompute is not a known attribute of class <class 'azure.ai.ml._restclient.v2023_04_01_preview.models._models_py3.UriFolderJobOutput'> and will be ignored
pathOnCompute is not a known attribute of class <class 'azure.ai.ml._restclient.v2023_04_01_preview.models._models_py3.MLFlowModelJobOutput'> and will be ignored
pathOnCompute is not a known attribute of class <class 'azure.ai.ml._restclient.v2023_04_01_preview.models._models_py3.UriFolderJobOutput'> and will be ignored


Experiment,Name,Type,Status,Details Page
pipeline_titanic,strong_fly_vn9xn46ftx,pipeline,NotStarted,Link to Azure Machine Learning studio


---
# End of Notebook