# Enabling Step Sequencing in AzureML SDK v2

AzureML SDK v2 does not include a direct equivalent of the `StepSequence` feature from SDK v1. This change means pipelines no longer support explicit ordering through a `StepSequence` object. Instead, SDK v2 relies heavily on data dependencies to define the execution order of steps as mentioned [here](https://learn.microsoft.com/en-us/azure/machine-learning/migrate-to-v2-execution-pipeline?view=azureml-api-2#mapping-of-key-functionality-in-sdk-v1-and-sdk-v2).

**Recommended Workaround** - Dummy Data Dependencies

**SDK v1 ↔ SDK v2: Feature Mapping**

The migration documentation highlights how core pipeline concepts translate between versions:

| **SDK v1 Functionality**        | **SDK v2 Equivalent**                     |
|---------------------------------|-------------------------------------------|
| `azureml.pipeline.core.Pipeline` | `azure.ai.ml.dsl.pipeline`                |
| `OutputDatasetConfig`           | `Output`                                  |
| `Dataset .as_mount()`           | `Input`                                   |
| `StepSequence`                  | Data dependency via dummy inputs/outputs  |

This reinforces that, in SDK v2, data dependencies are the default mechanism to enforce step order.

### Scenario: Pipeline Step Execution Flow

The diagram below represents the scenario we are creating with AzureML pipelines.  

>- **Step 1** runs first.  
>- It then branches into **Step 2, Step 3, and Step 4** (which run in parallel).  
>- The outputs of **Step 2, Step 3, and Step 4** converge into **Step 5**.  
>- Finally, **Step 6** executes after Step 5 completes.


![Scenario Pipeline Flow](images/Scenario_Pipeline_Flow.png)

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [None]:
# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml.entities import CommandComponent
from azure.ai.ml import dsl
from azure.ai.ml.entities import AmlCompute
from azure.ai.ml import load_component
from azure.ai.ml import MLClient
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import Input

## 1.2 Configure credential

We are using `DefaultAzureCredential` to get access to workspace. `DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios.

Reference for more available credentials if it does not work for you: [configure credential example](https://github.com/Azure/azureml-examples/blob/902929725e8d713447c99f80e4530c83075ecd9b/sdk/python/jobs/configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](https://github.com/Azure/azureml-examples/blob/902929725e8d713447c99f80e4530c83075ecd9b/sdk/python/jobs/configuration.ipynb).

In [None]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"
print(ml_client.compute.get(cluster_name))

# 2. Define and create components into workspace

## 2.1 Load components from YAML

In [None]:
parent_dir = "."


def test_function():
    return load_component(source=parent_dir + "/echo_component.yml")

## 2.2 Inspect loaded component

In [None]:
# Print the component as yaml
print(test_function())

In [None]:
# Inspect more information
print(type(test_function()))
help(test_function()._func)

## 2.3 Define a component inline

In [None]:
# defining a component inline in Python using the SDK
component_dummy = CommandComponent(
    name="dummy_component",
    display_name="Dummy Component",
    description="A dummy component for pipeline steps",
    command="echo hello",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
    outputs={"output_data": {"type": "uri_folder"}},
)

In [None]:
# Step 1
@pipeline(name="step1_pipeline", description="Step 1")
def pipeline_step1():
    step1 = test_function()()
    step1_out = component_dummy()
    return {"step1_output": step1_out.outputs.output_data}


# Step 2


@pipeline(name="step2_pipeline", description="Step 2")
def pipeline_step2(step_input: Input(type="uri_folder")):  # (step_input: str):
    step2 = test_function()()
    dummy_step2 = component_dummy()
    return {"step2_output": dummy_step2.outputs.output_data}


# Step 3


@pipeline(name="step3_pipeline", description="Step 3")
def pipeline_step3(step_input: Input(type="uri_folder")):
    step3 = test_function()()
    dummy_step3 = component_dummy()
    return {"step3_output": dummy_step3.outputs.output_data}


# Step 4


@pipeline(name="step4_pipeline", description="Step 4")
def pipeline_step4(step_input: Input(type="uri_folder")):
    step4 = test_function()()
    dummy_step4 = component_dummy()
    return {"step4_output": dummy_step4.outputs.output_data}


# Step 5 (converge)


@pipeline(name="step5_pipeline", description="Step 5")
def pipeline_step5(
    step2_in: Input(type="uri_folder"),
    step3_in: Input(type="uri_folder"),
    step4_in: Input(type="uri_folder"),
):
    step5 = test_function()()
    dummy_step5 = component_dummy()
    return {"step5_output": dummy_step5.outputs.output_data}


# Step 6 (final)


@pipeline(name="step6_pipeline", description="Step 6")
def pipeline_step6(step_in: Input(type="uri_folder")):
    step6 = test_function()()
    dummy_step6 = component_dummy()
    return {"final_output": dummy_step6.outputs.output_data}

# 3. Sample pipeline job

## 3.1 Build pipeline

In [None]:
# define a pipeline
@pipeline(
    name="step_sequence_pipeline", description="Step sequence pipeline with branching"
)
def pipeline_with_step_sequence():
    s1 = pipeline_step1()
    s2 = pipeline_step2(s1.outputs.step1_output)
    s3 = pipeline_step3(s1.outputs.step1_output)
    s4 = pipeline_step4(s1.outputs.step1_output)

    s5 = pipeline_step5(
        s2.outputs.step2_output, s3.outputs.step3_output, s4.outputs.step4_output
    )
    s6 = pipeline_step6(s5.outputs.step5_output)

    return {"pipeline_output": s6.outputs.final_output}


pipeline_job = pipeline_with_step_sequence()

# set pipeline level compute
pipeline_job.settings.default_compute = "cpu-cluster"

In [None]:
print(pipeline_job)

## 3.2 Submit pipeline job

In [None]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="pipeline_samples_branching"
)
pipeline_job

In [None]:
from azure.ai.ml.entities import Job
from azure.ai.ml import MLClient

# Wait until the job completes
try:
    ml_client.jobs.stream(pipeline_job.name)
except Exception as e:
    print(f"Error while streaming job logs: {e}")

# Get the pipeline run
parent_job = ml_client.jobs.get(pipeline_job.name)

# parent_job.jobs is a dict: {step_name: job_id}
for step_name, child_id in parent_job.jobs.items():
    # Some entries are components, skip those
    if not child_id.startswith(parent_job.name):  # child runs always start with parent run id
        continue

    child_job = ml_client.jobs.get(child_id)

    if child_job.status == "Failed":
        print(f"\n--- Failed child job: {step_name} ({child_job.name}) ---")

        if getattr(child_job, "error", None):
            print("Error code:", child_job.error.code)
            print("Error message:", child_job.error.message)
            if getattr(child_job.error, "details", None):
                for d in child_job.error.details:
                    print(" -", d)
        else:
            print("No structured error details, check logs in AML Studio.")

In [None]:
# # Wait until the job completes
# ml_client.jobs.stream(pipeline_job.name)

In [None]:
from azure.ai.ml.entities import Job

# get pipeline job
parent_job = ml_client.jobs.get("step_sequence_pipeline")

# iterate failed child jobs
for child in parent_job.children:
    if child.status == "Failed":
        print(f"\n--- Failed child job: {child.name} ---")
        
        # refresh child job details
        child_job = ml_client.jobs.get(child.name)
        
        # print error info if available
        if hasattr(child_job, "error") and child_job.error is not None:
            print("Error code:", child_job.error.code)
            print("Error message:", child_job.error.message)
            if hasattr(child_job.error, "details") and child_job.error.details:
                for d in child_job.error.details:
                    print(" -", d)
        else:
            print("No error details found, check logs in AML Studio.")

# Next Steps

You can see further examples of running a pipeline job [here](https://github.com/Azure/azureml-examples/blob/902929725e8d713447c99f80e4530c83075ecd9b/sdk/python/jobs/pipelines/)