# AutoML TextNer in pipeline

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../configuration.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../../README.md) - check the getting started section

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Create a pipeline with TextNer AutoML task.

**Motivations** - This notebook explains how to use TextNer AutoML task inside pipeline.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [1]:
# import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ml import MLClient, dsl, Input, command, Output
from azure.ml.automl import text_ner

## 1.2 Configure credential

We are using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

Reference for more available credentials if it does not work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [2]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
	EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot.this issue.
	ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
	SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
	VisualStudioCodeCredential: Failed to get Azure user details from Visual Studio Code.
	AzureCliCredential: Traceback (most recent call last):
  File "C:\Users\ayushmishra\Miniconda3\envs\sdk-cli-v2\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\ayushmishra\Miniconda3\envs\sdk-cli-v2\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\users\ayushmishra\worksp

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](../../configuration.ipynb)

In [3]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "gpu-cluster"
print(ml_client.compute.get(cluster_name))

Found the config file in: c:\Users\ayushmishra\workspace\azureml-examples\.azureml\config.json


AmlCompute({'type': 'amlcompute', 'created_on': None, 'provisioning_state': 'Succeeded', 'provisioning_errors': None, 'name': 'gpu-cluster', 'description': None, 'tags': {}, 'properties': {}, 'id': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/ayush_mishra_res01/providers/Microsoft.MachineLearningServices/workspaces/ayushmishra-central-useuap-ws/computes/gpu-cluster', 'base_path': './', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x000001CE6BAAE2E8>, 'resource_id': None, 'location': 'centraluseuap', 'size': 'STANDARD_NC6', 'min_instances': 0, 'max_instances': 4, 'idle_time_before_scale_down': 120.0, 'identity': None, 'ssh_public_access_enabled': True, 'ssh_settings': None, 'network_settings': None, 'tier': 'dedicated'})


# 2. Basic pipeline job with TextNer task

## 2.1 Build pipeline

In [4]:
# note that the used docker image doesn't suit for all size of gpu compute. Please use the following command to create gpu compute if experiment failed
# !az ml compute create -n gpu-cluster --type amlcompute --min-instances 0 --max-instances 4 --size Standard_NC12

In [10]:
# Define pipeline
@dsl.pipeline(
    description="AutoML TextNer Pipeline",
    default_compute="gpu-cluster",
)
def automl_text_ner(
    text_ner_train_data,
    text_ner_validation_data
):
    # define the automl text_ner task with automl function
    text_ner_node = text_ner(
        training_data=text_ner_train_data,
        validation_data=text_ner_validation_data,
        primary_metric="accuracy",
        target_column_name="xxxxx",  # REMOVE: This should be optional (BUG:1721836)
        # currently need to specify outputs "mlflow_model" explictly to reference it in following nodes 
        outputs={"best_model": Output(type="mlflow_model")},
    )
    text_ner_node.set_limits(timeout_minutes=120)

    command_func = command(
        inputs=dict(
            automl_output=Input(type="mlflow_model")
        ),
        command="ls ${{inputs.automl_output}}",
        environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:1"
    )
    show_output = command_func(automl_output=text_ner_node.outputs.best_model)



data_folder = "../../automl-standalone-jobs/automl-nlp-text-named-entity-recognition-task"
pipeline = automl_text_ner(
    text_ner_train_data=Input(path=f"{data_folder}/training-mltable-folder/", type="mltable"),
    text_ner_validation_data=Input(path=f"{data_folder}/validation-mltable-folder/", type="mltable"),
)

# 2.2 Submit pipeline job

In [11]:
# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(
    pipeline, experiment_name="pipeline_samples"
)
pipeline_job

[32mUploading training-mltable-folder (0.09 MBs): 100%|##########| 87948/87948 [00:01<00:00, 47303.79it/s]
[39m

[32mUploading validation-mltable-folder (0.09 MBs): 100%|##########| 85726/85726 [00:01<00:00, 53394.85it/s]
[39m



Experiment,Name,Type,Status,Details Page
pipeline_samples,joyful_drawer_d4lq9x0rsk,pipeline,Preparing,Link to Azure Machine Learning studio


In [9]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

RunId: sleepy_lunch_5v50qd2z7x
Web View: https://ml.azure.com/runs/sleepy_lunch_5v50qd2z7x?wsid=/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourcegroups/ayush_mishra_res01/workspaces/ayushmishra-central-useuap-ws

Streaming logs/azureml/executionlogs.txt

[2022-05-10 06:58:32Z] Submitting 1 runs, first five are: 72e8280f:96751695-f0c0-47d9-9e02-56dc2cd45c60


JobException: The output streaming for the run interrupted.
But the run is still executing on the compute target. 
Details for canceling the run can be found here: https://aka.ms/aml-docs-cancel-run

# Next Steps
You can see further examples of running a pipeline job [here](../)