This notebook provides an example of how to define and run a job on AzureML using Spark and external Spark libraries such as `spark-nlp`. This notebook is the _control plane_, meaning it creates a connection to the AzureML workspace, defines the job, and submits the job.

**This Jupyter notebook should be run from within a compute instance on AzureML, in a Python kernel, specifically `Python 3.10 - SDK v2 (Python 3.10.11)`**. 

## Create a client connection to the AzureML workspace

The following cell creates a connection object called `azureml_client` which has a connection to the AzureML workspace.

In [1]:
from azure.ai.ml import MLClient, spark, Input, Output
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import UserIdentityConfiguration

Use this authentication mechanism if you are running this notebook from your compute instance within Azure Machine Learning:

In [None]:
## Use this authentication when running the control plane from the AzureML Compute Instance

azureml_client = MLClient.from_config(
    DefaultAzureCredential(),
)

However, you can also run this control plane notebook from your Laptop. You need to install the python libraries in the `requirements.txt` file.

In [2]:
## Use this when running the control plane from your laptop
ml_client = MLClient(
    credential=DefaultAzureCredential(),
    workspace_name="prof-azureml",
    subscription_id="21ff0fc0-dd2c-450d-93b7-96eeb3699b22",
    resource_group_name="prof-azureml"
)

## Create a custom container environment with python spark-nlp to use in both interactive and jobs

In [4]:
from azure.ai.ml.entities import Environment
environment_object = Environment(
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04",
    conda_file="sparknlp-environment.yml",
    name="sparknlp-python-env"
)
ml_client.environments.create_or_update(environment_object)

Environment({'arm_type': 'environment_version', 'latest_version': None, 'image': 'mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04', 'intellectual_property': None, 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'sparknlp-python-env', 'description': None, 'tags': {}, 'properties': {'azureml.labels': 'latest'}, 'print_as_yaml': False, 'id': '/subscriptions/21ff0fc0-dd2c-450d-93b7-96eeb3699b22/resourceGroups/prof-azureml/providers/Microsoft.MachineLearningServices/workspaces/prof-azureml/environments/sparknlp-python-env/versions/1', 'Resource__source_path': '', 'base_path': '/Users/marck/class/dsan6000/working-repos/spark-on-azureml/create-environment-for-sparknlp', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x11aafe9f0>, 'serialize': <msrest.serialization.Serializer object at 0x11ab44350>, 'version': '1', 'conda_file': {'dependencies': ['python=3.10.3', {'pip': ['spark-nlp']}]}, 'build': None, 'inference_con