# 01. Azure Machine Learning - Environment Setup
This notebook gets a reference to an Azure Machine Learning workspace using the AML SDK and performs a few operations in preparation for this demo. 
 - Create a compute cluster to be used for model training
 - Upload sample weld images to a new blob datastore. 
 - Creates a labeled image dataset (using existing labels) that can be used as an input to an AutoML for Images training job
 
Run all of the cells in this notebook to setup the necessary components for the ML experiment.

### Import required packages

In [None]:
from azureml.core import Workspace, Experiment, Datastore, Dataset
from azureml.data import DataType
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_GPU_IMAGE
from azureml.pipeline.core import Pipeline, PipelineParameter
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineParameter, PipelineEndpoint
import os

### Connect to AML workspace

In [None]:
ws = Workspace.from_config()

### Create Azure ML Compute Cluster for model training
Compute clusters scalable, on-demand, compute resources that can be dynamically spun up and down to support different model training and inferencing jobs - think of compute clusters as massively scalable, serverless, ML compute. Here, we are creating a GPU cluster that can provision up to 5 nodes depending upon the workload submitted - this upper limit can be increased even futher. Once jobs complete and nodes sit idle for 120 seconds they are automatically spun down and billing stops.

[Create an Azure Machine Learning Compute Cluster](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python)

[What are Compute Targets in Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target)

In [None]:
cluster_name = 'automlimagescompute'
# compute_target = ws.compute_targets[cluster_name]
try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print("Found existing compute target.")
except ComputeTargetException:
    print("Creating a new compute target...")
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='Standard_NC6',
        idle_seconds_before_scaledown=120,
        min_nodes=0,
        max_nodes=3,
    )
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    # Can poll for a minimum number of nodes and for a specific timeout.
    # If no min_node_count is provided, it will use the scale settings for the cluster.
    compute_target.wait_for_completion(
        show_output=True, min_node_count=None, timeout_in_minutes=20
    )
    

### Get pointer to default Azure ML Blob Datastore
Datastores represent storage locations that are attached to an Azure Machine Learning Workspace. Connection information for these storage services are stored in an AML-linked Key Vault and they can be accessed either through the AML UI or SDK.

Azure Machine Learning workspaces are provisioned with an attached Azure Storage Account by default. These default blobstores are generally used for storing artifacts from ML experiments like trained models, saved outputs, run logs etc. Here we retrieve a reference to the default datastore.

[Connect to Storage Services on Azure Machine Learning with Datastores](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data)

In [None]:
default_ds = ws.get_default_datastore()

### Create new datastore and upload sample images
Azure Blob Storage containers can be registered as datastores in an AML workspace. For the purposes of our demonstration, we are going to create a separate blob container (in the attached default storage account) specifically for image storage, and register this as a new datastore. <i>Note:</i> This is purely for demonstration purposes, images could be loaded from any target datastore for model training.

[Creating a New Azure Machine Learning Datastore]('https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py')

In [None]:
new_container_name = 'streetimagestore'
default_ds.blob_service.create_container(new_container_name)

try:
    imagestore = Datastore.get(ws, new_container_name)
    print("Found Blob Datastore with name: %s" % new_container_name)
except Exception:
    imagestore = Datastore.register_azure_blob_container(
     workspace=ws,
        datastore_name = new_container_name, 
        account_name = default_ds.account_name,
        container_name = new_container_name,
        account_key = default_ds.account_key)
    print("Registered blob datastore with name: %s" % new_container_name)

# Upload images to new datastore
imagestore.upload('./sample_images', 'aml_images', overwrite=True, show_progress=True)

### Create labeled Dataset in the Azure Machine Learning workspace
To train a custom instance segmentation model we need to provide a labeled dataset to the AutoML for Images job. For the purposes of our demo we're utilizing pre-labeled a dataset of street images images from the [CBCL StreetScenes Challenge Framework](http://cbcl.mit.edu/software-datasets/streetscenes/) (see `./saml_annotations/*.jsonl`). The code below will upload two labeled datasets and register them in the AML workspace as `TRAIN_AML_Labeled_Street_Images`  and `TEST_AML_Labeled_Street_Images`.

[Labeling Images and Text Documents in Azure ML](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-label-data)

In [None]:
default_ds.upload('./aml_annotations', 'labeled_image_files', overwrite=True, show_progress=True)

train_image_dataset = Dataset.Tabular.from_json_lines_files(path=(default_ds, 'labeled_image_files/labeled_images_train.jsonl'), set_column_types={"image_url": DataType.to_stream(imagestore.workspace)})
train_image_dataset.register(ws, 'TRAIN_AML_Labeled_Street_Images', create_new_version=True)

test_image_dataset = Dataset.Tabular.from_json_lines_files(path=(default_ds, 'labeled_image_files/labeled_images_test.jsonl'), set_column_types={"image_url": DataType.to_stream(imagestore.workspace)})
test_image_dataset.register(ws, 'TEST_AML_Labeled_Street_Images', create_new_version=True)

## AML Environment setup complete!

Your Azure ML Workspace has been populated with sample weld images, a registered dataset, and you have available GPU compute ready to support model training operations!