# Run Experiments

You can use the Azure Machine Learning SDK to run code experiments that log metrics and generate outputs. This is at the core of most machine learning operations in Azure Machine Learning.

## Connect to your workspace

All experiments and associated resources are managed within your Azure Machine Learning workspace. In most cases, you should store the workspace configuration in a JSON configuration file. This makes it easier to reconnect without needing to remember details like your Azure subscription ID. You can download the JSON configuration file from the blade for your workspace in the Azure portal, but if you're using a Compute Instance within your workspace, the configuration file has already been downloaded to the root folder.

The code below uses the configuration file to connect to your workspace.

> **Note**: If you haven't already established an authenticated session with your Azure subscription, you'll be prompted to authenticate by clicking a link, entering an authentication code, and signing into Azure.

In [1]:
import azureml.core
from azureml.core import Dataset, Workspace, Experiment, ScriptRunConfig, Environment, Run
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.conda_dependencies import CondaDependencies

from azureml.widgets import RunDetails

In [2]:
subscription_id = '74135f1e-35b1-427a-a471-67ce43c07b63'
resource_group = 'Machine_Learning'
workspace_name = 'DTU_MLOps'

ws = Workspace(subscription_id, resource_group, workspace_name)

#dataset = Dataset.get_by_name(ws, name='Bert2Punc_data')
#dataset.download(target_path='./data/processed', overwrite=False)

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Now we'll create a Python script containing the code for our experiment, and save it in the experiment folder.

> **Note**: running the following cell just *creates* the script file - it doesn't run it!

Now you're almost ready to run the experiment. To run the script, you must create a **ScriptRunConfig** that identifies the Python script file to be run in the experiment, and then run an experiment based on it.

> **Note**: The ScriptRunConfig also determines the compute target and Python environment. In this case, the Python environment is defined to include some Conda and pip packages, but the compute target is omitted; so the default local compute will be used.

The following cell configures and submits the script-based experiment.

In [None]:
#ws = Workspace.from_config() # This automatically looks for a directory .azureml
'''
# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that the cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS12_v2',
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)
'''

In [5]:
cluster_name = "gpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', 
                                                           max_nodes=4,
                                                           min_nodes = 1)

    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

Creating a new compute target...
Creating.AmlCompute is getting created. Consider calling wait_for_completion() first

.AmlCompute is getting created. Consider calling wait_for_completion() first

..AmlCompute is getting created. Consider calling wait_for_completion() first

...AmlCompute is getting created. Consider calling wait_for_completion() first

.
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

There were errors reported from AmlCompute:
[{'error': {'code': 'ClusterCoreQuotaReached', 'message': 'Operation results in exceeding quota limits of Total Cluster Dedicated Regional vCPUs. Maximum allowed: 10, Current in use: 8, Additional requested: 6. Click here to view and request for quota: https://portal.azure.com/#resource/subscriptions/74135f1e-35b1-427a-a471-67ce43c07b63/resourceGroups/machine_learning/providers/Microsoft.MachineLearningServices/workspaces/dtu_mlops/quotaUsage'}}]


In [6]:
#ws = Workspace.from_config()

# Create a Python environment for the experiment
env = Environment.from_conda_specification(name='Bert2Punc',file_path='./conda_dependencies.yml')

experiment = Experiment(workspace=ws, name='Bert2Punc')
config = ScriptRunConfig(source_directory='./src',
                            script='./models/train_model_pl.py',
                            compute_target="gpu-cluster",
                            environment=env) 


config.run_config.environment = env

run = experiment.submit(config)
RunDetails(run).show()
run.wait_for_completion()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

KeyboardInterrupt: 

As before, you can use the widget or the link to the experiment in [Azure Machine Learning studio](https://ml.azure.com) to view the outputs generated by the experiment, and you can also write code to retrieve the metrics and files it generated:

In [6]:
# Get logged metrics
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')
for file in run.get_file_names():
    print(file)

Accuracy [0.3393999934196472, 0.7983999848365784, 0.8209999799728394, 0.8396000266075134, 0.8522999882698059, 0.8820000290870667, 0.8773000240325928, 0.8944000005722046, 0.899399995803833, 0.9068999886512756]
Loss [2.385221004486084, 0.7372096180915833, 0.7386306524276733, 0.5273603796958923, 0.5369148850440979, 0.36451077461242676, 0.4364103078842163, 0.3081313669681549, 0.42938071489334106, 0.3664677143096924]


azureml-logs/55_azureml-execution-tvmps_8ffa75f2bf9c813bc92d29f90f7c4b90001a000ea8d9520bc28e6fc5247e06af_d.txt
azureml-logs/65_job_prep-tvmps_8ffa75f2bf9c813bc92d29f90f7c4b90001a000ea8d9520bc28e6fc5247e06af_d.txt
azureml-logs/70_driver_log.txt
azureml-logs/75_job_post-tvmps_8ffa75f2bf9c813bc92d29f90f7c4b90001a000ea8d9520bc28e6fc5247e06af_d.txt
azureml-logs/process_info.json
azureml-logs/process_status.json
logs/azureml/93_azureml.log
logs/azureml/job_prep_azureml.log
logs/azureml/job_release_azureml.log
outputs/fashion_model.pkl
