# Test Docker environments

After having created the curated Azure environment, you will perform a simple test to attest the Docker image can be spawned on an AML compute node. You will execute a simple python script that returns basic diagnostic values.
In production, you should consider an automated build/test pipeline.

## Goal

The goal of this notebook is:
1. Demonstrate one way to create a new compute cluster, using the SDK
2. Test the successful creation of your Docker environments
3. Demonstrate how to run a python script in your newly created Docker environment



### 1 | Create a compute target and check install
These cells are a pre-flight check to ensure that you've got the necessary requirements accessible and that a compute cluster exists. Like the subsequent notebooks, this is intended to be run on an AzureML compute instance.

You will create a new cluster named `testcluster` composed of up to 2 nodes of `Standard_NC4as_T4_v3`. These settings will be stored in a dictionary called `config`.

In [None]:
import azureml.core
workspace = azureml.core.Workspace.from_config()

config = {}
config["compute_size"] = "STANDARD_NC4AS_T4_v3"
config["compute_target"] = "testcluster"
config["compute_node_count"] = 2
config["pytorch_configuration"] = {
    "node_count": 2, # num of computers in cluster
    "process_count": 2} # gpus-per-computer * node_count
config["training_command"] = "python diagnose_environment.py"
config["experiment"] = "Testing_Axolotl_images"
config["source_directory"] = "src"

In [None]:
try:
    cluster = azureml.core.compute.ComputeTarget(
        workspace=workspace, 
        name=config['compute_target']
    )
    print('Found existing compute cluster')
except azureml.core.compute_target.ComputeTargetException:
    compute_config = azureml.core.compute.AmlCompute.provisioning_configuration(
        vm_size=config['compute_size'],
        max_nodes=config['compute_node_count']
    )
    cluster = azureml.core.compute.ComputeTarget.create(
        workspace=workspace,
        name=config['compute_target'], 
        provisioning_configuration=compute_config
    )
    
cluster.wait_for_completion(show_output=True)

### 2 | Running the Diagnostics on both environments, in sequence

In this step, you will retrieve the environment you have created in the previous notebook. You will then submit a job on the cluster you created above.
You can observe the output of the job in the AzureML UI for ease of evaluation. You could also retrieve the output from the SDK or the CLI.

Connect to (or create) the experiment that will host the training run we'll launch. A single experiment can host many runs, each exploring a different set of parameters, architecture, or other approach to a the same problem. Metrics from multiple runs within a single experiment can be plotted against each other in AzureML studio.

In [None]:
experiment = azureml.core.Experiment(workspace, config['experiment'])

Submit a job to AzureML by creating a `ScriptRunConfig` object that determines what should be executed (in our case, `src/diagnose_environment.py`) and submit it as an Experiment.

In [None]:
environment = "axolotl_acpt"

You will specify a distributed job configuration. When submitting a job to AzureML on a compute cluster, you ospecify node counts and process counts.

In [None]:
distributed_job_config = azureml.core.runconfig.PyTorchConfiguration(**config['pytorch_configuration'])

In [None]:
aml_config = azureml.core.ScriptRunConfig(
            source_directory=config['source_directory'],
            command=config['training_command'],
            environment=azureml.core.Environment.get(workspace, name=environment),
            compute_target=config['compute_target'],
            distributed_job_config=distributed_job_config,
    )
run = experiment.submit(aml_config)
run.set_tags({
    "environment":environment
})

print(f"View run details:\n{run.get_portal_url()}")

This concludes this notebook. By executing the steps above, you have submitted a job to a cluster composed of 2 nodes.

The image is based on ACPT, and therefore contains Nebula.

When you submitted the job, it was executed across all nodes in the cluster. This means that you will find 2 std_out in the logs of the job. The image below shows the output of running the image diagnostic job.

This one demonstrates the outcome of the image diagnostics. The Axolotl_ACPT job differs by having Nebula loaded:
![Logs of one of the jobs](img/axolotl_acpt.png)