# Create a compute target and check install

__Goal__: Set up a training cluster 

This notebook is a pre-flight check to ensure that you've got the necessary requirements accessible and that a compute cluster exists. Like the subsequent notebooks, this is intended to be run on an [AzureML compute instance](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-studio#portal-create).

In [2]:
# Here to check that requirements have been installed
import onnx
import transformers

If the prior cell failed, please check that you are using the `AzureML_Py3.8` conda environment and that you have run `conda activate azureml_py38 && pip install -r requirements.txt` on this compute instance. 

Now we'll import the libraries we'll actually use and load the config file that specifies the cluster name and machine type. Throughout this notebook and those that follow we'll draw our configuration from `src/config.yml` where possible. This file contains most of the settings that need to be customized for a new job type. Reading through it gives a sense of how to configure an experiment.

In [None]:
import yaml
import azureml.core

with open('src/config.yml', 'r') as f:
    config = yaml.safe_load(f)

Now we'll connect to the AzureML [workspace]. The workspace can be thought of as the namespace that ties together all the models, runs, datasets, compute instances, cluster instances, and linked services we'll access. Each notebook will connect to this workspace before performing any operations with AzureML. 

We instantiate a connection to the workspace using a configuration file that is automatically provided on the AzureML compute instances but which [we must create] if we run this notebook on our own desktop or laptop machine. 

[workspace]: https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace
[we must create]: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-environment#workspace

In [None]:
workspace = azureml.core.Workspace.from_config()

Next we'll connect to the cluster or [create it if it does not exist](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python). This is easily doable through the AzureML Studio interface, but doing the creation here ensures we'll have an appropriate target (in name, size, and kind) for the next steps. 

In [None]:
# Does the cluster exist? If not, then create it
try:
    cluster = azureml.core.compute.ComputeTarget(
        workspace=workspace, 
        name=config['compute_target']
    )
    print('Found existing compute cluster')
except azureml.core.compute_target.ComputeTargetException:
    compute_config = azureml.core.compute.AmlCompute.provisioning_configuration(
        vm_size=config['compute_size'],
        max_nodes=config['compute_node_count']
    )
    cluster = azureml.core.compute.ComputeTarget.create(
        workspace=workspace,
        name=config['compute_target'], 
        provisioning_configuration=compute_config
    )
    
cluster.wait_for_completion(show_output=True)

We now have:

- verified that the requirements were installed into this conda environment
- a functioning compute cluster we can use as a target for training

Let's move on to the next notebook. 