# Train a model with PyTorch using GPUs

In this notebook, you'll train a PyTorch Convolutional Neural Network (CNN) model on MNIST data with a GPU compute cluster.

> This notebook is based on the Azure Machine Learning example for using PyTorch: https://github.com/Azure/azureml-examples/tree/main/python-sdk/workflows/train/pytorch/mnist

In [None]:
# imports

from azureml.core import Workspace
from azureml.core import ScriptRunConfig, Experiment, Environment

## Get environment variables

Before you can submit the job, you have to get all necessary environment variables such as the workspace, environment, compute, and input arguments.

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

In [None]:
from azureml.core import Environment

pytorch_env = Environment.get(workspace=ws, name="AzureML-pytorch-1.10-ubuntu18.04-py38-cuda11-gpu")

## Define the configuration and submit the run

Now that we have defined all necessary variables, we can define the script run configuration and submit the run.

**Warning!** Change the value of the compute_target variable to your compute cluster name before running the code below!

In [None]:
from azureml.core import ScriptRunConfig

src = ScriptRunConfig(source_directory='script',
                      script='train.py',
                      arguments = ["--epochs", 2],
                      compute_target="<your-compute-cluster>",
                      environment=pytorch_env)


To learn what is done during training, explore the script `train.py` in the `script` folder.

The following cell will initiate the run. Note that first, the compute cluster has to scale up from 0 nodes. Once a node is available, it will execute the script. The execution of the script should be fast and you can see the execution time in the **Details** tab of the **Experiment** run afterwards.

In [None]:
from azureml.core import Experiment

run = Experiment(ws,'train-model').submit(src)
run.wait_for_completion(show_output=True)

You should get a notification in the Studio that a new run has started and is running. 

You can also navigate to the **Experiments** tab, and find the experiment `train-model` there. 

Once it has finished running, have a look at the **Metrics** tab to view two evaluation metrics for the two epochs. In the **Details** tab, you can see how long it took to run.