# Face Generation

## Prerequisites
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you have to install the Azure Machine Learning Python SDK and create an Azure ML Workspace first.

In [None]:
# Check core SDK version number
import azureml.core

print("Azure SDK version:", azureml.core.VERSION)

## Initialize workspace

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. Workspace.from_config() creates a workspace object from the details stored in config.json.

In [None]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

## Create or Attach existing AmlCompute

**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.

As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "gpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='Standard_NC6', 
        vm_priority="dedicated",
        min_nodes = 0,
        max_nodes = 12,
        idle_seconds_before_scaledown=300
    )

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

# can poll for a minimum number of nodes and for a specific timeout. 
# if no min node count is provided it uses the scale settings for the cluster
compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

## Train model on the remote compute

I need to setup my training environment with the following steps
- Create a project directory & add training assets like scripts and data
- Create an Azure ML experiment
- Create an environment
- Configure & submit the training job

### Create a project directory & add training assets

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on.

In [None]:
import os
import shutil
import glob

project_folder = './train'
os.makedirs(project_folder, exist_ok=True)

for script in glob.glob('../*.py'):
    shutil.copy(script, project_folder)
    
for script in glob.glob('*.py'):
    shutil.copy(script, project_folder)

### Create an experiment
Create an Experiment to track all the runs in your workspace.

In [None]:
from azureml.core import Experiment

experiment_name = 'face-generation'
experiment = Experiment(ws, name=experiment_name)

### Create an environment

Define a conda environment YAML file with your training script dependencies and create an Azure ML environment.

In [None]:
%%writefile conda_dependencies.yml

channels:
- conda-forge
dependencies:
- python=3.6.2
- pip:
  - azureml-defaults
  - torch==1.6.0
  - torchvision==0.7.0
  - future==0.17.1
  - matplotlib==3.3.4
  - torchsummary
  - torchsummaryX
  - pillow

In [None]:
from azureml.core import Environment

pytorch_env = Environment.from_conda_specification(name = 'pytorch-1.6-gpu', file_path = './conda_dependencies.yml')

# Specify a GPU base image
#pytorch_env.docker.enabled = True
pytorch_env.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04'

### Configure the training job
Create a ScriptRunConfig object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on.

In [None]:
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DockerConfiguration

args = [
    '--num_epochs', 20, 
    '--batch_size', 256,
    '--img_size', 32,
    '--d_conv_dim', 64,
    '--g_conv_dim', 64,
    '--z_size', 100,
    '--learning_rate', 0.0002,
    '--beta1', 0.5,
    '--beta2', 0.999,
    '--dataset_name', 'processed_celeba_small'
]

docker_config = DockerConfiguration(use_docker=True)
script_run_config = ScriptRunConfig(source_directory=project_folder,
                                    script='train.py',
                                    arguments=args,
                                    compute_target=compute_target,
                                    environment=pytorch_env,
                                    docker_runtime_config=docker_config)

### Submit job
Run your experiment by submitting your ScriptRunConfig object. Note that this call is asynchronous.

In [None]:
test_run = experiment.submit(script_run_config)

### Monitor your run
You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes.

In [None]:
from azureml.widgets import RunDetails

RunDetails(test_run).show()

Alternatively, you can block until the script has completed training before running more code.

In [None]:
args = [
    '--num_epochs', 20, 
    '--batch_size', 256,
    '--img_size', 32,
    '--d_conv_dim', 32,
    '--g_conv_dim', 32,
    '--z_size', 100,
    '--learning_rate', 0.0002,
    '--beta1', 0.5,
    '--beta2', 0.999,
    '--dataset_name', 'processed_celeba_small'
]

docker_config = DockerConfiguration(use_docker=True)
script_run_config = ScriptRunConfig(source_directory=project_folder,
                                    script='train.py',
                                    arguments=args,
                                    compute_target=compute_target,
                                    environment=pytorch_env,
                                    docker_runtime_config=docker_config)

In [None]:
test_run2 = experiment.submit(script_run_config)

In [None]:
from azureml.widgets import RunDetails

RunDetails(test_run).show()