## DSVM Tutorial

This tutorial was created to showcase some of the features of the Ubuntu DSVM. It shows many steps of the data science process using the CIFAR-10 dataset with Keras+TensorFlow. CIFAR-10 is a popular dataset for image classification, collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It contains 60,000 images of 10 different types of objects (truck, automobile, cat, etc.).

This tutorial is divided into three parts:

0. Configure AzureML. This notebook sets up AzureML and saves the necessary configuration info to a file for other notebooks.
1. Load data. This notebook downloads the CIFAR-10 dataset and saves it to disk. It also prepares the dataset for remote runs on Azure by saving it to blob storage. 
2a. Train a model. This notebook trains a deep learning model to classify images as one of the CIFAR-10 categories (truck, cat, etc.). This is done locally.
3. Deploy a model. This notebook shows you how to create a REST API with you model using AzureML.

There are also three demo notebooks:
2b. Train remotely. This notebook trains a model on a Batch AI cluster, leveraging the power of Batch AI to scale up and optionally scale out a distributed training job.
2c. Hyperparameter sweep. This notebook leverages the HyperDrive service to explore the hyperparameter space and improve the model's performance.
4. AutoML. This notebook demonstrates the simplicity and power of AutoML for automated machine learning.

This tutorial was originally created for Microsoft's internal machine learning and data science conference (MLADS), but you can also run it on an Ubuntu DSVM of your own outside of the conference.

### Part 1: Load data

This tutorial will show how to prepare image data sets for use with deep learning algorithms in Keras.

In [None]:
from IPython.display import Image as ShowImage
ShowImage(url="https://cntk.ai/jup/201/cifar-10.png", width=500, height=500)

## Save the dataset to disk

Keras includes a convenient method to load the complete CIFAR-10 dataset into memory. Here we serialize it to disk so we can demonstrate how to upload it to blob storage for AzureML remote runs.

In [None]:
import pickle as pc
from keras.datasets import cifar10

data_dir = './data/cifar10'
os.makedirs(data_dir, exist_ok=True)

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
with open(os.path.join(data_dir, 'x_train'), 'wb') as f:
    pc.dump(x_train, f)
with open(os.path.join(data_dir, 'y_train'), 'wb') as f:
    pc.dump(y_train, f)
with open(os.path.join(data_dir, 'x_test'), 'wb') as f:
    pc.dump(x_test, f)
with open(os.path.join(data_dir, 'y_test'), 'wb') as f:
    pc.dump(y_test, f)

## Upload the data to the AzureML datastore

AzureML supports running a Python file remotely on Azure to take advantage of GPU VM instances or simplified distributed training on Batch AI. You can easily develop the file on your DSVM, then scale on Azure as needed. We need to upload our dataset to blob storage so it is accessible by these remote runs.

In [None]:
import azureml
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

In [None]:
ds = ws.get_default_datastore()
ds.upload(src_dir='./data/cifar10', target_path='cifar10', overwrite=True, show_progress=True)