# Digit Recognition Task
If this is your first time running a notebook - welcome!! Notebooks are awesome because they let us play around and experiment
with code with near-instant feedback. Some pointers:
1. To execute a cell, click on it and hit SHIFT-Enter
2. Once something is executed, the variables are in memory - inspect them!

## Getting Started
This first cell imports the necessary libraries so we can get started:

In [None]:
import torch 
import torch.nn as nn
from torchvision import datasets, transforms
import numpy as np
import matplotlib.pyplot as plt
import torch.nn.functional as F
import torch.onnx as onnx
from PIL import Image, ImageOps
from sklearn.preprocessing import OneHotEncoder, LabelEncoder

#packages for cloud 
import json
import time
import azureml
from azureml.core.model import Model
from azureml.core import Workspace, Run, Experiment
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.dnn import PyTorch
from azureml.core.image import ContainerImage, Image
from azureml.widgets import RunDetails

# 1. Load data

we will load Cifar10 dataset from Pytorch datasets.

In [None]:
transform = transforms.Compose([
                            #resize the image
                            #convert to tensor.
])

training_data = #code for reading MNIST training data.
validation_data = # code for reading MNIST validation/testing data. 

training_loader = torch.utils.data.DataLoader()
validation_loader = torch.utils.data.DataLoader()

Lets have a look into the data

In [None]:

def im_convert(img_tensor):
    img_tensor = img_tensor.cpu().clone().detach().numpy()
    img_tensor = img_tensor.transpose(1, 2, 0)
    img_tensor = img_tensor * np.array((1,1,1))
#     img_tensor = img_tensor.clip(0,1)
    return img_tensor

In [None]:
dataiter = iter(training_loader)
images, labels = dataiter.next()
fig = plt.figure(figsize=(10,4))

for i in np.arange(20):
    ax = fig.add_subplot(2, 10, i+1)
    plt.imshow(im_convert(images[i]))
    ax.set_title(labels[i].item())

# 2. Models
Now that we have some data it's time to start picking models we think might work. This is where the science part of data-science comes in: we guess and then check if our assumptions were right. Imagine models like water pipes that have to distribute water to 10 different hoses depending on 784 knobs. These 784 knobs represent the individual pixels in the digit and the 10 hoses at the end represent the actual number (or at least the index of the one with the most water coming out of it). Our job now is to pick the plumbing in between.

The next three cells represent three different constructions in an increasingly more complex order:

1. The first is a simple linear model,
2. The second is a 3 layer Neural Network,
3. The third is a full convolutional neural network
4. and the last is a full Conv neural network with padding

While it is out of the scope of this tutorial to fully explain how they work, just imagine they are basically plumbing with internal knobs that have to be tuned to produce the right water pressure at the end to push the most water out of the right
index. As you go down each cell the plumbing and corresponding internal knobs just get more complicated.

In [None]:
# This is a simple linear model. You assignment is to build complex model based on this one.
class Linear(nn.Module):
    def __init__(self):
        super().__init__(n_in, n_out)
        self.linear = nn.Linear(n_in, n_out)
    
    def forward(self, x):
        x = F.softmax(self.linear(x), dim=1)
        return x

Fill the code to build the named Neural Network models. as a reference follow the example of Linear model above. 
Tip: you can use F.relu. 

In [None]:
class NeuralNetwork():
    

In [None]:
class ConvNet():

The `learning_rate` basically specifies how fast the algorithm will learn the model parameters. Right now you're probably thinking "let's set it to fifty million #amirite?" The best analogy for why this is a bad idea is golf. I'm a terrible golfist (is that right?) so I don't really know anything - but pretend you are trying to sink a shot (again sorry) but can only hit the ball the same distance every time. Easy right? Hit it the exact length from where you are to the hole! Done! Now pretend you don't know where the hole is but just know the general direction. Now the distance you choose actually matters. If it is too long a distance you'll miss the hole, and then when you hit it back you'll overshoot again. If the distance is too small then it will take forever to get there but for sure you'll eventually get it in. Basically you have to guess what the right distance per shot should be and then try it out. That is basically what the learning rate does for finding the "hole in one" for the right parameters (ok, I'm done with the golf stuff).

Below there are three things that make this all work:
1. **The Model** - this is the function we're making that takes in the digit vector and should return the right number
2. **The Cost Function** (sometimes called the loss function). I know I promised I was done with golf but I lied. Remember how I said in our screwy golf game you knew the general direction of the hole? The cost function tells us the distance to the hole - when it's zero we're there! In actual scientific terms, the cost function tells us how bad the model is at getting the right answer. As we take shots you should see the cost function decreasing. If this does not happen then something is wrong. At this point I would change the shot distance (or `learning_rate`) to something smaller and try again. If that doesn't work maybe change the model!
3. **The Optimizer** - this part is the bit that actually changes the model parameters. It has a sense for the direction we should be shooting and updates all of the internal numbers inside the model to find the best internal knobs to predict the right digits. In this case I am using the Binary Cross Entropy cost function because, well, I know it works. There are a ton of different cost functions you can choose from that fit a variety of different scenarios.

In [None]:
# where to run
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))

#initialize the model you want to use. 

#declare the loss function. hint: you can use cross entropy loss.

#continue the code to initialize the optimizer. 
optimizer = torch.optim

In [None]:
running_loss_history = []
running_loss_correct = []
val_running_loss_history = []
val_running_correct_hostory = []


for epoch in range(epochs):
    running_loss = 0.0
    running_correct = 0.0
    
    val_running_loss = 0.0
    val_running_correct = 0.0
    
    for xxx,xxx in xxx:
        #zero the gradients
        #use the model to make prediction. 
        #calculate the loss/error
        #backward the loss
        #and update the weights.
        
        
        _, preds = torch.max(output, 1) #preds is the index of max value for that image
        running_correct += torch.sum(preds == labels.data)
        running_loss += loss
        
    for xxx, xxx in xxx: #validation data
        
        #predict the validation data. 
        #calculate the loss/error
        
        _, val_preds = torch.max(val_output, 1)
        val_running_loss += val_loss
        val_running_correct += torch.sum(val_preds==val_labels.data)
    
    #training scores
    epoch_loss = running_loss/len(training_loader)
    acc_epoch = running_correct.float()/len(training_loader)
    running_loss_history.append(epoch_loss)
    running_loss_correct.append(acc_epoch)
    
    #validation scores
    val_epoch_loss = val_running_loss/len(validation_loader)
    val_acc_epoch = val_running_correct.float()/len(validation_loader)
    val_running_loss_history.append(val_epoch_loss)
    val_running_correct_hostory.append(val_acc_epoch)
    
    
    print('training: loss {:0.4f} acc {:0.4f}, validation: loss {:0.4f} acc {:0.4f}'.format(epoch_loss, acc_epoch, val_epoch_loss, val_acc_epoch))

Check the plots below. Think about what do you understand from them. is the model underfitting/overfitting?

In [None]:
plt.plot(running_loss_history, label='training loss') 
plt.plot(val_running_loss_history, label='validation loss') 
plt.legend()

In [None]:
plt.plot(running_loss_correct, label='training accuracy') 
plt.plot(val_running_correct_hostory, label='validation accuracy') 
plt.legend()

# 3.Saving the Model
Every framework is different - in this case PyTorch let's us save the model (which you remember is just a big matrix `W` and a vector `b`) to an internal format as well as to the ONNX format. These can then be loaded up as an asset to a program that is executed every time you need to recognize a digit!

In [None]:
# create dummy variable to traverse graph
x = torch.randint(255, (1, 28*28), dtype=torch.float).to(device) / 255
onnx.export(model, x, 'model.onnx')
print('Saved onnx model to model.onnx')

# saving PyTorch Model Dictionary
torch.save(model.state_dict(), 'model.pth')
print('Saved PyTorch Model to model.pth')

# 4. To the cloud
Make sure you are running python 3.6.  If not, please follow the step below. 

Click on the "Project Settings"

![Project Setings](https://raw.githubusercontent.com/sethjuarez/pytorchintro/master/images/project_settings.png)

Next, select the "Environments" tab, choose "Python 3.6", and finally select the corresponding `requirements.txt` file.

![Settings](https://raw.githubusercontent.com/sethjuarez/pytorchintro/master/images/settings.png)

After those steps you should be good to go!

# 5.Setting up Azure Machine Learning service
The first thing you need to do is create an Azure Machine Learning workspace. There are [docs](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started#create-a-workspace) on how to do that. If you're a command line type person, I have an [example](https://github.com/sethjuarez/workspacestarter) of how you can set it up using the Azure CLI. Once you've set the project up fill in the appropriate settings for your workspace by uncommenting the first the code to write out the config file. Once the config file has been written out, you can load the workspace programmatically like I've done below.

In [None]:
subscription_id = '' 
resource_group = ''
workspace_name = ''

try:
    ws = # write down the code to load the workspace we already created in previous session.
    ws.write_config()
except:
    print('The workspace {} not found'.format(workspace_name))
    
# once you run the above code once, you can use the written config
ws = Workspace.from_config()

# 6. Cloud Compute
Next we need to define a compute target for your experiment. Since this is a brand new workspace, feel free to change the name of your cluster (I called my `racer`). The code below tries to get a reference to my cluster but if it doesn't exist, it creates it for me. If you're creating a cluster this might take a bit of time. Also, please turn these off when you're done (in fact consider setting the `min_nodes` to 0 so the cluster turns off automatically if it's idle for too long) - I don't want you to get an unexpected bill. 
vm_size = 'STANDARD_NC6'

In [None]:
compute_target_name = ''

try:
    compute = # declare compute target
    print('Found existing compute target "{}"'.format(cluster))
except:
    #if not exist, we have to create one.
    print('Creating new compute target "{}"...'.format(cluster))
    compute_config = # the configuration for compute target
    compute = #create compute target
    compute.wait_for_completion(show_output=True)

# Time to Experiment
Once our compute target has been set up it's time to package up our tiny notebook from last time into a single script that a remote compute environment can run. I've taken the time to [do that for you](train.py). In fact, if you look at the file you will see all of the exact same concepts we learned from the previous notebook (it's almost exactly the same but I have added additional things to make it easier to pass things into the script).

In AzureML service there is a concept of an experiment. For every experiment you can have multiple runs. In this case I'm using an `Estimator` object that defines how the experiment should run.

### Don't read this if you don't care what we do in the background
In the background the estimator is basically a definition of sorts for a docker image that will house your experiment. The best part about all of this is that irrespective of what you use for your experiment (a crazy custom version of TensorFlow or something) it should always run - it's a container after all. It's pretty slick.

### Back to the regular stuff
Once we submit our estimator to be run on AzureML service, it copies the contents of the current directory and packages them up to run in our new container (well, it will upload everything with the exception of anything you put describe in the [.amlignore](https://github.com/sethjuarez/pytorchintro/blob/master/.amlignore) file).

Notice also that since I'm using `argparse` I can specify external parameters to the trainin script as part of the estimator definition.

Let's run the next three lines to see what happens!

In [None]:
#declare the experiment. 
exp = Experiment(ws, '')

script_params = {
    '--lr':0.01,
    '--batch': 100,
    '--epochs':5,
    '--model':'cnn'
}

# declare the estimator, 
estimator = PyTorch()

#submit the experiment for execution.


In [None]:
RunDetails(run).show()

In [None]:
#print out a list of files created and saved locally. We are interested to the model file .pth
run.get_file_names()

In [None]:
# 7. Register the model. use the name = 'PyTorchMNIST'


In [None]:
#register the model. Same implementation as we did at previous session


# 8. Conda Dependencies, Image Container, Web Service
as the have register the model, now we are ready to create the image and deploy our model as a web service. 

In [None]:
#configure the conda dependencies and write down to file.

with open('pytorchmnist.yml','w') as f:
    print('Writing out {}'.format('pytorchmnist.yml'))
    f.write(myenv.serialize_to_string())
    print('Done!')

In [None]:
image_config = # image configuration.

image = # image deployment code 
image.wait_for_creation(show_output=True)

In [None]:
service_name = ''

#checking whether the service name already exist as a service. if yes delete.
svcs = [svc for svc in Webservice.list(ws) if svc.name==service_name]
if len(svcs) == 1:
    print('Deleting prior {} deployment'.format(service_name))
    svcs[0].delete()
    


# create service
aciconfig = AciWebservice.deploy_configuration(#fill the parameters)

service = #fill the code to deploy the image
service.wait_for_deployment(show_output=True)
print(service.scoring_uri)

You have the option of pushing the image to ACI or even a workspace Kubernetes cluster.

Sometimes things go wrong....... If it does for you run the code below to see the actual [logs](deploy.log)!

In [None]:
with open('deploy.log','w') as f:
    f.write(service.get_logs())

# 9. running the service.

In [None]:
import torch
from PIL import Image
import matplotlib.pyplot as plt

X, Y = digits[57435]
X = X * 255
plt.imshow(255 - X.reshape(28,28), cmap='gray')
print(Y)

In [None]:
# This is a string representation of the image we will POST to the endpoint
image_str = ','.join(map(str, X.int().tolist()))
print(image_str)

In [None]:
import json
import requests
service_url = service.scoring_uri
print(service_url)
r = requests.post(service_url, json={'image': image_str })
r.json()