### Week 2: Neural network in AWS SageMaker

- Today we are going to use AWS SageMaker to build, train and deploy a VNN. 
- Our data and goal are the same as Week 1 - we will use the Fashion MNIST data and our goal will be to contrust a simple WVV to classify the type of clothing. 

Our Tasks are:
1. Understand the structure of AWS-SageMaker.
2. How to create a instance on AWS-SageMaker.
3. Requesting intances with larger capacity - contact AWS support team 
3. Demonstrate how to build, train, deploy and evaluate a VNN on AWS. 
4. Remeber to **stop instance and DELETE endpoint** when they finish tasks otherwise AWS will keep charging. It can be very expensive.


### Build, train and deploy a NN model in AWS SageMaker
Data
- MNIST fashion data contains 60,000 small square 28 × 28 pixel grayscale images of 10 types of clothing, such as shoes, t-shirts, dresses, and more.
- All the code here is run on kernel **conda_tensorflow_p36** configuration

### A. Sign up for an AWS account 
If you don't already have an account [sign up here](https://portal.aws.amazon.com/billing/signup#/start)

### B. Create an AWS SageMaker instance

We will create a notebook instance that is used to download and process the data. 

1. Sign in to the [AWS SageMaker console](https://aws.amazon.com/console/) as a Root user
![title](pics/sagemaker_console.png)

2. Navigate to Notebook instances in the left menu pane, and select Create notebook instance. 
![title](pics/create_instance.png)

3. Specify your Notebook instance settings
   - Give your new instance a suitable name *e.g.* MA5852-Lab2
   - Select the **instance type** as **ml.t2.medium**. Note: This one is free. If you need intances with different capacity you can select it here. 
   - Leave **elastic inference** as **default selection (none)**
   
4. In the Permissions and encryption section, **create a new IAM role**. Leave the selections as default and select **create role**. Leave the **root access enabled as default**. 
![title](pics/notebook2.png)

5. Choose **Create Notebook instance**.

6. The **Notebook instances** section will now open, and the new notebook instance us displayed. The status will be *pending*, and this status will change to *InService* when the notebook is ready. 
![title](pics/new_instance.png)
  

### C. Preparing the data
We will now use our new notebook instance to load, prepare the Fashion MNIST data and upload the data to Amazon S3. 

1. When your notebook instance status changes to **InService**, select **Open Jupyter**
![title](pics/inservice.png)

Note, you can select Open JupyterLab to get a heap of tutorials etc... 

2. In the Notebook Instance, and either; 
    - create a new notebook using **new** and select the **kernel conda_tensorflow_p36** configuration. A new code cell will appear in your Jupyter notebook. Run the following code by copying and pasting it into your Notebook, or;
    - upload existing jupyter notebooks & python scripts using **upload**. Select the **conda_tensorflow_p36** kernal when prompted.

Before you start: Check directory structure and modify permissions

In [None]:
%%sh
ls -l

Do you have a lost+found folder with root group and owner? If so, you will need to change the permissions of to lost+found to prevent future errors. Note, I am in contact with AWS Support for a better solution and will keep you all updated!

In [None]:
%%sh
sudo chown ec2-user lost+found

In [None]:
%%sh
ls -l

In [None]:
%%sh
sudo chgrp ec2-user lost+found

In [None]:
%%sh
ls -l 

We next need to set up our environment:

In [None]:
# We first need to import the necessary libraries and define some environment variables  
## Import sagemaker and retrieve IAM role, which determines your user identity and permissions

import sagemaker #import sagemaker
print(sagemaker.__version__) #print the sagemaker version
sess = sagemaker.Session() ### Manages interactions with the Amazon SageMaker APIs and 
                           ### any other AWS services needed e.g. S3
role = sagemaker.get_execution_role() ### Get and save the IAM role as environment variable

In [None]:
## Import os, keras, numpy, pyplot and the fashion MNIST data 

import os
import keras
import numpy as np
from keras.datasets import fashion_mnist
from matplotlib import pyplot
(x_train, y_train), (x_val, y_val) = fashion_mnist.load_data()

In [None]:
# Take a quick look at data 

#Each image is represented as a 28x28 pixel grayscale images
## View shape and type of data
xtr = x_train.shape, x_train.dtype
ytr = y_train.shape, y_train.dtype

print("x_train_shape & data type:", xtr)
print("y_train_shape & data type:", ytr)

# plot some raw pixel data
for i in range(9):
  
    pyplot.subplot(330 + 1 + i)

    pyplot.imshow(x_train[i], cmap=pyplot.get_cmap('gray'))

In [None]:
# Create local directory for the data and save the test and training data here

os.makedirs("./data", exist_ok=True)
np.savez('./data/training', image = x_train, label=y_train)
np.savez('./data/test', image=x_val, label=y_val)

In [None]:
%%sh 
ls -l data ## Check that the directories have been created

We are now going to train our model on the local instance - this is an optional step and is to check if our code will run on AWS. We train the model using TensorFlow() to create a tf_estimator object. 

In [None]:
## We will use the python script that made in Lab 1 to train our VNN model. 
## If you haven't already uploaded this into your notebook instance then do that now. 

## We first need to get the python script from Lab 1 for the analysis. Upload into our notebook instance. 

#Import tensorflow from sagemaker
from sagemaker.tensorflow import TensorFlow

## documentation https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html

#Set environment variables - file paths to data and for output
local_training_input_path = 'file://data/training.npz'
local_test_input_path = 'file://data/test.npz'
output = 'file:///tmp'

tf_estimator = TensorFlow(entry_point='mnist_fashion_vnn_tf2.py', #path to local python source file to be executed
                          role = role, #the IAM ROLE ARN for the model - unique user ID
                          source_dir ='.', #path to the directory where any other dependancies are apart from entry point
                          instance_count = 1, #the number of EC2 intances to use
                          instance_type ='local', # Type of EC2 instance to use local = this one! 
                          framework_version = '2.1.0', # Tensorflow version for executing your tf code
                          py_version ='py3', #version of python for executing your model training code
                          script_mode =True, #enables us to use our python script to train the model
                          hyperparameters={'epochs':1}, #hyperparameters used by our custom TensorFlow code during model training
                          output_path = output) #location for saving the results. Default = saved in the default S3 bucket.

#Note, Estimator is a high level interface for SageMaker training

In [None]:
#fit is used to train the model saved in the estimator object. We pass in file paths to the 
#trainng and test data (in this example they are stored locally)

tf_estimator.fit({'training': local_training_input_path, 'validation': local_test_input_path})

### Train the model in AWS

Now we know that our code is working on SageMaker (note, we can only do this because we have a small dataset and a shallow neural network - this wouldn't work with large datasets or deep neural networks), we can train our model on a larger instance. 

1. Upload the dataset to S3. S3 is a default bucket for storing data and model output in AWS
2. Select the [EC2 instance type](https://aws.amazon.com/ec2/instance-types/?trkCampaign=acq_paid_search_brand&sc_channel=PS&sc_campaign=acquisition_ANZ&sc_publisher=Google&sc_category=Cloud%20Computing&sc_country=ANZ&sc_geo=APAC&sc_outcome=acq&sc_detail=aws%20ec2%20instance%20pricing&sc_content={ad%20group}&sc_matchtype=e&sc_segment=489278081276&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Cloud%20Computing|EC2|ANZ|EN|Sitelink&s_kwcid=AL!4422!3!489278081276!e!!g!!aws%20ec2%20instance%20pricing&ef_id=Cj0KCQjw1PSDBhDbARIsAPeTqrdxZQ3nAtQNtB_MzOowvGLxppgm3YnqP08nDUrv8ubtE_Y19XwRNIcaAkaXEALw_wcB:G:s&s_kwcid=AL!4422!3!489278081276!e!!g!!aws%20ec2%20instance%20pricing) for your model. For this subject, we mainly use *ml.m4.xlarge*. EC stands for Elastic Compute Clous, and its a web service where AWS subscribers can request and provision compute services in the AWS cloud. You'll be charged per hour with different rates, depending on the instance you choose. Don't forget to terminate the instance when you're done to stop being over-charged. 

In [None]:
#Upload data to S3 bucket
## Note - we get a certain capacity for free, after that you are charged

prefix = 'keras-mnist-fashion' #first define a prefix for the key (think of this like a directory or file path)

#upload a local file/directory to S3 using upload_data(). 
##inputs = path, bucket (if not specifified will use default_bucket), optional prefix for directory structure
training_input_path = sess.upload_data('data/training.npz', key_prefix = prefix+'/training')

test_input_path = sess.upload_data('data/test.npz', key_prefix = prefix+'/validation')

print(training_input_path)
print(test_input_path) ### note - you can look at your buckets in the S3 section of AWS. 

### Train with managed instances

Used [managed spot instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html) to save money. 

In [None]:
tf_estimator = TensorFlow(entry_point='mnist_fashion_vnn_tf2.py',  #Python script
                          source_dir = '.',
                          role=role,
                          instance_count=1, 
                          instance_type='ml.m4.xlarge', # instance type
                          framework_version='2.1.0', # Tensorflow version
                          py_version='py3',
                          script_mode=True,
                          hyperparameters={'epochs': 3},
                          ## after this line, everything is optional for managed spot instance
                          use_spot_instances=True,        # Use spot instance
                          max_run=3600,                    # Max training time
                          max_wait=7200,                  # Max training time + spot waiting time
                         ) ##note for martha ## means optional - money saving. only downside, if under specificy then will kill job. 


In [None]:
tf_estimator.fit({'training': training_input_path, 'validation': test_input_path})   

### Deploy the model

- Model deployment means to expose the model to real use.
- This means you can make inferences or predictions using your model, 
- The model is deployed in an EC2 instance 
- Deployment is via Amazon SageMaker endpoints – an Amazon SageMaker endpoint is a fully managed service that allows you to make real-time inferences via a REST API. 



In [None]:
import time

tf_endpoint_name = 'keras-tf-fmnist-'+time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime()) #give the endpoint a name.
#used the time and date from the time library

#deploy() Deploys the Model to an Endpoint and optionally return a Predictor.
tf_predictor = tf_estimator.deploy(initial_instance_count=1, # The initial number of instances to run in the Endpoint created from this Model.
                                   instance_type='ml.m4.xlarge', # The EC2 instance type to deploy this Model to.
                                   endpoint_name=tf_endpoint_name) # The name of the endpoint to create     

### Prediction exercise


In [None]:
  %matplotlib inline
import random #random number generator for random sampling
import matplotlib.pyplot as plt #for plotting

#select 10 of the test samples (images) randomly
num_samples = 10
indices = random.sample(range(x_val.shape[0] - 1), num_samples)
images = x_val[indices]/255
labels = y_val[indices]

for i in range(num_samples): #plot them with their labels 
    plt.subplot(1,num_samples,i+1)
    plt.imshow(images[i].reshape(28, 28), cmap='gray')
    plt.title(labels[i])
    plt.axis('off')
    
# Generate predictions for those random test images
# Apply the preductor() function to a Predictor oject
# It returns inferences for the given input - in this case the images

prediction = tf_predictor.predict(images.reshape(num_samples, 28, 28, 1))['predictions']
prediction = np.array(prediction) #save the predictions as a np.array (softmax decimal probabilties)
print(prediction)
predicted_labels = prediction.argmax(axis=1) #use argmax to turn the predictions into class labels
print('Predicted labels are: {}'.format(predicted_labels)) # print out the labels

### Delete the endpoint

Remember to delete the endpoint when you are not using to avoid unnecessary surcharge from AWS.

In [None]:
tf_predictor.delete_endpoint()