# PostScriptML: Modeling Notebook
### by Dolci Key 

### Import Libraries

In [1]:
import pandas as pd
import numpy as py

import matplotlib.pyplot as plt
%matplotlib inline 

import keras
import tensorflow as tf
from tensorflow import keras
from keras import layers
from keras.preprocessing import image
from tensorflow.keras.preprocessing import image
from tensorflow.keras import models, layers, optimizers
from tensorflow.keras.layers import BatchNormalization
from PIL import Image

import os
import gc

import pickle
from timeit import default_timer as timer

Using TensorFlow backend.





### Starting a AWS SageMaker Instance
Import specific libraries in an AWS SageMaker Notebook Instance. This will not work in a normal Jupyter Notebook environment. Once you have the libraries, you will start a session and connect your data. 

In [2]:
# AWS  Sagemaker Needed using AWS Sagemaker Notebook Instance

import sagemaker
from sagemaker.tensorflow import TensorFlow
from tensorflow.python.keras.preprocessing.image import load_img

No handlers could be found for logger "sagemaker"


In [3]:
sagemaker_session = sagemaker.Session()

In [4]:
role = sagemaker.get_execution_role()

### S3 Bucket Connection
Here I am connecting my S3 bucket. You MUST have 'sagemaker-' as the prefix on the name of your bucket for this to work. Please note that once the bucket is made, you cannot rename the bucket, however, you can move the data from one bucket to another if you make this mistake. 


In [5]:
bucket = 'sagemaker-postscriptml' # AWS S3 Bucket path to dataset
train_instance_type = 'ml.m4.xlarge' # AWS EC2 Instance used for training
deploy_instance_type = 'ml.m4.xlarge' # AWS EC2 Instance used for deployment
hyperparameters = {'learning_rate': 0.001, 'decay': 0.0001}


In [6]:
train_input_path = 's3://{}/TRAIN'.format(bucket) # Path to training data 
test_input_path = 's3://{}/TEST'.format(bucket) # Path to test data
validation_input_path = 's3://{}/VALIDATION'.format(bucket) # Path to validation data 


### Creating a Model 
Once I had my data set up, I created the model using TensorFlow. These steps varied when using Python 3 or 2.7. Using 2.7 I was able to list the training and evaluation steps, otherwise in python 3, the version of python had to be specified and training/evaluation steps had to be moved to my hyperparameter dictionary. 

I read in the script from my SCRIPTS folder on my repo. There you can find each Script that I tested. I then logged each accuracy and also the highest step accuracy from the evaluation to keep track of my modeling scripts. 

My baseline model
My first run through model had an accuracy of .90625 (start_script_i.py). 

Accuracy is a decent measurement for my model, however, recall and F1 are what I plan on using after MVP as AWS requires specific needs and functions in order to run these on the model. 

In [15]:
estimator = TensorFlow(
  entry_point=os.path.join(os.path.dirname('__file__'), "SCRIPTS/baseline_model.py"), # Your entry script
  role=role,
  framework_version='1.12.0', # TensorFlow's version
  training_steps = 100,
  evaluation_steps = 30, 
  hyperparameters=hyperparameters, # For python 3 you have to specify evaluation and training steps in the above hyperparameters
  train_instance_count=1,   # "The number of GPUs instances to use"
  train_instance_type=train_instance_type,
)

In [None]:
print("Training ...")
estimator.fit({'training': train_input_path, 'eval': validation_input_path})

Training ...
2020-09-22 22:43:27 Starting - Starting the training job...
2020-09-22 22:43:28 Starting - Launching requested ML instances......
2020-09-22 22:44:31 Starting - Preparing the instances for training...
2020-09-22 22:45:20 Downloading - Downloading input data...
2020-09-22 22:45:47 Training - Downloading the training image..[34m2020-09-22 22:46:06,415 INFO - root - running container entrypoint[0m
[34m2020-09-22 22:46:06,415 INFO - root - starting train task[0m
[34m2020-09-22 22:46:06,431 INFO - container_support.training - Training starting[0m
[34mDownloading s3://sagemaker-us-east-2-997425579135/sagemaker-tensorflow-2020-09-22-22-43-26-532/source/sourcedir.tar.gz to /tmp/script.tar.gz[0m
[34m2020-09-22 22:46:08,945 INFO - tf_container - ----------------------TF_CONFIG--------------------------[0m
[34m2020-09-22 22:46:08,945 INFO - tf_container - {"environment": "cloud", "cluster": {"master": ["algo-1:2222"]}, "task": {"index": 0, "type": "master"}}[0m
[34m2020-