# Google Colab

The following lines of code will configure your Google Colab environment for this assignment. For those interested in running a local Jupyter server, please consider using one of the following options to ensure dependency compatibility:

1. Precompiled Docker images: see https://github.com/peterchang77/install (**recommended**)
2. Conda environment files: see https://github.com/peterchang77/dl_utils/tree/master/envs

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

### Mount Google Drive

The Google Colab environment is transient and will reset after any prolonged break in activity. To retain important and/or large files between sessions, use the following lines of code to mount your personal Google drive to this Colab instance:

In [1]:
try:
    # --- Mount gdrive to /content/drive/My Drive/
    from google.colab import drive
    drive.mount('/content/drive')
    
except: pass

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


Throughout this assignment we will use the following global `MOUNT_ROOT` variable to reference a location to store long-term data. If you are using a local Jupyter server and/or wish to store your data elsewhere, please update this variable now.

In [0]:
# --- Set data directory
MOUNT_ROOT = '/content/drive/My Drive'

### Select Tensorflow library version

This assignment will use the (new) Tensorflow 2.0 library. Use the following line of code to select this updated version:

In [None]:
# --- Select Tensorflow 2.0 (only in Google Colab)
%tensorflow_version 2.x





# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [4]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md



### Imports

Use the following lines to import any additional needed libraries:

In [0]:
import numpy as np, pandas as pd
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers
from jarvis.train import datasets


# Data

As in the tutorial, data for this assignment will consist of the MNIST handwritten digit dataset. The following lines of code will:

1. Download the dataset (if not already present) 
2. Prepare the necessary Python generators to iterate through dataset
3. Prepare the corresponding Tensorflow Input(...) objects for model definition

In [6]:
# --- Download dataset
datasets.download(name='mnist')

# --- Prepare generators and model inputs

gen_train, _, client = datasets.prepare(name='mnist')
inputs = client.get_inputs(Input)




**Note**: There is no need to change the above code for this assignment.

# Training

In this assignment we will train a multilayer perceptron, e.g. a simple neural network with at least one hidden layer. Be creative; feel free to try various permutations of: 

* number(s) of hidden layer(s)
* size of hidden layer(s)
* learning rate
* training iterations 

### Define the model

In [0]:
# --- Define model
h1 = layers.Dense(60, "relu")(inputs['dat'])
h2 = layers.Dense(50, "relu")(h1)
h3 = layers.Dense(40, "relu")(h2)
h4 = layers.Dense(30, "relu")(h3)
h5 = layers.Dense(20, "relu")(h4)
# 94%: 25,20,15
logits = {}
logits['digit'] = layers.Dense(10 , name='digit')(h5)
model = Model(inputs=inputs, outputs=logits)




### Compile the model

In [0]:
# --- Define loss and optimizer
loss = losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = optimizers.Adam(learning_rate=2e-4)

# --- Compile model
model.compile(
    optimizer=optimizer,
    loss={'digit': loss},
    metrics={'digit': 'sparse_categorical_accuracy'})

### Train the model

In [9]:
model.fit_generator(
    generator=gen_train, 
    steps_per_epoch=500, 
    epochs = 50)

Instructions for updating:
Please use Model.fit, which supports generators.
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x7f19acc07da0>

# Evaluation

Based on the tutorial discussion, use the following cells to check your algorithm performance. Consider loading a saved model and running prediction using `model.predict(...)` on the training data. 

In [12]:
arrs = client.get(row=np.arange(60000))
# --- Predict
scores = model.predict(arrs['xs']['dat'])

# --- Argmax
pred = np.argmax(scores['digit'], axis=1)


# --- Serialize as *.csv file
df = pd.DataFrame(index=client.db.fnames.index)
df['true'] =  arrs['ys']['digit'][:, 0]
df['pred'] = pred
df['corr'] = df['true'] == df['pred']

weights = model.layers[4].get_weights()[0]

# --- Print cumulative model perforance
print(df['corr'].mean())


0.9946166666666667
0.9946166666666667


**Note**: this cell is used only to check for model performance. It will not be graded. Once you are satisfied with your model, proceed to submission of your assignment below.

In [0]:
# --- Serialize a model
import os
fname = '{}/models/linear/final2.hdf5'.format(MOUNT_ROOT)
os.makedirs(os.path.dirname(fname), exist_ok=True)
model.save(fname)