<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Lab: Fun with Neural Nets

---

Below is a procedure for building a neural network to recognize handwritten digits.  The data is from [Kaggle](https://www.kaggle.com/c/digit-recognizer/data), and you will submit your results to Kaggle to test how well you did!

1. Load the training data (`train.csv`) from [Kaggle](https://www.kaggle.com/c/digit-recognizer/data)
2. Setup X and y (feature matrix and target vector).
3. Split X and y into train and test subsets.
4. Preprocess your data:

   - When dealing with image data, you need to normalize your `X` by dividing each value by the max value of a pixel (255).
   - Since this is a multiclass classification problem, keras needs `y` to be a one-hot encoded matrix.
   
5. Create your network:
   - Remember that for multi-class classification you need a softmax activation function on the output layer.
   - You may want to consider using regularization or dropout to improve performance.
   
6. Train your network.
7. If you are unhappy with your model performance, try to tighten up your model by adding hidden layers, adding hidden layer units, chaining the activation functions on the hidden layers, etc.
8. Load in [Kaggle's](https://www.kaggle.com/c/digit-recognizer/data) `test.csv`.
9. Create your predictions (these should be numbers in the range 0-9).
10. Save your predictions and submit them to Kaggle.

---

For this lab, you should complete the above sequence of steps for **_at least_** two of the four **"configurations"**:

1. Using a `tensorflow` network
2. Using a `keras` convolutional network
3. Using a `keras` network with regularization
4. Using a `tensorflow` convolutional network (we did _not_ cover this in class!)

In [58]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.utils import to_categorical

np.random.seed(42)        # Set seed for reproducibility

In [59]:
# 1. Load the training data (`train.csv`) from [Kaggle](https://www.kaggle.com/c/digit-recognizer/data)
# train_data.shape = (42000, 785)
# test_data.shape = (28000, 784)
train_data = pd.read_csv('data/train.csv')

In [60]:
# 2. Setup X and y (feature matrix and target vector).
X = train_data.drop(columns = ['label'])
y = train_data['label']

# 4. Preprocess your data: I think it's easier if doing this before Split
# Normalize X
X = X / 255.0

# One-hot encode y
ec = OneHotEncoder(sparse_output = False)
y = ec.fit_transform(y.values.reshape(-1, 1))

In [61]:
# 3. Split X and y into train and test subsets.

X_train, X_dev, y_train, y_dev = train_test_split(X, y, test_size = 0.2, stratify = y)

In [62]:
# 5. Create your network:

model = Sequential([Input(shape = (784,))
                    , Dense(128, activation = 'relu', kernel_regularizer = 'l2')     # Hidden Layer 1
                    , Dropout(0.3)                                                   # Drop for prevent overfitting
                    , Dense(64, activation = 'relu')                                 # Hidden Layer 2
                    , Dense(10, activation = 'softmax')                              # Output layer, use softmax for multiclass
])

# Compile the model
model.compile(optimizer = 'adam'
              , loss='categorical_crossentropy'                                      # For multiclass
              , metrics = ['accuracy']
             )

In [72]:
hist = model.fit(
    X                                   # Input data (features)
    , y                                 # Target/class labels (one-hot encoded classes 0-9)
    , batch_size = 32                   # Number of the samples processed in each batch
    , epochs = 10                       # Number of complete passes through the dataset
    ,  validation_data = (X_dev, y_dev)
)

Epoch 1/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step - accuracy: 0.8081 - loss: 1.1834 - val_accuracy: 0.9267 - val_loss: 0.4024
Epoch 2/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.9143 - loss: 0.4476 - val_accuracy: 0.9371 - val_loss: 0.3606
Epoch 3/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.9224 - loss: 0.4184 - val_accuracy: 0.9551 - val_loss: 0.3115
Epoch 4/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 4ms/step - accuracy: 0.9240 - loss: 0.4020 - val_accuracy: 0.9554 - val_loss: 0.3003
Epoch 5/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.9273 - loss: 0.3955 - val_accuracy: 0.9588 - val_loss: 0.2952
Epoch 6/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.9292 - loss: 0.3894 - val_accuracy: 0.9604 - val_loss: 0.2969
Epoch 7/10
[1m1

In [78]:
# 8. Load in [Kaggle's](https://www.kaggle.com/c/digit-recognizer/data) `test.csv`.
# Since the test_data has only features, we directly apply normalization
test_data = pd.read_csv('data/test.csv').values / 255.0

In [84]:
predictions = model.predict(test_data)

[1m875/875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step


In [86]:
predictions

array([[1.1065687e-06, 3.1684788e-08, 9.9978656e-01, ..., 4.9466449e-05,
        3.4187653e-06, 1.6961869e-07],
       [9.9988997e-01, 7.8008560e-09, 5.0151622e-05, ..., 7.4794144e-07,
        1.4477371e-07, 1.1240304e-06],
       [9.4017538e-05, 3.3323202e-04, 4.2328381e-04, ..., 9.3259672e-03,
        6.0876675e-02, 8.9839578e-01],
       ...,
       [8.7290974e-09, 7.8401001e-07, 2.1090504e-05, ..., 1.6935579e-07,
        1.1264300e-04, 1.0540335e-04],
       [4.8519458e-05, 9.3332233e-07, 3.2907927e-05, ..., 2.9163393e-03,
        3.0330857e-04, 9.7573030e-01],
       [6.0195080e-06, 2.2286278e-07, 9.9854183e-01, ..., 4.4653140e-04,
        6.6090215e-05, 1.9844394e-05]], dtype=float32)

In [88]:
predicted_labels = np.argmax(predictions, axis = 1)
predicted_labels

array([2, 0, 9, ..., 3, 9, 2], dtype=int64)

In [90]:
submission = pd.DataFrame({'ImageId': range(1, len(predicted_labels) + 1), 'Label': predicted_labels})
submission.to_csv('data/submission.csv', index = False)

In [None]:
# 0.95300 on kaggle

In [117]:
model2 = Sequential([Input(shape = (784,))
                    , Dense(256, activation = 'relu', kernel_regularizer = 'l2')     # Hidden Layer 1
                    , Dropout(0.3)                                                   # Drop for prevent overfitting
                    , Dense(256, activation = 'relu')                                 # Hidden Layer 2
                   # , Dropout(0.3)                                                   # Drop for prevent overfitting
                    , Dense(256, activation = 'relu')                                 # Hidden Layer 3
                    , Dense(10, activation = 'softmax')                              # Output layer, use softmax for multiclass
])

# Compile the model
model2.compile(optimizer = 'adam'
              , loss='categorical_crossentropy'                                      # For multiclass
              , metrics = ['accuracy']
             )

In [119]:
hist2 = model2.fit(
    X                                   # Input data (features)
    , y                                 # Target/class labels (one-hot encoded classes 0-9)
    , batch_size = 32                   # Number of the samples processed in each batch
    , epochs = 10                       # Number of complete passes through the dataset
    ,  validation_data = (X_dev, y_dev)
)

Epoch 1/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.8315 - loss: 1.2816 - val_accuracy: 0.9313 - val_loss: 0.4295
Epoch 2/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 6ms/step - accuracy: 0.9181 - loss: 0.4802 - val_accuracy: 0.9504 - val_loss: 0.3591
Epoch 3/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 6ms/step - accuracy: 0.9263 - loss: 0.4351 - val_accuracy: 0.9463 - val_loss: 0.3638
Epoch 4/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 6ms/step - accuracy: 0.9283 - loss: 0.4227 - val_accuracy: 0.9589 - val_loss: 0.3266
Epoch 5/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 6ms/step - accuracy: 0.9289 - loss: 0.4162 - val_accuracy: 0.9621 - val_loss: 0.3112
Epoch 6/10
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 6ms/step - accuracy: 0.9325 - loss: 0.3992 - val_accuracy: 0.9555 - val_loss: 0.3291
Epoch 7/10
[1m

In [121]:
predictions2 = model2.predict(test_data)
predicted_labels2 = np.argmax(predictions2, axis=1)
predicted_labels2

[1m875/875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step


array([2, 0, 9, ..., 3, 4, 2], dtype=int64)

In [123]:
submission2 = pd.DataFrame({'ImageId': range(1, len(predicted_labels2) + 1), 'Label': predicted_labels2})
submission2.to_csv('data/submission2.csv', index = False)

In [None]:
# 0.94932

![image](images/kaggle.png)