In [None]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.keras as keras
%matplotlib inline
tf.__version__

# Image Classification with CIFAR-10 Dataset
[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) is a widely used benchmark dataset for image classifiers. The dataset consists of 10 classes of color images of size $32\times 32$. Let's build a neural network with **convolutional layers** to classify the images.

### Download the dataset
- Use `request` to download the tar file from [https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)
- Use `tarfile` to extract files
- Use `pickle` to load the data

In [None]:
import os
import requests

filename = "cifar-10-python.tar.gz"
if not os.path.isfile(filename):
    print("Downloading CIFAR10 dataset...")
    url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
    file = requests.get(url)

    print("Writing to file", filename, "...")
    with open(filename, "wb") as f:
        f.write(file.content)

In [None]:
import tarfile
datapath = "cifar-10-batches-py/" 
if not os.path.isdir(datapath):
    print("Extracting files...")
    tar = tarfile.open(filename)
    tar.extractall()
    tar.close()

The dataset is broken into batches to **prevent** your machine from running **out of memory**. The CIFAR-10 dataset consists of 5 batches, named `data_batch_1`, `data_batch_2`, etc..

In [None]:
# load one batch
import pickle
with open(datapath + "data_batch_1", "rb") as f:
    batch = pickle.load(f, encoding="latin1")
    features = batch['data'].reshape([len(batch['data']), 3, 32, 32]).transpose(0, 2, 3, 1)
    labels = batch['labels']
print("feature size:", features.shape)
print("label size:", len(labels))

The label data is just a list of 10000 numbers in the range 0-9, which corresponds to each of the 10 classes in CIFAR-10. 

* **airplane**
* **automobile**
* **bird**
* **cat**
* **deer**
* **dog**
* **frog**
* **horse**
* **ship**
* **truck**

In [None]:
# Show a sample image
sample_id = 243
plt.imshow(features[sample_id])
label_names = ['airplane', 'automobile', 'bird',
            'cat', 'deer', 'dog', 'frog',
            'horse', 'ship', 'truck']
plt.xlabel(label_names[labels[sample_id]])

### How to reshape into a such form?

The row vector (3072) has the exact same number of elements if you calculate 32\*32\*3==3072. In order to reshape the row vector, (3072), there are two steps required. The **first** step is involved with using **reshape** function in numpy, and the **second** step is involved with using **transpose** function in numpy as well.

By definition from the official web site, **reshape** function gives a new shape to an array without changing its data. Here, the phrase **without changing its data** is an important part. **reshape** operations should be delivered in three more detailed step. The following direction is described in a logical concept. 

1. divide the row vector (3072) into 3 pieces. Each piece corresponds to the each channels.
  - this results in (3 x 1024) dimension of tensor
2. divide the resulting tensor from the previous step with 32. 32 here means width of an image.
  - this results in (3 x 32 x 32)

In order to implement the directions written in logical sense in numpy, **reshape** function should be called in the following arguments, (10000, 3, 32, 32). As you noticed, reshape function doesn't automatically divide further when the third value (32, width) is provided. We need to explicitly specify the value for the last value (32, height)


This is not the end of story. Now, the image data is represented as (num_channel, width, height) form. However, **this is not the shape tensorflow and matplotlib are expecting**. They are expecting different shape of (width, height, num_channel) instead. We need to swap the order of each axes, and that is where **transpose** function comes in.

The **transpose** function can take a list of axes, and each value specifies where it wants to move around. For example, calling transpose with argument (1, 2, 0) in an numpy array of (num_channel, width, height) will return a new numpy array of (width, height, num_channel).

<img src="./reshape-transpose.png" alt="Drawing" style="width: 800px;"/>

In [None]:
# Load all images from batch 1-5
train_features = np.empty([0, 32, 32, 3], dtype=np.uint8)
train_labels = np.empty([0])
for k in range(1, 6):
    with open(datapath + "data_batch_" + str(k), "rb") as f:
        batch = pickle.load(f, encoding="latin1")
        features = batch["data"].reshape([len(batch['data']), 3, 32, 32]).transpose(0, 2, 3, 1)
        labels=batch['labels']
        print("features shape:", features.shape)
        print("labels shape:", len(labels))
        train_features = np.append(train_features, features, axis=0)
        train_labels = np.append(train_labels, labels, axis=0)
print("train_features shape:", train_features.shape)
print("train_labels shape:", train_labels.shape)
        

## Explore the data
- Show image statistics
- Show sample images

In [None]:
# Display stats
print('# of Samples: {}\n'.format(len(train_features)))

df = pd.DataFrame(train_labels, columns=['label'])
df['class'] = df['label'].apply(lambda x: label_names[int(x)])
df['class'].value_counts()

In [None]:
# Show more images
plt.figure(figsize=(10, 10))

for i in range(25):
    plt.subplot(5, 5, i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_features[i])
    plt.xlabel(label_names[np.argmax(train_labels[i])])

## Preprocess the data

#### Normalization
- this simply makes all x values to range between 0 and 1.
- y = (x-min) / (max-min)

#### One-hot encoding
- the model is designed to show the probabilities of an image for each class.
- the ground truth label should be transformed to a set of probabilities to match with the model output
- for label $k$, set the $k$-th probability to 1, and set all other probabilities to 0.

In [None]:
# normalize data
def normalize(x):
    """
        argument
            - x: input image data in numpy array [32, 32, 3]
        return
            - normalized x 
    """
    min_val = np.min(x)
    max_val = np.max(x)
    x = (x-min_val) / (max_val-min_val)
    return x

In [None]:
# one-hot encode data
def one_hot_encode(x):
    """
        argument
            - x: a list of labels
        return
            - one hot encoding matrix (number of labels, number of class)
    """
    encoded = np.zeros((len(x), 10))
    
    for idx, val in enumerate(x):
        encoded[idx][int(val)] = 1
    
    return encoded

In [None]:
train_features_scaled = normalize(train_features)
train_labels_encoded = one_hot_encode(train_labels)

## Save the preprocessed data with Pickle

**Note: test_batch should also be processed in the same manner.**

In [None]:
# preprocess data and save
pickle.dump((train_features_scaled, train_labels_encoded),
            open("CIFAR10_preprocessed.p", "wb"))

In [None]:
# to retrieve the data
train_features, train_labels = pickle.load(
    open("CIFAR10_preprocessed.p"), "rb")

## Build CNN model
### Create Convolutional Model

The entire model consists of 14 layers in total. In addition to layers below lists what techniques are applied to build the model.

1. Convolution with 64 different filters in size of (3x3)
2. Max Pooling by 2
  - ReLU activation function 
  - Batch Normalization
3. Convolution with 128 different filters in size of (3x3)
4. Max Pooling by 2
  - ReLU activation function 
  - Batch Normalization
5. Convolution with 256 different filters in size of (3x3)
6. Max Pooling by 2
  - ReLU activation function 
  - Batch Normalization
7. Convolution with 512 different filters in size of (3x3)
8. Max Pooling by 2
  - ReLU activation function 
  - Batch Normalization
9. Flattening the 3-D output of the last convolutional operations.
10. Fully Connected Layer with 128 units
  - Dropout 
  - Batch Normalization
11. Fully Connected Layer with 256 units
  - Dropout 
  - Batch Normalization
12. Fully Connected Layer with 512 units
  - Dropout 
  - Batch Normalization
13. Fully Connected Layer with 1024 units
  - Dropout 
  - Batch Normalization
14. Fully Connected Layer with 10 units (number of image classes)

the image below decribes how the conceptual convolving operation differs from the tensorflow implementation when you use [Channel x Width x Height] tensor format. 

<img src="https://adeshpande3.github.io/assets/Cover.png" alt="Drawing" style="width: 1000px;"/>

In [None]:
batch_size = 32
num_classes = 10
epochs = 100
num_predictions = 20

import os
save_dir = os.path.join(os.getcwd(), "saved_models")
model_name = "Keras_CIFAR10.h5"

In [None]:
# Build CNN model
model = keras.Sequential()
model.add(keras.layers.Conv2D(32, (3, 3), padding="same",
                              input_shape=features[0].shape,
                              activation=tf.nn.relu))
model.add(keras.layers.Conv2D(32, (3, 3),
                              activation=tf.nn.relu))
model.add(keras.layers.MaxPool2D(pool_size=(2, 2)))
model.add(keras.layers.Dropout(0.25))

model.add(keras.layers.Conv2D(64, (3, 3), padding="same",
                              input_shape=features[0].shape,
                              activation=tf.nn.relu))
model.add(keras.layers.Conv2D(64, (3, 3),
                              activation=tf.nn.relu))
model.add(keras.layers.MaxPool2D(pool_size=(2, 2)))
model.add(keras.layers.Dropout(0.25))

model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(512, activation=tf.nn.relu))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(num_classes, activation=tf.nn.softmax))


In [None]:
model.compile(optimizer="adam",
              loss="categorical_crossentropy",
              metrics=["accuracy"])

In [None]:
model.fit(train_features_scaled, train_labels_encoded, epochs=10)

## Dropout and Model Regularization
For a complicated model like deep neural networks, a major concern on its performance is model overfitting:
![underfitting and overfitting](https://cdn-images-1.medium.com/max/1200/1*cdvfzvpkJkUudDEryFtCnA.png)
In plain words, overfitting happens when the model is **memorizing** the training data, and become poorly at **generalizing** what they've learned to unseen data. Think about a student who memorized the entire machine learning textbook. He may appear quite knowledgable in machine learning when asked things directly from the book, but there is no way he can perform a machine project on a dataset not mentioned in the book.

### How to dentify model overfitting?
- Visualize the model (decision boundary, regression curves, etc.)
- Observe the trends in training loss and the testing loss
![](https://cdn-images-1.medium.com/max/1600/1*vuZxFMi5fODz2OEcpG-S1g.png)

### How to prevent model overfitting?
1. Start with a simple model
![](https://image.slidesharecdn.com/lawsofwebdesign-091104020153-phpapp01/95/laws-of-web-development-11-728.jpg?cb=1257384621)
2. Add penalty to complicated models
    - L1 Regularizor
    - L2 Regularizor
    - Elastic Net

3. (For Neural Networks) Dropout layers: remove weights to the next layer
![](https://cdn-images-1.medium.com/max/1800/1*iWQzxhVlvadk6VAJjsgXgg.png)


In [None]:
# Save model and weights
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
model_path = os.path.join(save_dir, model_name)
model.save(model_path)
print('Saved trained model at %s ' % model_path)

# Score trained model.
scores = model.evaluate(valid_features, valid_labels, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])