In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

import keras
from keras import backend as K

cfg = K.tf.ConfigProto()
cfg.gpu_options.allow_growth = True
cfg.gpu_options.per_process_gpu_memory_fraction=0.333
K.set_session(K.tf.Session(config=cfg))

# Introduction to Keras

#### @author Alec Chapman

This tutorial was adapted from [this keras blog](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html)


The data comes from a [Kaggle competition to classify images as being cats or dogs. The data can be downloaded [here](https://www.kaggle.com/c/dogs-vs-cats/data) after signing into Kaggle (either via a Kaggle account or Google, Facebook, or Yahoo!).

## What is Keras?

Keras is a high-level deep learning API written in Python. Keras uses [TensorFlow](https://www.tensorflow.org/) (Google), [CNTK](https://github.com/Microsoft/cntk) (Microsoft), or [Theano](http://deeplearning.net/software/theano/) (University of Montreal) as a backend.



### Import some modules needed for our tutorial

In [None]:
import glob, os
import random
import numpy as np
from PIL import Image
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img

import keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization
from keras.optimizers import RMSprop, Adagrad, Adam
from keras.models import load_model
import matplotlib.pyplot as plt


#### Note that I'm using the TensorFlow Backend

This is controlled by an environment variable and is set in
```bash
/home/user/.keras/keras.json
```
which for me has the following content:

```json

    "epsilon": 1e-07,
    "floatx": "float32",
    "image_data_format": "channels_last",
    "backend": "tensorflow"
}
```

### Set the data directory paths

In [None]:
# DATADIR = '/home/jovyan/DATA/keras_cat_dog/data'
import getpass
if getpass.getuser() == 'alec':
    DATADIR = "./data_alec/cats_vs_dogs/"
    MODELDIR = './saved_models'
else:
    DATADIR = os.path.join(os.path.expanduser('~'), 'DATA/DeepLearning/data/cats_vs_dogs/')
    MODELDIR = os.path.join(os.path.expanduser('~'), 'DATA/DeepLearning/saved_models')
TRAINDIR = os.path.join(DATADIR, 'train')
VALDIR = os.path.join(DATADIR, 'val')
assert os.path.exists(DATADIR)
assert os.path.exists(MODELDIR)

batch_size = 16

## Overview
## CNNs for Computer Vision
In this tutorial we'll build a Convolutional Neural Network to solve the age-old problem: Is it a **dog**, or a **cat**? 

Here's what we'll do today:
- First, we'll look at what it actually means to deal with images in machine learning. 
- Then we'll starting using Keras, a great library that providers a higher-level API to sit on top of TensorFlow. 
- Finally, we'll train our model (for a bit) and then use a pre-trained model to classify a batch of images.

### Working with Images in Python

Convolutional neural networks (CNNs) are often associated with computer vision. They're great at detecting edges, shapes, and higher-level features in images and using those findings to make a decision, such as classifying between cats and dogs. But how do we actually get these images into the neural net?

Basically, images can be seen as arrays of pixels. If we flatten them, it would be one long array of numbers, where each array corresponds to a pixel. CNNs allow us to keep a grid-like shape rather than dealing with flat, 1-dimensional arrays. Specifically, our images will look like this:

   ##### Height x Width x Channels 
where channels corresponds to the color channels (3 for RGB, 1 for grayscale). So a list of images being fed into a neural network will look like this:
   ##### # of images x Height x Width x Channels
   
This array of matrices is often called a *tensor*.
   
Let's look at some examples.

A great library for working with images in Python is the [Python Image Library](https://pillow.readthedocs.io/en/4.2.x/), or **PIL** (actually now **Pillow**, a fork of PIL).


In [None]:
from PIL import Image
example = os.path.join(TRAINDIR, 'cat', 'cat.12497.jpg')
img = Image.open(example)
print("Width, height")
print(img.size)
img

That's great for a human. Now let's convert it to something the computer can understand:

In [None]:
arr = np.array(img)
print("Height, width, Channels")
print(arr.shape)
print(arr)

In [None]:
# And back...
example = Image.fromarray(arr)
example

**PIL** offers great utility for working with images. Now let's look at Keras.

### Keras 
[Keras](https://keras.io/) is an API that allows you to work with [TensorFlow](https://www.tensorflow.org/) or [Theano](http://deeplearning.net/software/theano/) in a much more user-friendly way. Per their description:

```
"""
Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Use Keras if you need a deep learning library that:

- Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).
- Supports both convolutional networks and recurrent networks, as well as combinations of the two.
- Runs seamlessly on CPU and GPU.
"""
```

We'll get to neural nets in a minute. First let's look at some of the image functions Keras offers.

##### Image Generators
It's not always the case you have a bunch of data to train with. One solution to this is called **data augmentation**, where you create alterations of your existing data to provide more examples for your classifier. With images, that means that we'll stretch, augment, and crop the images so that we have a bunch of different versions of each of our images.

Keras also offers a great utility called `flow_from_directory` that will allow us to put images in folders divided by class and Keras will automatically load them, know their label, and convert them into arrays to train/test with

In [None]:
from keras.preprocessing.image import ImageDataGenerator
batch_size = 16

train_datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

train_generator = train_datagen.flow_from_directory(
        TRAINDIR,  # this is the target directory
        target_size=(227, 227),  # all images will be resized to 227 x 227
        batch_size=batch_size,
        class_mode='binary')  # since we use binary_crossentropy loss, we need binary labels

# No distorting for testing!
# Instead, we'll rescale so that pixel values are between 0 and 1
# Which, trust me, is very necessary.
test_datagen = ImageDataGenerator(rescale=1./255) 
validation_generator = test_datagen.flow_from_directory(
        VALDIR,
        target_size=(227, 227),
        batch_size=batch_size,
        class_mode='binary')

In [None]:
batch_size

In [None]:
os.path.exists(VALDIR)

In [None]:
x_batch, y_batch = next(train_generator)

In [None]:
# This creates 16 images from the images found in TRAINDIR. Let's look at a few of them
print(x_batch.shape)
print(y_batch)

In [None]:
ncols = 3
nimg = x_batch.shape[0]

fig = plt.figure(figsize=(18,9))
for i in range(len(x_batch)):
    x = x_batch[i]
    ax = plt.subplot2grid((nimg//ncols+1, ncols), (i//ncols,i%ncols))
    ax.imshow(x)
    #img = array_to_img(x)
    #img.show()

As you can see, some of these get distorted. While this may look weird to us, it forces the classifier to look for other features that will allow it to recognize cats vs. dogs even with these strange distortions.

Now, let's finally get to our trainer!

### Convolutional Neural Networks
CNNs are special because of what's called a Convolutional Layer. See our presentation for the details.

Here is the architecture that we're going to use, based on [Alexnet](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf). It is probably too deep for the problem, and we'll have to look out for overfitting:

## The model

### AlexNet
The [Alexnet](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) model was one of the first famous "deep" architectures for image processing. It won the 2012 ImageNet competition and while CNNs have grown deeper and more complex, this is a good example of a multi-layered CNN. Here's a diagram of the complete architecture:

The model above is based on AlexNet. Let's walk through each part of our model. See the [Keras documentation](keras.io) for more details on the implementation.

### `Sequential` model
The `Sequential` model is a linear stack of layers. We'll build our model by subsequently adding on additional layers to this object.

In [None]:
model = Sequential()

### Convolutional Layers
A common design pattern in CNNs is a combination of these four layers:
- 2D Convolutional
- Activation function (ReLU)
- Max Pooling


#### **2D Convolutional Layer**
![Example of Convolution](./images/convolution.png)

In [None]:
print("Layer 1")
model.add(Conv2D(
                filters=96,
                kernel_size=11,
                strides=4,
                padding='valid',
                input_shape= \
                    (227, 227, 3),
                data_format='channels_last')
          )

#### **Non-Linear Activation Function**
![ReLU](./images/relu.png)

In [None]:
model.add(Activation('relu'))

#### **Max Pooling**
![Max Pooling](./images/max_pooling.jpeg)

In [None]:
model.add(MaxPooling2D(
                      pool_size=(3, 3),
                      strides=(2,2),
                      data_format='channels_last')
         )

Each time we have some combination of these layers, we'll call it a **Convolutional Layer**. So after our first Convolutional Layer, here's what our model looks like:

In [None]:
model.summary()

We'll add four more similar layers:

In [None]:
print("Layer 2")
model.add(Conv2D(256, 5, strides=1, padding='valid', data_format='channels_last'))
model.add(Activation('relu'))
model.add(MaxPooling2D(
          pool_size=(3, 3),
          strides=(2,2),
           data_format='channels_last')
         )

print("Layer 3")
model.add(Conv2D(
           384, 3,
           strides=1,
           padding='valid',
           data_format='channels_last')
         )

print("Layer 4")
model.add(Conv2D(
                 256, 3,
                 strides=1,
                 padding='valid',
                 data_format='channels_last')
          )


print("Layer 5")
model.add(Conv2D(256, 3, strides=1, padding='valid', data_format='channels_last'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2)))

In [None]:
model.summary()

### Fully-Connected and Dropout Layers
At this point, we *flatten* our inputs so that we're now dealing with 1-dimensional vectors.

#### Fully-Connected (Dense)
![Fully-Connected Layer](./images/fully_connected.jpeg)

In [None]:
model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))

#### Dropout Layers
![Dropout Layer](./images/dropout.png)

In [None]:
model.add(Dropout(0.5))
model.add(Dense(4096))
model.add(Dropout(0.5))
model.add(Dense(1000))

In [None]:
model.summary()

### Output Layer
We then finally create an output layer. This layer will have **N** elements, where **N** is the number of classes we are trying to predict. In this case, our task is binary, so N=1.
![Output Layer](./images/output.png)

And we use a **sigmoid function** to squash our output value to be between 0 and 1, which we interpret as probability.
![Output Layer](./images/sigmoid.png)

In [None]:
model.add(Dense(1))
model.add(Activation('sigmoid'))

Our complete model looks like this:

In [None]:
model = Sequential()
print("Layer 1")
model.add(Conv2D(
                filters=96,
                kernel_size=11,
                strides=4,
                padding='valid',
                input_shape=(227, 227, 3)
                )
          )
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(
                      pool_size=(3, 3),
                      strides=(2,2))
         )

print("Layer 2")
model.add(Conv2D(256, 5, strides=1, padding='valid'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(
          pool_size=(3, 3),
          strides=(2,2))
         )

print("Layer 3")
model.add(Conv2D(
           384, 3,
           strides=1,
           padding='valid')
         )
model.add(BatchNormalization())

print("Layer 4")
model.add(Conv2D(
                 256, 3,
                 strides=1,
                 padding='valid')
          )
model.add(BatchNormalization())

print("Layer 5")
model.add(Conv2D(256, 3, strides=1, padding='valid'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2)))

model.add(Flatten())
model.add(Dense(4096))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(4096))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Dense(1000))
model.add(BatchNormalization())
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
          optimizer=RMSprop(lr=0.001),
          metrics=['accuracy'])

Our last step before training is to **compile** the model, which defines configures the learning process. This defines:
- The optimization algorithm
- A loss function
- A list of metrics

In [None]:
model.compile(loss='binary_crossentropy',
          optimizer='rmsprop',
          metrics=['accuracy'])

## Training
Now, we're finally ready to train! We provide our compiled model with our train and validation generators, which will read in the images we have in our data directory and perform image transformation, and will train for a total of 5 epochs. We'll then look at the training and validation scores.

However, training a deep network with images takes a **long** time. So instead, we trained a model for you that you'll be able to use post training.

In [None]:
history = model.fit_generator(
    train_generator,
    steps_per_epoch=2000 // batch_size, # Number of batches
    epochs=1, 
    validation_data=validation_generator,
    validation_steps=800//batch_size)

In [None]:
#model.save('saved_models/cats_vs_dogs_not_trained.h5')

#import pickle
#with open('logs/history_cats_vs_dogs.pkl', 'wb') as f:
#    pickle.dump(history.history, f)

### Before you go - Design Choices
One of the main tasks in machine learning is testing out different design choices. This involes identifying various components of a model that can be changed/adjusted and testing which combinations work best for the task at hand.

With CNNs, here are some model components to consider:

- **Architecture**
   - Number of layers
   - Types of layers
   - Order of layers
   - Which activation functions to use
- **Convolutional Hyperparameters**
    - Number of filters
    - Kernel size (filter size)
    - Stride size
    - Padding
- **MaxPooling Hyperparameters**
    - Pool size
    - Strides size
    - Padding
- **Fully-Connected and Dropout Hyperparameters**
    - Number of neurons
    - Dropout value
- **Training**
    - Number of epochs
    - Learning rate
    - Optimization algorithm
    - Batch size
    
#### Build your own CNN
Try building a few different models by testing out different parameters as above. Start small and just try out a few different values - look at the docstrings below or the [Keras documentation](keras.io) to see the argument values that represent the hyperparameters. It can be a little tricky just to get all of the different layers to fit together, so that's a good first step. Then, if you have the time/RAM, try actually training a few different models and see if it makes a difference.

In [None]:
help(Conv2D)
# help(MaxPooling2D)
# help(Activation)
# help(Dense)
# help(Dropout)
# help(model.compile)
# help(keras.optimizers)

In [None]:
my_model = Sequential()

### Add layers here ###
# my_model.add(...)
#######################

### Compile your model ###
# my_model.compile(loss='binary crossentropy',
#     optimizer=...
#     metrics=['accuracy'])
#######################

# print(my_model.summary())

In [None]:
### Train your model ###
# num_epochs = ...
# batch_size = ...
# my_history = my_model.fit_generator(
#     train_generator,
#     steps_per_epoch=2000 // batch_size, # Number of batches
#     epochs=num_epochs, 
#     validation_data=validation_generator,
#     validation_steps=800//batch_size)
#######################

In [None]:
### Evaluate your model ###
# loss, acc = my_model.evaluate_generator(validation_generator, steps=800//batch_size, verbose=1)

## Next Up
Deep neural nets like this can take a *long* time to train. With large datasets, models are sometimes trained for hours or days to get the best results.

Next, we'll skip ahead to after training and see how we can use a pretrained model for predictions.