## Imports

In [1]:
# adds parent directory to python path so we can access code located there
import os, sys
nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path: sys.path.append(nb_dir)
    
# core imports
from ohmeow_ml.keras_tf_util import *

# configure matplotlib
%matplotlib inline
    
# configure autoreload to re-load changed modules
%load_ext autoreload
%autoreload 2

Using TensorFlow backend.


## Define paths and global variables

In [2]:
current_dir = os.getcwd()
DATA_HOME_DIR = current_dir + '/data/'
DATA_CLASSES = [ dir for dir in os.listdir(DATA_HOME_DIR+'train') ]

# path = DATA_HOME_DIR
path = DATA_HOME_DIR + 'sample/'

train_path = path + 'train/'
val_path = path + 'valid/'
test_path = path + 'test/'

models_path = path + 'models/'                      # save weights here
results_path = path + 'results/'                    # save predictions here
processed_data_path = path + 'preprocesed_data/'    # save preprocessed data used for training here

if not os.path.exists(models_path): os.makedirs(models_path)
if not os.path.exists(results_path): os.makedirs(results_path)
if not os.path.exists(processed_data_path): os.makedirs(processed_data_path)

In [3]:
batch_size = 4 #64

## Preprocess the data

We can save time by pre-processing the images (e.g., converting them to jpegs, resizing to 224x224) and saving them as a numpy array on the file system.  We can do the same for the train, validation, and test image class designations, filenames, and one-hot encoded labels

In [4]:
# get classes, one-hot encoded labels, and filenames
train_classes, train_labels, train_filenames = get_batch_info(train_path)
val_classes, val_labels, val_filenames = get_batch_info(val_path)
test_filenames = get_batch_info(test_path)[2]

Found 1500 images belonging to 10 classes.
Found 750 images belonging to 10 classes.
Found 500 images belonging to 1 classes.


In [5]:
 # get image data
if not os.path.exists(processed_data_path+'train_data.bc'):
    train_data = get_data(train_path)
    save_array(processed_data_path+'train_data.bc', train_data)
else:
    train_data = load_array(processed_data_path+'train_data.bc')
    print('training data loaded ...')

if not os.path.exists(processed_data_path+'val_data.bc'):
    val_data = get_data(val_path)
    save_array(processed_data_path+'val_data.bc', val_data)
else:
    val_data = load_array(processed_data_path+'val_data.bc')
    print('validation data loaded ...')

if not os.path.exists(processed_data_path+'test_data.bc'):
    test_data = get_data(test_path)
    save_array(processed_data_path+'test_data.bc', test_data)
else:
    test_data = load_array(processed_data_path+'test_data.bc')
    print('test data loaded ...')

training data loaded ...
validation data loaded ...
test data loaded ...


Create training/validation batches and also define "steps per epoch" for each ... defines the # of batches per epoch (see `model.fit_generator()`).

***ONLY RUN THIS CODE IF YOU NEED TO USE BATCHES INSTEAD OF PERSISTED IMAGE ARRAYS***

In [None]:
# OPTION 1: BUILD BATCHES FROM FILE SYSTEM
# train_batches = get_batches(train_path, batch_size=batch_size)
# val_batches = get_batches(val_path, batch_size=batch_size*2, shuffle=False)

# OPTION 2: BUILD BATCHES FROM IMAGE ARRAYS
# gen = image.ImageDataGenerator()
# train_batches = gen.flow(train_data, train_labels, batch_size=batch_size, shuffle=True)
# val_batches = gen.flow(val_data, val_labels, batch_size=batch_size*2, shuffle=False)

# DEFINE # OF STEPS TO TAKE IN FITTING BATCHES FOR BOTH TRAINING AND VALIDATION EXAMPLES
# epoch_steps = math.ceil(train_batches.n/train_batches.batch_size)
# val_steps = math.ceil(val_batches.n/val_batches.batch_size)

## Basic Models
Train a linear classifer and a basic NN with a single hidden layer to provide a baseline and also validate that the size of our sample datasets are usable

### Option 1: A simple linear classifier.

In [None]:
def lm_model():
    model = Sequential([
        BatchNormalization(axis=1, input_shape=(224,224,3)),
        Flatten(),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [None]:
lm = lm_model()

In [None]:
# IF USING BATCHES ...
# lm.fit_generator(train_batches, steps_per_epoch=epoch_steps, epochs=2, 
#                  validation_data=val_batches, validation_steps=val_steps, verbose=2)

# IF USING IMAGE ARRAYS
lm.fit(train_data, train_labels, epochs=3, validation_data=(val_data, val_labels), shuffle=True, verbose=2)

In [None]:
# lm.summary()

While we have plenty of paramters (1,506,186 ~ 224\*224\*3\*10 = 1505280) our accuracy is really poor (~ .13)

NOTE: A **simple model with no regularization and plenty of parameters that doesn't perform well indicates that our learning rate is too high.**

From the notebook: "Perhaps it is jumping to a solution where it predicts one or two classes with high confidence so that it can give a zero prediction to as many classes as possible - that's the best approach for a model that is no better than random, and there is likely to be where we would end up with a high learning rate"

In [None]:
# IF USING BATCHES ...
# np.round(lm.predict_generator(train_batches, epoch_steps)[:10], 2)

# IF USING IMAGE ARRAYS
np.round(lm.predict(train_data, batch_size=batch_size)[:10],2)

The above shows that indeed, using the standard learning rate of 0.001 is too high and causing the alorithm to select 1 most of the time.  If you see this, **lower the learning rate**

In [None]:
lm.optimizer.lr = 1e-05
lm.fit(train_data, train_labels, epochs=5, validation_data=(val_data, val_labels), verbose=2)

BEST PRACTICE: **Start with a small learning rate, then increase really high, and then decrease it gradually by a factor of 10**

In [None]:
lm.optimizer.lr = 0.001
lm.fit(train_data, train_labels, epochs=5, validation_data=(val_data, val_labels), verbose=2)

In [None]:
lm.optimizer.lr = 0.0001
lm.fit(train_data, train_labels, epochs=5, validation_data=(val_data, val_labels), verbose=2)

### Option 2: A simple linear classifier with L2 regularization.

In [None]:
def build_lm_reg():
    model = Sequential([
        BatchNormalization(axis=1, input_shape=(224,224,3)),
        Flatten(),
        Dense(10, activation='softmax', kernel_regularizer=l2(0.001))
    ])
    
    model.compile(Adam(lr=10e-5), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [None]:
lm = build_lm_reg()

In [None]:
lm.fit(train_data, train_labels, epochs=5, validation_data=(val_data, val_labels), verbose=2)

## Validate Size of Sample

Once we are getting pretty consisten accuracy on our validation dataset, we should verify that our sample size is sufficient for further experiements.  If it isn't, adjust and run the previous code again

In [None]:
rnd_batches = get_batches(val_path, batch_size=batch_size*2, shuffle=True)
steps = math.ceil(rnd_batches.n / batch_size)
val_results = [ lm.evaluate_generator(rnd_batches, steps) for i in range(10) ]

In [None]:
np.round(val_results, 2)

## NN Models for Sample

###  Single Hidden Layer

In [None]:
def nn():
    model = Sequential([
        BatchNormalization(axis=1, input_shape=(224,224,3)),
        Flatten(),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
    
    model.compile(Adam(lr=1e-05), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [None]:
model = nn()

In [None]:
model.fit(train_data, train_labels, batch_size=batch_size, epochs=5, 
          validation_data=(val_data, val_labels), verbose=2)

In [None]:
model.optimizertimizer.lr = 0.01
model.fit(train_data, train_labels, batch_size=batch_size, epochs=5, 
          validation_data=(val_data, val_labels), verbose=2)

### Simple CNN

2 conv layers with max pooling + a simple dense network is a good simple CNN to start with

In [6]:
def simple_cnn():
    model = Sequential([
        BatchNormalization(axis=1, input_shape=(224,224,3)),
        Conv2D(32, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3)),
        Conv2D(64, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3)),
        Flatten(),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])

    model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [7]:
model = simple_cnn()

In [8]:
model.fit(train_data, train_labels, batch_size=batch_size, epochs=2, shuffle=True, 
          validation_data=(val_data, val_labels), verbose=2)

Train on 1500 samples, validate on 750 samples
Epoch 1/2
62s - loss: 1.9641 - acc: 0.3380 - val_loss: 2.2936 - val_acc: 0.2533
Epoch 2/2
59s - loss: 1.0999 - acc: 0.6873 - val_loss: 2.3516 - val_acc: 0.2693


<keras.callbacks.History at 0x1cf77e84e48>

In [None]:
model.optimizer.lr = 0.001
model.fit(train_data, train_labels, batch_size=batch_size, epochs=5, shuffle=True, 
          validation_data=(val_data, val_labels), verbose=2)

## Data Augmentation

In [9]:
def test_augmentation(rotation_range=0.0, width_shift_range=0.0, height_shift_range=0.0, 
                      shear_range=0.0, zoom_range=0.0, channel_shift_range=0.0,
                      horizontal_flip=False, vertical_flip=False):
    limit_mem()
    
    gen = image.ImageDataGenerator(
            rotation_range=rotation_range,           # degrees (0 to 180)
            width_shift_range=width_shift_range,     # fraction of total width
            height_shift_range=height_shift_range,   # fraction of total height
            shear_range=shear_range,                 # shear intensity (shear angle in radians; 2 radians = 360 degrees)
            zoom_range=zoom_range,                   # amount of zoom
            channel_shift_range=channel_shift_range, # shift range for each channels
            horizontal_flip=horizontal_flip, 
            vertical_flip=vertical_flip)
    
    da_batches = gen.flow(train_data, train_labels, batch_size=batch_size, shuffle=True)
    
    model = simple_cnn()
    
    epoch_steps = math.ceil(da_batches.n/da_batches.batch_size)
    model.fit_generator(da_batches, epoch_steps, epochs=2, validation_data=(val_data, val_labels), verbose=2)
    
    model.optimizer.lr = 0.001
    history = model.fit_generator(da_batches, epoch_steps, epochs=5, validation_data=(val_data, val_labels), verbose=2)
    
    return history

In [10]:
# define the types of data augmentations we want to test, and the values we want to test for each
aug_experiments = {
    'rotation_range' : [0, 1, 3, 5, 10],
    'width_shift_range': [0, 0.05, 1, 2, 4],
    'height_shift_range': [0, 0.05, 1, 2, 4],
    'shear_range': [0, 0.1, 0.15, 0.2, 0.3],
    'zoom_range': [0, 0.1, 0.15, 0.2, 0.3],
    'channel_shift_range': [0, 10, 20, 30, 50]
}

# used to store the results of data augmentation tests
df_augs = pd.DataFrame(columns=['aug', 'aug_val', 'train_loss', 'train_acc', 'val_loss', 'val_acc'])

# try each type of data augmentation one at a time
for k,v in aug_experiments.items():
    # for each type, try 3-4 different levels of augmentation
    for aug_val in v: 
        print('> {0} = {1}'.format(k, aug_val))
        h = test_augmentation(**{k:aug_val})

        # save the results of each tested value so that we can determine the best for
        # each data augmentation type
        df_augs = df_augs.append({
            'aug': k, 
            'aug_val': aug_val,
            'train_loss': np.mean(h.history['loss'][-3:]), 
            'train_acc': np.mean(h.history['acc'][-3:]), 
            'val_loss': np.mean(h.history['val_loss'][-3:]), 
            'val_acc': np.mean(h.history['val_acc'][-3:]) 
        }, ignore_index=True)

> height_shift_range = 0
Epoch 1/2
60s - loss: 1.9885 - acc: 0.3360 - val_loss: 2.2485 - val_acc: 0.2600
Epoch 2/2
58s - loss: 1.0839 - acc: 0.6833 - val_loss: 1.9539 - val_acc: 0.3587
Epoch 1/5
58s - loss: 0.7643 - acc: 0.8080 - val_loss: 2.0649 - val_acc: 0.3907
Epoch 2/5
58s - loss: 0.6019 - acc: 0.8593 - val_loss: 1.9913 - val_acc: 0.4093
Epoch 3/5
58s - loss: 0.4853 - acc: 0.8940 - val_loss: 1.9970 - val_acc: 0.4253
Epoch 4/5
59s - loss: 0.3824 - acc: 0.9307 - val_loss: 1.8117 - val_acc: 0.4280
Epoch 5/5
59s - loss: 0.3436 - acc: 0.9307 - val_loss: 1.7876 - val_acc: 0.4453
> height_shift_range = 0.05
Epoch 1/2
61s - loss: 2.1731 - acc: 0.2813 - val_loss: 2.3515 - val_acc: 0.1773
Epoch 2/2
58s - loss: 1.4590 - acc: 0.5220 - val_loss: 2.3047 - val_acc: 0.2400
Epoch 1/5
58s - loss: 1.2011 - acc: 0.6313 - val_loss: 2.3688 - val_acc: 0.2400
Epoch 2/5
58s - loss: 0.9873 - acc: 0.7093 - val_loss: 2.0782 - val_acc: 0.3280
Epoch 3/5
59s - loss: 0.8733 - acc: 0.7427 - val_loss: 2.1689 - val

In [11]:
df_augs.to_csv(path+'data_augmentation_results.csv', index=False)

In [12]:
df_augs.sort_values('val_acc', ascending=False).groupby('aug').first()

Unnamed: 0_level_0,aug_val,train_loss,train_acc,val_loss,val_acc
aug,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
channel_shift_range,0.0,0.392873,0.926667,2.012688,0.413778
height_shift_range,0.0,0.403766,0.918444,1.865438,0.432889
rotation_range,0.0,0.379833,0.926,1.768067,0.437778
shear_range,0.1,0.540318,0.870222,1.913431,0.453333
width_shift_range,0.05,0.853956,0.752667,2.134572,0.371111
zoom_range,0.0,0.372285,0.927111,2.034861,0.42


In [17]:
gen_aug = image.ImageDataGenerator(channel_shift_range=0.0, height_shift_range=0.0, rotation_range=0.0, 
                                   shear_range=0.10, width_shift_range=0.05, zoom_range=0.0)

aug_batches = gen_aug.flow(train_data, train_labels, batch_size=batch_size)
epoch_steps = math.ceil(aug_batches.n/aug_batches.batch_size)

In [18]:
limit_mem()
model = simple_cnn()
model.fit_generator(aug_batches, epoch_steps, epochs=2, validation_data=(val_data, val_labels), verbose=2)

Epoch 1/2
87s - loss: 2.2124 - acc: 0.2540 - val_loss: 2.4333 - val_acc: 0.2080
Epoch 2/2
60s - loss: 1.6187 - acc: 0.4480 - val_loss: 2.3062 - val_acc: 0.2853


<keras.callbacks.History at 0x1d044ebcc50>

In [19]:
model.optimizer.lr = 0.001
model.fit_generator(aug_batches, epoch_steps, epochs=5, validation_data=(val_data, val_labels), verbose=2)

Epoch 1/5
60s - loss: 1.3837 - acc: 0.5533 - val_loss: 2.2757 - val_acc: 0.3427
Epoch 2/5
60s - loss: 1.2061 - acc: 0.6260 - val_loss: 2.2440 - val_acc: 0.3920
Epoch 3/5
60s - loss: 1.0874 - acc: 0.6653 - val_loss: 1.9509 - val_acc: 0.4160
Epoch 4/5
60s - loss: 1.0007 - acc: 0.6980 - val_loss: 2.0170 - val_acc: 0.4293
Epoch 5/5
60s - loss: 0.9142 - acc: 0.7347 - val_loss: 2.0209 - val_acc: 0.4560


<keras.callbacks.History at 0x1d045088e48>

In [20]:
model.optimizer.lr = 0.0001
model.fit_generator(aug_batches, epoch_steps, epochs=5, validation_data=(val_data, val_labels), verbose=2)

Epoch 1/5
60s - loss: 0.8551 - acc: 0.7520 - val_loss: 1.9440 - val_acc: 0.4453
Epoch 2/5
60s - loss: 0.7815 - acc: 0.7673 - val_loss: 2.0283 - val_acc: 0.4000
Epoch 3/5
60s - loss: 0.7384 - acc: 0.7933 - val_loss: 1.9325 - val_acc: 0.4787
Epoch 4/5
59s - loss: 0.6885 - acc: 0.8120 - val_loss: 1.9163 - val_acc: 0.4813
Epoch 5/5
59s - loss: 0.6299 - acc: 0.8380 - val_loss: 1.9638 - val_acc: 0.4680


<keras.callbacks.History at 0x1d045349fd0>

## NN Models for Full Data Set

### Complex CONV Architecutre

We are adding in regularization via Dropout so this will work better on full data set

In [None]:
def complex_cnn(p):
    model = Sequential([
        BatchNormalization(axis=1, input_shape=(224,224,3)),
        Conv2D(32, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3)),
        Conv2D(64, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3)),
        Conv2D(128, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3)),
        Flatten(),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dropout(p/2),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(10, activation='softmax')
    ])

    model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [None]:
limit_mem()
model = complex_cnn(0.1)

In [None]:
model.fit(train_data, train_labels, batch_size=batch_size, epochs=2, shuffle=True, 
          validation_data=(val_data, val_labels), verbose=2)

In [None]:
model.optimizer.lr = 0.01
model.fit(train_data, train_labels, batch_size=batch_size, epochs=5, shuffle=True, 
          validation_data=(val_data, val_labels), verbose=2)

## Experiments

### Pre-compute output from various layers to use as input in various experiments

#### 1. Pre-compute output from VGG's 2nd to last layer

In [None]:
limit_mem()
model = VGG19(weights='imagenet', include_top=True)

In [None]:
# pop last layer and set model.outputs = to that of the now last layer
model.layers.pop()

# model.layers[-1].outbound_nodes = [] ... this is not needed
model.outputs = [model.layers[-1].output]

In [None]:
# model.summary()

In [None]:
if not os.path.exists(processed_data_path+'train_features_ft_2nd_to_ll.bc'):
    train_features_ft = model.predict(train_data, 4)
    val_features_ft = model.predict(val_data, 4)
    
    save_array(processed_data_path+'train_features_ft_2nd_to_ll.bc', train_features_ft)
    save_array(processed_data_path+'val_features_ft_2nd_to_ll.bc', val_features_ft)
else:
    train_features_ft = load_array(processed_data_path+'train_features_ft_2nd_to_ll.bc')
    val_features_ft = load_array(processed_data_path+'val_features_ft_2nd_to_ll.bc')
    
print(train_features_ft.shape)
print(val_features_ft.shape)

#### 2. Pre-compute output from convolutional layers

In [None]:
limit_mem()
model = VGG19(include_top=False, weights='imagenet')

In [None]:
# model.summary()

In [None]:
if not os.path.exists(processed_data_path+'train_features_ft_conv.bc'):
    train_features_ft = model.predict(train_data, 4)
    val_features_ft = model.predict(val_data, 4)
    
    save_array(processed_data_path+'train_features_ft_conv.bc', train_features_ft)
    save_array(processed_data_path+'val_features_ft_conv.bc', val_features_ft)
else:
    train_features_ft = load_array(processed_data_path+'train_features_ft_conv.bc')
    val_features_ft = load_array(processed_data_path+'val_features_ft_conv.bc')
    
print(train_features_ft.shape)
print(val_features_ft.shape)

### 1. Train a linear classifier using the pre-computed output from 2nd to last layer

In [None]:
limit_mem()
model = VGG19(include_top=True, weights='imagenet')
model.layers.pop()
model.outputs = [model.layers[-1].output]

train_features_ft = load_array(processed_data_path+'train_features_ft_2nd_to_ll.bc')
val_features_ft = load_array(processed_data_path+'val_features_ft_2nd_to_ll.bc')

In [None]:
def build_lm_from_vgg_2ll():
    m = Sequential([
        Dense(10, activation='softmax', input_shape = model.layers[-1].output_shape[1:])
    ])
    
    m.compile(optimizer=Adam(lr=1e-05), loss='categorical_crossentropy', metrics=['accuracy'])
    return m

In [None]:
lm = build_lm_from_vgg_2ll()

In [None]:
lm.fit(train_features_ft, train_labels, batch_size=batch_size, epochs=12, shuffle=True,
       validation_data=(val_features_ft, val_labels), verbose=2)

In [None]:
lm.optimizer.lr = 0.01

### Option 2: Train model after replacing last layer with a Dense layer having 10 outputs

In [None]:
limit_mem()
model = VGG19(weights='imagenet', include_top=True)
# model.summary()

In [None]:
model = finetune(model, 10)
# model.summary()