# Using pre-trained NN

In [1]:
import numpy as np
import tensorflow as tf

import keras
from matplotlib import pyplot as plt
from imageio import imread
import pickle

ModuleNotFoundError: No module named 'keras'

# Model Zoo
* https://github.com/keras-team/keras/tree/master/keras/applications
* More models within the community
* Pick model, copy init, download weights
* Here we proceed with vgg16

#### Very Deep Convolutional Networks for Large-Scale Visual Recognition
VGG at Oxford: http://www.robots.ox.ac.uk/~vgg/research/very_deep/
<br>
### [layer configuration](https://gist.githubusercontent.com/ksimonyan/211839e770f7b538e2d8/raw/0067c9b32f60362c74f4c445a080beed06b07eb3/VGG_ILSVRC_16_layers_deploy.prototxt)
Build a model based on **layer configuration** below

In [None]:
import keras

In [None]:
from keras.activations import relu

In [None]:
relu()

In [4]:
import keras
from keras.layers import Input
from keras.layers import Conv2D
from keras.layers import MaxPool2D
from keras.layers import Activation
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten # You may need this :)



img_input = Input(shape=(224, 224, 3), name="input")

x = Conv2D(64, 3, padding="same", name="conv1_1")(img_input)
x = Activation('relu',name= "relu1_1")(x)

x = Conv2D(64, 3, padding="same", name="conv1_2")(x)
x = Activation('relu',name= "relu1_2")(x)

x = MaxPool2D(name = "pool1")(x)



x = Conv2D(128, 3, padding="same", name="conv2_1")(x)
x = Activation('relu',name= "relu2_1")(x)

x = Conv2D(128, 3, padding="same", name="conv2_2")(x)
x = Activation('relu',name= "relu2_2")(x)

x = MaxPool2D(name = "pool2")(x)



x = Conv2D(256, 3, padding="same", name="conv3_1")(x)
x = Activation('relu',name= "relu3_1")(x)

x = Conv2D(256, 3, padding="same", name="conv3_2")(x)
x = Activation('relu',name= "relu3_2")(x)

x = Conv2D(256, 3, padding="same", name="conv3_3")(x)
x = Activation('relu',name= "relu3_3")(x)

x = MaxPool2D(name = "pool3")(x)



x = Conv2D(512, 3, padding="same", name="conv4_1")(x)
x = Activation('relu',name= "relu4_1")(x)

x = Conv2D(512, 3, padding="same", name="conv4_2")(x)
x = Activation('relu',name= "relu4_2")(x)

x = Conv2D(512, 3, padding="same", name="conv4_3")(x)
x = Activation('relu',name= "relu4_3")(x)

x = MaxPool2D(name = "pool4")(x)


x = Conv2D(512, 3, padding="same", name="conv5_1")(x)
x = Activation('relu',name= "relu5_1")(x)

x = Conv2D(512, 3, padding="same", name="conv5_2")(x)
x = Activation('relu',name= "relu5_2")(x)

x = Conv2D(512, 3, padding="same", name="conv5_3")(x)
x = Activation('relu',name= "relu5_3")(x)

x = MaxPool2D(name = "pool5")(x)


x = Flatten(name='flatten')(x)

x = Dense(4096,name = 'fc6')(x)
x = Activation('relu',name= "relu6")(x)

x = Dropout(0.5, name = 'drop6')(x)


x = Dense(4096,name = 'fc7')(x)
x = Activation('relu',name= "relu7")(x)

x = Dropout(0.5, name = 'drop7')(x)


x = Dense(1000,name = 'fc8')(x)

x = Activation("softmax", name="prob")(x)

model = keras.Model(img_input, x)

You have to implement two functions in the cell below.

Preprocess function should take the image with shape (w, h, 3) and transform it into a tensor with shape (1, 224, 224, 3). Without this transformation, vgg16 won't be able to digest input image. 
Additionally, your preprocessing function have to rearrange channels RGB -> BGR and subtract mean values from every channel.

In [5]:
MEAN_VALUES = np.array([104, 117, 123])
IMAGE_W = 224

def preprocess(img):
    img = img[:,:,::-1]
    
    #for i in range(3):
    #    img[:,:,i] = img[:,:,i] - np.mean(img[:,:,i],axis = 1)

    # convert from [w,h,3 to 1,w,h,3]
    return img[None]

def deprocess(img):
    img = img.reshape(img.shape[1:])
    for i in range(3):
        img[:,:, i] += MEAN_VALUES[i]
    return img[:, :, :: -1].astype(np.uint8)

img = (np.random.rand(IMAGE_W, IMAGE_W, 3) * 256).astype(np.uint8)

print(np.linalg.norm(deprocess(preprocess(img)) - img))

0.0


If your implementation is correct, the number above will be small, because deprocess function is the inverse of preprocess function

### Deploy the network

In [6]:
# load vgg16 weights
import h5py
with h5py.File("vgg16_weights_tf_dim_ordering_tf_kernels.h5", "r") as f:
    vgg16_weights = {k1: {k2:v2.value for k2, v2 in v1.items()} 
                     for k1, v1 in f.items() if len(v1)>0}

OSError: Unable to open file (Unable to open file: name = 'vgg16_weights_tf_dim_ordering_tf_kernels.h5', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0)

In [None]:
[[y.shape for y in x.values()] for x in vgg16_weights.values()]

Now we should put the weights into their places:

In [None]:
weight_list = []
[[weight_list.append(y) for y in x.values()] for x in vgg16_weights.values()]
model.set_weights(weight_list)

In [None]:
# classes' names are stored here
with open("classes.txt", "r") as f:
    classes = f.read().splitlines()
# for example, 10th class is ostrich:
print(classes[9])

### Sanity check
Lets check that out pretrained network is working. We have sample image of a "albatross" and lets check that out network predicts it well.

In [None]:
img = imread('albatross.jpg')
plt.imshow(img)
plt.show()
print("note that image array type is", img.dtype)


p = model.predict(preprocess(img))

labels = p.ravel().argsort()[-1:-6:-1]
print('top-5 classes are:')
for l in labels:
    print('%3f\t%s' % (p.ravel()[l], classes[l].split(',')[0]))

# Grand-quest: Dogs Vs Cats
* original competition
* https://www.kaggle.com/c/dogs-vs-cats
* 25k JPEG images of various size, 2 classes (guess what)

### Your main objective
* In this seminar your goal is to fine-tune a pre-trained model to distinguish between the two rivaling animals
* The first step is to just reuse some network layer as features

In [None]:
!wget https://www.dropbox.com/s/d61lupw909hc785/dogs_vs_cats.train.zip?dl=1 -O data.zip
!unzip data.zip

# for starters
* Train sklearn model, evaluate validation accuracy (should be >80%

In [None]:
sess = tf.Session()
with sess.as_default():
    X[1].all().eval()

In [None]:
from keras.models import Model



intermediate_output = intermediate_layer_model.predict(preprocess(img))
intermediate_output

In [None]:
#extract features from images
from keras.models import Model
from tqdm import tqdm
from scipy.misc import imresize
import os
X = []
Y = []
layer_name = 'relu6'
intermediate_layer_model = Model(inputs=model.input,
                                 outputs=model.get_layer(layer_name).output)
#this may be a tedious process. If so, store the results in some pickle and re-use them.
for fname in tqdm(os.listdir('train/')):

    y = fname.startswith("cat")
    img = imread("train/"+fname)
    img = preprocess(imresize(img,(IMAGE_W,IMAGE_W)))
    features = intermediate_layer_model.predict((img))
    Y.append(y)
    X.append(features)


In [None]:
np.save("x.npy",X)
np.save("y.npy",Y)

In [2]:
X = np.load("x.npy")
Y = np.load("y.npy")

In [3]:

X = np.concatenate(X) #stack all [1xfeature] matrices into one. 
assert X.ndim==2
#WARNING! the concatenate works for [1xN] matrices. If you have other format, stack them yourself.

#crop if we ended prematurely
Y = Y[:len(X)]

In [4]:
from sklearn.cross_validation import train_test_split



In [28]:
X[0]

array([ 0.      , 12.103239,  0.      , ...,  0.      ,  0.      ,
        0.      ], dtype=float32)

In [5]:
X_,X_test,Y_,Y_test = train_test_split(X,Y,test_size = 0.2)
X_train,X_val,Y_train,Y_val = train_test_split(X_,Y_,test_size = 0.25)
print(X_train.shape,X_val.shape,X_test.shape)

(15000, 4096) (5000, 4096) (5000, 4096)


__load our dakka__
![img](https://s-media-cache-ak0.pinimg.com/564x/80/a1/81/80a1817a928744a934a7d32e7c03b242.jpg)

In [49]:
Y_train = Y_train.astype(float)

In [50]:
Y_train

array([0., 0., 1., ..., 1., 1., 1.])

In [None]:
y_binary = to_categorical(y_int

In [54]:
from keras.layers import Input
from keras.layers import Conv2D
from keras.layers import MaxPool2D
from keras.layers import Activation
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten # You may need this :)



img_input = Input(shape=(4096,), name="input")

x = Dense(4096,name = 'fc6')(img_input)
x = Activation('relu',name= "relu6")(x)

x = Dropout(0.5, name = 'drop6')(x)


x = Dense(4096,name = 'fc7')(x)
x = Activation('relu',name= "relu7")(x)

x = Dropout(0.5, name = 'drop7')(x)


x = Dense(1,name = 'fc8')(x)

x = Activation("softmax", name="prob")(x)

model = keras.Model(img_input, x)

model.compile(keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False),
              loss ="sparse_categorical_crossentropy")

In [55]:
Y_train

array([0., 0., 1., ..., 1., 1., 1.])

In [None]:
model.fit(X_train,Y_train)

Epoch 1/1


In [6]:
from sklearn.ensemble import RandomForestClassifier,ExtraTreesClassifier,GradientBoostingClassifier,AdaBoostClassifier
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

In [7]:
X_train.shape

(15000, 4096)

In [8]:
#defining placeholders for input and target
input_X = tf.placeholder(tf.float32, shape=[None, 4096], 
                         name="X")
target_y = tf.placeholder(tf.int32, shape=[None], 
                         
                          name="target_Y_integer")

In [9]:
from keras.layers import Input
from keras.layers import Conv2D
from keras.layers import MaxPool2D
from keras.layers import Activation
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten # You may need this :)


dense1 = tf.layers.dense(input_X, units=4096, 
                     activation=tf.nn.sigmoid)
dropout3 = tf.layers.dropout(dense1)

dense3 = tf.layers.dense(dropout3, units=4096, 
                     activation=tf.nn.sigmoid)
dropout4 = tf.layers.dropout(dense3)

dense4 = tf.layers.dense(dropout4, units=2024, 
                     activation=tf.nn.sigmoid)
dropout5 = tf.layers.dropout(dense4)

dense5 = tf.layers.dense(dropout5, units=1024, 
                     activation=tf.nn.sigmoid)
dropout6 = tf.layers.dropout(dense5)

dense2 = tf.layers.dense(dropout6, units=2, activation=None)

# We use softmax nonlinearity to make probabilities add up to 1
l_out = tf.nn.softmax(dense2)

y_predicted = tf.argmax(dense2, axis=-1)

In [10]:
weights = tf.trainable_variables()
weights

[<tf.Variable 'dense/kernel:0' shape=(4096, 4096) dtype=float32_ref>,
 <tf.Variable 'dense/bias:0' shape=(4096,) dtype=float32_ref>,
 <tf.Variable 'dense_1/kernel:0' shape=(4096, 4096) dtype=float32_ref>,
 <tf.Variable 'dense_1/bias:0' shape=(4096,) dtype=float32_ref>,
 <tf.Variable 'dense_2/kernel:0' shape=(4096, 2024) dtype=float32_ref>,
 <tf.Variable 'dense_2/bias:0' shape=(2024,) dtype=float32_ref>,
 <tf.Variable 'dense_3/kernel:0' shape=(2024, 1024) dtype=float32_ref>,
 <tf.Variable 'dense_3/bias:0' shape=(1024,) dtype=float32_ref>,
 <tf.Variable 'dense_4/kernel:0' shape=(1024, 2) dtype=float32_ref>,
 <tf.Variable 'dense_4/bias:0' shape=(2,) dtype=float32_ref>]

In [28]:
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
    labels=target_y, logits=dense2, name="softmax_loss"))

In [12]:
accuracy, update_accuracy = tf.metrics.accuracy(target_y, y_predicted)
tf.local_variables()

[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]

In [27]:
optimzer = tf.train.AdamOptimizer(learning_rate=0.001)
train_step = optimzer.minimize(loss)

In [14]:
# An auxilary function that returns mini-batches for neural network training

#Parameters
# inputs - a tensor of images with shape (many, 1, 28, 28), e.g. X_train
# outputs - a vector of answers for corresponding images e.g. Y_train
#batch_size - a single number - the intended size of each batches

def iterate_minibatches(inputs, targets, batchsize):
    assert len(inputs) == len(targets)
    indices = np.arange(len(inputs))
    np.random.shuffle(indices)
    for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
        excerpt = indices[start_idx:start_idx + batchsize]
        yield inputs[excerpt], targets[excerpt]

In [15]:
model_path = "./checkpoints/model.ckpt"
saver = tf.train.Saver(max_to_keep=5)

In [None]:
import time

num_epochs = 100 # amount of passes through the data

batch_size = 128 # number of samples processed at each function call

with tf.Session() as sess:
    # initialize global wariables
    sess.run(tf.global_variables_initializer())
#     load_path = saver.restore(sess, saver.last_checkpoints[-1])
#     print("Model restored from file: %s" % save_path)
    
    sess.run(tf.local_variables_initializer())
    for epoch in range(num_epochs):
        # In each epoch, we do a full pass over the training data:
        train_err = 0
        train_batches = 0
        start_time = time.time()

        sess.run(tf.local_variables_initializer())
        for batch in iterate_minibatches(X_train, Y_train,batch_size):
            inputs, targets = batch

            _, train_err_batch, _ = sess.run(
                [train_step, loss, update_accuracy], 
                feed_dict={input_X: inputs, target_y:targets}
            )
            train_err += train_err_batch
            train_batches += 1
        train_acc = sess.run(accuracy)

        # And a full pass over the validation data:
        sess.run(tf.local_variables_initializer())
        for batch in iterate_minibatches(X_val, Y_val, batch_size):
            inputs, targets = batch
            sess.run(update_accuracy, feed_dict={input_X: inputs, 
                                                 target_y:targets})
        val_acc = sess.run(accuracy)


        # Then we print the results for this epoch:
        print("Epoch {} of {} took {:.3f}s".format(
            epoch + 1, num_epochs, time.time() - start_time))

        print("  training loss (in-iteration):\t\t{:.6f}".format(train_err / train_batches))
        print("  train accuracy:\t\t{:.2f} %".format(
            train_acc * 100))
        print("  validation accuracy:\t\t{:.2f} %".format(
            val_acc * 100))
        
        # save model
        save_path = saver.save(sess, model_path, global_step=epoch)
        print("  Model saved in file: %s" % save_path)

In [16]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    load_path = saver.restore(sess, "./checkpoints/model.ckpt-4")
    print("Model restored from file: %s" % load_path)
    
    sess.run(tf.local_variables_initializer())
    for batch in iterate_minibatches(X_test, Y_test, 500):
        inputs, targets = batch
        sess.run(update_accuracy, feed_dict={input_X: inputs, 
                                                   target_y:targets})
    test_acc = sess.run(accuracy)
    
    print("Final results:")
    print("  test accuracy:\t\t{:.2f} %".format(
        test_acc* 100))

    if test_acc * 100 > 99.5:
        print ("Achievement unlocked: 80lvl Warlock!")
    else:
        print ("We need more magic!")

INFO:tensorflow:Restoring parameters from ./checkpoints/model.ckpt-4
Model restored from file: None
Final results:
  test accuracy:		98.88 %
We need more magic!


In [21]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    load_path = saver.restore(sess, "./checkpoints/model.ckpt-3")
    print("Model restored from file: %s" % load_path)
    
    sess.run(tf.local_variables_initializer())
    for batch in X_test_kaggle:
        inputs = np.array(batch)[None]
        
        print(inputs.shape)
        out = sess.run(y_predicted, feed_dict={input_X: inputs})
    
    
out

INFO:tensorflow:Restoring parameters from ./checkpoints/model.ckpt-3


InternalError: Dst tensor is not initialized.
	 [[Node: save/RestoreV2/_21 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_26_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
	 [[Node: save/RestoreV2/_22 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_28_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](save/RestoreV2:18)]]

In [None]:
clf.score(X_test,Y_test)

In [18]:
X_test_kaggle = np.load("x_test.npy")
X_test_kaggle.shape

(12500, 1, 4096)

In [19]:
X_test_kaggle = np.concatenate(X_test_kaggle) #stack all [1xfeature] matrices into one. 
assert X_test_kaggle.ndim==2

In [None]:
X_test.shape

In [None]:
y = clf.predict_proba(X_test)
    

In [None]:
y = y[:,1]

In [None]:
y[3]

In [None]:
#extract features from images
from keras.models import Model
from tqdm import tqdm
from scipy.misc import imresize
import os

a = []
for fname in tqdm(os.listdir('test/')):
    a.append(int(fname.split(".")[0]))
a[3]

In [None]:
kaggle = np.vstack((a,y)).T

In [None]:
import pandas as pd 
df = pd.DataFrame(kaggle)
df.to_csv("kaggle.csv")


# Main quest

* Get the score improved!

No methods are illegal: ensembling, data augmentation, NN hacks. 
Just don't let test data slip into training.

The main requirement is that you implement the NN fine-tuning recipe:
### Split the raw image data
  * please do train/validation/test instead of just train/test
  * reasonable but not optimal split is 20k/2.5k/2.5k or 15k/5k/5k
### Choose which vgg layers are you going to use
  * Anything but for prob is okay
  * Do not forget that vgg16 uses dropout
### Build a few layers on top of chosen "neck" layers.
  * a good idea is to just stack more layers inside the same network
  * alternative: stack on top of get_output
### Train the newly added layers for some iterations
  * you can selectively train some weights by only sending them to your optimizer
      * `mysupermegaoptimizer.minimize(loss, var_list=<only_those_weights_i_wanna_train>)`
  * it's cruicial to monitor the network performance at this and following steps
### Fine-tune the network body
  * probably a good idea to SAVE your new network weights now 'cuz it's easy to mess things up.
  * Moreover, saving weights periodically is a no-nonsense idea
  * even more cruicial to monitor validation performance
  * main network body may need a separate, much lower learning rate
      * You can have two optimizers: one for old network and one for new network
      * `old_net_optimizer.minimize(loss, old_net_weigts)`
      * `new_net_optimizer.minimize(loss, new_net_weigts)`
### PROFIT!!!
  * Evaluate the final score
  * Submit to kaggle
      * competition page https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition
      * get test data https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/data
  
## Some ways to get bonus points
* explore other networks from the model zoo
* play with architecture
* 85%/90%/93%/95%/97% kaggle score (screenshot pls).
* data augmentation, prediction-time data augmentation
* use any more advanced fine-tuning technique you know/read anywhere
* ml hacks that benefit the final score


In [None]:
print("I can do it!")

In [None]:
Random

In [None]:
sess.run(accuracy_op, feed_dict={X: X_test, Y: Y_test})