Hello Kagglers. Working on Kaggle kernels is fun. The purpose of this kernel is totally different. Here I am not going to do a typical EDA or typical data modelling but I would love to share some cool things. We are going to dive into following topics:
* How to add pre-trained Keras models to your kernel and answer the question **Why do I need to do that at all?**
* What generator should I use for my model- inbuilt or a custom one?
* How to effectively use Keras ImageDataGenerator in kernels?

## Adding Keras pre-trained models to your kernel

Transfer learning (Here I am assuming that you know about it) **almost always** works. Before doing some serious modelling, people like me always starts with transfer learning to get a baseline. For this, we need pre-trained models. Keras provides a lot of SOTA pre-trained models. When you want to use a pre-trained architecture for the first time, Keras download the weights for the corresponding model *but* Kernels can't use network connection to download pretrained keras model weights. So, the big question is `If Kernels can't use network connection to download pre-trained weights, how can I use them at all?` 

This is a great question and for people who are beginners or just getting started on Kaggle kernels, this can be very confusing. In order to use, pre-trained Keras model weights, people have uploaded the weights to a kernel and published it. Now here is the catch. **You can add the output of any other kernel as input data source for your kernel **. Follow these simple steps:
* On the top-left of your notebook, there is a `Input Files` cell. Expand it by clicking the `+` button.
* You will see a list of input data files on the left along with the description of the data on the right.
* Click the add `Add Data Source` button. A window will appear.
* In the search bar, search like this `VGG16 pretrained` or `Keras-pretrained`.
* Choose the kernel you want to add. That's it!!

Now if you expand your `Input Files` cell again, you will the pre-trained model as input files along with your dataset.


You can see that my kernel has two kind of input files:
* flowers-recognition dataset
* vgg16 pre-trained model kernel that I added to my kernel

Keras requires the pre-trained weights to be present in the `.keras/models` cache directory. This is how you do it

That's it!! Now, you can use pre-trained models for transfer learning or fine-tuning. `

**What generator should I use for my model-  a custom one or the default Keras ImageDataGenerator?**

This is a very interesting question. I would say that it actually depends on how your dataset is arranged or how are you going to set up your data. These are the following scenarios I can think of along with the corresponding solutions. If you think of any more, do let me know in the comments section.

* **Data is arranged class-wise in separate directories with corresponding names**: This is the best way to arrange your data, if possible. Although it takes some time to arrange the data in such a way but it is the way to go if you want to use the Keras ImageDataGenerator efficiently as it requires data to be separated class wise in different folders. Once you have this, you need to arrange your data like this:
```
data/
    train/
        category1/(contains all images related to category1)  
        category2/(contains all images related to category2)
        ...
        ...
            
    validation/
         category1/(contains all images related to category1)  
        category2/(contains all images related to category2)
        ...
        ...
```
For this kernel, later in the notebook, I will show how to make this structure within the kernel for using ImageDataGenerator

* **All data is within one folder and you have meta info about the images** This is a very usual case. When we quickly crawl data, we generally store the met info about the images in a csv and allthe images are stored in a single folder. There are two ways to deal with this situatio, provided you don't want all the segregation of images as in the first step.
  * Define your own simple python generator which yields batches of images and labels while reading the csv
  * Use another high-level api such as `Dataset` api and let it do the work for you. 

Let's look at how to get the structure defined in the  first step above. If you are not aware, jupyter is pretty powerful and you can use bash directly within the notebook.

For each category, copy samples to the train and validation directory which we defined in the above step. The number of samples you want in your training and validation set is upto you. 

That's all folks. I hope you enjoyed this. One last thing: Kaggle kernels doesn't provide you GPU, so the training time will depend on your architecture and size of your dataset. Also, if you find this kernel helpful, please upvote!!

In [1]:
import numpy as np
import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Activation,Dense,Flatten
from keras.layers.normalization import BatchNormalization
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.convolutional import *
from sklearn.metrics import confusion_matrix
import itertools

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
train_path = '../input/furniture/data/data/train/'
valid_path = '../input/furniture/data/data/valid/'
test_path = '../input/furnituretest/test/test/unknown/'

In [3]:
train_batches = ImageDataGenerator().flow_from_directory(train_path,target_size=(224,224),classes=['104','15','5','67','86'],batch_size=50)
valid_batches = ImageDataGenerator().flow_from_directory(valid_path,target_size=(224,224),classes=['104','15','5','67','86'],batch_size=50)

Found 972 images belonging to 5 classes.
Found 500 images belonging to 5 classes.


In [4]:
model = Sequential([Conv2D(32,(3,3),activation='relu',input_shape=(224,224,3)),Flatten(),Dense(5,activation='softmax'),])

In [5]:
model.compile(Adam(lr=0.0001),loss='categorical_crossentropy',metrics=['accuracy'])

In [6]:
model.fit_generator(train_batches,validation_data=valid_batches,epochs=5,verbose=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f68b254eac8>

In [7]:
preditions = model.predict_generator(valid_batches)

In [8]:
imgs,vlabels = next(valid_batches)
len(vlabels)

50

In [9]:
len(preditions)

500

In [10]:
model.evaluate_generator(valid_batches)

[12.894476509094238, 0.20000000074505805]

In [11]:
# cm = confusion_matrix(vlabels,preditions[:50])

ValueError: multilabel-indicator is not supported

In [12]:
# vggmodel = keras.applications.vgg16.VGG16()
# # vgg = '../input/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5'
# vggmodel.summary()

In [13]:
import os
import numpy as np
from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras import optimizers

In [14]:
model = Sequential()
model.add(Convolution2D(32, 3, 3, input_shape=(224, 224,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(5))
model.add(Activation('sigmoid'))

  
  
  # Remove the CWD from sys.path while we load stuff.


In [15]:
model.compile(Adam(lr=0.0001),loss='categorical_crossentropy',metrics=['accuracy'])

In [16]:
model.fit_generator(train_batches,validation_data=valid_batches,epochs=5,verbose=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f68b1ddfef0>

In [17]:
model.fit_generator(train_batches,validation_data=valid_batches,epochs=5,verbose=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f68b1ddfe10>

In [18]:
train_datagen_augmented = ImageDataGenerator(
        rescale=1./255,        # normalize pixel values to [0,1]
        shear_range=0.2,       # randomly applies shearing transformation
        zoom_range=0.2,        # randomly applies shearing transformation
        horizontal_flip=True)  # randomly flip the images

# same code as before
train_generator_augmented = train_datagen_augmented.flow_from_directory(train_path,target_size=(224,224),classes=['104','15','5','67','86'],batch_size=50)

Found 972 images belonging to 5 classes.


In [19]:
model.compile(Adam(lr=0.00001,decay=0.0001),loss='categorical_crossentropy',metrics=['accuracy'])

In [20]:
model.fit_generator(train_generator_augmented,validation_data=valid_batches,epochs=5,verbose=1,shuffle=False)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f68b04eeba8>

In [21]:
model = Sequential()
model.add(Convolution2D(32, 3, 3, input_shape=(224, 224,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(5))
model.add(Activation('softmax'))

  
  
  # Remove the CWD from sys.path while we load stuff.


In [22]:
model.compile(Adam(lr=0.00001,decay=0.0001),loss='categorical_crossentropy',metrics=['accuracy'])

model.fit_generator(train_generator_augmented,validation_data=valid_batches,epochs=5,verbose=1,shuffle=False)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f68b16502b0>

In [23]:
model.compile(Adam(lr=0.00001,decay=0.0001),loss='categorical_crossentropy',metrics=['accuracy'])

model.fit_generator(train_generator_augmented,validation_data=valid_batches,epochs=30,verbose=1,shuffle=False)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f68b0f8b9e8>

In [27]:
test_path = '../input/furnituretest/test/test/'

In [28]:
# train_batches = ImageDataGenerator().flow_from_directory(train_path,target_size=(224,224),classes=['104','15','5','67','86'],batch_size=50)
# valid_batches = ImageDataGenerator().flow_from_directory(valid_path,target_size=(224,224),classes=['104','15','5','67','86'],batch_size=50)
test_batches = ImageDataGenerator().flow_from_directory(test_path,target_size=(224,224),classes=None,batch_size=5)

Found 32 images belonging to 1 classes.


In [33]:
preditions = model.predict_generator(test_batches)


In [38]:
print(preditions)

[[0.0000000e+00 0.0000000e+00 5.0214800e-28 0.0000000e+00 1.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00]
 [0.0000000e+00 1.0000000e+00 1.0940081e-29 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 7.5649786e-14 0.0000000e+00]
 [4.6142077e-08 1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.2430840e-20 0.0000000e+00 1.0000000e+00]
 [0.0000000e+00 5.6755841e-01 4.3244159e-01 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00]
 [9.9999845e-01 1.5055867e-06 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 2.3495356e-37 0.0000000e+00 0.0000000e+00 1.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [3.3409903e-32 1.0000000e+00 0.0000000e+00 0.00000

In [37]:
[[0.0000000e+00 0.0000000e+00 5.0214800e-28 0.0000000e+00 1.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00]
 [0.0000000e+00 1.0000000e+00 1.0940081e-29 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 7.5649786e-14 0.0000000e+00]
 [4.6142077e-08 1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.2430840e-20 0.0000000e+00 1.0000000e+00]
 [0.0000000e+00 5.6755841e-01 4.3244159e-01 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00]
 [9.9999845e-01 1.5055867e-06 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 2.3495356e-37 0.0000000e+00 0.0000000e+00 1.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [3.3409903e-32 1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [5.6703093e-06 9.9999428e-01 0.0000000e+00 1.0264415e-08 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00]
 [1.4066793e-01 8.5933208e-01 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 4.7314298e-04 0.0000000e+00 9.9952686e-01 0.0000000e+00]
 [7.7310508e-10 1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 3.9070974e-19 1.0000000e+00 0.0000000e+00 0.0000000e+00]
 [2.3400735e-05 9.9997663e-01 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [9.9985969e-01 1.4029646e-04 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [6.7803699e-23 1.3149651e-25 0.0000000e+00 1.0000000e+00 0.0000000e+00]
 [2.6111671e-29 1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.0000000e+00]
 [7.1186054e-29 1.0000000e+00 0.0000000e+00 1.5067708e-31 0.0000000e+00]
 [2.8976409e-21 1.7384750e-09 0.0000000e+00 0.0000000e+00 1.0000000e+00]
 [1.1485862e-13 1.0000000e+00 0.0000000e+00 5.8687394e-19 2.8971532e-37]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 1.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 7.1168786e-11 1.0000000e+00 0.0000000e+00]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]]

4