 # TP2.B Images 
 
 #### Université Sciences-U, 2019-2020

## Part 1. Keras with data from directories, small dataset

Before our data are provided as numpy arrays and already normalized (same size). Now we move to a dataset with high resolution images (around 300 * 300) organized in folders.
- download bird dataset at: http://perso.ens-lyon.fr/tien-nam.le/data/ML/birds.zip

This is an excerpt of CUB-200 dataset (http://www.vision.caltech.edu/visipedia/CUB-200.html), which contain 200 types of birds. Our sub-dataset contains 10 types of birds, each type contains around 50 images for training and 10 images for testing.

<img src = "http://www.vision.caltech.edu/visipedia/collage.jpg">

We face 3 problems here:
1. How to label the data?
2. How to feed images and their labels to the neural net?
3. How to normalize the size of the images (to feed to the input of the neural net)?

All these problems can be solved by ImageDataGenerator. Keras will run through whole directory 'birds/train' and get images and label each image from 0 to 9 by the subfolders containing it. Thus, the subfolders of train folder and test folder must be similar

**Problem 1. Use `flow_from_directory` method to train a NN with the dataset**

In [1]:
# Import libs

In [2]:
from keras.preprocessing.image import ImageDataGenerator

# input image dimensions
img_rows, img_cols = 300, 300
# The CIFAR10 images are RGB.
img_channels = 3
nb_classes = 10

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        './birds/train',
        target_size=(img_rows, img_cols),
        batch_size=32,
        class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
        './birds/test',
        target_size=(img_rows, img_cols),
        batch_size=32,
        class_mode='categorical')

x_train, y_train = train_generator.next()
x_test, y_test = validation_generator.next()

print('train data:\t', x_train.shape, y_train.shape)
print('test data:\t', x_test.shape, y_test.shape)

Using TensorFlow backend.


Found 479 images belonging to 10 classes.
Found 100 images belonging to 10 classes.
train data:	 (32, 300, 300, 3) (32, 10)
test data:	 (32, 300, 300, 3) (32, 10)


In [3]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

print('x_train_reshape', x_train.shape)
print('x_test_reshape', x_test.shape)

x_train_reshape (32, 300, 300, 3)
x_test_reshape (32, 300, 300, 3)


In [4]:
from keras.utils import to_categorical

y_train_categorical = to_categorical(y_train)
y_test_categorical = to_categorical(y_test)

print('y_train_categorical', y_train_categorical.shape)
print('y_test_categorical', y_test_categorical.shape)

y_train_categorical (32, 10, 2)
y_test_categorical (32, 10, 2)


In [7]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras.layers import Dense, Dropout, Flatten, Activation

##model building
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(img_rows, img_cols, img_channels)))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print(model.summary())

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 298, 298, 32)      896       
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 296, 296, 64)      18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 148, 148, 64)      0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 148, 148, 64)      0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 146, 146, 64)      36928     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 73, 73, 64)        0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 73, 73, 64)       

In [8]:
# Adding epoch is needed to improve accuracy
model.fit_generator(
        train_generator,
        steps_per_epoch=2000,
        epochs=5,
        validation_data=validation_generator,
        validation_steps=800)

Epoch 1/5


KeyboardInterrupt: 

**Problem 2. Use data augmentation to improve the results**

In [17]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.preprocessing.image import ImageDataGenerator

# input image dimensions
img_rows, img_cols = 300, 300
# The CIFAR10 images are RGB.
img_channels = 3
nb_classes = 10

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

valid_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    directory=r"./birds/train/",
    target_size=(img_rows, img_cols),
    color_mode="rgb",
    batch_size=32,
    class_mode="categorical",
    shuffle=True,
    seed=42
)

valid_generator = valid_datagen.flow_from_directory(
    directory=r"./birds/test/",
    target_size=(img_rows, img_cols),
    color_mode="rgb",
    batch_size=32,
    class_mode="categorical",
    shuffle=True,
    seed=42
)

Found 479 images belonging to 10 classes.
Found 100 images belonging to 10 classes.


In [19]:
#STEP_SIZE_TRAIN=train_generator.n//train_generator.batch_size
#STEP_SIZE_VALID=valid_datagen.n//valid_datagen.batch_size

STEP_SIZE_TRAIN=100
STEP_SIZE_VALID=100

model.fit_generator(generator=train_generator,
                    steps_per_epoch=STEP_SIZE_TRAIN,
                    validation_data=valid_generator,
                    validation_steps=STEP_SIZE_VALID,
                    epochs=10
)

Epoch 1/10
 12/100 [==>...........................] - ETA: 18:33 - loss: 3.2170 - accuracy: 0.1018

KeyboardInterrupt: 

In [None]:
model.evaluate_generator(generator=valid_datagen,
steps=STEP_SIZE_VALID)
predicted_class_indices=np.argmax(pred,axis=1)

In [None]:
labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]

In [None]:
filenames=test_generator.filenames
results=pd.DataFrame({"Filename":filenames,
                      "Predictions":predictions})
results.to_csv("results.csv",index=False)

**Problem 3. Use other techniques to avoid overfitting**

**Problem 4. Use pretrained models, objective 84%**

In [24]:
from keras.applications.densenet import DenseNet121
from keras.layers import Dense, GlobalAveragePooling2D
from keras.models import Model

In [25]:
# Don't know what I'm doing

base_model = DenseNet121(include_top=False, weights='imagenet')
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(100, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

model = Model(inputs=base_model.input, output=predictions)

for layer in base_model.layers:
    layer.trainable = False
    


  import sys
