# Computer Vision Project - Classification of Flowers


In this project your objective is to create a model in order to classify flowers. Thiszip file contains all relevant data. 

1. The data contains two folders: *train* and *test*. The *train* folder consists of 5486-images to use for training while the *test* folder contains 1351-images you can use to test your model in a **train-test-split** validation style. We have omitted another set of 1352 validation images which we will use to benchmark your final models in the last lecture. 


2. We have provided you with two label files: *train_labels.csv* and *test_labels.csv*. Each file contains the filename of the corresponding image and the class label. In total we have **102 different classes** of flowers.  You can import the label files using the `import_labels()` function provided to you in this notebook.


3. Due to the large number of images, there is a good chance that you can not easily fit the entire training and testing data into RAM. We therefore give you an implementation of a `DataGenerator` class that can be used with keras. This class will read in the images from your hard-drive for each batch during during or testing. The class comes with some nice features that could improve your training significantly such as **image resizing**, **data augmentation** and **preprocessing**. Have a look at the code to find out how.

    Initialize data generators using labels and image source directory.

    `
    datagen_train = DataGenerator('train', y_train, batch_size, input_shape, ...)
    datagen_test = DataGenerator('test', y_test, batch_size, input_shape, ...)`

    Train your model using data generators.

    `model.fit(datagen_train, validation_data=datagen_test, ...)`
    
    
4. Select a suitable model for classification. It is up to you to decide all model parameters, such as **number of layers**, **number and size of filter** in each layer, using **pooling** or, **image-size**, **data-augmentation**, **learning rate**, ... 


5. **Document** your progress and your intermediate results (your failures and improvements). Describe why you selected certain model and training parameters, what worked, what did not work. Store the training history (loss and accuracy) and create corresponding plots. This documentation will be part of your final presentation and will be **graded**.


6. Feel free to explore the internet for suitable CNN models and re-use these ideas. If you use certain features we have not touched during the lecture such as Dropout, Residual Learning or Batch Normalization. Prepare a slide in your final presentation to explain in your own (basic) terms what these things to so we can all learn from your experience. **Notice:** Very large models might perform better but will be harder and slower to train. **Do not use a pre-trained model you find online!**


7. Prepare a notebook with your model such that we can use it in the final competition. This means, store your trained model using `model.save(...)`. Your saved models can be loaded via `tf.keras.models.load_model(...)`. We will then provide you with a new folder containing images (*validation*) and a file containing labels (*validation_labels.csv*) which have the same structure. Prepare a data generator for this validation data (test it using the test data) and supply it to the 
 `evaluate_model(model, datagen)` function provided to you.
 
 Your prepared notebook could look like this:
 
    `... import stuff 
    ... code to load the stored model ...
    y_validation = import_labels('validation_labels.csv')
    datagen_validation = DataGenerator('validation', y_validation, batch_size, input_shape)
    evaluate_model(model, datagen_validation)`


8. Prepare a 15-Minute presentation of your findings and final model presentation. A rough guideline what could be interesting to your audience:
    * Explain your models architecture (number of layers, number of total parameters, how long took it to train, ...)
    * Compare the training history of your experimentats visually
    * Explain your best model (why is it better)
    * Why did you take certain decision (parameters, image size, batch size, ...)
    * What worked, what did not work (any ideas why?)
    * **What did you learn?**
    



In [1]:
import os
print(os.getcwd())
#os.chdir("Project")
#print(os.getcwd())

C:\Users\binde\Downloads\project(1)


In [2]:
import datetime
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

In [3]:
# Read in label file and return a dictionary {'filename' : label}.
#
def import_labels(label_file):
    labels = dict()

    import csv
    with open(label_file) as fd:
        csvreader = csv.DictReader(fd)

        for row in csvreader:
            labels[row['filename']] = int(row['label'])
    return labels

In [4]:
import tensorflow.keras as keras
from tensorflow.keras.preprocessing import image

import numpy as np
import tensorflow as tf 

#tf.enable_eager_execution(tf.ConfigProto(log_device_placement=True)) 
#tf.debugging.experimental.enable_dump_debug_info(log_dir, tensor_debug_mode="FULL_HEALTH", circular_buffer_size=-1)

class DataGenerator(keras.utils.Sequence):

    def __init__(self, img_root_dir, labels_dict, batch_size, target_dim, preprocess_func=None, use_augmentation=False):
        self._labels_dict = labels_dict
        self._img_root_dir = img_root_dir
        self._batch_size = batch_size
        self._target_dim = target_dim
        self._preprocess_func = preprocess_func
        self._n_classes = len(set(self._labels_dict.values()))
        self._fnames_all = list(self._labels_dict.keys())
        self._use_augmentation = use_augmentation

        if self._use_augmentation:
            self._augmentor = image.ImageDataGenerator(
                rotation_range=40,
                width_shift_range=0.2,
                height_shift_range=0.2,
                shear_range=0.2,
                zoom_range=0.2,
                horizontal_flip=True,
                fill_mode='nearest'
            )
        self.on_epoch_end()

    def __len__(self):
        return int(np.floor(len(self._fnames_all)) / self._batch_size)

    def on_epoch_end(self):
        self._indices = np.arange(len(self._fnames_all))
        np.random.shuffle(self._indices)

    def __getitem__(self, index):
        indices = self._indices[index * self._batch_size:(index+1)*self._batch_size]

        fnames = [self._fnames_all[k] for k in indices]
        X,Y = self.__load_files__(fnames)

        return X,Y

    def __load_files__(self, batch_filenames):
        X = np.empty((self._batch_size, *self._target_dim, 3))
        Y = np.empty((self._batch_size), dtype=int)

        for idx, fname in enumerate(batch_filenames):
            img_path = os.path.join(self._img_root_dir, fname)
            img = image.load_img(img_path, target_size=self._target_dim)
            x = image.img_to_array(img)
            if self._preprocess_func is not None:
                x = self._preprocess_func(x)

            X[idx,:] = x 
            Y[idx] = self._labels_dict[fname]-1

        if self._use_augmentation:
            it = self._augmentor.flow(X, batch_size=self._batch_size, shuffle=False)
            X = it.next()

        if self._preprocess_func is not None:
            X = self._preprocess_func(X)

        return X, tf.keras.utils.to_categorical(Y, num_classes=self._n_classes)

In [5]:
from tensorflow.keras.utils import to_categorical
y_train = import_labels("train_labels.csv")
y_test = import_labels("test_labels.csv")
batch_size=32
image_size = (496, 496)

In [6]:
def preprocess_func(x):
    return x / 255.0

In [7]:
datagen_train = DataGenerator('train', y_train, batch_size, image_size, preprocess_func=preprocess_func, use_augmentation=True)
datagen_test = DataGenerator('test', y_test, batch_size, image_size, preprocess_func=preprocess_func, use_augmentation=True)

In [8]:
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import layers

# Design model
model = keras.Sequential()
model.add(layers.Input(image_size + (3,)))
model.add(layers.Conv2D(filters=64, kernel_size=(3,3), activation="relu"))
model.add(layers.MaxPool2D((3,3)))
model.add(layers.BatchNormalization())
#model.add(layers.Dropout(0.5))
model.add(layers.Conv2D(filters=32, kernel_size=(3,3), activation="relu"))
model.add(layers.MaxPool2D((5,5)))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(filters=16, kernel_size=(3,3), activation="relu"))
model.add(layers.MaxPool2D((4,4)))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(filters=8, kernel_size=(3,3), activation="relu"))
model.add(layers.MaxPool2D((3,3)))
model.add(layers.BatchNormalization())
#model.add(layers.Dropout(0.5))
#model.add(layers.Conv2D(filters=16, kernel_size=(3,3), activation="relu"))
#model.add(layers.MaxPool2D((2,2)))
#model.add(layers.BatchNormalization())
model.add(layers.Dropout(0.1))
model.add(layers.Flatten())
model.add(layers.Dense(16, activation="relu"))
#model.add(layers.BatchNormalization())
#model.add(layers.Dense(16, activation="relu"))
#model.add(layers.BatchNormalization())
model.add(layers.Dense(102, activation="softmax"))
model.compile(optimizer=Adam(learning_rate=0.02), loss="categorical_crossentropy", metrics=["accuracy"])
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 494, 494, 64)      1792      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 164, 164, 64)     0         
 )                                                               
                                                                 
 batch_normalization (BatchN  (None, 164, 164, 64)     256       
 ormalization)                                                   
                                                                 
 conv2d_1 (Conv2D)           (None, 162, 162, 32)      18464     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 32, 32, 32)       0         
 2D)                                                             
                                                        

In [9]:
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath="training.ckpt",
                                                 save_weights_only=True,
                                                 verbose=1)


tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

In [11]:
import tensorflow.compat.v1 as tf2

tf2.enable_eager_execution(tf.ConfigProto(log_device_placement=True)) 

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

In [None]:

# Train model on dataset
model.fit(datagen_train, validation_data=datagen_test, epochs=5,callbacks=[cp_callback
                                                                          # ,tensorboard_callback
                                                                          ])