# Applied Machine Learning 2
## Course project          
                                                 Author: Diego Rodriguez

## Feature extraction

In this first part of the project, start by extracting a set of high-level features for each image in the data set. To achieve this, you can use ex. the Inception v3 or MobileNet v2 ConvNets which respectively extract 2048 and 1280 high-level features.

This high-level features should then be used for all of the tasks in this project, except for when it is stated otherwise. In other words, the PCA exploration and all models (except for the Convolutional Neural Network) should use these high-level features. And in the case where we ask you to visualize the images, we of course mean to visualize the raw images with their pixel values.

Suggestion: consider storing the extracted high-level features, e.g. in npz files, for quickly reloading them into each of the following notebooks.

Note: All your models should be trained on the training set, and the fine tuning of your hyperparameters should be validated on the validation set. The final test set should only be used for the final comparison to test the accuracies of your models on a new dataset. However, in the case where you use a cross-validation approach, you can of course merge the train and validation set into one bigger dataset and use this for model fitting.

In [1]:
# Import warnings, there are a lot verbosity due deprecated tensorflow modules
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

import tensorflow_hub as hub
import numpy as np    
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Create graph. Code from course Applied Machine Learning 2 (EPFL Extension School)
img_graph = tf.Graph()

with img_graph.as_default():
    
    # Download module MobileNet v2
    module_url = 'https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/2'
    feature_extractor = hub.Module(module_url)

    # Create input placeholder
    input_imgs = tf.placeholder(dtype=tf.float32, shape=[None, 224, 224, 3])

    # A node with the features
    imgs_features = feature_extractor(input_imgs)

    # Collect initializers
    init_op = tf.group([
        tf.global_variables_initializer(), tf.tables_initializer()
    ])

img_graph.finalize()

In [2]:
# Create a session
sess = tf.Session(graph=img_graph)

# Initialize it
sess.run(init_op)

In [3]:
# Path to general directory with data
base_dir = '/Users/rodriguezmod/Downloads/swissroads/'

# Three paths for train, validation, and test data
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'valid')
test_dir = os.path.join(base_dir, 'test')

data_gen = ImageDataGenerator(rescale = 1./255)

# Extract features function
def extract_features(data_dir, img_height, img_width, batch_size):
    data_generator = data_gen.flow_from_directory(
                            data_dir,
                            target_size=(img_height, img_width),
                            batch_size=batch_size,
                            color_mode='rgb')
    batch_index = 0
    features = []
    labels = []
    pixels = []
   
    while batch_index <= data_generator.batch_index:
        data = data_generator.next()
        features_batch = sess.run(imgs_features, feed_dict={input_imgs: data[0]})
        features[batch_index * batch_size : (batch_index + 1) * batch_size] = features_batch
        labels[batch_index * batch_size : (batch_index + 1) * batch_size] = data[1]
        pixels[batch_index * batch_size : (batch_index + 1) * batch_size] = data[0]
        batch_index += 1

    # Data_array is the numeric data of whole images
    data_array_features = np.asarray(features)
    data_array_labels = np.asarray(labels)
    return data_array_features, data_array_labels, pixels

# Calling function to extrated features
train_features, train_labels, train_pixels = extract_features(train_dir, 224, 224, 70)
validation_features, validation_labels, validation_pixels = extract_features(validation_dir, 224, 224, 70)
test_features, test_labels, test_pixels = extract_features(test_dir, 224, 224, 70)

# Save features, labels, and pixels into a .npz file
np.savez(base_dir+'features.npz', 
         train_features=train_features, validation_features=validation_features, test_features=test_features,
         train_labels=train_labels, validation_labels=validation_labels, test_labels=test_labels,
         train_pixels=train_pixels, validation_pixels=validation_pixels, test_pixels=test_pixels)

Found 280 images belonging to 6 classes.
Found 139 images belonging to 6 classes.
Found 50 images belonging to 6 classes.
