<a href="https://colab.research.google.com/github/LouisVanLangendonck/UPC-AML-ArchitectureClassif/blob/main/feature_extraction_cnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Feature Extractor for Architecture Image Classification**


In [None]:
#@title Imports...

import itertools
import os
import matplotlib.pylab as plt
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import keras
import random
import pickle
print("TF version:", tf.__version__)
print("Hub version:", hub.__version__)
print("keras version:", keras.__version__)
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

TF version: 2.9.2
Hub version: 0.12.0
keras version: 2.9.0
GPU is available


Connect to your google drive. Make sure all data (scraped using the webscraper) is loaded in here.

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount = True)

Mounted at /content/drive


In [None]:
path_to_aml_file = '/content/drive/MyDrive/FIB-2022-2023/aml'

Specifiy where the train and test data is stored. The file-structure should be as follows (which is automatically obtained if data_scraper.ipynb and train_test_split.ipynb are correctly used): 
- Both train- and test in seperate files. 
- In each of these, each style should have its own folder containing all images in .jpg format.

In [None]:
train_data = os.path.join(path_to_aml_file,'data/unzipped/train')
test_data = os.path.join(path_to_aml_file,'data/unzipped/test')
print(os.listdir(train_data))

['gothic', 'baroque', 'modernism', 'contemporary', 'noucentisme', 'renaissance', 'romanesque', 'neoclassicism']


In [None]:
#@title Different pre-trained architectures to be used for feature extraction.

from tensorflow.keras.applications import VGG19
from tensorflow.keras.applications import Xception
from tensorflow.keras.applications import InceptionResNetV2
from tensorflow.keras.applications import EfficientNetB7
from keras import layers
from keras import models

vgg_model = VGG19(
    weights='imagenet',
    include_top=False,
    input_shape=(299, 299, 3), 
    pooling='avg'
)


Xception_model = Xception(
    weights='imagenet',
    include_top=False,
    pooling='avg',
    input_shape=(299, 299, 3)
)

InceptionResNet_model = InceptionResNetV2(
    weights='imagenet',
    include_top=False,
    pooling='avg',
    input_shape=(299, 299, 3)
)

EfficientNetB7_model = EfficientNetB7(
    weights='imagenet',
    include_top=False,
    pooling='avg',
    input_shape=(299, 299, 3)
)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/xception/xception_weights_tf_dim_ordering_tf_kernels_notop.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_resnet_v2/inception_resnet_v2_weights_tf_dim_ordering_tf_kernels_notop.h5
Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb7_notop.h5


**Specify here which of the previous pre-trained models you want to use for feature extraction** + whether or not to shuffle the data. If not, this would mainly be to later concatenate features (as the order is kept)

In [None]:
feature_extractor = Xception_model

shuffle_datagen = False #Preferable false as you can concatenate vectors

In [None]:
#@title Defining flow from directory...

from keras.preprocessing.image import ImageDataGenerator
from keras.applications.inception_resnet_v2 import preprocess_input

datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
batch_size = 150
image_height, image_width = feature_extractor.input_shape[1:3]

train_generator = datagen.flow_from_directory(
        train_data, 
        target_size = (image_height,image_width),
        batch_size=batch_size, 
        class_mode = 'categorical', 
        shuffle=shuffle_datagen)

test_generator = datagen.flow_from_directory(
        test_data, 
        target_size = (image_height,image_width),
        batch_size=batch_size, 
        class_mode = 'categorical', shuffle=shuffle_datagen)

Found 12291 images belonging to 8 classes.
Found 3135 images belonging to 8 classes.


In [None]:
nr_train_images = train_generator.samples
nr_test_images = test_generator.samples
nr_of_target_classes = test_generator.num_classes

In [None]:
class_encoding = train_generator.class_indices #Save encoding of classes to use in other files (for concat. or knowing what prediction means)
with open(os.path.join(path_to_aml_file,'models/extracted_features/class_encoding.pkl'), 'wb') as f:
    pickle.dump(class_encoding, f)

In [None]:
#@title Feature extraction loop !

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
#pip install tqdm
from tqdm import tqdm

def extract_features(generator, sample_count):
    print('Beginning feature extraction for {} samples in {} batches:'.format(sample_count, int(np.ceil(sample_count/batch_size))))
    with tqdm(total=int(np.ceil(sample_count/batch_size)), position=0, leave=True) as pbar:
        input_list = [sample_count]
        input_list.extend(feature_extractor.output_shape[1:])
        features = np.zeros(shape = tuple(input_list))
        labels = np.zeros(shape = (sample_count, nr_of_target_classes))
        i = 0
        for inputs_batch, labels_batch in generator:
            pbar.update(n=1)
            features_batch = feature_extractor.predict(inputs_batch, verbose=0)
            features[i*batch_size:(i+1)*batch_size] = features_batch
            labels[i*batch_size : (i+1)*batch_size] = labels_batch
            i += 1
            if (i+1)*batch_size >= sample_count:
                print('final batch')
                features_batch = feature_extractor.predict(inputs_batch, verbose=0)
                features[i*batch_size:sample_count] = features_batch[0:sample_count-(i*batch_size)]
                labels[i*batch_size:sample_count] = labels_batch[0:sample_count-(i*batch_size)]
                break
    print('Features extracted!')
    print('Shape of feature vector:{}'.format(features.shape))
    print('Shape of labels vector:{}'.format(labels.shape))
    return features, labels

In [None]:
print('Train Feature Extraction:')
train_features, train_labels = extract_features(train_generator, nr_train_images)
print('Test Feature Extraction:')
test_features, test_labels = extract_features(test_generator, nr_test_images)

In [None]:
#@title Save features in vectors in specified location

all_features = np.asarray([(train_features, train_labels), (test_features, test_labels)], dtype=object)
np.save(os.path.join(path_to_aml_file,'models/Xception_model_avg_features_CONCAT.npy'), all_features)

In [None]:
#@title If you want to run all extractors at the same time!
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
#pip install tqdm
from tqdm import tqdm
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.inception_resnet_v2 import preprocess_input

models = [vgg_model, Xception_model, InceptionResNet_model, EfficientNetB7_model]

for extr_model in models:
  datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
  batch_size = 150
  image_height, image_width = feature_extractor.input_shape[1:3]

  train_generator = datagen.flow_from_directory(
          train_data, 
          target_size = (image_height,image_width),
          batch_size=batch_size, 
          class_mode = 'categorical', 
          shuffle=shuffle_datagen)

  test_generator = datagen.flow_from_directory(
          test_data, 
          target_size = (image_height,image_width),
          batch_size=batch_size, 
          class_mode = 'categorical', shuffle=shuffle_datagen)
  
  nr_train_images = train_generator.samples
  nr_test_images = test_generator.samples
  nr_of_target_classes = test_generator.num_classes
  feature_extractor = extr_model
  print('Train Feature Extraction:')
  train_features, train_labels = extract_features(train_generator, nr_train_images)
  print('Test Feature Extraction:')
  test_features, test_labels = extract_features(test_generator, nr_test_images)
  all_features = np.asarray([(train_features, train_labels), (test_features, test_labels)], dtype=object)
  np.save(os.path.join(path_to_aml_file,'models/extracted_features/{}.npy'.format(extr_model.name)), all_features)


Found 12291 images belonging to 8 classes.
Found 3135 images belonging to 8 classes.
Train Feature Extraction:
Beginning feature extraction for 12291 samples in 82 batches:


 99%|█████████▉| 81/82 [1:08:34<00:46, 46.05s/it]

final batch


 99%|█████████▉| 81/82 [1:08:38<00:50, 50.85s/it]


Features extracted!
Shape of feature vector:(12291, 512)
Shape of labels vector:(12291, 8)
Test Feature Extraction:
Beginning feature extraction for 3135 samples in 21 batches:


 95%|█████████▌| 20/21 [16:26<00:46, 46.56s/it]

final batch


 95%|█████████▌| 20/21 [16:30<00:49, 49.50s/it]


Features extracted!
Shape of feature vector:(3135, 512)
Shape of labels vector:(3135, 8)
Found 12291 images belonging to 8 classes.
Found 3135 images belonging to 8 classes.
Train Feature Extraction:
Beginning feature extraction for 12291 samples in 82 batches:


 99%|█████████▉| 81/82 [10:31<00:07,  7.94s/it]

final batch


 99%|█████████▉| 81/82 [10:34<00:07,  7.83s/it]


Features extracted!
Shape of feature vector:(12291, 2048)
Shape of labels vector:(12291, 8)
Test Feature Extraction:
Beginning feature extraction for 3135 samples in 21 batches:


 95%|█████████▌| 20/21 [02:48<00:08,  8.72s/it]

final batch


 95%|█████████▌| 20/21 [02:51<00:08,  8.58s/it]


Features extracted!
Shape of feature vector:(3135, 2048)
Shape of labels vector:(3135, 8)
Found 12291 images belonging to 8 classes.
Found 3135 images belonging to 8 classes.
Train Feature Extraction:
Beginning feature extraction for 12291 samples in 82 batches:


 99%|█████████▉| 81/82 [10:09<00:07,  7.99s/it]

final batch


 99%|█████████▉| 81/82 [10:13<00:07,  7.57s/it]


Features extracted!
Shape of feature vector:(12291, 1536)
Shape of labels vector:(12291, 8)
Test Feature Extraction:
Beginning feature extraction for 3135 samples in 21 batches:


 95%|█████████▌| 20/21 [02:26<00:07,  7.28s/it]

final batch


 95%|█████████▌| 20/21 [02:29<00:07,  7.49s/it]


Features extracted!
Shape of feature vector:(3135, 1536)
Shape of labels vector:(3135, 8)
Found 12291 images belonging to 8 classes.
Found 3135 images belonging to 8 classes.
Train Feature Extraction:
Beginning feature extraction for 12291 samples in 82 batches:


 99%|█████████▉| 81/82 [12:27<00:09,  9.57s/it]

final batch


 99%|█████████▉| 81/82 [12:34<00:09,  9.31s/it]


Features extracted!
Shape of feature vector:(12291, 2560)
Shape of labels vector:(12291, 8)
Test Feature Extraction:
Beginning feature extraction for 3135 samples in 21 batches:


 95%|█████████▌| 20/21 [02:49<00:08,  8.64s/it]

final batch


 95%|█████████▌| 20/21 [02:55<00:08,  8.79s/it]


Features extracted!
Shape of feature vector:(3135, 2560)
Shape of labels vector:(3135, 8)
