# Feature Engineering Notebook

## Objectives

* Pre-process the images in a one-hot encoding.
* Use powerful pre-trained neural networks to extract complex features from the dataset of dog images for breed classification, specifically:
    - InceptionV3
    - Xception
    - InceptionResNetV2
    - NASNetLarge
* Concatenate all of the 4 extracted features into a single feature map.

## Inputs

* labels.csv
* images/train/
* breed_dict.pkl
* breeds.pkl

## Outputs

* final_features.pkl
* y.pkl


---

# Importing all the packages / libraries we need

In [1]:
import os
import matplotlib.pyplot as plt
from tqdm.autonotebook import tqdm
import numpy as np
import pandas as pd
from tensorflow.keras.preprocessing.image import load_img
from keras.utils import to_categorical
from keras.layers import GlobalAveragePooling2D
from keras.models import Model
from keras.layers import Lambda
from keras.layers import Input
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.applications.xception import Xception, preprocess_input
from keras.applications.inception_resnet_v2 import InceptionResNetV2, preprocess_input
from keras.applications.nasnet import NASNetLarge, preprocess_input
import pickle
import time
import gc

  from tqdm.autonotebook import tqdm
2024-05-08 15:25:45.695909: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-05-08 15:25:45.696091: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-08 15:25:45.697947: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-08 15:25:45.719661: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


# Change working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [2]:
current_dir = os.getcwd()
current_dir

'/home/jaaz/Desktop/project-5/TailTeller/jupyter_notebooks'

We want to make the parent of the current directory the new current directory.
* os.path.dirname() gets the parent directory
* os.chdir() defines the new current directory

In [3]:
os.chdir(os.path.dirname(current_dir))
print("New current directory:", os.getcwd())

New current directory: /home/jaaz/Desktop/project-5/TailTeller


---

# Pre-process and Encode

We will create a function to pre-process (transform the data before feeding it to the algorithm) the images and labels to encode them into a numpy array in a one-hot encoded format.

In [None]:
# Load the breed dictionary
with open('breed_dict.pkl', 'rb') as f:
    breed_dict = pickle.load(f)

input_shape = (299,299,3)

def images_to_array(directory, label_dataframe, target_size = input_shape):
    image_labels = label_dataframe['breed']
    # Using uint8 will save RAM memory when tackling large amounts of data
    images = np.zeros([len(label_dataframe), target_size[0], target_size[1], target_size[2]], dtype=np.uint8)
    y = np.zeros([len(label_dataframe), 1], dtype=np.uint8)

    """
    Taking each index and image name, constructing
    A full path of each image, loading them and resizing,
    Storing images into the array, deleting images to save RAM
    """
    for ix, image_name in enumerate(tqdm(label_dataframe['id'].values)):
        img_dir = os.path.join(directory, image_name + '.jpg')
        img = load_img(img_dir, target_size = target_size)
        images[ix] = img
        del img

        # Getting the breed label for the current image
        # And convert the breed name into a numerical
        # Index and assign it to a label array "y"
        dog_breed = image_labels[ix]
        y[ix] = breed_dict[dog_breed]
    
    # Convert the vector class into a binary matrix (one-hot encoding)
    y = to_categorical(y)

    return images,y