<a href="https://colab.research.google.com/github/azharkhairy/AutoClean/blob/main/VGG16_as_the_feature_extractor_in_AutoML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install tensorflow

In [2]:
pip install keras_tuner

Collecting keras_tuner
  Downloading keras_tuner-1.3.5-py3-none-any.whl (176 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/176.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.4/176.1 kB[0m [31m1.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m176.1/176.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
Collecting kt-legacy (from keras_tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras_tuner
Successfully installed keras_tuner-1.3.5 kt-legacy-1.0.5


Import the necessary libraries: TensorFlow for deep learning, VGG16 from Keras Applications for feature extraction, CIFAR-10 dataset from Keras, Support Vector Classifier (SVC) from scikit-learn for training, RandomSearch from Kerastuner for NAS, and HyperParameters for defining hyperparameters.

In [3]:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.datasets import cifar10
from sklearn.svm import SVC
from keras_tuner import RandomSearch
from keras_tuner.engine.hyperparameters import HyperParameters
import numpy as np


Load the CIFAR-10 dataset and normalize the pixel values to be between 0 and 1.

In [4]:
# Step 1: Load and preprocess CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


Subsample the dataset by randomly selecting num_samples images and their corresponding labels.

In [5]:
# Step 2: Subsample datasets from the training set
num_samples = 1000  # Increase the number of samples for better results
indices = np.random.choice(x_train.shape[0], num_samples, replace=False)# This line generates random indices for subsampling. The 'np.random.choice' function randomly selects 'num_samples' unique indices from the range of indices of the entire training dataset '(x_train.shape[0]' represents the total number of training samples). The 'replace=False' argument ensures that the selected indices are unique.
# These two lines create the subsampled training data and labels based on the randomly selected indices. The training data 'x_train_subsampled' contains the images from the original training dataset '(x_train)' corresponding to the selected indices. Similarly, the labels 'y_train_subsampled' contain the labels corresponding to the selected indices from the original labels '(y_train)'.
x_train_subsampled = x_train[indices]
y_train_subsampled = y_train[indices]
# By increasing 'num_samples' to 1000, you are selecting a larger subset of the training dataset for further processin and training. This can potentially enhance the performance of your model, especially if your original dataset is large and diverse. However, keep in mind that using more samples also requires more computational resources, so make sure your hardware can handle the increased workload.

Create the VGG16 model with the specified parameters: include_top=False removes the fully connected layers, weights='imagenet' initializes the model with pre-trained weights, and input_shape specifies the shape of input images. Extract features from the subsampled training and test images using the VGG16 model.

In [None]:
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array, array_to_img
from tensorflow.keras.preprocessing.image import load_img

# Step 3: Use VGG16 as feature extractor
input_shape = (224, 224)  # VGG16 input image size
x_train_features = []

for img in x_train_subsampled:
    img = array_to_img(img)  # Convert array back to image
    img = img.resize(input_shape)  # Resize image to VGG16 input size
    img = img_to_array(img)
    img = preprocess_input(img)  # Preprocess the image for VGG16
    x_train_features.append(img)

x_train_features = np.array(x_train_features)  # Convert list to numpy array

x_test_features = []

for img in x_test:
    img = array_to_img(img)  # Convert array back to image
    img = img.resize(input_shape)  # Resize image to VGG16 input size
    img = img_to_array(img)
    img = preprocess_input(img)  # Preprocess the image for VGG16
    x_test_features.append(img)

x_test_features = np.array(x_test_features)  # Convert list to numpy array

base_model = VGG16(include_top=False, weights='imagenet', input_shape=(224, 224, 3))
x_train_features = base_model.predict(x_train_features)
x_test_features = base_model.predict(x_test_features)


Create an SVC classifier with a linear kernel and fit it to the extracted features of the subsampled training data.

In [None]:
# Step 4: Train SVM classifier using the extracted features
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(x_train_features.reshape(x_train_features.shape[0], -1), y_train_subsampled)

Define a function build_model that constructs a sequential neural network. The function uses the Kerastuner's HyperParameters object to define the hyperparameters: number of hidden layers (num_layers), number of units in each hidden layer (units_i), and activation functions.

In [None]:
# Step 5: Define NAS hypermodel for architecture search
def build_model(hp):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=x_train_features.shape[1:]))
    for i in range(hp.Int('num_layers', 1, 3)):
        units = hp.Int('units_' + str(i), 32, 256, step=32)
        model.add(tf.keras.layers.Dense(units, activation='relu'))
    model.add(tf.keras.layers.Dense(10, activation='softmax'))
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

# Step 6: Perform NAS with Keras Tuner
tuner = RandomSearch(build_model, objective='val_accuracy', max_trials=10, directory='nas_results')
tuner.search(x_train_features, y_train_subsampled, epochs=10, validation_split=0.2)


Retrieve the best architecture found by the NAS search, and train and evaluate it using the extracted features and the original test data.

In [None]:
# Get the best model architecture found by NAS
best_model = tuner.get_best_models(num_models=1)[0]

# Train and evaluate the best model
best_model.fit(x_train_features, y_train_subsampled, epochs=10, validation_split=0.2)
test_loss, test_accuracy = best_model.evaluate(x_test_features, y_test)
print(f"Test Loss: {test_loss}, Test Accuracy: {test_accuracy}")
In this code, I've modified the feature extraction part to use the VGG16 model for feature extraction. Then, I've applied NAS to find the best architecture for the classifier using the extracted features. This way, you're using VGG16 as the feature extractor in your AutoML pipeline and then searching for the best classifier architecture using NAS.