## Data Preparation

In this section, we'll preprocess the datasets to make them suitable for training our model. This includes normalization (scaling pixel values to the range [0,1]), one-hot encoding the labels, and any other necessary transformations.


## Import dependencies

[Back to Main](../Project.ipynb)

In [1]:
# Importing necessary libraries

import numpy as np
import tensorflow as tf
import h5py


# Loading CIFAR-10 data
(train_images_cifar, train_labels_cifar), (test_images_cifar, test_labels_cifar) = tf.keras.datasets.cifar10.load_data()

# Loading Fashion MNIST data
(train_images_fmnist, train_labels_fmnist), (test_images_fmnist, test_labels_fmnist) = tf.keras.datasets.fashion_mnist.load_data()

### CIFAR-10 Data Preparation

For CIFAR-10, the data consists of color images. Here's what we'll do:

1. **Normalization**: Neural networks generally perform better on data that's on a smaller scale. We'll scale the pixel values from [0,255] to [0,1].
2. **One-hot Encoding Labels**: Instead of a singular value representing each class, we'll use one-hot encoding. This converts our integer labels into a binary matrix, which is more suitable for classification tasks in deep learning.


In [2]:
# Normalizing the pixel values for CIFAR-10
train_images_cifar = train_images_cifar / 255.0
test_images_cifar = test_images_cifar / 255.0

# One-hot encoding the labels for CIFAR-10
train_labels_cifar = tf.keras.utils.to_categorical(train_labels_cifar)
test_labels_cifar = tf.keras.utils.to_categorical(test_labels_cifar)


Save files needed for the next steps.

In [3]:
np.savez('../saved_models/cifar_data.npz', train_images=train_images_cifar, test_images=test_images_cifar, 
         train_labels=train_labels_cifar, test_labels=test_labels_cifar)


### Fashion MNIST Data Preparation

The Fashion MNIST dataset contains grayscale images. The steps for preparation will be similar to CIFAR-10:

1. **Normalization**: We'll scale the pixel values to the range [0,1].
2. **One-hot Encoding Labels**: As with CIFAR-10, we'll convert the integer labels into a binary matrix.


In [4]:
# Normalizing the pixel values for Fashion MNIST
train_images_fmnist = train_images_fmnist / 255.0
test_images_fmnist = test_images_fmnist / 255.0

# One-hot encoding the labels for Fashion MNIST
train_labels_fmnist = tf.keras.utils.to_categorical(train_labels_fmnist)
test_labels_fmnist = tf.keras.utils.to_categorical(test_labels_fmnist)


Save files needed for the next steps.

In [5]:
np.savez('../saved_models/fmnist_data.npz', train_images=train_images_fmnist, test_images=test_images_fmnist, 
         train_labels=train_labels_fmnist, test_labels=test_labels_fmnist)


With our data now prepared, we're better positioned to build and train our machine learning models. Properly prepared data ensures that our models train efficiently and achieve higher accuracy. 

In the next section, we'll delve into model design, exploring architectures suitable for our image classification tasks.


**Note:** The cell below is commented out for performance reasons during regular runs. However, if you wish to save the trained models for testing or future use, you can uncomment and execute the cell. It's worth noting that the saved model files have been prepared in advance and are optimized for the testing scenarios presented in this notebook.

In [6]:
# Save train_images_cifar to an h5 file
with h5py.File('../saved_models/train_images_cifar_enhanced.h5', 'w') as hf:
    hf.create_dataset("cifar_data", data=train_images_cifar)

# Save train_images_fmnist to an h5 file
with h5py.File('../saved_models/train_images_fmnist_enhanced.h5', 'w') as hf:
    hf.create_dataset("fmnist_data", data=train_images_fmnist)


[Back to Main](../Project.ipynb)
