<a href="https://colab.research.google.com/github/Sbabuthota/imageclassification/blob/main/preprocess_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import tensorflow as tf

# Load the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

# Resize images to 64x64 (if necessary)
train_images_resized = tf.image.resize(train_images, (64, 64))
test_images_resized = tf.image.resize(test_images, (64, 64))

# Normalize images to the range [0, 1]
train_images_normalized = train_images_resized / 255.0
test_images_normalized = test_images_resized / 255.0

# Convert class vectors to binary class matrices (one-hot encoding)
train_labels_one_hot = to_categorical(train_labels, num_classes=10)
test_labels_one_hot = to_categorical(test_labels, num_classes=10)

# Save preprocessed data locally
np.save('train_images_normalized.npy', train_images_normalized)
np.save('test_images_normalized.npy', test_images_normalized)
np.save('train_labels_one_hot.npy', train_labels_one_hot)
np.save('test_labels_one_hot.npy', test_labels_one_hot)


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [2]:
pip install numpy opencv-python pillow




In [3]:
import numpy as np
import os
import cv2  # OpenCV for resizing

# Function to resize images
def resize_images(images, new_size):
    resized_images = []
    for img in images:
        resized_img = cv2.resize(img, new_size, interpolation=cv2.INTER_AREA)
        resized_images.append(resized_img)
    return np.array(resized_images)

# Function to save images as files (optional)
def save_images(images, labels, directory, prefix):
    if not os.path.exists(directory):
        os.makedirs(directory)
    for i, img in enumerate(images):
        filename = f"{prefix}_{i}.png"
        filepath = os.path.join(directory, filename)
        cv2.imwrite(filepath, img)
        if labels is not None:
            label = labels[i]
            label_filepath = os.path.join(directory, f"{prefix}_label_{i}.txt")
            with open(label_filepath, 'w') as f:
                f.write(str(label))

# Load the .npy files
train_images = np.load('train_images_normalized.npy')
test_images = np.load('test_images_normalized.npy')
train_labels = np.load('train_labels_one_hot.npy')
test_labels = np.load('test_labels_one_hot.npy')

# Define the new size (width, height)
new_size = (32, 32)

# Resize the images
train_images_resized = resize_images(train_images, new_size)
test_images_resized = resize_images(test_images, new_size)

# Save the resized images back to .npy files
np.save('train_images_resized.npy', train_images_resized)
np.save('test_images_resized.npy', test_images_resized)

# Optionally save the resized images to disk as image files
save_images(train_images_resized, train_labels, 'resized_train_images', 'train')
save_images(test_images_resized, test_labels, 'resized_test_images', 'test')

print("Resizing completed and files saved.")


Resizing completed and files saved.


Explanation:
Import Libraries:

numpy for handling array operations.
cv2 (OpenCV) for resizing images.
Define a Function to Resize Images:

resize_images function takes an array of images and a new size (width, height) as inputs and returns the resized images.
Define a Function to Save Images (optional):

save_images function saves the resized images as actual image files in a specified directory, along with their labels.
Load .npy Files:

Load your .npy files containing the image data using np.load.
Define the New Size:

Set the target size for the images (e.g., 32x32 pixels).
Resize the Images:

Use the cv2.resize function to resize each image to the new size.
Save the Resized Images:

Save the resized image arrays back to .npy files using np.save.
Optional: Save Images as Files:

If you want to save the resized images as actual image files, use the save_images function.
Notes:
Ensure the dimensions of the input images are compatible with the cv2.resize function. If the images are in grayscale or have a different number of channels, you might need to adjust the script accordingly.
The cv2.INTER_AREA interpolation method is generally good for shrinking images. You can experiment with other interpolation methods provided by OpenCV (e.g., cv2.INTER_LINEAR, cv2.INTER_CUBIC) to see which one works best for your specific use case.
The optional step of saving images as files is useful if you need to visually inspect the resized images or use them in a different format.
By following these steps, i can resize image datasets stored in .npy files and prepare them for further processing or model training.






