<a href="https://colab.research.google.com/github/Ramees918/RJ/blob/main/Proj%201.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install tensorflow matplotlib scikit-learn




# Task
Build an image classifier as a complete learning guide for an internship project.

## Data collection and preparation

### Subtask:
Gather a dataset of images suitable for the classification task. This may involve downloading a pre-existing dataset or collecting your own. Organize the data into categories and split it into training, validation, and test sets.


**Reasoning**:
The first step is to choose and download a suitable dataset. CIFAR-10 is a commonly used dataset for image classification and is readily available in libraries like TensorFlow. I will download and load the CIFAR-10 dataset.



In [None]:
import tensorflow as tf
import numpy as np

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

# Combine train and test for easier splitting later
all_images = np.concatenate((train_images, test_images), axis=0)
all_labels = np.concatenate((train_labels, test_labels), axis=0)

print(f"Total number of images: {len(all_images)}")
print(f"Shape of images: {all_images.shape}")
print(f"Shape of labels: {all_labels.shape}")

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step
Total number of images: 60000
Shape of images: (60000, 32, 32, 3)
Shape of labels: (60000, 1)


**Reasoning**:
The next step is to split the combined dataset into training, validation, and test sets with a 70/15/15 split while ensuring the class distribution is balanced. I will use scikit-learn's `train_test_split` function for this.



In [None]:
from sklearn.model_selection import train_test_split

# Split into training and temporary set (validation + test)
train_images, temp_images, train_labels, temp_labels = train_test_split(
    all_images, all_labels, test_size=0.3, random_state=42, stratify=all_labels
)

# Split the temporary set into validation and test sets
val_images, test_images, val_labels, test_labels = train_test_split(
    temp_images, temp_labels, test_size=0.5, random_state=42, stratify=temp_labels
)

print(f"Training set size: {len(train_images)}")
print(f"Validation set size: {len(val_images)}")
print(f"Test set size: {len(test_images)}")

Training set size: 42000
Validation set size: 9000
Test set size: 9000


## Data preprocessing

### Subtask:
Prepare the image data for use with a neural network. This typically includes resizing images to a consistent size, normalizing pixel values, and augmenting the data to increase the size and variability of the training set.


**Reasoning**:
Normalize the pixel values, convert labels to categorical format, define and apply data augmentation, create TensorFlow Datasets, and apply batching and prefetching.



In [None]:
# Normalize pixel values
train_images = train_images / 255.0
val_images = val_images / 255.0
test_images = test_images / 255.0

# Convert labels to categorical format
num_classes = 10
train_labels = tf.keras.utils.to_categorical(train_labels, num_classes)
val_labels = tf.keras.utils.to_categorical(val_labels, num_classes)
test_labels = tf.keras.utils.to_categorical(test_labels, num_classes)

# Define data augmentation pipeline
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),
    tf.keras.layers.RandomRotation(0.1),
    tf.keras.layers.RandomZoom(0.1),
])

# Create TensorFlow Dataset objects
train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
val_ds = tf.data.Dataset.from_tensor_slices((val_images, val_labels))
test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))

# Apply data augmentation to the training dataset and batching/prefetching to all datasets
batch_size = 32
train_ds = train_ds.map(lambda x, y: (data_augmentation(x, training=True), y), num_parallel_calls=tf.data.AUTOTUNE).batch(batch_size).prefetch(buffer_size=tf.data.AUTOTUNE)
val_ds = val_ds.batch(batch_size).prefetch(buffer_size=tf.data.AUTOTUNE)
test_ds = test_ds.batch(batch_size).prefetch(buffer_size=tf.data.AUTOTUNE)

print("Data preparation complete.")

Data preparation complete.


## Model selection and building

### Subtask:
Choose a suitable neural network architecture for image classification. This could be a pre-trained model like VGG, ResNet, or MobileNet, or a custom-built convolutional neural network (CNN). Build the model using a deep learning framework like TensorFlow or PyTorch.


**Reasoning**:
Build a simple convolutional neural network model using Keras Sequential API based on the subtask instructions.



In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(num_classes, activation='softmax')
])

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Model training

### Subtask:
Train the selected model on the prepared training data. This involves feeding the data through the network, calculating the loss, and updating the model's weights using an optimization algorithm. Monitor the training process using the validation set to prevent overfitting.


**Reasoning**:
Train the compiled model using the fit method, providing the training and validation datasets and setting the number of epochs.



In [None]:
history = model.fit(
    train_ds,
    epochs=15,
    validation_data=val_ds
)

Epoch 1/15
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m75s[0m 53ms/step - accuracy: 0.3207 - loss: 1.8499 - val_accuracy: 0.4781 - val_loss: 1.4425
Epoch 2/15
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 52ms/step - accuracy: 0.4881 - loss: 1.4225 - val_accuracy: 0.5574 - val_loss: 1.2388
Epoch 3/15
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 51ms/step - accuracy: 0.5385 - loss: 1.2872 - val_accuracy: 0.5851 - val_loss: 1.1683
Epoch 4/15
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 53ms/step - accuracy: 0.5652 - loss: 1.2176 - val_accuracy: 0.5766 - val_loss: 1.2012
Epoch 5/15
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 52ms/step - accuracy: 0.5907 - loss: 1.1564 - val_accuracy: 0.6068 - val_loss: 1.1257
Epoch 6/15
[1m1313/1313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 52ms/step - accuracy: 0.6040 - loss: 1.1091 - val_accuracy: 0.6270 - val_loss: 1.0878
Epoc

In [8]:
# 1. Introduction
# 2. Dataset Overview
# 3. CNN Model Architecture
# 4. Training and Results
# 5. Evaluation Metrics
