# End to End Image Classification Model

This notebook builds an end-to-end dog classification machine learning model using TensorFlow and TensorFlow Hub.

## 1. Problem Statement

Identifying Dog Breed

The goal is to develop a model that can identify the breed of a dog from a photograph.

## 2. Data

The data is collected from Kaggle: [Dog Breed Identification](https://www.kaggle.com/c/dog-breed-identification/data).

## 3. Model Evaluation

The model evaluation will be based on a file containing prediction probabilities for each dog breed.

## 4. Features

Some information about our dataset:
- We are dealing with unstructured data (images), so we will likely use deep learning/transfer learning techniques.
- The dataset contains over 10,000 images in the training set and over 10,000 images in the test set (these images are unlabeled as they are the ones we need to predict).

# Mounting Google Drive & Importing necessary libraries
1. Mounting google drive
2. Import libraries
3. Checking GPU
4. Init GPU
5. Wamup the GPU and CPU

## Mount Drive

In [None]:
# Mount Google Drive to access data (if running on Google Colab)
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Import Libraries

In [107]:
# Import necessary libraries
import pandas as pd
import numpy as np
import tensorflow_hub as hub
import tensorflow as tf


# Check versions
print(f"TensorFlow Version: {tf.__version__}")
print(f"TensorFlow Hub Version: {hub.__version__}")

2.15.0
0.16.1


## Checking GPU

In [None]:
print("GPU", "Available" if tf.config.list_physical_devices("GPU") else "Not Available")

## Init GPU

In [None]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

## Device warmup

In [None]:
import timeit

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

def cpu():
  with tf.device('/cpu:0'):
    random_image_cpu = tf.random.normal((100, 100, 100, 3))
    net_cpu = tf.keras.layers.Conv2D(32, 7)(random_image_cpu)
    return tf.math.reduce_sum(net_cpu)

def gpu():
  with tf.device('/device:GPU:0'):
    random_image_gpu = tf.random.normal((100, 100, 100, 3))
    net_gpu = tf.keras.layers.Conv2D(32, 7)(random_image_gpu)
    return tf.math.reduce_sum(net_gpu)

# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()

# Run the op several times.
print('Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images '
      '(batch x height x width x channel). Sum of ten runs.')
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
print('GPU (s):')
gpu_time = timeit.timeit('gpu()', number=10, setup="from __main__ import gpu")
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))

# Data Loading and Exploration
1. Loading Image Data
2. Displaying Sample Images
3. Exploring Image Metadata

## Loading & Exploring Image Data

In [None]:
# Load the dataset
df_lab = pd.read_csv('drive/MyDrive/Deep_Learning/labels.csv')

In [None]:
# Display dataset statistics
print(df_lab.describe())

In [None]:
# Display the first few rows of the dataset
print(df_lab.head())

In [None]:
# Display the value counts of breeds
breed_counts = df_lab['breed'].value_counts()
print(breed_counts)

In [None]:
# Getting the median of count of images by breed
df_lab["breed"].value_counts().median()

In [None]:
# Plotting grapph of count of images by breed
df_lab["breed"].value_counts().plot.bar(figsize=(20,10))

## Displaying Sample Images

In [None]:
# Displaying a sample image
from IPython.display import Image, display

# Function to display a random image
def display_random_image(df, image_folder):
    random_index = np.random.randint(len(df))
    image_id = df["id"][random_index]
    breed = df["breed"][random_index]
    image_path = f"{image_folder}/{image_id}.jpg"
    display(Image(filename=image_path))
    print(f"Image ID: {image_id}, Breed: {breed}")

# Display a random image from the dataset
display_random_image(df_labels, "drive/MyDrive/Deep_Learning/train")

## Exploring Image Metadata

In [None]:
# Display specific image by index
index = 8000
image_id = df_labels["id"][index]
breed = df_labels["breed"][index]
image_path = f"drive/MyDrive/Deep_Learning/train/{image_id}.jpg"
display(Image(filename=image_path))
print(f"Image ID: {image_id}, Breed: {breed}")


# Data Preparation
1. Extracting Labels
2. Checking Data Integrity
3. Converting Labels to Boolean Values
4. Preprocessing Images
5. Creating Data Pipeline

## Extracting Labels

In [None]:
# Convert labels to numpy array
labels = df_labels["breed"].to_numpy()
print(labels)

## Checking Data Integrity

In [None]:
# Ensure the number of labels matches the number of filenames
if len(labels) == len(df_labels["id"]):
    print("Number of labels matches number of filenames")
else:
    print("Numbers do not match")

# Check the length of labels
print(f"Number of labels: {len(labels)}")

## Converting Labels to Boolean Values

In [None]:
# Get unique breed names
unique_breeds = np.unique(labels)
print(f"Unique breeds: {unique_breeds}")

# Convert labels to boolean values
bool_labels = [label == unique_breeds for label in labels]
print(f"Boolean labels: {bool_labels[:10]}")

# Verify the length of boolean labels
print(f"Number of boolean labels: {len(bool_labels)}")

### Cheking example boolean label and its occurence

In [None]:
print(labels[0]) #original label
print(np.where(unique_breed==labels[0])) #index where label occured
print(bool_labels[0].argmax()) #index where label occurs in boolean array
print(bool_labels[0].astype(int)) #there will be 1 where the label occured

## Preprocessing Images
1. Loading and Resizing Images
2. Normalizing Images

In [None]:
# Define function to preprocess images
def preprocess_image(image_path):
    # Read the image file
    image = tf.io.read_file(image_path)
    # Decode the image to a tensor
    image = tf.image.decode_jpeg(image, channels=3)
    # Resize the image to the required size
    image = tf.image.resize(image, [224, 224])
    # Normalize the image to the range [0, 1]
    image = image / 255.0
    return image

# Test the function with a sample image
sample_image_path = f"drive/MyDrive/Deep_Learning/train/{df_labels['id'][0]}.jpg"
sample_image = preprocess_image(sample_image_path)
print(sample_image.shape)

## Creating Data Pipeline
1. Creating TensorFlow Dataset
2. Batching and Prefetching Data

In [None]:
# Function to create a TensorFlow dataset from image paths and labels
def create_dataset(image_paths, labels, batch_size=32):
    # Create a dataset of image paths
    dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))
    
    # Function to load and preprocess images
    def load_and_preprocess_image(path, label):
        image = preprocess_image(path)
        return image, label
    
    # Map the load_and_preprocess_image function to the dataset
    dataset = dataset.map(load_and_preprocess_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    # Shuffle, batch, and prefetch the dataset
    dataset = dataset.shuffle(buffer_size=len(image_paths)).batch(batch_size).prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    
    return dataset

# Creating image paths
image_paths = [f"drive/MyDrive/Deep_Learning/train/{image_id}.jpg" for image_id in df_labels['id']]

# Creating the dataset
train_dataset = create_dataset(image_paths, bool_labels)

In [None]:
# Function to create batches of data
def create_data_batches(image_paths, labels, batch_size=32, valid_data=False):
    if valid_data:
        print("Creating validation data batches...")
    else:
        print("Creating training data batches...")
    
    # Create a dataset from the image paths and labels
    data = tf.data.Dataset.from_tensor_slices((image_paths, labels))
    
    # Map the preprocessing function to the dataset
    data = data.map(get_image_details, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
    # Batch and prefetch the dataset
    data = data.batch(batch_size).prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    
    return data

In [None]:
# Function to return a tuple of (image, label)
def get_image_details(image_path, label):
    image = preprocess_image(image_path)
    return image, label

In [None]:
# Example image paths and labels
image_paths = [f"drive/MyDrive/Deep_Learning/train/{image_id}.jpg" for image_id in df_labels['id']]

# Create training and validation data batches
train_data = create_data_batches(image_paths, bool_labels)
val_data = create_data_batches(image_paths, bool_labels, valid_data=True)

# Select one batch for visualization
one_batch_train_data = train_data.take(1)

## Visualizing Data Batches

In [None]:
import matplotlib.pyplot as plt

# Function to visualize data batches
def visualize_data_batches(data_batch):
    # Iterate over the batch
    for images, labels in data_batch:
        # Plot each image in the batch
        plt.figure(figsize=(10, 10))
        for i in range(len(images)):
            plt.subplot(4, 4, i + 1)
            plt.imshow(images[i])
            plt.title(unique_breeds[np.argmax(labels[i])])
            plt.axis("off")
        plt.show()

# Visualize one batch of training data
visualize_data_batches(one_batch_train_data)