# Dog Breed Prediction

Build, Train and test a convolutional neural network capable of identifying the vreed of a dog in a supplied image. (Supervised ML - Multi class classification problem)

this model can be used by NGOs working on saving anumals and for educational purposes.

## Steps 

1. Load data form Kaggle
2. Load labels CSV for labels that contain image ID and Breed
3. Checking the breed count
4. One-hot encoding on labels data prediction column
5. Load the images, Convert them to an Array and normalize them
6. Check the shape and size of x and y data
7. Bbuild the network model architecture
8. Split the data an dfit in into the model and create an accuaracy plot
9. Evaluate the model for accuracy score
10. Using the model for prediction

In [1]:
# need  to install pip install kaggle kaggle-api

In [2]:
import os
import opendatasets as od
from zipfile import ZipFile
import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi
os.chmod(os.path.expanduser('C:\\Users\\anfes\\My Drive\\Projects\\Data_Projects\\Dog_Breed\\.kaggle\\kaggle.json'), 0o600)  # Nota: 0o600 representa los permisos en octal

# Download the Dataset 

In [3]:
# Importa las librerías necesarias
from kaggle.api.kaggle_api_extended import KaggleApi

# Instancia la API de Kaggle
api = KaggleApi()

# Autentica la API utilizando tu archivo de configuración
api.authenticate()

# Descarga el dataset utilizando el comando "dataset_download_files"

dataset_name = "catherinehorng/dogbreedidfromcomp"
destination_path = "dog_dataset"  # Cambia esto a la ruta deseada

if not os.path.exists(destination_path):
    api.dataset_download_files(dataset_name, path=destination_path, unzip=False)



# Installing libraries

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tqdm import tqdm   # is a library that provides a visual progress bar for loops and long operations.
from keras.preprocessing import image   # contains utilities for loading and preprocessing images before feeding them into deep learning models.
from sklearn.preprocessing import label_binarize    # is used to convert categorical labels into binary label format.
from sklearn.model_selection import train_test_split    # allows you to split datasets into training and testing subsets
from keras.models import Sequential # is a way to create sequential neural network models layer by layer.
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D # contains the layers you can use to build your model, such as Dense (fully connected dense layer), Dropout (regularization layer), Flatten (flatten data), Conv2D (2D convolutional layer), and MaxPool2D (pooling layer).
from keras.optimizers import Adam   #  implements the Adam optimization algorithm, which is used to adjust model parameters during the training process.

## Reading Labels...
loading the labels data into a dataframe and vieweng it. 

In [5]:
labels_all = pd.read_csv("dog_dataset/labels.csv")
print(labels_all.shape)
labels_all.head()

(10222, 2)


Unnamed: 0,id,breed
0,000bec180eb18c7604dcecc8fe0dba07,boston_bull
1,001513dfcb2ffafc82cccf4d8bbaba97,dingo
2,001cdf01b096e06d78e9e5112d419397,pekinese
3,00214f311d5d2247d5dfe4fe24b2303d,bluetick
4,0021f9ceb3235effd7fcde7f7538ed62,golden_retriever


In [6]:
breeds_all = labels_all["breed"]
breed_count = breeds_all.value_counts()
breed_count.head()

scottish_deerhound      126
maltese_dog             117
afghan_hound            116
entlebucher             115
bernese_mountain_dog    114
Name: breed, dtype: int64

## limitating the model due depending the computational power

In [7]:
CLASS_NAMES = ['scottish_deerhound','maltese_dog','afghan_hound','entlebucher','bernese_mountain_dog']
labels = labels_all[(labels_all['breed'].isin(CLASS_NAMES))]

labels.head()

Unnamed: 0,id,breed
9,0042188c895a2f14ef64a918ed9c7b64,scottish_deerhound
12,00693b8bc2470375cc744a6391d397ec,maltese_dog
79,01e787576c003930f96c966f9c3e1d44,scottish_deerhound
80,01ee3c7ff9bcaba9874183135877670e,entlebucher
88,021b5a49189665c0442c19b5b33e8cf1,entlebucher


In [8]:
# Reset the index:
labels = labels.reset_index()
labels.head()

Unnamed: 0,index,id,breed
0,9,0042188c895a2f14ef64a918ed9c7b64,scottish_deerhound
1,12,00693b8bc2470375cc744a6391d397ec,maltese_dog
2,79,01e787576c003930f96c966f9c3e1d44,scottish_deerhound
3,80,01ee3c7ff9bcaba9874183135877670e,entlebucher
4,88,021b5a49189665c0442c19b5b33e8cf1,entlebucher


## Hot-encoding the target value, reading the images,  converting them into numpy array and normalizing the array 



In [13]:
# Creating numpy matrix with zeros 
X_data = np.zeros((len(labels),224,224,3), dtype='float32') #hay un total de len(labels) imágenes, cada una de tamaño 224x224 píxeles con 3 canales de color (RGB).


# One Hot Encoding
Y_data = label_binarize(labels['breed'], classes = CLASS_NAMES) # Se crea una matriz Y_data utilizando la función label_binarize para realizar la codificación en caliente de las etiquetas de las razas de los perros. La columna 'breed' del DataFrame labels se usa como entrada para esta codificación en caliente. Cada fila en Y_data corresponde a una imagen de entrenamiento y contiene un vector binario que representa la raza del perro.

print(Y_data[:5])


[[1 0 0 0 0]
 [0 1 0 0 0]
 [1 0 0 0 0]
 [0 0 0 1 0]
 [0 0 0 1 0]]


In [None]:

# Reading and converting image to numpy array and normalizing dataset
for i in tqdm(range(len(labels))): # Luego, el código entra en un bucle for que itera a través de cada fila en el DataFrame labels.
    img = image.load_img('dog_dataset/train/%s.jpg' % labels['id'][i],target_size=(224,224))
    img = image.img_to_array(img) # Para cada fila, se carga una imagen de un archivo ubicado en el directorio 'dog_dataset/train' y se cambia su tamaño a 224x224 píxeles utilizando la función load_img y img_to_array de Keras.
    x = np.expand_dims(img.copy(), axis=0) # 
    X_data[i] = x/255 #La imagen se normaliza dividiendo todos sus valores de píxeles por 255, lo que escala los valores de píxeles al rango [0, 1]. y El resultado se almacena en la matriz X_data en la posición correspondiente al índice actual del bucle.


# printing train image and one hot encode shape and size
print('\nTrain Images shape: ',X_data.shape,' size: {:,}'.format(X_data.size))
print('One-hot encoded output shape: ', Y_data.shape, ' size: {:,}'.format(Y_data.size))