<a href="https://colab.research.google.com/github/artms-18/ML-Projects/blob/main/Pets_Object_Localization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Using a Resnet50 feature extractor to classify and locate objects within the Oxford-IIIT Pet Dataset

In this notebook, we will achieve this by implementing the following steps:
1. Importing Modules
2. Preprocessing data
3. Defining helper functions to later view the data
4. Choosing a runtime strategy
4. Creating the model using a pretrained feature extractor, but not including the top so that we can add our own Dense layers
5. Compiling and fitting the model
6. Visualizing the results and looking at accuracy as well as loss on Tensorflow Hub

## Importing Modules

In [46]:
import glob
import os
import csv
import cv2
import zipfile

import xml.etree.ElementTree as ET
import numpy as np
import tensorflow as tf
import cv2

from PIL import Image
from PIL import ImageColor
from PIL import ImageDraw
from PIL import ImageFont
from PIL import ImageOps

## Getting the Data

In [2]:
!pip install -U -q kaggle
!mkdir -p ~/.kaggle

In [3]:
from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"mikashaw","key":"5ea1167847d0f4828837c42bb91725e5"}'}

In [4]:
!cp kaggle.json ~/.kaggle/
!kaggle datasets download -d devdgohil/the-oxfordiiit-pet-dataset

Downloading the-oxfordiiit-pet-dataset.zip to /content
 99% 775M/780M [00:24<00:00, 53.6MB/s]
100% 780M/780M [00:24<00:00, 32.8MB/s]


In [9]:
with zipfile.ZipFile('/content/the-oxfordiiit-pet-dataset.zip', 'r') as zip_file:
  zip_file.extractall('/content/the-oxfordiiit-pet-dataset')

## Preprocessing the Data

Right now, the annotations are in an xml file. In order to train the model, we will need to convert the data contained into a numpy array

In [10]:
XML_PATH = '/content/the-oxfordiiit-pet-dataset/annotations/annotations/xmls'
SPLIT_RATIO = 0.8 #while splitting into training and validation

In [11]:
class_names = {} #storing the name pertaining to each class of images
k = 0 #classes will be label encoded
output = [] #this will soon be a list containing the labels for each image (paht, height, width, xmin, xmas, ymax, class_name, class_names[class_name])

xml_files = glob.glob("{}/*xml".format(XML_PATH))
for i, xml_file in enumerate(xml_files):
  tree = ET.parse(xml_file)
  path = os.path.join(XML_PATH, tree.findtext("./filename"))

  height = int(tree.findtext("./size/height"))
  width = int(tree.findtext("./size/width"))
  xmin = int(tree.findtext("./object/bndbox/xmin"))
  ymin = int(tree.findtext("./object/bndbox/ymin"))
  xmax = int(tree.findtext("./object/bndbox/xmax"))
  ymax = int(tree.findtext("./object/bndbox/ymax"))

  basename = os.path.basename(path)
  basename = os.path.splitext(basename)[0]
  class_name = basename[:basename.rfind('_')].lower() #gets the lowercased name of pet (getting rid of jpg number)
  if class_name not in class_names:
    class_names[class_name] = k
    k+=1

  output.append((path, height, width, xmin, ymin, xmax, ymax, class_name, class_names[class_name]))

output.sort(key = lambda tup: tup[-1]) #sorting by class


In [12]:
#taking a look at the amounts of images in each image class

lengths = []
i = 0
last = 0
for j, row in enumerate(output):
  if last == row[-1]: #since earlier, we sorted 'output' from class 0 increasing, we can traverse the list much easier
    i += 1 #this will continue happening until last IS NOT equal to the class label
  else: 
    print(f"class {output[j-1][-2]}: {i} images") #we are doing j-1 because this happens ONLY when we move on to the next class
    lengths.append(i)
    i = 1
    last += 1

print(f"class {output[j-1][-2]}: {i} images") #since the last one still needs to be expressed
lengths.append(i)


class yorkshire_terrier: 100 images
class sphynx: 100 images
class saint_bernard: 99 images
class great_pyrenees: 100 images
class japanese_chin: 100 images
class english_setter: 100 images
class chihuahua: 100 images
class staffordshire_bull_terrier: 100 images
class german_shorthaired: 100 images
class persian: 100 images
class abyssinian: 99 images
class boxer: 100 images
class newfoundland: 100 images
class american_pit_bull_terrier: 100 images
class basset_hound: 100 images
class keeshond: 100 images
class havanese: 100 images
class ragdoll: 99 images
class bengal: 98 images
class birman: 100 images
class english_cocker_spaniel: 100 images
class leonberger: 100 images
class shiba_inu: 100 images
class miniature_pinscher: 100 images
class wheaten_terrier: 100 images
class egyptian_mau: 92 images
class beagle: 100 images
class british_shorthair: 100 images
class bombay: 100 images
class american_bulldog: 100 images
class pomeranian: 100 images
class maine_coon: 100 images
class samo

## Splitting into training and validation

In [13]:
training_data = []
validation_data = []
s = 0

for c in lengths:
  for i in range(c):
    path, height, width, xmin, ymin, xmax, ymax, class_name, class_id = output[s]
    if xmin >= xmax or ymin > ymax or xmax > width or ymax > height or xmin < 0 or ymin < 0:
      print(f"Warning: {path} contains invalid box. Skipped...")
      continue

    if i <= c * SPLIT_RATIO:
      training_data.append(output[s])

    else:
      validation_data.append(output[s])

    s+= 1
print(len(training_data))
print(len(validation_data))

2984
702


('/content/the-oxfordiiit-pet-dataset/annotations/annotations/xmls/yorkshire_terrier_177.jpg',
 500,
 375,
 76,
 89,
 287,
 210,
 'yorkshire_terrier',
 0)

## Getting Images

In [19]:
IMG_PATH = '/content/the-oxfordiiit-pet-dataset/images/images'
IMG_PATHS = []
img_files = glob.glob("{}/*jpg".format(IMG_PATH)) #getting all the jpg files

In [24]:
img_files[0].split('/')[-1].split('.')[-2]

'great_pyrenees_59'

In [33]:
def get_image_and_label(img_files, labels):

  """

    Args: 2 lists - one containing the img file paths and the other containing the preprocessed labels
    Returns: a tuple containing an img with its respective label

  """

  images_and_labels = []

  for i in range(len(img_files)):
    img = img_files[i].split('/')[-1].split('.')[-2]
    for j in range(len(labels)):
      label = labels[j][0].split('/')[-1].split('.')[-2]
      if img == label:
        images_and_labels.append((img_files[i], labels[j]))
        break
   
  return images_and_labels


In [34]:
ims_and_labels_training = get_image_and_label(img_files, training_data)
img_and_labels_validation = get_image_and_label(img_files, validation_data)

In [53]:
ims_and_labels_training[0]

('/content/the-oxfordiiit-pet-dataset/images/images/Sphynx_117.jpg',
 ('/content/the-oxfordiiit-pet-dataset/annotations/annotations/xmls/Sphynx_117.jpg',
  500,
  334,
  143,
  154,
  237,
  250,
  'sphynx',
  1))

In [49]:
temp_image = img_files[0]
temp_image = cv2.imread(temp_image)
temp_image = cv2.cvtColor(temp_image, cv2.COLOR_BGR2RGB)
temp_image = cv2.resize(temp_image, (256,256))
image_tensor = tf.convert_to_tensor(temp_image, dtype = tf.float32)
image_tensor = tf.expand_dims(image_tensor, 0)


In [55]:
#preprocessing functions

def convert_to_tensor(img_path):

  """

    Args: filepath to an image
    Returns: a resized RBB image with a batch dimension in the form of a tensor of dtype tf.float32

  """

  img = cv2.imread(img_path)
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RBG)
  img = cv2.resize(img, (256,256))
  image_tensor = tf.convert_to_tensor(img, dtype = tf.float32)
  image_tensor = tf.expand_dims(image_tensor, 0)
  return image_tensor

def preprocess_label(label):

  """
    Args: A label containing a tuple with values (path, height, width, xmin, ymin, xmax, ymax, class_name, class_names[class_name])
    Returns: A label containinga tuple with values (class_names[class_name], [xmin, ymin, xmax, ymax])

    **NOTE** the class_names[class_name] will be one hot encoded

  """

  path, height, width, xmin, ymin, xmax, ymax, class_name, class_label = label

  label = tf.one_hot(class_label, 100)
  b_box = [xmin, ymin, xmax, ymax]

  return (label, b_box)

def preprocessing(image_and_label):

  """

    Args: a list containing tuples of list[0] = image path, list[2] = outputs/labels
    Returns: a tuple containing the preprocessed images and labels

    ##MIKA FIGURE OUT HOW THE LABEL SHOULD BE FORMATTED
  
  """

  image, label = image_and_label
  preprocessed_image = convert_to_tensor(image)
  preprocessed_label = preprocess_label(label)

  return (preprocessed_image, preprocessed_label)

def create_datasets(images_and_labels):

  images = []
  labels = []

  for image_and_label in image_and_labels:
    image, label = preprocessing(image_and_label)
    images.append(image)
    labels.append(label)

  dataset = tf.data.Dataset.from_tensor_slices((images, labels))
  dataset = dataset.shuffle(100, reshuffle_each_iteration = True)
  dataset = dataset.batch(64).prefetch(tf.data.AUTOTUNE)

  return dataset
