<a href="https://colab.research.google.com/github/beedoop1/image-classification/blob/main/final_analysis_of_image_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction


The purpose of this project is to develop an image classification algorithm that classifies images into one of two categories.

## Description

The dataset was found from kaggle. The dataset includes images of both birds and drones. We will be using these images to develop an image classification algorithm that determines whether an image is a bird or drone.

Link to dataset: https://www.kaggle.com/datasets/harshwalia/birds-vs-drone-dataset/data

## Importing libraries

In [142]:
import glob
from PIL import Image
from io import BytesIO
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skimage.io import imread
import cv2
import random
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

In [143]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Getting all the jpeg files from the birds and drones folder

In [144]:
bird = glob.glob("/content/drive/MyDrive/bird vs drone/BirdVsDrone/Birds/*.jpeg")
drone = glob.glob("/content/drive/MyDrive/bird vs drone/BirdVsDrone/Drones/*.jpeg")

Initializing an empty list to hold the images for both birds and drones

In [145]:
bird_images = []

for image in bird:
  bird_image = cv2.imread(image)
  if bird_image is not None:
    bird_images.append(bird_image)

In [146]:
drone_images = []

for image in drone:
  drone_image = cv2.imread(image)
  if drone_image is not None:
    drone_images.append(drone_image)

Checking the amount of images taken from both folders

In [147]:
print(len(bird_images))

396


In [148]:
print(len(drone_images))

114


Flipping the drone images multiple times to try to balance out the difference in images

In [149]:
rotated_drone_iamges_90_degrees = [np.rot90(image, k=1) for image in drone_images]
rotated_drone_images_180_degrees = [np.rot90(image, k=2) for image in drone_images]

Merging the 3 drone lists together to get one data set and then checking to make sure all 3 lists were added together

In [150]:
merged_drone_images = drone_images + rotated_drone_iamges_90_degrees + rotated_drone_images_180_degrees

In [151]:
print(len(merged_drone_images))

342


Resizing the bird list so it matches the drones list with random images

In [152]:
balanced_bird_images = random.sample(bird_images, 342)

Checking the sizes of an image from both datasets

In [153]:
print(balanced_bird_images[0].shape)

(275, 183, 3)


In [154]:
print(merged_drone_images[0].shape)

(168, 299, 3)


Changing the data type of the images which is a numpy array to PIL image,resizing the images to 128*128, and then changing the data type back to a numpy array

In [155]:
output_size = (128, 128)
bird_resized = []
drone_resized = []

for image_array in balanced_bird_images:
    image_pil = Image.fromarray(image_array)
    resized_image = image_pil.resize(output_size, Image.LANCZOS)
    bird_resized.append(np.array(resized_image))

for image_array in merged_drone_images:
    image_pil = Image.fromarray(image_array)
    resized_image = image_pil.resize(output_size, Image.LANCZOS)
    drone_resized.append(np.array(resized_image))

Making sure the sizes of the images changed

In [157]:
print(bird_resized[0].shape)

(128, 128, 3)


In [158]:
print(drone_resized[0].shape)

(128, 128, 3)


## Analysis

Combining the two lists together and also giving a label for the images so the image classification algorithm knows which picture is a bird and which is a drone

In [159]:
all_images = np.concatenate((bird_resized, drone_resized), axis=0)
labels = np.array([1] * len(bird_resized) + [0] * len(drone_resized))

Performing a train test split

In [160]:
X_train, X_test, y_train, y_test = train_test_split(all_images, labels, test_size=0.2, random_state=42)

#### Flattening the images into a 1D array

In [161]:
X_train.shape

(547, 128, 128, 3)

In [162]:
X_train = X_train.reshape(X_train.shape[0], -1)

In [163]:
X_train.shape

(547, 49152)

In [164]:
X_test = X_test.reshape(X_test.shape[0], -1)

In [165]:
X_test.shape

(137, 49152)

#### Normalizing our train and test data

In [166]:
X_train_norm = X_train / 255.0
X_test_norm = X_test / 255.0

#### Training a classifier using logistic regression

In [167]:
model = LogisticRegression(
                        fit_intercept=True,
                        multi_class='auto',
                        penalty='l2', #ridge regression
                        solver='saga',
                        max_iter=100,
                        C=50
                      )

Training the neural network with the training set data.

In [168]:
model.fit(X_train_norm, y_train)



Making the prediction on the test set, calculating the accuracy, and showing the confusion matrix

In [169]:
y_pred = model.predict(X_test_norm)
accuracy = sum(y_pred == y_test) / len(y_test)
print(accuracy)
cm = confusion_matrix(y_test, y_pred)
print(cm)

0.708029197080292
[[31 28]
 [12 66]]


## Conclusion

From looking at the accuracy, the accuracy when correctly classifying instances from the data set was around 70.8%. A possibility for the accuracy being this low is that the amount of data used in the training set was not a lot. In order to balance the sampling between the bird and drone images, I rotated the drone pictures 90 degrees and 180 degrees to get a larger sample size for the drones. Doing this did increase my accuracy from a 59% accuracy to a 70% accuracy. I believe if I were to find a larger dataset for both birds and drones, I would be able to increase the accuracy for the image classification algorithm. We can see that around 97 out of 137 images were predicted correctly by our image classification algorithm. We can for sure get better results if we were to have a larger training dataset inserted in our neural network. 12 images of birds were falsely identified as a drone which is not bad, however 28 drone images were identified as birds out of 59 which shows that the drone dataset was not diverse enough since I flipped the images multiple times in order to balance out the dataset.