# 1. Project Goal/Motivation

## Goal:
Develop an image classification model that can accurately identify different species of bears from images.

## Motivation:
- **Problem**: Different bear species exhibit different behaviors, and appropriate human responses to encounters with them vary. Accurate identification can help in promoting safety and effective wildlife management.
- **Relevance**: This project can aid hikers, wildlife enthusiasts, and conservationists in identifying bear species quickly and accurately, contributing to safer interactions and better conservation strategies.


# 2. Data

## Data Collection:
- The dataset for this project consists of images of various bear species sourced from a publicly available dataset.

## Categories:
- The dataset includes images of common bear species such as:
  1. Polar bears
  2. Grizzly bears
  3. Black bears
  4. Panda bears
  5. Teddy bears

## Data Preparation:
- Image preprocessing steps were applied to ensure consistency and quality:
  - Image resizing: Images were resized to a standardized dimension (e.g., 224x224 pixels).
  - Data augmentation: Techniques like rotation, zoom, and flipping were applied to increase dataset variability and improve model robustness.



In [1]:
# Import Libraries
import os  # Provides functions to interact with the operating system
import sys  # Provides access to some variables used or maintained by the interpreter
import shutil  # Offers high-level operations on files and collections of files

from tempfile import NamedTemporaryFile  # Creates temporary files and directories
from urllib.request import urlopen  # Opens URLs
from urllib.parse import unquote, urlparse  # Provides utilities to handle URL parsing and quoting
from urllib.error import HTTPError  # Defines the exceptions raised by urllib

from zipfile import ZipFile  # Provides tools to create, read, write, append, and list a ZIP file
import tarfile  # Provides tools to read and write tar archive files

In [3]:
# Download the dataset

CHUNK_SIZE = 40960  # Define the size of each chunk to be read from the URL

# Mapping of dataset name to its encoded URL
DATA_SOURCE_MAPPING = 'bear-dataset:https%3A%2F%2Fstorage.googleapis.com%2Fkaggle-data-sets%2F2293433%2F3855739%2Fbundle%2Farchive.zip%3FX-Goog-Algorithm%3DGOOG4-RSA-SHA256%26X-Goog-Credential%3Dgcp-kaggle-com%2540kaggle-161607.iam.gserviceaccount.com%252F20240601%252Fauto%252Fstorage%252Fgoog4_request%26X-Goog-Date%3D20240601T163604Z%26X-Goog-Expires%3D259200%26X-Goog-SignedHeaders%3Dhost%26X-Goog-Signature%3D3d149c913c839d42dfc99190066b0d8dc0cc4495ee921249978a937b91ca68edde89dcc26addea3eec9cb14b621d51c7b3752465fed72bca66931c40fd6c18eea1b231b9ae99115da46ecc9e87804216aea188f333e62797a8487f367256275409ca24aafbf8a43f3e54bb5b8c26dc9f862378bf5ea390d07982ef89882d5639e2a6b74441c64877446562b964a7d22e043bbaa2a63fe0bab3ae4b6b9618b79f181592254144fa21e75e2a6d56a96d82c4a1814ce425394aec30006b7e8522f7f6c073b414f2c12be6f80ef432802558edf37a479f2baff4f8bbbc10f58ef02380f536f916c584edf96d43f26a28328fdc8ae3068c9a7b4e230449e1c6ad0516'

KAGGLE_INPUT_PATH = '/kaggle/input'  # Define the input path for Kaggle
KAGGLE_WORKING_PATH = '/kaggle/working'  # Define the working path for Kaggle
KAGGLE_SYMLINK = 'kaggle'  # Define the symlink name for Kaggle

# Unmount the /kaggle/input directory if it is mounted
!umount /kaggle/input/ 2> /dev/null
# Remove the /kaggle/input directory, ignoring errors if it does not exist
shutil.rmtree('/kaggle/input', ignore_errors=True)
# Create the /kaggle/input directory with full permissions
os.makedirs(KAGGLE_INPUT_PATH, 0o777, exist_ok=True)
# Create the /kaggle/working directory with full permissions
os.makedirs(KAGGLE_WORKING_PATH, 0o777, exist_ok=True)

# Create a symbolic link to the input directory, ignore if it already exists
try:
    os.symlink(KAGGLE_INPUT_PATH, os.path.join("..", 'input'), target_is_directory=True)
except FileExistsError:
    pass
# Create a symbolic link to the working directory, ignore if it already exists
try:
    os.symlink(KAGGLE_WORKING_PATH, os.path.join("..", 'working'), target_is_directory=True)
except FileExistsError:
    pass

# Loop through each data source mapping
for data_source_mapping in DATA_SOURCE_MAPPING.split(','):
    directory, download_url_encoded = data_source_mapping.split(':')  # Split into directory and encoded URL
    download_url = unquote(download_url_encoded)  # Decode the URL
    filename = urlparse(download_url).path  # Parse the filename from the URL
    destination_path = os.path.join(KAGGLE_INPUT_PATH, directory)  # Define the destination path

    try:
        # Open the URL and create a temporary file
        with urlopen(download_url) as fileres, NamedTemporaryFile() as tfile:
            total_length = fileres.headers['content-length']  # Get the total length of the file
            print(f'Downloading {directory}, {total_length} bytes compressed')
            dl = 0  # Initialize the downloaded length
            data = fileres.read(CHUNK_SIZE)  # Read the first chunk
            while len(data) > 0:  # Continue reading until no more data
                dl += len(data)  # Increment the downloaded length
                tfile.write(data)  # Write the data to the temporary file
                done = int(50 * dl / int(total_length))  # Calculate the download progress
                # Display the download progress
                sys.stdout.write(f"\r[{'=' * done}{' ' * (50-done)}] {dl} bytes downloaded")
                sys.stdout.flush()
                data = fileres.read(CHUNK_SIZE)  # Read the next chunk

            # Check if the file is a ZIP archive and extract it
            if filename.endswith('.zip'):
                with ZipFile(tfile) as zfile:
                    zfile.extractall(destination_path)
            # Otherwise, treat it as a tar archive and extract it
            else:
                with tarfile.open(tfile.name) as tarfile:
                    tarfile.extractall(destination_path)
            print(f'\nDownloaded and uncompressed: {directory}')  # Confirm download and extraction
    except HTTPError as e:
        # Handle HTTP errors (likely due to an expired URL)
        print(f'Failed to load (likely expired) {download_url} to path {destination_path}')
        continue
    except OSError as e:
        # Handle OS errors
        print(f'Failed to load {download_url} to path {destination_path}')
        continue

print('Data source import complete.')  # Indicate that the data import is complete


OSError: [Errno 30] Read-only file system: '/kaggle'

In [4]:
# Importing necessary libraries for model

# Building deep learning models
import tensorflow as tf
from tensorflow import keras
# For accessing pre-trained models
import tensorflow_hub as hub
# For separating train and test sets
from sklearn.model_selection import train_test_split

# For visualizations
import matplotlib.pyplot as plt
import matplotlib.image as img
import PIL.Image as Image
import cv2

import os
import numpy as np
import pandas as pd
import pathlib

In [9]:
# Accessing the images link
data_dir = "./input/bear-dataset/data" # Datasets path
data_dir = pathlib.Path(data_dir)
data_dir

# Opening each folder in a variable
black = list(data_dir.glob('black/*'))
grizzly = list(data_dir.glob('grizzly/*'))
panda = list(data_dir.glob('panda/*'))
polar = list(data_dir.glob('polar/*'))
teddy = list(data_dir.glob('teddy/*'))

# Assigning dirs for images and their labels
# Contains the images path
df_images = {
    'black' : black,
    'grizzly' : grizzly,
    'panda' : panda,
    'polar' : polar,
    'teddy': teddy
}

# Contains numerical labels for the categories
df_labels = {
    'black' : 0,
    'grizzly' : 1,
    'panda' : 2,
    'polar' : 3,
    'teddy': 4
}

# Reshape dimensions 224x224
X, y = [], [] # X = images, y = labels
for label, images in df_images.items():
    for image in images:
        img = cv2.imread(str(image))
        resized_img = cv2.resize(img, (224, 224)) # Resizing the images to be able to pass on MobileNetv2 model
        X.append(resized_img)
        y.append(df_labels[label])

# Standarizing
X = np.array(X)
X = X/255
y = np.array(y)

# Training and Test Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 3. Model

### TensorFlow Hub - MobileNetV2
In this section of the notebook, we are utilizing a pre-trained MobileNetV2 model from TensorFlow Hub as the base for our neural network. MobileNetV2 is a lightweight, efficient convolutional neural network commonly used for feature extraction in image classification tasks. By importing the model without its final classification layer, we can leverage its pre-trained weights to extract high-level features from our input images.

We initialize the MobileNetV2 model as a non-trainable Keras layer to retain its pre-trained weights during our training process. This approach, known as transfer learning, allows us to build a more effective model with reduced computational cost and training time, especially beneficial when working with smaller datasets.

We then construct a new sequential model by adding a dense layer on top of the MobileNetV2 base. This dense layer, configured to match the number of target classes in our specific task, will be trained to perform the final classification.

By compiling and summarizing this model, we prepare it for subsequent training and evaluation, effectively customizing a state-of-the-art image classification model for our unique dataset.

We compile the model using the Adam optimizer and `SparseCategoricalCrossentropy` loss function, with accuracy as the evaluation metric. The model is trained on the training dataset for 10 epochs to adjust its weights and minimize the loss. This process generates a history object that tracks the model's performance and learning progression over the epochs.

In [10]:
mobile_net = 'https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4' # MobileNetv4 link
mobile_net = hub.KerasLayer(
        mobile_net, input_shape=(224,224, 3), trainable=False) # Removing the last layer

In [13]:
num_label = 5 # number of labels

# Wrap the mobile_net layer in a Lambda layer to make it compatible with Sequential
model = keras.Sequential([
    keras.layers.Lambda(lambda x: mobile_net(x)),
    keras.layers.Dense(num_label)
])

model.summary()


In [14]:
# Training the model
model.compile(
  optimizer="adam",
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
  metrics=['acc'])


history = model.fit(X_train, y_train, epochs=10)

Epoch 1/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 162ms/step - acc: 0.3662 - loss: 1.4493
Epoch 2/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 158ms/step - acc: 0.8168 - loss: 0.6211
Epoch 3/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 159ms/step - acc: 0.9543 - loss: 0.2832
Epoch 4/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 163ms/step - acc: 0.9832 - loss: 0.1732
Epoch 5/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 162ms/step - acc: 0.9980 - loss: 0.1328
Epoch 6/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 159ms/step - acc: 0.9965 - loss: 0.0934
Epoch 7/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 160ms/step - acc: 1.0000 - loss: 0.0799
Epoch 8/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 161ms/step - acc: 1.0000 - loss: 0.0691
Epoch 9/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 160ms/step - acc: 1.000

# 4. Interpretation and Validation

We evaluate the model's performance on the test dataset to obtain its accuracy and other metrics. Subsequently, we generate predictions for the test data and convert these predictions to class labels. Finally, we print a detailed classification report, including precision, recall, and F1-score for each class, using the true labels and predicted labels.

These results indicate that the model achieved a relatively low loss of 0.1740 and a high accuracy of 91.94% during training. The evaluation on the test dataset confirms the model's performance, with a similar loss of 0.17397 and accuracy of 91.93%.

The classification report provides more detailed insights into the model's performance across different classes. Notably, class 0 has perfect precision but lower recall, suggesting that the model correctly identifies instances of this class but may miss some. Class 1 also has high precision and recall, indicating good performance. However, class 2 shows lower precision and recall, suggesting potential challenges in correctly classifying instances of this class. Classes 3 and 4 have high precision and recall, indicating that the model performs well on these classes.

Overall, the model achieves an accuracy of 92% on the test dataset, with a balanced performance across most classes. However, class 2 appears to be more challenging for the model, possibly due to imbalanced data or inherent difficulties in distinguishing instances of this class. Further analysis, such as investigating misclassified instances or exploring different model architectures, may help improve the model's performance, particularly for class 2.

In [15]:
model.evaluate(X_test,y_test)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 163ms/step - acc: 0.9677 - loss: 0.1135


[0.135707825422287, 0.9516128897666931]

In [16]:
from sklearn.metrics import classification_report

y_pred = model.predict(X_test, batch_size=64, verbose=1)
y_pred_bool = np.argmax(y_pred, axis=1)

print(classification_report(y_test, y_pred_bool))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 583ms/step
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        13
           1       1.00      1.00      1.00        10
           2       0.86      0.75      0.80         8
           3       0.96      1.00      0.98        22
           4       0.89      0.89      0.89         9

    accuracy                           0.95        62
   macro avg       0.94      0.93      0.93        62
weighted avg       0.95      0.95      0.95        62

