# TASK: IMAGE CLASSIFICATION OF DIFFERENT ANIMALS
In this task, we aim to develop a computer program capable of analyzing images and categorizing them based on the animals depicted in the pictures. The program will be trained to recognize various species of animals, distinguishing between different classes such as **cattle**, **sheep**, **elephant**, etc. By classifying images according to the animals present, we can facilitate wildlife monitoring efforts and contribute to conservation initiatives
# IMAGE DATASET SOURCE
We dowloaded air pollution image dataset from kaggle. The dataset contains images of different animals. The dataset is divided into five folders names as **elefante_train**,**farfalla_train**,**mucca_train**,**pecora_train** and **scoiattolo_train**. Total number of images in the dataset is 7,002 and images had different sizes.

[Link to the dataset](https://www.kaggle.com/datasets/pratik2901/animal-dataset)
date the dataset made: 10/01/2022
contributor:stpete_ishii and prateek 
# FEATURE ENGINEERING
We performed the following feature engineering

1.**Cropping**

Cropping is the process of removing parts of an image to focus on specific regions of interest. we cropped our images using code and other picutures we cropped them using photo app.
cropping using code, here we give bounderies(cordinate and dimensions of interested region) of interested region on the picture.But this way is the same as using the photo app becuase you crop one picture at a time.
- **the code to crop one picture**

In [1]:
import cv2

# Read the image
image = cv2.imread("ALL IMAGES/ea36b30f2df3023ed1584d05fb1d4e9fe777ead218ac104497f5c978a4efb4bb_640.jpg")

# Check if the image was successfully loaded
if image is None:
    print("Error: Unable to read the image.")
else:
    # Output the dimensions of the image
    print("Image dimensions:", image.shape)

    # Define the coordinates of the region of interest (ROI)
    # Adjusted coordinates within the image bounds
    x, y, width, height = 100, 100, 150, 100

    # Output the calculated coordinates
    print("ROI coordinates:", x, y, width, height)

    # Check if the ROI coordinates are within the bounds of the image
    if x < image.shape[1] and y < image.shape[0]:
        # Adjust the ROI dimensions if it exceeds the image boundaries
        width = min(width, image.shape[1] - x)
        height = min(height, image.shape[0] - y)

        # Crop the image
        cropped_image = image[y:y+height, x:x+width]

        # Display the cropped image
        cv2.imshow("Cropped Image", cropped_image)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

        # Save the cropped image
        cv2.imwrite("cropped_image.jpg", cropped_image)
        print("Cropped image saved successfully.")
    else:
        print("Error: ROI coordinates are outside the bounds of the image.")


Image dimensions: (425, 640, 3)
ROI coordinates: 100 100 150 100
Cropped image saved successfully.


2.**resizing**
- we resized all images in the dataset by 250*250 pixels. Resizing ensures that all images in the dataset have same dimensions. This uniformity simplifies data preprocessing and model training, as the model does not have to handle images of different sizes.
- **resizing code of all images**

In [2]:
from PIL  import Image
import os

#Directory containing the original images
original_folder="ALL IMAGES"

#directory to store resized images
resized_folder="RESIZED IMAGES2"

#TARGET SIZE FOR RESIZING
target_size=(250,250) 

#create the resized folder if it doesnt exist
if not os.path.exists(resized_folder):
    os.makedirs(resized_folder)
total_images=1
#itarate over each image in the original folder 
for filename in os.listdir(original_folder):
    if filename.endswith((".jpg",".png",".jpeg",".gif")):
        total_images +=1
        with Image.open(os.path.join(original_folder,filename)) as img:
            resized_img=img.resize(target_size,Image.ANTIALIAS)
            #save the resized image to the resized folder
            resized_img.save(os.path.join(resized_folder,filename))
            print(f"Resized {filename} succesfully")
            
print(f"all {total_images} images resized and check for this folder {resized_folder} to your folders")


  resized_img=img.resize(target_size,Image.ANTIALIAS)


Resized ea36b0072bfc063ed1584d05fb1d4e9fe777ead218ac104497f5c978a4efb4bb_640.jpg succesfully
Resized ea36b00828f6033ed1584d05fb1d4e9fe777ead218ac104497f5c97faee9bdba_640.jpg succesfully
Resized ea36b00829f0043ed1584d05fb1d4e9fe777ead218ac104497f5c97faee9bdba_640.jpg succesfully
Resized ea36b0082bf2063ed1584d05fb1d4e9fe777ead218ac104497f5c978a4efb4bb_640.jpg succesfully
Resized ea36b0082bf2083ed1584d05fb1d4e9fe777ead218ac104497f5c978a4efb4bb_640.jpg succesfully
Resized ea36b1062df4053ed1584d05fb1d4e9fe777ead218ac104497f5c97faee9bdba_640.jpg succesfully
Resized ea36b30f2df3023ed1584d05fb1d4e9fe777ead218ac104497f5c978a4efb4bb_640.jpg succesfully
Resized ea36b4072ffc023ed1584d05fb1d4e9fe777ead218ac104497f5c97faee9bdba_640.jpg succesfully
Resized ea36b4082ff2093ed1584d05fb1d4e9fe777ead218ac104497f5c97faee8b1b8_640.jpg succesfully
Resized ea36b4092df7073ed1584d05fb1d4e9fe777ead218ac104497f5c97faee9bdba_640.jpg succesfully
Resized ea36b50b2df21c22d2524518b7444f92e37fe5d404b0144390f8c07aa4e5b0

# CREATING IMAGE CLASS LABEL (GROUPING IMAGES)
We looked at every image and see the animal on it, then we put images in classes. We created five following **classes**.
- **Elephant class**: this class contains all elephant images.  
- **Butterfly class**: this class contains butterfly images. 
- **Squil class**: this class contains squil images.
- **Sheep class**: this class contains sheep images.
- **Cattle Class**: this class contains cattle images.

In each class label we put different number of images range 10 to 15.

# DATA FORMATTING 
After grouping the images based on classes, we converted the dataset into the **COCO JSON** data format. COCO provides a standardized format for representing image datasets and is commonly used for tasks such as object detection. Using COCO can increase the robustness of computer vision algorithms to unseen images.
- In this **COCO JSON** format, we assigned numerical labels to each class, with **ELEPHANT CLASS** labeled as **1**, **SQUIL CLASS** as **2**,**SHEEP CLASS** as **3**,**BUTTERFLY CLASS** as **4** and **CATTLE CLASS** as **5**. Using numerical labels for prediction simplifies the representation of classes, making it easier for machine learning algorithms to process the data efficiently.
**Code we used to convert the image dataset to  COCO JSON dataset format**

In [3]:
import json
import os

# Define the path to your dataset directory
dataset_dir = "RESIZED ANIMAL  DATASET"

# Initialize a dictionary to store the COCO-style dataset annotations
coco_dataset = {
    "info": {
        "description": "This dataset contains images of animals."
    },
    "categories": [{
        "id": 1,
        "name": "RESIZED BUTTERFLY IMAGES",
        "supercategory": "animal"
    }, {
        "id": 2,
        "name": "RESIZED CATTLE IMAGES",
        "supercategory": "animal"
    }, {
        "id": 3,
        "name": "RESIZED ELEPHANT IMAGES",
        "supercategory": "animal"
    },{
        "id": 4,
        "name": "RESIZED SHEEP IMAGES",
        "supercategory": "animal"
    },{
        "id": 5,
        "name": "RESIZED SQUILL IMAGES",
        "supercategory": "animal"
    }],
    "images": [],  # Fill this with the image data
}

# Mapping from category folder names to category IDs
category_id_map = {
    "RESIZED BUTTERFLY IMAGES": 1,
    "RESIZED CATTLE IMAGES": 2,
    "RESIZED ELEPHANT IMAGES": 3,
    "RESIZED SHEEP IMAGES": 4,
    "RESIZED SQUILL IMAGES": 5
}

# Iterate over each category folder in your dataset directory
for category_folder, category_id in category_id_map.items():
    category_path = os.path.join(dataset_dir, category_folder)
    if os.path.isdir(category_path):
        # Iterate over the image files in the category folder
        for filename in os.listdir(category_path):
            if filename.endswith(('.jpg', '.png', '.jpeg')):
                # Construct the path to the image file
                image_path = os.path.join(category_path, filename)
                
                # Add the image to the COCO-style dataset
                coco_dataset["images"].append({
                    "id": len(coco_dataset["images"]) + 1,
                    "file_name": filename,
                    "category_id": category_id
                })

# Save the COCO-style dataset annotations to a JSON file
annotations_path = "ANIMALS DATASET IN COCO JSON FORMAT2.json"
with open(annotations_path, "w") as f:
    json.dump(coco_dataset, f)
    print(f"converting to coco succesfully check for this file {annotations_path}")


converting to coco succesfully check for this file ANIMALS DATASET IN COCO JSON FORMAT2.json


# CHALLENGES WE FACED WHEN WE WERE DOING THIS TASK
- **Data collection**: we downloaded a dataset from Kaggle, but it was too big close to 1GB in size. we struggled to download the dataset due to poor  internet connection

- **Image labeling**: Sorting each image into its pollution level category was quite a task. We had to carefully look at each image and decide where it belongs. This took a lot of time and effort because we had to do it manually.

- **Data formatting**: Getting the labeled images into the right format (COCO JSON) was another challenge. We had to rearrange the data to fit the format properly. It was an important step for training and testing our model, but it wasn't easy and required a lot of attention to detail.
-feature engineering:since it was our first time to do this kind of task,it was difficult to choose which tools are suitable for the task ,but in the end we found a solution to complete the feature engineering task