# Smart Agricultural Bird Pest Control: Utilizing Computer Vision for Pest Management

## Abstract
This project focuses on developing a smart bird detection and repellent system aimed at protecting agricultural fields from bird-related crop damage. By leveraging computer vision and sound-based repellent mechanisms, the system will detect birds and trigger distress calls or ultrasonic sounds to scare them away. The solution is designed to be cost-effective, environmentally friendly, and sustainable, reducing the reliance on harmful methods like chemicals or physical barriers. The project begins by collecting a large dataset of bird images using web scraping techniques, which will be used to train the detection model. The project will then evolve to involve the integration of IoT systems for real-time monitoring and automated responses using Raspberry Pi.

## Table of Contents
1. [Overview](#overview)
2. [Data Sources](#data-sources)
3. [Importing Libraries](#importing-libraries)
4. [Data Collection](#data-collection)
5. [Exploratory Data Analysis (EDA)](#exploratory-data-analysis-eda)
6. [Data Preprocessing](#data-preprocessing)
7. [Model Development](#model-development)
8. [Results and Evaluation](#results-and-evaluation)
9. [Conclusion and Next Steps](#conclusion-and-next-steps)
10. [References](#references)

## Overview

### Background Information

Birds are an integral part of ecosystems worldwide, contributing to pollination, seed dispersal, and pest control. However, certain species can become pests, especially in agricultural settings. In Kenya and other regions of Africa, the invasion of specific bird species, such as the red-billed quelea, poses a significant threat to crop yields. The 2023 invasion of quelea birds in Kenya illustrates the severity of this issue, with farmers facing potential losses of up to 60 tonnes of grain. This problem is worsened by ongoing drought conditions in the Horn of Africa, which have driven these birds to invade cultivated fields in search of food.

The Food and Agricultural Organization (FAO) estimates that such crop losses amount to $50 million annually across sub-Saharan Africa, affecting food security and livelihoods. In response to this crisis, governments have often resorted to using toxic pesticides like fenthion for eradication. While these chemicals aim to protect crops, they also pose serious environmental and health risks, particularly to non-target wildlife and endangered species. The rapid reproductive capacity of birds, along with their significant daily grain consumption, highlights the urgency of finding sustainable and effective solutions to this persistent agricultural challenge.

### Problem Statement
Bird-related crop damage significantly impacts agricultural productivity and food security in Kenya and beyond. The reliance on harmful pesticides to control avian pests raises environmental and health concerns, necessitating a more sustainable approach. This project seeks to develop a smart bird detection and repellent system that leverages computer vision and sound-based mechanisms to provide an eco-friendly alternative to pesticides.

### Proposed Solution
The proposed solution involves creating an intelligent bird detection system that will identify pest birds and trigger distress calls or ultrasonic sounds to deter them from agricultural fields. By integrating computer vision technology with sound-based repellent mechanisms, this system aims to protect crops effectively while minimizing ecological disruption. The initial phase will involve collecting a comprehensive dataset of bird images through web scraping to train the detection model. Future developments will incorporate IoT systems for real-time monitoring and automated responses, empowering farmers to safeguard their crops sustainably.

### Objectives
- Develop a comprehensive dataset of bird images to train the detection model.
- Create a computer vision-based bird detection system that accurately identifies avian pests in agricultural settings.
- Integrate sound-based deterrents to provide a non-invasive repellent mechanism.
- Implement IoT solutions for real-time monitoring and automated responses to bird invasions.

The following sections will detail the data sources, methods, and results of this project, highlighting the significance of technology in modern pest control and its potential impact on agricultural sustainability.

## Data Sources

The projectutilised data from various sources which was collected using various methods such as web scraping.

[African Bird Club](https://www.africanbirdclub.org/)

[HuggingFace](https://huggingface.co/datasets/yashikota/birds-525-species-image-classification)

## Importing Libraries

Below are all the libraries that shall be used for this project.

In [1]:
from bs4 import BeautifulSoup
import requests
import time

import pandas as pd

import shutil
import os

from datasets import load_dataset, concatenate_datasets

## Data Collection

### Web Scraping
#### 1. African Bird Club
The first data collected was from [African Bird Club](https://www.africanbirdclub.org/).

The first thing I did was set the links to the [website](https://www.africanbirdclub.org/) that we would be accessing information from. I separated the two links to keep the code clean and flexible since I shall be collecting information from different pages. 
- The `base_url` holds the main address of the site and serves as the foundation for any additional links on the site
- `species_info_url` contains the path to the page with the kkenyan bird species information. 

This approach makes it easier to update or reuse parts of the URL later on if needed.

In [2]:
# Set the links to the pages to retrieve information from
abc_base_url = 'https://www.africanbirdclub.org'
abc_species_url = f'{abc_base_url}/afbid/search/category/-/-/-'

When scraping data from websites, it’s important to identify your request as coming from a legitimate source, such as a web browser rather than a bot or scraper, so as to allow one to retrieve the necessary data smoothly. Websites can block requests if they think the traffic isn’t from a real user. To avoid this, we need to set headers that make the requests look like they're coming from a browser. 

I set the headers to include a `User-Agent` (more information about `User-Agent` in [User-Agent Documentation](https://pypi.org/project/user-agents/)), which is a string used by web browsers to identify themselves when making requests to websites. By using this, we ensure that the website allows our code to access the information without blocking it. In this case, I used a User-Agent that mimics a standard Chrome browser on Windows.

To understand more about why headers are important you can read [this article.](https://www.zenrows.com/blog/python-requests-user-agent#what-is)

In [3]:
# Set headers with a User-Agent to allow access
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36'
}

After setting the header, I sent a request using the `requests.get` method, which sends a GET request to the `species_info_url`, containing the specific endpoint for the bird species information. I also include the headers  defined earlier to ensure the request is accepted. You can find more details about the requests.get method in the [Requests Documentation](https://pypi.org/project/requests/).

After sending the request, I checked the status code of the response to determine if the request was successful. A status code of 200 indicates that the request was successful and the page has been successfully retrieved. If a different status code is recieved, a message to inform of the failure is printed, including the specific status code received. This helps in easier identifying and troubleshooting any issues with the request.

If the request is successful, the HTML content of the response is parsed using BeautifulSoup. BeautifulSoup is a powerful library for parsing HTML and XML documents, as detailed in the [documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/). By passing the content of the response and specifying the parser ('html.parser'), a soup object is created, which allows for easy navigation and extraction of data from the HTML structure of the page.

This step is crucial, as it lays the foundation for extracting the necessary information about the bird species from the website.

In [4]:
# Scrape the species list page
response_info = requests.get(abc_species_url, headers=headers)
if response_info.status_code != 200:
    print(f"Failed to retrieve species list page, status code: {response_info.status_code}")
else:
    soup_info = BeautifulSoup(response_info.content, 'html.parser')

In the next step, the HTML structure of the page is assessed. It can be noted that the structure is such that the bird species are arranged in a list contained in the class`panel-inner`. 

To know the number of species on the web page, whose images we will be using:

In [5]:
# Find all the species elements
species_elements = soup_info.find_all("li")

# Count the number of species
species_count = len(species_elements)

print(f"Number of species: {species_count}")

Number of species: 2393


In [6]:
# Find the div and ul containing the species list
species_div = soup_info.find('div', class_='panel-inner')  # Find the div with class "panel-inner"
species_list = species_div.find('ul', class_='type')  # Find the ul with class "type"

In [7]:
 # Find all species items (li elements)
species_items = species_list.find_all('li')

# Extract all species names and common names
abc_birds = []  # List to store tuples of (img_link, species_name, common_name)

for species_item in species_items:
    # Get the species info
    img_link = species_item.find('a')['href']  # Get the link from the <a> tag
    scientific_name = species_item.find('h5').text.strip()  # Get the species name
    common_name = species_item.find('span').text.strip()  # Get the common name
    
    # Store the species and common names in the list
    abc_birds.append((img_link, scientific_name, common_name))

In [8]:
# Print the total number of species extracted
print(f"Total number of species extracted: {len(abc_birds)}")

# Print the first 5 species as a sample
print("Sample of extracted species (Image Link, Scientific Name, Common Name):")
for bird in abc_birds[:5]:  # Limit to the first 5 species
    print(bird)

Total number of species extracted: 2369
Sample of extracted species (Image Link, Scientific Name, Common Name):
('/afbid/search/browse/species/290', 'Accipiter henstii (4)', 'Henst’s Goshawk')
('/afbid/search/browse/species/285', 'Accipiter madagascariensis (5)', 'Madagascar Sparrowhawk')
('/afbid/search/browse/species/286', 'Accipiter nisus (6)', 'Eurasian Sparrowhawk')
('/afbid/search/browse/species/284', 'Accipiter ovampensis (8)', 'Ovambo Sparrowhawk')
('/afbid/search/browse/species/287', 'Accipiter rufiventris (4)', 'Rufous-breasted Sparrowhawk')


In [9]:
species_info_df = pd.DataFrame(abc_birds, columns=['Image_Link', 'Scientific_Name', 'Common_Name'])
species_info_df.head()

Unnamed: 0,Image_Link,Scientific_Name,Common_Name
0,/afbid/search/browse/species/290,Accipiter henstii (4),Henst’s Goshawk
1,/afbid/search/browse/species/285,Accipiter madagascariensis (5),Madagascar Sparrowhawk
2,/afbid/search/browse/species/286,Accipiter nisus (6),Eurasian Sparrowhawk
3,/afbid/search/browse/species/284,Accipiter ovampensis (8),Ovambo Sparrowhawk
4,/afbid/search/browse/species/287,Accipiter rufiventris (4),Rufous-breasted Sparrowhawk


In [10]:
species_images = []

for _, row in species_info_df.iterrows():
    # Construct the full URL for the species page
    species_img_url = f"{abc_base_url}{row['Image_Link']}"
    common_name = row['Common_Name']
    
    # Fetch the species page
    response = requests.get(species_img_url, headers=headers, timeout=15)
    if response.status_code != 200:
        print(f"Failed to retrieve species page for {common_name}, status code: {response.status_code}")
        continue  # Skip to the next species
    
    # Parse the species page
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Find the image list
    image_list = soup.find('ul', class_='row image-list')
    if not image_list:
        continue  # Skip if no images are found
    
    # Extract image URLs
    images = image_list.find_all('img')
    for img in images:
        img_src = img.attrs['src']  # Extract the `src` attribute
        
        # Append the data to the results list
        species_images.append({
            "Common_Name": common_name,
            "Image": img_src
        })

In [11]:
# Convert the results to a DataFrame
abc_birds_df = pd.DataFrame(species_images)

# Display the first few rows
abc_birds_df.head()

Unnamed: 0,Common_Name,Image
0,Henst’s Goshawk,/afbid/public/imgdata/photos/626/6261419077214...
1,Henst’s Goshawk,/afbid/public/imgdata/photos/451/4511314128952...
2,Henst’s Goshawk,/afbid/public/imgdata/photos/260/2601178351348...
3,Henst’s Goshawk,/afbid/public/imgdata/photos/23/231122943023.jpg
4,Madagascar Sparrowhawk,/afbid/public/imgdata/photos/1937/193716001472...


In [12]:
abc_birds_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26011 entries, 0 to 26010
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Common_Name  26011 non-null  object
 1   Image        26011 non-null  object
dtypes: object(2)
memory usage: 406.5+ KB


In [13]:
from tqdm import tqdm

# Function to clean species names (to be used as folder names)
def clean_species_name(name):
    return name.replace(" ", "_").replace("'", "").replace("-", "_")

# Function to download images in batches
def download_images_in_batches(df, start_index=0, end_index=None, batch_size=20, output_dir="ABC_Bird_Images"):
    # Create base directory for storing images
    os.makedirs(output_dir, exist_ok=True)

    # Set the end_index to the last index of the dataframe
    if end_index is None:
        end_index = len(df)

    # Loop over the specified range of species in batches
    for i in tqdm(range(start_index, end_index, batch_size), desc="Downloading Batches", unit="batch"):
        # Process each batch
        batch_df = df.iloc[i:i+batch_size]
        
        for _, row in batch_df.iterrows():
            species_name = clean_species_name(row["Common_Name"])  # Clean the species name for folder
            image_url = row["Image"]  # URL to the image
            
            # Create species folder if it does not exist
            species_folder = os.path.join(output_dir, species_name)
            os.makedirs(species_folder, exist_ok=True)

            # Downloading the images
            response = requests.get(image_url, stream=True, timeout=10)
            if response.status_code == 200:
                image_filename = f"{species_name}_{i}.jpg"  # Use index in the filename
                image_path = os.path.join(species_folder, image_filename)

                with open(image_path, "wb") as file:
                    shutil.copyfileobj(response.raw, file)
                print(f"Downloaded: {image_filename}")
            else:
                print(f"Failed to download {image_url}: {response.status_code}")
            
        # Add a slight delay to avoid overwhelming the server
        time.sleep(1)

    print("Image download complete!")