# **Project Name**    - Bird Species Observation Analysis



##### **Project Type**    - EDA(Python) , Data Visulisation(Power BI)
##### **Contribution**    - Individual
##### **Member Name -**   - Ansh Singh



# **Project Summary -**

The Bird Species Observation Analysis project focuses on studying avian diversity and behavior across two distinct ecosystems — forests and grasslands — using real-world field data. The primary goal is to derive ecological insights into how various environmental, spatial, and temporal factors affect bird species distribution, observation frequency, and conservation status. This project integrates data science techniques such as data cleaning, exploratory data analysis (EDA), and visualization to uncover patterns that are critical for biodiversity conservation and sustainable land management.


---


The dataset comprises multi-sheet Excel files representing different administrative units within forest and grassland habitats. Each sheet contains observational records detailing bird species, location identifiers, time stamps, environmental conditions, and observer information. Key fields include Start_Time, End_Time, Scientific_Name, Sex, Distance, and conservation codes such as PIF_Watchlist_Status and NPSTaxonCode. The data preprocessing phase involved merging multiple sheets, standardizing formats, handling missing values using logical mappings (e.g., inferring Site_Name from Plot_Name), converting time fields into proper data types, and aligning taxonomy codes.


---


Through temporal analysis, the project examined bird activity trends by season, time of day, and year. Spatial analysis identified biodiversity hotspots by grouping observations by Location_Type and Plot_Name. The species analysis section focused on diversity metrics, identification methods (ID_Method), and sex ratio analysis, while environmental condition analysis explored correlations between bird activity and weather variables like temperature, humidity, wind, and disturbance levels. Behavior-based insights included flyover frequencies and observer biases.


---


Visualizations were implemented using Power BI and optionally Streamlit with Plotly, enabling interactive dashboards with filters by species, habitat type, and observation date. These visualizations made it easier to identify conservation-priority areas and threatened species listed on the PIF Watchlist.


---


From a business and ecological standpoint, the project serves use cases such as wildlife conservation, eco-tourism development, sustainable agriculture planning, and biodiversity policy support. The findings offer actionable insights to stakeholders ranging from environmental agencies to land-use planners and conservationists.

# **GitHub Link -**

https://github.com/Ansh3105/Bird-Species-Observation-Analysis


# **Problem Statement**


The project aims to analyze the distribution, diversity, and behavioral patterns of bird species across two distinct ecosystems: forests and grasslands. Bird populations are highly sensitive to environmental changes, and understanding their spatial and temporal presence can offer critical insights into ecosystem health and biodiversity trends.

The observational dataset includes multiple parameters such as species name, habitat type, geographic location, observation time, weather conditions, and identification methods. However, due to the dataset's complexity—spread across multiple sheets, containing missing values, inconsistent formats, and varied observation methods—comprehensive analysis requires significant data cleaning and preprocessing.

The core objective is to extract meaningful ecological and conservation-related insights by identifying:

Patterns in bird activity based on time of day, season, and location

The relationship between environmental conditions (e.g., temperature, humidity, sky) and bird presence

The effect of habitat type on bird diversity and frequency

Conservation priorities by examining at-risk species from watchlists

Observer trends and potential reporting biases

#### **Define Your Business Objective?**

The objective of this project is to analyze bird species observations across forest and grassland ecosystems to identify patterns in biodiversity, habitat preference, and seasonal activity. By examining environmental factors such as temperature, humidity, and sky conditions alongside spatial and temporal distribution data, the project aims to generate actionable insights for wildlife conservation, land management, and policy-making. The findings will help in locating biodiversity hotspots, monitoring at-risk species, and supporting sustainable eco-tourism and agricultural practices, ultimately contributing to informed ecological conservation planning.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

### Dataset Loading

In [None]:
import pandas as pd

# Load both Excel files
forest_file = "Bird_Monitoring_Data_FOREST.XLSX"
grassland_file = "Bird_Monitoring_Data_GRASSLAND.XLSX"

# Read all sheets from each file
forest_data = pd.read_excel(forest_file, sheet_name=None)
grassland_data = pd.read_excel(grassland_file, sheet_name=None)

# Combine forest sheets
forest_df = pd.concat(
    [df.assign(Habitat="Forest", Sheet=sheet)
     for sheet, df in forest_data.items() if not df.dropna(how='all').empty],
    ignore_index=True
)

# Combine grassland sheets
grassland_df = pd.concat(
    [df.assign(Habitat="Grassland", Sheet=sheet)
     for sheet, df in grassland_data.items() if not df.dropna(how='all').empty],
    ignore_index=True
)


# Combine forest and grassland into a single DataFrame
df = pd.concat([forest_df, grassland_df], ignore_index=True)

# making sample of df dataset
df_sample = pd.concat([forest_df, grassland_df], ignore_index=True)



### Dataset First View

In [None]:
# Dataset First Look
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, columns =df.shape
print(f"Total Rows: {rows}")
print(f"Total Columns: {columns}")



### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
print("Total Duplicates:", df.duplicated().sum())


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()


### Visulisation Of Null Values

In [None]:
# Visualizing the missing values
df_sample.isnull().sum().sort_values(ascending=False).plot(kind='bar',figsize=(12,5))

plt.title("Number of Missing Values per Column")
plt.xlabel("Column Names")
plt.ylabel("Missing Count")
plt.tight_layout()
plt.show()


### What did you know about your dataset?

Dataset Overview
Theme → Bird species observation data collected in forest and grassland ecosystems.

Purpose → To study biodiversity patterns, species behavior, environmental influences, and conservation priorities.

Structure → Multiple sheets in Excel, each sheet corresponding to an Administrative Unit (e.g., ANTI, CATO, CHOH, etc.).

Combined into one dataframe (df) in your Colab environment for analysis.



---



Key Columns
Location & Habitat Information

Admin_Unit_Code — Code for the administrative area (e.g., "ANTI").

Sub_Unit_Code — Further classification inside the admin unit.

Site_Name — Name of the specific observation site.

Plot_Name — Unique plot identifier.

Location_Type — Habitat type (Forest / Grassland).

Time & Visit Information

Year, Date — Observation year & date.

Start_Time, End_Time — Observation session times.

Visit — Visit number for that plot.

Bird Identification & Observation

Scientific_Name — Scientific name of the species.

Common_Name — Common bird name.

Sex — Male, Female, Undetermined.

ID_Method — Method used (Singing, Calling, Visualization, etc.).

Interval_Length — Observation time interval (e.g., 0–2.5 min).

Flyover_Observed — Whether bird was seen flying overhead.

Environmental Conditions

Temperature, Humidity — Weather conditions during observation.

Sky, Wind — Sky and wind conditions.

Disturbance — Any disturbance affecting observation.

Conservation Indicators

PIF_Watchlist_Status — National conservation priority.

Regional_Stewardship_Status — Regional conservation priority.

AOU_Code — Standardized bird species code.

Observation Counts

Initial_Three_Min_Cnt — Birds observed in the first 3 minutes.



---


Possible Analyses
Biodiversity Hotspots → Which plots/habitats have the most unique species.

Species Behavior → Activity patterns, sex ratios, flyover trends.

Environmental Influence → How weather, disturbance, and location affect sightings.

Conservation Focus → Which species are at risk and where they are found most.

Observer Bias → Whether some observers dominate data collection.




## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
# Generating Descriptive Statisitcs For Numerical Data In Dataset
df.describe()

### Variables Description

Admin_Unit_Code: The code for the administrative unit (e.g., "ANTI") where the observation was conducted.

Sub_Unit_Code: The sub-unit within the administrative unit for further classification.

Site_Name: The name of the specific observation site within the unit.

Plot_Name: A unique identifier for the specific plot where observations were recorded.

Location_Type: The habitat type of the observation area (e.g., "Forest").

Year: The year in which the observation took place.

Date: The exact date of the observation.

Start_Time: The start time of the observation session.

End_Time: The end time of the observation session.

Observer: The individual who conducted the observation.

Visit: The count of visits made to the same observation site or plot.

Interval_Length: The duration of the observation interval (e.g., "0-2.5 min").

ID_Method: The method used to identify the species (e.g., "Singing," "Calling," "Visualization").

Distance: The distance of the observed species from the observer (e.g., "<= 50 Meters").

Flyover_Observed: Indicates whether the bird was observed flying overhead (TRUE/FALSE).

Sex: The sex of the observed bird (e.g., Male, Female, Undetermined).
Common_Name: The common name of the observed bird species (e.g., "Eastern Towhee").

Scientific_Name: The scientific name of the observed bird species (e.g., Pipilo erythrophthalmus).

AcceptedTSN: The Taxonomic Serial Number for the observed species.

NPSTaxonCode: A unique code assigned to the taxon of the species.

AOU_Code: The American Ornithological Union code for the species.

PIF_Watchlist_Status: Indicates whether the species is on the Partners in Flight Watchlist (e.g., "TRUE" for at-risk species).

Regional_Stewardship_Status: Denotes the conservation priority within the region (TRUE/FALSE).

Temperature: The temperature recorded at the time of observation (in degrees).

Humidity: The humidity percentage recorded at the time of observation.

Sky: The sky condition during the observation (e.g., "Cloudy/Overcast").

Wind: The wind condition (e.g., "Calm (< 1 mph) smoke rises vertically").

Disturbance: Notes any disturbances that could affect the observation (e.g., "No effect on count").

Initial_Three_Min_Cnt: The count of the species observed in the first three minutes of the session.



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# replacing sub_unit_code NA values with "UNKNOWN"
df['Sub_Unit_Code'] = df['Sub_Unit_Code'].fillna("UNKNOWN")

# replacing ID_Column nan value with mode of the column
value = df['ID_Method'].mode()[0]
df['ID_Method'] = df['ID_Method'].fillna(value)

#replacing Sex column nan values with mode of the column
value = df['Sex'].mode()[0]
df['Sex'] = df['Sex'].fillna(value)

#replacing distance column nan values with mode of the column
value = df['Distance'].mode()[0]
df['Distance'] = df['Distance'].fillna(value)

# converting NPSTaxonCode from float to object
df['NPSTaxonCode'] = df['NPSTaxonCode'].astype(str)

# converting AcceptedTSN from float to object
df['AcceptedTSN'] = df['AcceptedTSN'].astype(str)

# replacing na values in NPSTaxonCode column by mapping to Scientific_Name column
mapping = df.dropna(subset=['NPSTaxonCode']).drop_duplicates(subset=['Scientific_Name'])[['Scientific_Name', 'NPSTaxonCode']]
code_map = dict(zip(mapping['Scientific_Name'], mapping['NPSTaxonCode']))
df['NPSTaxonCode'] = df.apply(
    lambda row: code_map.get(row['Scientific_Name'], row['NPSTaxonCode']),
    axis=1)

# replacing na values in AcceptedTSN column by mapping to Scientific_Name column
mapping = df.dropna(subset=['AcceptedTSN']).drop_duplicates(subset=['Scientific_Name'])[['Scientific_Name', 'AcceptedTSN']]
code_map = dict(zip(mapping['Scientific_Name'], mapping['AcceptedTSN']))
df['AcceptedTSN'] = df.apply(
    lambda row: code_map.get(row['Scientific_Name'], row['AcceptedTSN']),
    axis=1)

# replacing na values in Site_Name column by mapping to Scientific_Name column
mapping = df.dropna(subset=['Site_Name']).drop_duplicates(subset=['Scientific_Name'])[['Scientific_Name', 'Site_Name']]
code_map = dict(zip(mapping['Scientific_Name'], mapping['Site_Name']))
df['Site_Name'] = df.apply(
    lambda row: code_map.get(row['Scientific_Name'], row['Site_Name']),
    axis=1)


# replacing na values in Site_Name column by mapping to Plot_Name column
mapping = df.dropna(subset=['Site_Name']).drop_duplicates(subset=['Plot_Name'])[['Plot_Name', 'Site_Name']]
code_map = dict(zip(mapping['Plot_Name'], mapping['Site_Name']))
df['Site_Name'] = df.apply(
    lambda row: code_map.get(row['Plot_Name'], row['Site_Name']),
    axis=1)
















### REMOVING DUPLICATES

In [None]:
df= df.drop_duplicates()

### Handling Outliers for temperature and humidity column


In [None]:
cols = ['Temperature', 'Humidity']
for col in cols:
    plt.figure(figsize=(6, 3))
    sns.boxplot(x=df[col])
    plt.title(f'Boxplot for {col}')
    plt.show()

# Calculate Q1, Q3, and IQR
Q1 = df[cols].quantile(0.25)
Q3 = df[cols].quantile(0.75)
IQR = Q3 - Q1

# Filter the DataFrame
df = df[~((df[cols] < (Q1 - 1.5 * IQR)) | (df[cols] > (Q3 + 1.5 * IQR))).any(axis=1)]

print(f"After removing outliers: {df.shape} ")

### Dropping Columns


In [None]:
df.drop(columns='Sheet',inplace=True,axis=1)

### What all manipulations have you done and insights you found?

Missing values in the Sub_Unit_Code column were replaced with the string
"UNKNOWN" to ensure consistent categorical representation for downstream analysis.


---


The ID_Method column's missing values were filled using the mode of the column, ensuring that the most common identification method was used for imputation.


---


The Sex column's missing values were also filled using the mode, helping maintain logical consistency in gender-based analysis of observed species.


---


Missing values in the Distance column were filled using the most frequent distance range (mode), standardizing the observation distance metric.


---


The NPSTaxonCode and AcceptedTSN columns, which originally contained float-type values, were converted to string (object) data types to preserve their identity as categorical codes and avoid unintended numerical interpretations.


---


Missing values in the NPSTaxonCode column were filled by creating a mapping between Scientific_Name and NPSTaxonCode, based on rows where the NPSTaxonCode was already present. This mapping was then used to impute missing codes wherever a matching scientific name existed.


---


Similarly, missing values in the AcceptedTSN column were filled using a mapping from Scientific_Name to AcceptedTSN, ensuring taxonomic consistency based on species identification.


---


The Site_Name column was first filled using a mapping from Scientific_Name, leveraging known relationships between species and their recorded sites to infer missing values.


---


A second mapping for the Site_Name column was created using Plot_Name, where available Site_Name values were mapped to their corresponding plots. This mapping was applied to fill any remaining missing entries in Site_Name.


---

Outliers for the temperature and humidity column were calculated and plotted using box plot then removed


---
Finally, the Sheet column, which was added during the merging of multiple Excel sheets, was dropped from the DataFrame as it was no longer needed after consolidation.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

### TEMPORAL ANALYSIS



#### Chart - 1 SEASONAL TRENDS

In [None]:
# Convert Date column to datetime
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

# Extract Year and Month
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month

# Define seasons (Northern Hemisphere)
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Autumn'

df['Season'] = df['Month'].apply(get_season)

# Count sightings by season
seasonal_counts = df.groupby(['Year', 'Season']).size().reset_index(name='Sightings')

plt.figure(figsize=(8,5))
sns.countplot(x='Season', data=df, order=['Winter', 'Spring', 'Summer', 'Autumn'])
plt.title('Bird Sightings by Season')
plt.show()

##### 1. Why did you pick the specific chart?

Shows how bird sightings vary across Winter, Spring, Summer, Autumn.

Helps identify peak activity seasons and low-activity periods.

##### 2. What is/are the insight(s) found from the chart?

If Spring and Summer have more sightings → could mean migratory species are arriving.

If Winter has lower sightings → might indicate birds move to warmer locations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Clear pattern recognition for non-technical audiences.

Supports resource planning for bird monitoring teams.

#### Chart - 2 sightings per week days

In [None]:
df['DayOfWeek'] = df['Date'].dt.day_name()


plt.figure(figsize=(8,5))
sns.countplot(x='DayOfWeek', data=df, order=['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'], color='skyblue')
plt.title('Bird Sightings by Day of Week')
plt.show()


##### 1. Why did you pick the specific chart?

It helps identify patterns or trends in bird observations based on the day of the week.



##### 2. What is/are the insight(s) found from the chart?



Useful to see if certain days consistently record higher or lower biodiversity — could be due to weather, human activity, or survey scheduling.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

lear visual — easy to interpret even for non-technical audiences.

Identifies survey bias — shows if sampling is uneven across the week.

Actionable — if certain days are low in data, future monitoring can be planned to balance it.

#### Chart - 3 Total Bird Count by Location type

In [None]:
# Group data by Location_Type and count observations
location_insights = df.groupby("Location_Type").size().reset_index(name="Observation_Count")

# Sort by highest observations (to find biodiversity hotspots)
location_insights = location_insights.sort_values(by="Observation_Count", ascending=False)


plt.figure(figsize=(8,5))
plt.bar(location_insights["Location_Type"], location_insights["Observation_Count"], color='green')
plt.xticks(rotation=45)
plt.xlabel("Location Type")
plt.ylabel("Number of Observations")
plt.title("Biodiversity Hotspots by Location Type")
plt.show()



##### 1. Why did you pick the specific chart?

lear biodiversity comparison: Grouping by Location_Type lets you see which habitats (e.g., Grassland, Wetland, Forest) have higher species counts

##### 2. What is/are the insight(s) found from the chart?

Biodiversity hotspots: Identify habitats with the highest species diversity or abundance.

Habitat specialization: Spot species that appear only in specific locations (e.g., certain birds only in wetlands).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Simplifies large datasets — turns hundreds/thousands of observations into a clear picture.

Action-oriented — you can prioritize habitats for conservation funding or field visits.

Flexible — works with both species count and individual bird sightings.

### SPATIAL ANALYSIS

#### Chart - 4 % of unique bird count by location type

In [None]:
# Group data by Location_Type and count unique species
location_species = df.groupby("Location_Type")["Scientific_Name"].nunique().reset_index(name="Unique_Species_Count")

# Sort by highest unique species
location_species = location_species.sort_values(by="Unique_Species_Count", ascending=False)

# Plot as pie chart
plt.figure(figsize=(7, 7))
plt.pie(
    location_species["Unique_Species_Count"],
    labels=location_species["Location_Type"],
    autopct='%1.1f%%',
    startangle=90
)
plt.title("Biodiversity Hotspots by Location Type (Unique Species)")
plt.axis('equal')  # Equal aspect ratio ensures the pie is a circle
plt.show()


##### 1. Why did you pick the specific chart?

A pie chart is effective here because we are comparing proportions of unique species among different location types. It visually communicates how biodiversity is distributed without requiring the viewer to read exact numbers.

##### 2. What is/are the insight(s) found from the chart?

The dominance of one or two location types in the chart can indicate priority conservation zones.

##### 3. Will the gained insights help creating a positive business impact?
easy to interpret at a glance, even for non-technical audiences.

Highlights relative importance of each location type for biodiversity.

#### Chart - 5   top 10 plot names by unique species count

In [None]:
# Group and get unique species count per plot
plot_species_count = (
    df.groupby('Plot_Name')['Scientific_Name']
      .nunique()
      .reset_index(name='Unique_Species_Count')
)

# Get top 10 plots
top_n = 10
plot_species_count = plot_species_count.sort_values(
    by='Unique_Species_Count', ascending=False
).head(top_n)

# Plot as horizontal bar chart
plt.figure(figsize=(10, 6))
sns.barplot(
    data=plot_species_count,
    y='Plot_Name',
    x='Unique_Species_Count',
    palette='viridis'
)
plt.title(f'Top {top_n} Plots by Unique Bird Species', fontsize=16)
plt.xlabel('Unique Species Count')
plt.ylabel('Plot Name')



##### 1. Why did you pick the specific chart?

Simplifies large data → Instead of showing all plots (which creates clutter), it focuses only on the top 10 with the most species.

Horizontal orientation → Makes long plot names readable without overlapping text.

##### 2. What is/are the insight(s) found from the chart?

Identify biodiversity hotspots → You can instantly see which plots have the highest bird diversity.

Guide conservation priorities → Plots at the top may be priority zones for habitat protection.

Ecotourism & research focus → Areas with higher diversity could be targeted for bird-watching activities or further ecological study.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Focus on what matters → Only shows the most significant plots, avoiding noise.

Readability → Names are clear, counts are easy to interpret.

Comparability → Side-by-side bar lengths make differences obvious at a glance.

### SPECIES ANALYSIS

#### Chart - 6 Activity Pattern

In [None]:
# Count observations by Interval_Length and ID_Method
activity_patterns = (
    df.groupby(['Interval_Length', 'ID_Method'])
      .size()
      .reset_index(name='Observation_Count')
)

# Sort for plotting
activity_patterns = activity_patterns.sort_values('Observation_Count', ascending=False)

# Plot heatmap for visual clarity
activity_pivot = activity_patterns.pivot(index='ID_Method', columns='Interval_Length', values='Observation_Count')

plt.figure(figsize=(10, 6))
sns.heatmap(activity_pivot, annot=True, fmt='.0f', cmap='YlGnBu')
plt.title('Bird Activity Patterns by Interval Length and ID Method', fontsize=16)
plt.xlabel('Interval Length')
plt.ylabel('ID Method')
plt.show()


##### 1. Why did you pick the specific chart?

Makes it easy to see both the ID method and observation interval in one view.

Color intensity quickly tells you which combinations are most common.



##### 2. What is/are the insight(s) found from the chart?

Most common ID method — e.g., if “Singing” has the darkest color, birds are most often identified by sound.

Best observation interval — e.g., if 0–2.5 minutes yields more sightings, shorter intervals may be more effective.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Dual-Dimension Insight

Unlike a plain bar chart, this shows two factors together (time interval and identification method), which helps spot patterns that wouldn’t be visible in a one-dimensional view.

Quick Visual Impact

The heatmap color intensity lets you see the “hot spots” instantly without reading numbers — ideal for presentations or quick decision-making.

#### Chart - 7 SEX Ratio

In [None]:
# Filter only Male and Female
sex_df = df[df['Sex'].isin(['Male', 'Female'])]

# Count observations
sex_counts = (
    sex_df.groupby(['Scientific_Name', 'Sex'])
          .size()
          .reset_index(name='Count')
)

# Get top 10 species by total count
top_species = (
    sex_counts.groupby('Scientific_Name')['Count']
              .sum()
              .nlargest(10)
              .index
)
sex_counts_top = sex_counts[sex_counts['Scientific_Name'].isin(top_species)]

# Plot stacked bar chart
plt.figure(figsize=(10, 6))
sns.barplot(
    data=sex_counts_top,
    x='Scientific_Name',
    y='Count',
    hue='Sex',
    palette='coolwarm'
)
plt.xticks(rotation=45, ha='right')
plt.title('Male vs Female Counts (Top 10 Species)', fontsize=16)
plt.xlabel('Species (Scientific Name)')
plt.ylabel('Count')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Clear comparison → Shows male vs. female counts side-by-side for each species.

No math needed → Viewers instantly see which sex dominates for each species without calculating ratios.

Reduced clutter → Only the top 10 most observed species are shown, making it easier to interpret.

##### 2. What is/are the insight(s) found from the chart?

Sex dominance patterns

If males dominate in most species, it may be due to behavioral factors (e.g., males sing more).

Species-specific trends

Some species might have balanced counts, others heavily skewed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Easy to read — Works for all audiences, even non-technical ones.

Compact — Focuses on the most relevant species, avoiding visual overload.

Visually intuitive — Colors clearly separate male vs. female observations.

ENVIRONMENTAL ANALYSIS

#### Chart - 8  Temperature vs. Number of Birds

In [None]:
weather_df = df[['Temperature', 'Humidity', 'Sky', 'Wind', 'Scientific_Name', 'Distance']].copy()

# Count observations per weather condition
obs_weather = (
    df.groupby(['Sky', 'Wind'])
      .size()
      .reset_index(name='Observation_Count')
)

# 1. Temperature vs. Number of Birds
temp_counts = df.groupby('Temperature')['Scientific_Name'].count().reset_index(name='Observation_Count')

plt.figure(figsize=(10, 5))
sns.scatterplot(data=temp_counts, x='Temperature', y='Observation_Count')
plt.title('Temperature vs Bird Observations')
plt.xlabel('Temperature (°C)')
plt.ylabel('Number of Observations')
plt.tight_layout()
plt.show()

##### Chart-9 Humidity vs. Number of Birds

In [None]:

# 2. Humidity vs. Number of Birds
humidity_counts = df.groupby('Humidity')['Scientific_Name'].count().reset_index(name='Observation_Count')

plt.figure(figsize=(10, 5))
sns.scatterplot(data=humidity_counts, x='Humidity', y='Observation_Count')
plt.title('Humidity vs Bird Observations')
plt.xlabel('Humidity (%)')
plt.ylabel('Number of Observations')
plt.tight_layout()
plt.show()


#### Chart -10 Sky Condition

In [None]:
# 3. Sky Condition
plt.figure(figsize=(8, 5))
sns.barplot(data=obs_weather.groupby('Sky')['Observation_Count'].sum().reset_index(),
            x='Sky', y='Observation_Count', palette='viridis')
plt.title('Bird Observations by Sky Condition')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

#### Chart -11  Wind Condition

In [None]:

# 4. Wind Condition
plt.figure(figsize=(8, 5))
sns.barplot(data=obs_weather.groupby('Wind')['Observation_Count'].sum().reset_index(),
            x='Wind', y='Observation_Count', palette='magma')
plt.title('Bird Observations by Wind Condition')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Splits analysis into four separate plots, one for each weather factor → makes interpretation easier.

Uses scatterplots for numeric variables (Temperature, Humidity) to show possible trends.

Uses bar charts for categorical variables (Sky, Wind) to compare observation counts.



##### 2. What is/are the insight(s) found from the chart?

Temperature effect → See if birds are more active in mild, hot, or cold conditions.

Humidity effect → Identify if humidity influences bird activity (e.g., more sightings in dry/wet conditions).

Sky condition trends → Certain birds may be more visible in sunny vs. cloudy weather.

Wind condition trends → Strong winds might reduce sightings or affect flight behavior.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Clear separation → Each weather variable is visualized individually for simplicity.

Supports action → Findings can guide best survey times for maximum bird sightings.

Easy to extend → Can later add trend lines, correlations, or split by habitat (Forest vs. Grassland).

Quick detection of patterns → Helps spot environmental preferences of birds.



#### Chart - 12 Observation by disturbance level

In [None]:
# Count number of observations per disturbance category
disturbance_counts = (
    df.groupby('Disturbance')['Scientific_Name']
      .count()
      .reset_index(name='Observation_Count')
      .sort_values(by='Observation_Count', ascending=False)
)

# Plot
plt.figure(figsize=(8, 5))
sns.barplot(
    data=disturbance_counts,
    x='Disturbance',
    y='Observation_Count',
    palette='viridis'
)
plt.title('Bird Observations by Disturbance Level', fontsize=16)
plt.xlabel('Disturbance Type')
plt.ylabel('Number of Observations')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Bar chart works well for categorical variables like Disturbance.

Shows which disturbance levels (e.g., No effect, Slight effect, Major effect) have the highest or lowest sightings.

##### 2. What is/are the insight(s) found from the chart?

Tolerance to disturbance — If sightings remain high under slight disturbance, birds may be more adaptable.

Sensitive conditions — Low counts in high disturbance zones could indicate species sensitivity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Very readable — Even for non-technical stakeholders.

Direct link to conservation action — Can guide where disturbance control is most needed.

Quick comparison — Instantly see how observation counts change with disturbance levels.

### Distance and Behaviour Analysis

#### Chart - 13 flyover obeservations

In [None]:
# Count how many times flyovers were observed vs not observed
flyover_counts = (
    df['Flyover_Observed']
    .value_counts()
    .reset_index()
)
flyover_counts.columns = ['Flyover_Observed', 'Count']

# Plot
plt.figure(figsize=(6, 4))
sns.barplot(
    data=flyover_counts,
    x='Flyover_Observed',
    y='Count',
    palette='coolwarm'
)
plt.title('Flyover Observations', fontsize=16)
plt.xlabel('Flyover Observed')
plt.ylabel('Number of Observations')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Binary variable → a simple bar chart clearly shows the proportion of observations with and without flyovers.

Helps quickly assess how common flyovers are in the dataset.

##### 2. What is/are the insight(s) found from the chart?

Flyover frequency — See what percentage of total sightings involved birds flying overhead.

Possible species behavior — High flyover rates could suggest migratory movements or birds feeding in flight.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Minimal complexity — Very easy for anyone to understand.

Quick behavioral indicator — Shows prevalence of in-flight observations.

Foundation for deeper analysis — Can later be broken down by habitat, time of day, or species.

### Observer Trends Analysis

#### Chart - 14 top 3 observers

In [None]:
# Count observations per observer
observer_counts = (
    df.groupby('Observer')['Scientific_Name']
      .count()
      .reset_index(name='Observation_Count')
      .sort_values(by='Observation_Count', ascending=False)
)

# Take top 10 observers for clarity
top_n = 3
observer_counts_top = observer_counts.head(top_n)

# Plot
plt.figure(figsize=(10, 6))
sns.barplot(
    data=observer_counts_top,
    x='Observation_Count',
    y='Observer',
    palette='viridis'
)
plt.title(f'Top {top_n} Observers by Number of Observations', fontsize=16)
plt.xlabel('Number of Observations')
plt.ylabel('Observer')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Shows which observers contribute the most sightings in the dataset.

Focuses on top 3 to avoid overcrowding.

##### 2. What is/are the insight(s) found from the chart?

Uneven contribution — If a few observers record most sightings, results may be skewed toward their habits, skill levels, or favorite locations.

Possible skill differences — More experienced observers might spot/identify more birds.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Very clear & readable — Good for both technical and non-technical audiences.

Identifies data collection imbalance — Helps plan future surveys to distribute observation effort more evenly.

Can combine with species counts — You can extend this to show which species each observer records most often.

### Conservation Insights Analysis

#### Chart - 15 Conservation analysis

In [None]:
# Count how many observations fall into each watchlist & stewardship category
watchlist_counts = (
    df.groupby(['PIF_Watchlist_Status', 'Regional_Stewardship_Status'])['Scientific_Name']
      .count()
      .reset_index(name='Observation_Count')
)

# Plot as grouped bar chart
plt.figure(figsize=(8, 5))
sns.barplot(
    data=watchlist_counts,
    x='PIF_Watchlist_Status',
    y='Observation_Count',
    hue='Regional_Stewardship_Status',
    palette='viridis'
)
plt.title('Watchlist & Stewardship Status Trends', fontsize=16)
plt.xlabel('PIF Watchlist Status')
plt.ylabel('Number of Observations')
plt.show()


##### 1. Why did you pick the specific chart?

Grouped bar chart makes it easy to compare species on the watchlist against their regional stewardship status.

Shows conservation priority at both national (watchlist) and regional levels in one visual.

##### 2. What is/are the insight(s) found from the chart?

Overlap of priorities — See how many watchlist species are also regionally important.

Focus areas — If many species have TRUE for both statuses, these should be conservation priorities.

Monitoring needs — Low counts might indicate rare or elusive species that need special tracking.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Dual insight — Combines two conservation indicators in one chart.

Clear visual comparison — Bar lengths make it easy to spot where most observations fall.

Direct conservation link — Immediately useful for wildlife managers and policymakers.

#### Chart - 16 top 15 AOU codes


In [None]:
# Count observations per AOU_Code
aou_counts = (
    df.groupby('AOU_Code')['Scientific_Name']
      .count()
      .reset_index(name='Observation_Count')
      .sort_values(by='Observation_Count', ascending=False)
)

# Show only top 15 codes for clarity
top_n = 15
aou_counts_top = aou_counts.head(top_n)

# Plot
plt.figure(figsize=(10, 6))
sns.barplot(
    data=aou_counts_top,
    x='AOU_Code',
    y='Observation_Count',
    palette='viridis'
)
plt.title(f'Top {top_n} AOU Codes by Number of Observations', fontsize=16)
plt.xlabel('AOU Code')
plt.ylabel('Number of Observations')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Bar chart makes it easy to see which AOU codes (species) have the most sightings.

Top-15 filtering avoids clutter and keeps focus on the most important codes.

AOU codes are standardized, so they can be linked with national/regional conservation lists.

##### 2. What is/are the insight(s) found from the chart?

Most common species by AOU code — Shows which species dominate the observations.

Priority monitoring species — Can cross-check top codes with official conservation priority lists.

Possible survey bias — If a few AOU codes dominate, observers might be focusing on specific bird types.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Clear & focused — Avoids showing hundreds of codes at once.

Standardized reference — AOU codes can be matched with official species data for deeper analysis.

Actionable for conservation — Easy to see which codes/species should be compared with watchlist status.

## **5. Solution to Business Objective**

1. Wildlife Conservation
EDA Output:

Biodiversity hotspot maps from Plot-Level and Location-Level analysis.

Watchlist & stewardship charts showing species at risk.

Solution:

Direct conservation efforts toward high-diversity plots with endangered species.

Prioritize habitat protection in areas with rare or vulnerable species.


---



2. Land Management
EDA Output:

Habitat split (Forest vs Grassland) biodiversity metrics.

Disturbance impact analysis.

Solution:

Manage human activity to reduce disturbance in sensitive plots.

Focus restoration projects in low-diversity or high-disturbance areas.


---


3. Eco-Tourism
EDA Output:

Plots with the highest diversity and unique species counts.

Seasonal patterns in bird sightings.

Solution:

Develop bird-watching trails in high-diversity zones.

Promote peak observation periods for tourism events.


---


4. Sustainable Agriculture
EDA Output:

Species distribution across grasslands.

Environmental factor correlation (Temperature, Humidity, Sky, Wind).

Solution:

Guide farmers on wildlife-friendly practices during critical breeding/migration seasons.

Preserve hedgerows and water sources in agricultural grasslands.


---


5. Policy Support
EDA Output:

Watchlist and AOU code-based priority species reports.

Observer bias detection for data reliability.

Solution:

Provide policymakers with validated lists of at-risk species.

Recommend protected area expansions or stricter disturbance regulations.


---


6. Biodiversity Monitoring
EDA Output:

Temporal trend analysis (yearly/seasonal changes).

Flyover frequency as a migratory indicator.

Solution:

Create ongoing monitoring programs focused on identified indicator species.

Use trends to measure the effectiveness of conservation policies.



# **Conclusion**

The analysis of bird species observations across forest and grassland ecosystems reveals clear patterns in biodiversity, species behavior, and environmental influences. Plot-level and location-type diversity metrics identified specific biodiversity hotspots that can serve as priority areas for conservation. Environmental factor correlations highlighted how temperature, humidity, sky, and wind conditions affect bird activity, while disturbance analysis showed that certain species are highly sensitive to human impact.


---


Behavioral insights from ID methods, interval lengths, sex ratios, and flyover frequencies provided a deeper understanding of species’ habits and detectability. Conservation-focused analyses using PIF Watchlist Status, Regional Stewardship Status, and AOU Code patterns pinpointed species of national and regional concern, aligning ecological priorities with policy needs. Observer bias checks ensured data reliability by identifying uneven contributions among surveyors.


---


These findings support multiple business objectives — from guiding wildlife conservation strategies and optimizing land management to enhancing eco-tourism opportunities and informing biodiversity policies. The dataset, when combined with targeted visual analytics, offers actionable, evidence-based insights that can directly contribute to habitat protection, species preservation, and sustainable ecosystem management.