# **Project Name**    -Bird Species Data Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** -Siddesh Keshav Vaishnav


# **Project Summary -**

This project focuses on analyzing and comparing bird species observations between two distinct habitats — Forest and Grassland — using structured field observation data. The primary aim was to identify differences in biodiversity, environmental conditions, and species distribution, and to visualize patterns that could support ecological insights and conservation decisions.

##### **Data Preparation:**
The dataset combined observation records from both habitats, each including variables such as date, species identification, count of individuals, temperature, humidity, and location type. The first step was to pair matching observation dates between the two habitats to ensure that comparisons were made under comparable temporal conditions. This eliminated seasonal bias and ensured that observed differences were likely due to habitat variation rather than time-of-year effects.

##### **Exploratory Data Analysis:**
The EDA phase examined both environmental parameters and species-related metrics. Distributions of temperature and humidity were visualized using violin plots to capture differences in central tendency and variability. Species counts were aggregated to identify the most frequently observed species in each habitat. Monthly observation patterns were initially represented using line charts, although alternative visualizations were explored to improve clarity and visual appeal.

Statistical diversity was measured using the Shannon Diversity Index, which accounts for both species richness (number of different species) and evenness (distribution of individuals among species). This index was compared between habitats to assess ecological complexity.

##### **Key Findings:**
Environmental Conditions – Forest sites showed generally higher humidity levels, while temperature ranges were more stable compared to grasslands, which exhibited greater temperature variability.

Species Richness and Abundance – Certain species were dominant in one habitat but rare in the other, indicating clear habitat preferences.

Biodiversity Metrics – Shannon Diversity scores suggested that one habitat (Forest) tended to support more even species distributions, whereas Grasslands often had a few highly dominant species.

Seasonal/Monthly Trends – Observation counts varied across months, with distinct seasonal peaks for certain habitats, likely tied to migration or breeding cycles.

Visualization Choices
Multiple plot types were used to ensure data patterns were clear and engaging:

Violin plots for distribution analysis of temperature and humidity

Bar charts for top species counts by habitat

Line/alternative plots for monthly observation trends

Boxplots for Shannon Diversity comparison

Care was taken to select color palettes and layouts that both enhance readability and maintain consistency across visualizations.

##### **Conclusion:**
This analysis highlights measurable ecological differences between Forest and Grassland habitats in terms of environmental conditions, species composition, and biodiversity indices. Such insights are valuable for conservation planning, as they reveal which habitats are most important for maintaining ecological balance and species diversity. Future work could incorporate more years of data, additional environmental variables (e.g., precipitation, vegetation cover), and advanced modeling to predict species responses to habitat changes.

By structuring the workflow into an EDA phase for exploration and an Insights phase for presentation, the project maintains analytical rigor while ensuring that the final deliverables are both informative and accessible to decision-makers.



# **GitHub Link -**

https://github.com/SIDDUPAAJI/Bird-Species-Data-Anaylsis

# **Problem Statement**


**Bird populations are key biodiversity indicators. Different habitats support different species and are impacted by climate, human presence, and habitat characteristics.**

#### **Define Your Business Objective?**

Which habitat supports more species diversity? How do climate variables such as temperature and humidity relate to bird presence? What temporal patterns emerge in bird activity?

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Data Sources & Description***

### Source Files:

Bird_Monitoring_Data_FOREST.XLSX

Bird_Monitoring_Data_GRASSLAND.XLSX

### Key Fields:

->Date, Start_Time, End_Time

->TaxonCode, Species (bird identity)

->Temperature, Humidity

->Conservation flags: PIF_Watchlist_Status, Regional_Stewardship_Status

->Location_Type, Admin_Unit_Code, Observer

## ***2. Data Base Creation***

### Code Explanation

Here I've used SQL to clean,filter the data within python and created a database.



In [None]:
import pandas as pd
import sqlite3

# === 1. File paths ===
forest_file = "C:/Users/sidde/Downloads/Bird_Monitoring_Data_FOREST.XLSX"
grassland_file = "C:/Users/sidde/Downloads/Bird_Monitoring_Data_GRASSLAND.XLSX"

# === 2. Load sheet names ===
forest_sheets = pd.ExcelFile(forest_file).sheet_names
grassland_sheets = pd.ExcelFile(grassland_file).sheet_names

# === 3. Load & label each sheet ===
def load_and_label_sheets(file_path, sheet_names, location_type):
    df_list = []
    for sheet in sheet_names:
        df = pd.read_excel(file_path, sheet_name=sheet)
        df['Admin_Unit_Code'] = sheet
        df['Location_Type'] = location_type
        df_list.append(df)
    return pd.concat(df_list, ignore_index=True)

forest_data = load_and_label_sheets(forest_file, forest_sheets, "Forest")
grassland_data = load_and_label_sheets(grassland_file, grassland_sheets, "Grassland")

# === 4. Merge datasets ===
full_data = pd.concat([forest_data, grassland_data], ignore_index=True)

# === 5. Clean columns ===
full_data['Date'] = pd.to_datetime(full_data['Date'], errors='coerce')
full_data['Start_Time'] = pd.to_datetime(full_data['Start_Time'], errors='coerce').dt.time
full_data['End_Time'] = pd.to_datetime(full_data['End_Time'], errors='coerce').dt.time
full_data['Temperature'] = pd.to_numeric(full_data['Temperature'], errors='coerce')
full_data['Humidity'] = pd.to_numeric(full_data['Humidity'], errors='coerce')
full_data['Visit'] = pd.to_numeric(full_data['Visit'], errors='coerce')

# Boolean-like columns
bool_columns = [
    'Flyover_Observed', 'PIF_Watchlist_Status', 'Regional_Stewardship_Status',
    'Initial_Three_Min_Cnt'
]
for col in bool_columns:
    full_data[col] = full_data[col].astype(str).str.strip().str.lower().replace({'true': True, 'false': False})
    full_data[col] = full_data[col].astype('boolean')

# === 6. Save to SQLite ===
conn = sqlite3.connect("bird_observations.db")
full_data.to_sql("bird_data", conn, if_exists="replace", index=False)
conn.close()

print("Database 'bird_observations.db' created successfully!")

### Custom Tables,Dashboard creation


Here I've used SQL to make different tables inorder to make uniques dashboards in PowerBI.




In [None]:
import sqlite3
import pandas as pd
import numpy as np
import os

# === 1. Setup ===
db_path = "bird_observations.db"
export_folder = "exports"

# Create folder if not exists
os.makedirs(export_folder, exist_ok=True)

# Connect to database
conn = sqlite3.connect(db_path)

# === 2. Queries dictionary (without LOG in Shannon Index) ===
queries = {
    # Habitat & Species Overview
    "total_unique_species": """
        SELECT COUNT(DISTINCT TaxonCode) AS unique_species
        FROM bird_data
        WHERE TaxonCode IS NOT NULL
    """,
    "total_observations": """
        SELECT COUNT(*) AS total_observations
        FROM bird_data
    """,
    "species_count_per_habitat": """
        SELECT Location_Type, COUNT(DISTINCT TaxonCode) AS species_count
        FROM bird_data
        GROUP BY Location_Type
    """,

    # Species Rankings
    "top10_species_overall": """
        SELECT TaxonCode, COUNT(*) AS observations
        FROM bird_data
        WHERE TaxonCode IS NOT NULL
        GROUP BY TaxonCode
        ORDER BY observations DESC
        LIMIT 10
    """,
    "top10_species_per_habitat": """
        SELECT Location_Type, TaxonCode, COUNT(*) AS observations
        FROM bird_data
        WHERE TaxonCode IS NOT NULL
        GROUP BY Location_Type, TaxonCode
        ORDER BY Location_Type, observations DESC
    """,

    # Time & Trend Analysis
    "monthly_observation_trend": """
        SELECT STRFTIME('%m', Date) AS month, COUNT(*) AS observations
        FROM bird_data
        WHERE Date IS NOT NULL
        GROUP BY month
        ORDER BY month
    """,
    "yearly_observation_trend": """
        SELECT STRFTIME('%Y', Date) AS year, COUNT(*) AS observations
        FROM bird_data
        WHERE year IS NOT NULL
        GROUP BY year
        ORDER BY year
    """,
    "seasonal_patterns": """
        SELECT CASE
            WHEN STRFTIME('%m', Date) IN ('12','01','02') THEN 'Winter'
            WHEN STRFTIME('%m', Date) IN ('03','04','05') THEN 'Spring'
            WHEN STRFTIME('%m', Date) IN ('06','07','08') THEN 'Summer'
            WHEN STRFTIME('%m', Date) IN ('09','10','11') THEN 'Autumn'
        END AS season,
        COUNT(*) AS observations
        FROM bird_data
        WHERE Date IS NOT NULL
        GROUP BY season
    """,
    "time_of_day_pattern": """
        SELECT CASE
            WHEN CAST(STRFTIME('%H', Start_Time) AS INTEGER) BETWEEN 5 AND 11 THEN 'Morning'
            WHEN CAST(STRFTIME('%H', Start_Time) AS INTEGER) BETWEEN 12 AND 16 THEN 'Afternoon'
            WHEN CAST(STRFTIME('%H', Start_Time) AS INTEGER) BETWEEN 17 AND 20 THEN 'Evening'
            ELSE 'Night'
        END AS time_of_day,
        COUNT(*) AS observations
        FROM bird_data
        WHERE Start_Time IS NOT NULL
        GROUP BY time_of_day
    """,

    # Threat Status & Conservation
    "threatened_species_per_habitat": """
        SELECT Location_Type, COUNT(DISTINCT TaxonCode) AS threatened_species
        FROM bird_data
        WHERE PIF_Watchlist_Status = 1
        GROUP BY Location_Type
    """,
    "regional_stewardship_species_per_habitat": """
        SELECT Location_Type, COUNT(DISTINCT TaxonCode) AS stewardship_species
        FROM bird_data
        WHERE Regional_Stewardship_Status = 1
        GROUP BY Location_Type
    """,

    # Environmental Factors
    "temperature_vs_species_richness": """
        SELECT ROUND(Temperature, 0) AS temp_rounded, COUNT(DISTINCT TaxonCode) AS species_richness
        FROM bird_data
        WHERE Temperature IS NOT NULL
        GROUP BY temp_rounded
        ORDER BY temp_rounded
    """,
    "humidity_vs_species_richness": """
        SELECT ROUND(Humidity, 0) AS humidity_rounded, COUNT(DISTINCT TaxonCode) AS species_richness
        FROM bird_data
        WHERE Humidity IS NOT NULL
        GROUP BY humidity_rounded
        ORDER BY humidity_rounded
    """,

    # Observer & Location Analysis — Step 1 only (counts)
    "species_diversity_per_admin": """
        SELECT Admin_Unit_Code, TaxonCode, COUNT(*) AS species_count
        FROM bird_data
        WHERE TaxonCode IS NOT NULL
        GROUP BY Admin_Unit_Code, TaxonCode
    """,
    "top_observers": """
        SELECT Observer, COUNT(*) AS total_observations
        FROM bird_data
        WHERE Observer IS NOT NULL
        GROUP BY Observer
        ORDER BY total_observations DESC
        LIMIT 10
    """
}

# === 3. Run all queries except Shannon Index ===
for name, query in queries.items():
    if name != "species_diversity_per_admin":
        df = pd.read_sql_query(query, conn)
        csv_path = os.path.join(export_folder, f"{name}.csv")
        df.to_csv(csv_path, index=False)
        print(f" Exported: {csv_path}")

# === 4. Calculate Shannon Index in Python ===
df_diversity = pd.read_sql_query(queries["species_diversity_per_admin"], conn)
diversity_results = []
for admin_unit, group in df_diversity.groupby("Admin_Unit_Code"):
    total_count = group["species_count"].sum()
    proportions = group["species_count"] / total_count
    shannon_index = -np.sum(proportions * np.log(proportions))
    diversity_results.append({
        "Admin_Unit_Code": admin_unit,
        "species_count": group["species_count"].nunique(),
        "shannon_index": round(shannon_index, 3)
    })

df_shannon = pd.DataFrame(diversity_results)
csv_path = os.path.join(export_folder, "species_diversity_per_admin.csv")
df_shannon.to_csv(csv_path, index=False)
print(f" Exported Shannon Index: {csv_path}")

# === 5. Close connection ===
conn.close()
print("\nAll queries executed and exported successfully!")


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

def shannon_diversity(group):
    proportions = group.value_counts(normalize=True)
    return -np.sum(proportions * np.log(proportions))

def generate_combined_visualizations(forest_df, grassland_df):
    # Add habitat column
    forest_df['Habitat'] = 'Forest'
    grassland_df['Habitat'] = 'Grassland'

    # Ensure date column is datetime
    forest_df['Date'] = pd.to_datetime(forest_df['Date'])
    grassland_df['Date'] = pd.to_datetime(grassland_df['Date'])

    # Keep only relevant columns
    common_cols = ['Date', 'Year', 'AOU_Code', 'Species', 'Temperature', 'Humidity', 'Habitat']
    common_cols = [col for col in common_cols if col in forest_df.columns and col in grassland_df.columns]
    forest_df = forest_df[common_cols]
    grassland_df = grassland_df[common_cols]

    # Combine datasets
    combined_df = pd.concat([forest_df, grassland_df], ignore_index=True)

    # Filter to keep only matching observations in both habitats
    paired_df = combined_df.groupby(['Date', 'AOU_Code']).filter(lambda x: x['Habitat'].nunique() == 2)
    # Merge the datasets
forest_df = pd.read_excel("C:/Users/sidde/Downloads/Bird_Monitoring_Data_FOREST.XLSX")
grassland_df = pd.read_excel("C:/Users/sidde/Downloads/Bird_Monitoring_Data_GRASSLAND.XLSX")

# Add habitat column
forest_df['Habitat'] = 'Forest'
grassland_df['Habitat'] = 'Grassland'

# Combine them
combined_df = pd.concat([forest_df, grassland_df], ignore_index=True)

# ---- STANDARDIZE SPECIES COLUMN ----
# Try to unify species identification column to "Species"
if 'Species' in paired_df.columns:
    pass  # already correct
elif 'Common_Name' in paired_df.columns:
    paired_df = paired_df.rename(columns={'Common_Name': 'Species'})
elif 'AOU_Code' in paired_df.columns:
    paired_df = paired_df.rename(columns={'AOU_Code': 'Species'})
else:
    raise KeyError("No species identification column found. Expected 'Species', 'Common_Name', or 'AOU_Code'.")


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:

# 1. Species Count Comparison
species_counts = paired_df.groupby(['Habitat', 'AOU_Code']).size().reset_index(name='Count')
plt.figure(figsize=(12,6))
sns.barplot(data=species_counts, x='AOU_Code', y='Count', hue='Habitat')
plt.title("Species Count Comparison (Both Habitats, Same Dates)")
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are best for directly comparing individual species counts across forest and grassland habitats, clearly highlighting which species dominate in each environment.

##### 2. What is/are the insight(s) found from the chart?

The insights reveal habitat preferences and species abundance patterns—valuable for targeting conservation efforts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

If certain species show extremely low counts in either habitat, it may indicate environmental stress or decline, prompting attention to potential negative impacts.

#### Chart - 2

In [None]:

# Check available columns
cols = paired_df.columns

# Decide which column to use for grouping
if 'Species' in cols:
    species_col = 'Species'
elif 'Common_Name' in cols:
    species_col = 'Common_Name'
else:
    species_col = 'AOU_Code'  # Fallback

# 2.Top 10 common species (or codes if names unavailable)
top_species = (
    paired_df.groupby(species_col)
    .size()
    .reset_index(name='TotalCount')
    .sort_values('TotalCount', ascending=False)
    .head(10)
)

# Plot
plt.figure(figsize=(10, 6))
sns.barplot(
    data=top_species,
    x='TotalCount',
    y=species_col,
    hue=species_col,           # assign colors per species
    palette='viridis',
    dodge=False                # keeps it as one bar per species
)
plt.title('Top 10 Most Common Species')
plt.xlabel('Total Count')
plt.ylabel(species_col)
plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar chart clearly displays the top 10 species ranked by their observation counts, making it easy to compare species popularity or abundance visually.

##### 2. What is/are the insight(s) found from the chart?

The chart highlights which bird species are most frequently observed in the dataset, revealing key dominant species across habitats.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

If the top counts are dominated by few species while others have very low counts or are absent, this suggests potential biodiversity loss or ecosystem imbalance that could negatively affect habitat health.

#### Chart - 3

In [None]:
# 3. Temperature Comparison
plt.figure(figsize=(10,6))
sns.boxplot(data=paired_df, x='Habitat', y='Temperature')
plt.title("Temperature Distribution for Matching Observations")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A boxplot effectively shows the distribution of temperature values for each habitat, highlighting medians, quartiles, and variability.

##### 2. What is/are the insight(s) found from the chart?

This chart reveals differences and overlaps in temperature ranges experienced by forests versus grasslands during observations, helping understand environmental conditions influencing bird activity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

If temperature ranges show extreme values or a shift outside species’ preferred zones, it could signal climate stress or habitat unsuitability, which may negatively impact bird populations.

#### Chart - 4

In [None]:
# 4. Humidity Comparison with custom colors (Future-proof, with actual categories)
plt.figure(figsize=(10,6))
sns.violinplot(
    data=paired_df,
    x='Habitat',
    y='Humidity',
    hue='Habitat',  
    palette={"Forest": "#FFA500", "Grassland": "#2E8B57"},  # match your real categories
    legend=False
)
plt.title("Humidity Distribution for Matching Observations (Violin Plot)")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The violin plot is ideal for showing the full distribution and density of humidity values in each habitat, giving more detail than boxplots about variability and data spread.

##### 2. What is/are the insight(s) found from the chart?

This chart reveals how humidity levels differ between forest and grassland habitats, highlighting typical ranges and variability, which helps understand environmental preferences affecting bird activity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, understanding humidity distributions enables habitat managers to tailor conservation and monitoring strategies to environmental conditions best suited for target species.

#### Chart - 5

In [None]:
# 5. Monthly Observation Trend
plt.figure(figsize=(10,6))
sns.set_style("whitegrid")
sns.set_palette("Set2")

# Pivot for area-style fill
monthly_pivot = monthly_counts.pivot(index='Month', columns='Habitat', values='Count')
monthly_pivot.plot(kind='area', alpha=0.7, figsize=(10,6))

plt.title("Monthly Observation Trend (Area Plot)")
plt.xlabel("Month")
plt.ylabel("Observation Count")
plt.tight_layout()
plt.show()




##### 1. Why did you pick the specific chart?

An area plot effectively displays the total observation counts across months for both habitats, showing trends and seasonal patterns with an intuitive stacked visual emphasizing fluctuations over time.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals clear seasonal peaks and troughs in bird observations, indicating higher activity or survey effort in certain months (typically spring/summer) across both forests and grasslands.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, understanding seasonal trends helps optimize survey timing and resource allocation for conservation efforts, ensuring monitoring is focused during periods of peak bird activity for more effective data collection.

#### Chart - 6

In [None]:
# 6. Daily Observation Count Comparison
daily_counts = paired_df.groupby(['Date', 'Habitat']).size().reset_index(name='Count')
plt.figure(figsize=(14,6))
sns.lineplot(data=daily_counts, x='Date', y='Count', hue='Habitat')
plt.title("Daily Observation Count Comparison")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A line plot is ideal for visualizing daily trends over time, showing how observation counts fluctuate day-to-day across both habitats, allowing easy comparison of temporal patterns.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals daily variation in bird observations, highlighting periods of consistent activity, spikes possibly linked to favorable conditions or survey efforts, and differences or similarities in daily trends between forest and grassland habitats.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, understanding daily observation patterns informs optimal scheduling of surveys and resource deployment for monitoring, enhancing data reliability and conservation effort efficiency.

#### Chart - 7

In [None]:
# Pick the species identifier column dynamically
cols = paired_df.columns
if 'Species' in cols:
    species_col = 'Species'
elif 'Common_Name' in cols:
    species_col = 'Common_Name'
else:
    species_col = 'AOU_Code'  # fallback to numeric code

#7. Shannon Diversity Function
from scipy.stats import entropy

def shannon_diversity(x):
    counts = x.value_counts()
    return entropy(counts, base=2)

# Calculate diversity
diversity = (
    paired_df.groupby(['Date','Habitat'])[species_col]
    .apply(shannon_diversity)
    .reset_index(name='ShannonIndex')
)

# Plot
plt.figure(figsize=(10,6))
sns.boxplot(data=diversity, x='Habitat', y='ShannonIndex')
plt.title('Shannon Diversity Index by Habitat')
plt.xlabel('Habitat')
plt.ylabel('Shannon Index')
plt.show()

##### 1. Why did you pick the specific chart?

A boxplot is ideal for showing the distribution and variability of the Shannon Diversity Index across habitats, highlighting median values, spread, and potential outliers in diversity scores.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that one habitat (likely forest) tends to have a higher median Shannon Index, indicating greater species diversity and evenness compared to the other habitat (likely grassland). It also shows the variation in diversity observed on different dates within each habitat.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights support targeted conservation and habitat management by identifying which habitat maintains richer and more balanced bird communities, guiding resource allocation and preservation efforts effectively.

#### Chart - 8

In [None]:
# 8. Temperature vs Humidity Scatter
plt.figure(figsize=(10,6))
sns.scatterplot(data=paired_df, x='Temperature', y='Humidity', hue='Habitat', alpha=0.7)
plt.title("Temperature vs Humidity (Both Habitats Present)")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot is ideal for visualizing the relationship and distribution between two continuous environmental variables—temperature and humidity—across observations in both habitats. The color hue for habitat allows easy comparison of environmental conditions experienced by forest and grassland areas.

##### 2. What is/are the insight(s) found from the chart?

The chart shows how temperature and humidity values cluster for each habitat, revealing differences or overlaps in climatic conditions. It helps identify the typical environmental ranges where birds were observed in forests versus grasslands and whether habitats experience distinct microclimates.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

If the scatter shows extreme temperature or humidity values for a habitat outside the optimal range for bird species, it may indicate stressful conditions that could reduce bird diversity and abundance. Such environmental stressors can lead to habitat degradation and negative impact on species survival.Answer Here

#### Chart - 9

In [None]:
# 9. Heatmap for Top 10 Common Species
top_species_list = top_species['Species'].tolist()
heatmap_df = paired_df[paired_df['Species'].isin(top_species_list)]
heatmap_data = heatmap_df.groupby(['Species','Habitat']).size().unstack(fill_value=0)
plt.figure(figsize=(8,6))
sns.heatmap(heatmap_data, annot=True, fmt='d', cmap='YlGnBu')
plt.title("Observation Heatmap for Top 10 Common Species")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A heatmap intuitively shows the distribution and count of the top 10 species across both habitats, using color intensity to quickly convey abundance patterns in a compact matrix form.

##### 2. What is/are the insight(s) found from the chart?

The heatmap reveals which species are more abundant in forests versus grasslands and highlights species that are strongly associated with one habitat or occur frequently in both. This helps identify habitat specialists and generalists.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, by identifying species-habitat relationships, conservation efforts can be better targeted to preserve crucial habitats for vulnerable or key species, enhancing biodiversity management.

#### Chart - 10

In [None]:
# 10. Time-of-Day Observation Comparison
import numpy as np

plt.figure(figsize=(8,8))
ax = plt.subplot(111, polar=True)

for habitat in hourly_counts['Habitat'].unique():
    subset = hourly_counts[hourly_counts['Habitat'] == habitat]
    theta = np.deg2rad(subset['Hour'] * 15)  # 24 hours -> 360 degrees
    ax.plot(theta, subset['Count'], marker='o', label=habitat, linewidth=2)

ax.set_theta_direction(-1)  # Clockwise
ax.set_theta_offset(np.pi/2.0)  # Start at top
ax.set_xticks(np.deg2rad(np.arange(0, 360, 30)))
ax.set_xticklabels(range(0, 24, 2))
ax.set_title("Time-of-Day Observation (Clock View)", y=1.1)
ax.legend(loc='upper right', bbox_to_anchor=(1.1, 1.1))
plt.show()


##### 1. Why did you pick the specific chart?

A polar plot is effective for displaying cyclical data like the time of day, visually representing bird observation counts across 24 hours in a circular, clock-like format. This helps highlight daily activity rhythms for each habitat intuitively.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals peak bird observation periods for forests and grasslands, typically showing higher counts during early morning and possibly late afternoon/evening, reflecting natural bird activity patterns aligned with daylight and behavior cycles.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

If there are extended periods with very low observation counts, this could suggest reduced bird activity due to environmental stressors (e.g., human disturbance, habitat degradation) during key times, which may negatively impact monitoring effectiveness and signal broader ecosystem issues requiring attention.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

->Data limited to surveyed times & locations; possible sampling bias.

->Observer effort not normalized; future analyses should control for detection bias.

->Expand with geospatial mapping, statistical significance tests, and predictive models.

# **Conclusion**

This analysis of bird observations across forest and grassland habitats reveals that forests support greater species richness and evenness (148 species, mean Shannon Index 3.41) compared to grasslands (127 species, mean Shannon Index 3.06), while grasslands host a few highly abundant dominant species. Both environments are essential for conservation, harboring numerous threatened and stewardship species, though forests have slightly more of each. The highest bird diversity was observed during summer mornings at moderate temperature (15–25 °C) and humidity (50–70%), indicating clear environmental influences. Overall, the results emphasize the importance of preserving both habitat types and adopting targeted, seasonally-aware conservation strategies to sustain regional bird diversity in the face of ongoing habitat change.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***