# Project Scoping

## 1. Project Goals

Some initial questions to answer that could be explored:

- What is the overall biodiversity across the parks?
- Are certain species more concentrated in specific parks?
- Are certain categories more endangered than others?
- Which parks have the most endangered species?
- Which species are most under threat?

## 2. Understanding the Dataset

There are two csv files: `species_info.csv` and `observations.csv`.
- `species_info.csv` columns: Animal category (eg. mammal), scientific name, common name, and conservation status.
- `observations.csv` columns: scientific name, park name, and number of observations.

**Data Types**: all columns are string objects except for the number of observations which are int64.

**Size & Quality**:

- `species_info.csv`: 5,824 rows x 4 columns. Missing values: 5,633 (97%), all in the conservation status column.
- `observations.csv`: 23,296 rows x 3 columns. Missing values: 0

## 3. Key Metrics/Analysis Techniques

- Biodiversity Indexes to consider: Simpson’s Diversity Index, or Shannon Index to quantify biodiversity.
- Trend Analysis: Identify trends between parks.
- Visualization: Plan on using visualizations like bar charts, heatmaps, or maps to make the data easier to interpret.

## 4. Tools & Methods

Python Libraries: pandas, matplotlib, seaborn, for analysis and visualizations. Plotly for interactive map visualizations.

## 5. Deliverables

- Jupyter Notebook Report: Clear insights into biodiversity trends, backed by data.
- Linkedin Article: A summary of the findings and the importance of biodiversity conservation.
- Visualizations: Maps showing species distribution, charts for biodiversity indexes, etc.
- Conclusion/Recommendations: Suggest conservation strategies based on the findings.

---

# Boilerplate Structure for the Jupyter Notebook Report:

Title Page

Project Title: Biodiversity in US National Parks
Your Name
Date
Introduction

Brief overview of the project.
Define the project goals (e.g., understanding biodiversity distribution, identifying trends, etc.).
Dataset Overview

Description of the dataset (source, size, variables).
Initial observations (any patterns or peculiarities?).
Data Cleaning & Preprocessing

Missing data handling, outliers, data transformation.
Tools and libraries used (e.g., pandas, NumPy).
Exploratory Data Analysis (EDA)

Visualizations of species diversity, park distributions, etc.
Any summary statistics or interesting patterns.
Species trends over time (if applicable).
Key Metrics & Calculations

Biodiversity indexes (e.g., Shannon Index, species richness).
Breakdown by park, species group, or region.
Insights & Findings

Highlight major trends or patterns found during analysis.
Discuss regional biodiversity differences, species at risk, etc.
Conservation Implications

Based on your findings, discuss potential conservation actions.
Any policy recommendations?
Conclusion

Summarize key takeaways.
Future work (e.g., further data collection, predictive modeling).
Appendix

Code snippets, additional graphs, or tables.