SABINA COMMENT: This is a very schooly description. The outcome of an analysis is an insight that leads to a decision. But just telling people "We cleaned this data so we can provide actionable insights" doesn't tell them what does insights are! This means you need to say something like "We analyzed trends across XX years of global shark attacks data to determine the safest place to open a surfing school."
This project focuses on performing a full Exploratory Data Analysis (EDA) and data cleaning process on the Global Shark Attack File (GSAF) dataset.
The goal was to transform messy, real-world public data into a clean, actionable format and extract key insights regarding the geographical, temporal, and activity-based trends of shark attacks worldwide.
(The top 5 countries interms of Shark Attacks are USA, Australia, South Africa, New Zeland and Papua New Guinea)
Geographical Concentration: ["The vast majority of attacks are concentrated in the USA (2573) and Australia (1480), suggesting a correlation with high population and aquatic recreation rates in those regions."] Most Common Activity: ["Surfing was identified as the activity most frequently associated with recorded attacks, followed by swimming/wading."]
The analysis addressed several key questions, yielding the following insights:
- Finding: The USA records the highest total number of attacks, but Australia holds the highest number of fatal attacks (310), followed by the USA (202).
- Insight: While the risk of interaction is highest in the USA, the risk of lethality is significantly higher in Australia, suggesting a difference in the types of sharks present or emergency response systems. Safety efforts should be tailored to this specific risk profile in each region.
- Finding: Surfing was identified as the activity most frequently associated with recorded attacks, followed by swimming/wading.
- Insight: Public safety campaigns must be targeted specifically at recreational water users, focusing on avoiding high-risk times (like dawn/dusk) and high-risk environments (like murky river mouths).
- Finding: The raw GSAF dataset required extensive cleaning, as over 80% of records were unusable until key columns (like the
Injurycolumn for fatality data) were successfully coerced and imputed. - Insight: This underscores the critical need for standardized, mandatory global reporting of shark incidents to improve data reliability for future risk modeling.
Top 5 Countries by Fatal Shark Attacks
This chart visually demonstrates the difference in risk severity, highlighting Australia's high number of recorded fatalities.
| Component | Purpose |
|---|---|
| Python | Core programming language |
| Pandas | Data Cleaning and Analysis |
| Matplotlib / Seaborn | Data Visualization |
| Jupyter Notebook | Interactive Analysis |
| Git & GitHub | Version Control & Hosting |
