Skip to content

ReynoldT92/GSAF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦈 Global Shark Attack Data Analysis

📝 Project Overview

SABINA COMMENT: This is a very schooly description. The outcome of an analysis is an insight that leads to a decision. But just telling people "We cleaned this data so we can provide actionable insights" doesn't tell them what does insights are! This means you need to say something like "We analyzed trends across XX years of global shark attacks data to determine the safest place to open a surfing school."

This project focuses on performing a full Exploratory Data Analysis (EDA) and data cleaning process on the Global Shark Attack File (GSAF) dataset.

The goal was to transform messy, real-world public data into a clean, actionable format and extract key insights regarding the geographical, temporal, and activity-based trends of shark attacks worldwide.


🔑 Key Findings & Business Insights

(The top 5 countries interms of Shark Attacks are USA, Australia, South Africa, New Zeland and Papua New Guinea)

Geographical Concentration: ["The vast majority of attacks are concentrated in the USA (2573) and Australia (1480), suggesting a correlation with high population and aquatic recreation rates in those regions."] Most Common Activity: ["Surfing was identified as the activity most frequently associated with recorded attacks, followed by swimming/wading."]

The analysis addressed several key questions, yielding the following insights:

1. Geographical Risk Profile

  • Finding: The USA records the highest total number of attacks, but Australia holds the highest number of fatal attacks (310), followed by the USA (202).
  • Insight: While the risk of interaction is highest in the USA, the risk of lethality is significantly higher in Australia, suggesting a difference in the types of sharks present or emergency response systems. Safety efforts should be tailored to this specific risk profile in each region.

2. Activity-Based Risk

  • Finding: Surfing was identified as the activity most frequently associated with recorded attacks, followed by swimming/wading.
  • Insight: Public safety campaigns must be targeted specifically at recreational water users, focusing on avoiding high-risk times (like dawn/dusk) and high-risk environments (like murky river mouths).

3. Data Quality and Limitations

  • Finding: The raw GSAF dataset required extensive cleaning, as over 80% of records were unusable until key columns (like the Injury column for fatality data) were successfully coerced and imputed.
  • Insight: This underscores the critical need for standardized, mandatory global reporting of shark incidents to improve data reliability for future risk modeling.

📊 Visualization of Key Data

Top 5 Countries by Fatal Shark Attacks

This chart visually demonstrates the difference in risk severity, highlighting Australia's high number of recorded fatalities.

Bar chart showing the top 5 countries by the number of fatal shark attacks, with Australia leading at 310


🛠️ Technology Stack

Component Purpose
Python Core programming language
Pandas Data Cleaning and Analysis
Matplotlib / Seaborn Data Visualization
Jupyter Notebook Interactive Analysis
Git & GitHub Version Control & Hosting

About

Global Shark Attack Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors