# JBI100 Visualization 
### Academic year 2024-2025

## Incidents and Accidents
Data sources:

- Australian Shark Incidents (https://github.com/cjabradshaw/AustralianSharkIncidentDatabase)

Data dictionaries and additional info can be found in the respective data folders.
Note: you only need to select one dataset for your project.

This visualization tool aims to inform the general public, particularly beachgoers, about shark incidents in Australia. It seeks to replace fear and misinformation with data-driven understanding, empowering individuals to make informed decisions about ocean safety. The visualization will allow users to explore the historical and geographical distribution of shark incidents, understand the factors associated with these incidents (e.g., time of year, activity, species involved), and learn about the relative risk at different locations and times.

In [58]:
# Import libraries
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import os

# Do not truncate tables
pd.set_option('display.max_columns', None)

# If you receive a 'ModuleNotFoundError' please install the according library. 
# This can be done from within the Jupyter environment with the command 
#'!python3 -m pip install lib' where lib is the according library name.

In [59]:
# Load the data

# Australian Shark Incidents Data
df_shark = pd.read_excel('Australian Shark-Incident Database Public Version.xlsx', index_col=0)

## Explore Shark Data

In [60]:
#df_shark.sample(50)
df_shark.head(5)

Unnamed: 0_level_0,Incident.month,Incident.year,Victim.injury,State,Location,Latitude,Longitude,Site.category,Site.category.comment,Shark.common.name,Shark.scientific.name,Shark.identification.method,Shark.identification.source,Shark.length.m,Basis.for.length,Provoked/unprovoked,Provocative.act,No.sharks,Victim.activity,Fish.speared?,Commercial.dive.activity,Object.of.bite,Present.at.time.of.bite,Direction.first.strike,Shark.behaviour,Victim.aware.of.shark,Shark.captured,Injury.location,Injury.severity,Victim.gender,Victim.age,Victim.clothing,Clothing.coverage,Dominant.clothing.colour,Other.clothing.colour,Clothing.pattern,Fin.colour,Diversionary.action.taken,Diversionary.action.outcome,People <3m,People 3-15m,Time.of.incident,Depth.of.incident.m,Teeth.recovered,Time.in.water.min,Water.temperature.°C,Total.water.depth.m,Water.visability.m,Distance.to.shore.m,Spring.or.neap.tide,Tidal.cycle,Wind.condition,Weather.condition,Air.temperature.°C,Personal.protective.device,Deterrent.brand.and.type,Data.source,Reference,Unnamed: 59
UIN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1
1,1,1791,fatal,NSW,near sydney,-33.866667,151.2,coastal,,white shark,Carcharodon carcharias,"bite analysis, shark behaviour, geographical l...",,,,unprovoked,,,swimming,,,,,,,,,torso,major lacerations,female,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,book,"shark&survl, whitley 1958, book ref 1793",
2,3,1803,injured,WA,"hamelin bay, faure island",-25.833333,113.883333,coastal,,tiger shark,Galeocerdo cuvier,"bite analysis, shark behaviour, geographical l...",,,,unprovoked,,1.0,swimming,,,,,,swam at victim,Y,,,,male,,,,,,,,pushed at shark,,,,,0.0,,1.0,,,,,,,,,,,,book,"balgridge,green,taylor,whitley 1940",
3,1,1807,injured,NSW,"cockle bay, sydney harbour",-33.866667,151.2,estuary/harbour,,bull shark,Carcharhinus leucas,"bite analysis, shark behaviour",,,,unprovoked,,1.0,swimming,,,,,,bit victim on wrist,,,"arm, hand",minor lacerations,male,,,,,,,,,,,,,,,,,,,,,,,,,,,media outlet,sydney gazette 18.1.1807,
4,1,1820,fatal,TAS,"sweetwater point, pitt water",-42.8,147.533333,coastal,,,,,,,,provoked,,1.0,swimming,,,,catch,,bit victim on leg,N,,leg,major lacerations,male,,,,,,,,,,,,,1.0,,,,,,100.0,,,,,,,,witness account,"shark&survl, c. black researcher",
5,1,1825,injured,NSW,"kirribili point, sydney harbour",-33.85,151.216667,estuary/harbour,,bull shark,Carcharhinus leucas,"bite analysis, shark behaviour, geographical l...",,,,unprovoked,,1.0,swimming,,,,,,bit legs,,,leg,minor lacerations,male,15.0,,,,,,,,,,,,,,,,,,,,,,,,,,media outlet,maitland daily mercury 13.11.1899,


In [61]:
df_shark.describe()

Unnamed: 0,Incident.month,Incident.year,Shark.length.m,No.sharks,Victim.age,People <3m,People 3-15m,Time.of.incident,Depth.of.incident.m,Time.in.water.min,Water.temperature.°C,Total.water.depth.m,Water.visability.m,Distance.to.shore.m,Spring.or.neap.tide,Air.temperature.°C,Unnamed: 59
count,1233.0,1233.0,590.0,1140.0,723.0,97.0,83.0,522.0,520.0,230.0,91.0,228.0,70.0,360.0,0.0,40.0,1.0
mean,5.939984,1968.518248,2.696855,1.034211,28.26971,1.556701,3.289157,1281.689655,2.13,59.37,20.981319,5.961404,10.371429,3176.019444,,24.175,415438758.0
std,4.084692,48.451842,1.206209,0.342569,13.963268,1.561043,5.511695,409.954877,5.474775,253.578818,4.12709,9.286013,22.023815,21519.531041,,4.914069,
min,1.0,1791.0,0.3,1.0,0.0,0.0,0.0,130.0,0.0,0.1,0.3,0.5,0.0,1.0,,10.0,415438758.0
25%,2.0,1933.0,1.8,1.0,17.0,1.0,1.0,933.75,0.0,3.0,19.0,1.0,1.0,30.0,,22.0,415438758.0
50%,5.0,1986.0,2.6,1.0,25.0,1.0,2.0,1300.0,0.0,10.0,21.0,2.0,5.0,80.0,,25.0,415438758.0
75%,10.0,2011.0,3.5,1.0,37.0,2.0,3.0,1620.0,1.0,30.0,23.0,7.0,10.0,200.0,,27.0,415438758.0
max,12.0,2024.0,6.0,10.0,84.0,12.0,40.0,2330.0,45.0,2160.0,29.0,80.0,150.0,280000.0,,35.0,415438758.0


In [62]:
fig = px.scatter(df_shark, x="Incident.year", y="Victim.age",
                 width=1000, height=800)
fig.show()

In [63]:
# plot histogram of shark incidents by year
fig = px.histogram(df_shark, x="Incident.year",
                 width=1000, height=800)
fig.show()

In [64]:
#(c) Data Parsing Function:
def get_data_at_position(row, col):
    """Retrieves data at a specified row and column."""
    try:
        return df_shark.iloc[row, col]  # Use iloc for integer-based indexing
    except IndexError:
        return None  # Return None if the index is out of bounds

# Example usage:
value = get_data_at_position(0, 2)  # Get the value at the 1st row and 3rd column
print(f"Value at (0, 2): {value}")

Value at (0, 2): fatal


In [65]:
#(d) Attribute Distribution and Frequency Counts:
def analyze_attribute_distribution(attribute_name):
    """Calculates and prints the distribution and frequency counts of an attribute."""
    if attribute_name not in df_shark.columns:
        print(f"Error: Attribute '{attribute_name}' not found in the dataset.")
        return
    
    print(f"\nDistribution and Frequency Counts for '{attribute_name}':")
    print(df_shark[attribute_name].value_counts(dropna=False)) # Include NaN values
    
    # Visualization (optional - uncomment if needed):
    # df[attribute_name].value_counts().plot(kind='bar') 
    # plt.title(f"Distribution of {attribute_name}")
    # plt.show()

# Example usage:
analyze_attribute_distribution('Incident.month')
analyze_attribute_distribution('Victim.injury')



Distribution and Frequency Counts for 'Incident.month':
Incident.month
1     228
12    160
2     147
3     122
11    118
10     93
4      92
9      61
6      58
7      56
5      51
8      47
Name: count, dtype: int64

Distribution and Frequency Counts for 'Victim.injury':
Victim.injury
injured      746
fatal        255
uninjured    229
unknown        1
Injured        1
injury         1
Name: count, dtype: int64


In [66]:
#(f) Missing Values Analysis:
def analyze_missing_values():
    """Analyzes and prints the number of missing values per attribute and per entry."""
    print("\nMissing Values Analysis:")
    print(df_shark.isnull().sum())  # Missing values per attribute
    print("\nMissing Values per Entry:")
    print(df_shark.isnull().sum(axis=1))  # Missing values per row (entry)

    # Identify most relevant missing values:
    missing_ratios = df_shark.isnull().sum() / len(df_shark) * 100
    print("\nMissing Value Ratios (%):")
    print(missing_ratios) # Percentage of missing values per attribute


    potentially_problematic = missing_ratios[missing_ratios > 10] # Threshold - adjust as needed
    print("\nPotentially Problematic Missing Values (Attributes with > 10% missing):")
    print(potentially_problematic)


analyze_missing_values()


Missing Values Analysis:
Incident.month                    0
Incident.year                     0
Victim.injury                     0
State                             0
Location                          3
Latitude                          0
Longitude                         0
Site.category                     0
Site.category.comment          1185
Shark.common.name                57
Shark.scientific.name            66
Shark.identification.method     229
Shark.identification.source    1085
Shark.length.m                  643
Basis.for.length                751
Provoked/unprovoked               7
Provocative.act                1081
No.sharks                        93
Victim.activity                  33
Fish.speared?                  1229
Commercial.dive.activity       1096
Object.of.bite                  936
Present.at.time.of.bite         637
Direction.first.strike          929
Shark.behaviour                 250
Victim.aware.of.shark           618
Shark.captured                 1155
In

Deal with missing values 

In [67]:
def compute_missing_values():
    # For the attributes we intend to use in the visualization we will compute the missing values 
    # For Victim activity wherever there is a missing value we will put "Unknown"
    df_shark['Victim.activity'].fillna('Unknown', inplace=True)
    df_shark['Shark.common.name'].fillna('Unknown', inplace=True)
    

compute_missing_values()
df_shark.head(5)


A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.




A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.





Unnamed: 0_level_0,Incident.month,Incident.year,Victim.injury,State,Location,Latitude,Longitude,Site.category,Site.category.comment,Shark.common.name,Shark.scientific.name,Shark.identification.method,Shark.identification.source,Shark.length.m,Basis.for.length,Provoked/unprovoked,Provocative.act,No.sharks,Victim.activity,Fish.speared?,Commercial.dive.activity,Object.of.bite,Present.at.time.of.bite,Direction.first.strike,Shark.behaviour,Victim.aware.of.shark,Shark.captured,Injury.location,Injury.severity,Victim.gender,Victim.age,Victim.clothing,Clothing.coverage,Dominant.clothing.colour,Other.clothing.colour,Clothing.pattern,Fin.colour,Diversionary.action.taken,Diversionary.action.outcome,People <3m,People 3-15m,Time.of.incident,Depth.of.incident.m,Teeth.recovered,Time.in.water.min,Water.temperature.°C,Total.water.depth.m,Water.visability.m,Distance.to.shore.m,Spring.or.neap.tide,Tidal.cycle,Wind.condition,Weather.condition,Air.temperature.°C,Personal.protective.device,Deterrent.brand.and.type,Data.source,Reference,Unnamed: 59
UIN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1
1,1,1791,fatal,NSW,near sydney,-33.866667,151.2,coastal,,white shark,Carcharodon carcharias,"bite analysis, shark behaviour, geographical l...",,,,unprovoked,,,swimming,,,,,,,,,torso,major lacerations,female,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,book,"shark&survl, whitley 1958, book ref 1793",
2,3,1803,injured,WA,"hamelin bay, faure island",-25.833333,113.883333,coastal,,tiger shark,Galeocerdo cuvier,"bite analysis, shark behaviour, geographical l...",,,,unprovoked,,1.0,swimming,,,,,,swam at victim,Y,,,,male,,,,,,,,pushed at shark,,,,,0.0,,1.0,,,,,,,,,,,,book,"balgridge,green,taylor,whitley 1940",
3,1,1807,injured,NSW,"cockle bay, sydney harbour",-33.866667,151.2,estuary/harbour,,bull shark,Carcharhinus leucas,"bite analysis, shark behaviour",,,,unprovoked,,1.0,swimming,,,,,,bit victim on wrist,,,"arm, hand",minor lacerations,male,,,,,,,,,,,,,,,,,,,,,,,,,,,media outlet,sydney gazette 18.1.1807,
4,1,1820,fatal,TAS,"sweetwater point, pitt water",-42.8,147.533333,coastal,,Unknown,,,,,,provoked,,1.0,swimming,,,,catch,,bit victim on leg,N,,leg,major lacerations,male,,,,,,,,,,,,,1.0,,,,,,100.0,,,,,,,,witness account,"shark&survl, c. black researcher",
5,1,1825,injured,NSW,"kirribili point, sydney harbour",-33.85,151.216667,estuary/harbour,,bull shark,Carcharhinus leucas,"bite analysis, shark behaviour, geographical l...",,,,unprovoked,,1.0,swimming,,,,,,bit legs,,,leg,minor lacerations,male,15.0,,,,,,,,,,,,,,,,,,,,,,,,,,media outlet,maitland daily mercury 13.11.1899,


Exploring location details 

In [68]:
def check_common_areas():
    location_categories = df_shark['Location'].unique()
    print(f"There are {len(location_categories)} unique location categories.")
    print(location_categories)