# Shaaaaark.....Swim.....

 ![](https://www.rd.com/wp-content/uploads/2018/07/shutterstock_390021130.jpg?resize=768,512)
 
 Shark attacks are a known issue in coastal areas and are dangerous for surfers and swimmers. This notebook explores a few questions pertinent to shark attacks -
 * Which countries have reported the most number of shark attacks?
 * Are men attacked more compared to women?
 * What activities typically lead to attacks?
 * Hw many attacks have been fatal?
 
We have data since 1543. Lets jump right in!

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/shark-attack-dataset/list_coor_australia.csv
/kaggle/input/shark-attack-dataset/attacks.csv


In [2]:
# import libraries
import plotly.express as px
import plotly.graph_objects as go

pd.set_option('display.max_columns',100)
pd.set_option('display.max_rows',100)

In [3]:
# read data
df = pd.read_csv('/kaggle/input/shark-attack-dataset/attacks.csv')

# A timeline of Shark Attacks.....

Data is available since year 1543. Wow...Amazing level of data. As first step we will look at a timeline of the attacks. There has been an increasing trend in the yearly number of shark attacks with a maximum of 145 attacks in 2015. The increasing trend could be simply because the data collection and reporting became better as the years progressed. Interestingly, there is also a peak in 1959 with 93 attacks.

In [4]:
df_year = df[df.Year >= 1543].groupby('Year')['Year'].count()

fig = px.area(df_year, x=df_year.index, y=df_year.values, title='Shark Attack Trend',
             labels={
                     "index": "Year",
                     "y": "Attack Count",})


fig.show()

# Which countries have had the most number of attacks?

USA and Australia have the most number of shark attacks reported. USA contributes to approximately 9% of the total attacks, which is pretty significant. And within USA, Florida contributes to 47% of the total attacks. Though not suprising, I would have expected California to have more attacks, which ranks 3rd  with only 290 reported attacks.

In [5]:
attack_country =  df.groupby('Country')['Country'].count().sort_values(ascending=False)[:10]

fig = px.bar(attack_country, x=attack_country.index, y=attack_country.values,
            labels={
                     "index": " ",
                     "y": "Attacks"}, title= 'Top 10 Countries by number of Attacks')

fig.show()

In [6]:
usa_attacks = df[df.Country == 'USA'].groupby('Area')['Area'].count().sort_values(ascending=False)[:10]

fig = px.bar(usa_attacks, x=usa_attacks.index, y=usa_attacks.values,
            labels={
                     "index": " ",
                     "y": "Attacks"}, title= 'Top 10 Areas in USA with Shark Attacks')

fig.show()

# Are men attacked more than women?

Overall attacks on Males is 8 times more as compared to that for females. Are men more reckless? Could there be a bias in the reported data?

In [7]:
sex_attacks = df.groupby('Sex ')['Sex '].count()
sex_attacks = sex_attacks[(sex_attacks.index == 'M') | (sex_attacks.index=='F')]
sex_attacks

fig = px.pie(sex_attacks, values=sex_attacks.values, names=sex_attacks.index, title='Shark Attacks by Gender')
fig.update_layout(height=500, width=600)
fig.show()

It is also possible that in the earlier years women tended to stay home more and men went fishing, boating etc. Lets look at the distribution only between 2010-2018.

In [8]:
sex_attacks_2010_2018 = df[df.Year >= 2010].groupby('Sex ')['Sex '].count()
sex_attacks_2010_2018 = sex_attacks_2010_2018[(sex_attacks_2010_2018.index == 'M') | (sex_attacks_2010_2018.index=='F')]
sex_attacks

fig = px.pie(sex_attacks_2010_2018, values=sex_attacks_2010_2018.values, names=sex_attacks_2010_2018.index, 
             title='Shark Attacks by Gender - Between 2010-2018')
fig.update_layout(height=500, width=600)
fig.show()

The number of attacks is still significantly higher, 4 times more for males compared to females. This is quite interesting and may warrant an independent study.

## What activities lead to shark attacks?

Let's explore the type of activities that typically lead to shark attacks.

In [9]:
activity_attack = df.groupby('Activity')['Activity'].count().sort_values(ascending=False)[:15]

fig = px.bar(activity_attack, x=activity_attack.index, y=activity_attack.values,
            labels={
                     "index": " ",
                     "y": "Attacks"}, title= 'Top 15 Activities that lead to shark attacks')

fig.show()

Surfing and swimming contributes to 32% of the total attacks (where data is available). This is not surprising at all as seen from movies and cartoons. There are some interesting activities that have led to shark attacks such as pearl diving and spearfishing. One would assume that while spearfishing you would have a spear in your hand to fend off the attack. My guess these were non-fatal attacks. It will be good to see how many of these attacks were fatal for these activities. Bathing and wading indicate that there have been attacks in shallow water.

# How many attacks were fatal?

It will be interesting to look at the number of attacks that were fatal. Should we worry while going surfing or swimming in Florida?

In [10]:
fatal_attacks =  df.groupby('Fatal (Y/N)')['Fatal (Y/N)'].count()
fatal_attacks = fatal_attacks[(fatal_attacks.index== 'N') | (fatal_attacks.index== 'Y')]

attack_map = {'Y':'Fatal Attack', 'N':'Non Fatal Attack'}
fatal_attacks.index = fatal_attacks.index.map(attack_map)

fig = px.bar(fatal_attacks, x=fatal_attacks.values, y=fatal_attacks.index, orientation='h', labels={'index':'','x':'Attack Count'}, 
             color = fatal_attacks.index)
fig.update_layout(height=500, width=800)
fig.show()


The number of fatal attacks is only 25% i.e. 1 in 4 attacks. That gives some confidence that not all attacks are dangerous. Of course you could lose an arm, which will suck and looks like this is not considered fatal as per the dataset. Let's also look at the fatal and non-fatal attacks for swimming and surfing.

In [11]:
fatal_attacks_activity =  df[(df['Fatal (Y/N)'] == 'Y') | (df['Fatal (Y/N)'] == 'N')]
fatal_attacks_activity = fatal_attacks_activity[(fatal_attacks_activity.Activity == 'Surfing') | (fatal_attacks_activity.Activity == 'Swimming') |
                                               (fatal_attacks_activity.Activity == 'Spearfishing')]
fatal_attacks_activity =  fatal_attacks_activity.groupby(['Activity','Fatal (Y/N)'])['Country'].count().reset_index()

fatal_attacks_activity['Fatal (Y/N)'] = fatal_attacks_activity['Fatal (Y/N)'].map(attack_map)
fatal_attacks_activity = fatal_attacks_activity.rename(columns={'Country':"Count"}) 

fig = px.bar(fatal_attacks_activity, x='Count', y='Activity', orientation='h', labels={'index':'','Count':'Attack Count'}, 
             color = 'Fatal (Y/N)')
fig.show()

Fatal attacks for are significantly lower for surfing, only 5% of the total attacks. In comparison,fatal attacks for swimming are 39%,which is signifiantly higher. There are a couple of hypothesis here. One is that it may be easier to spot the shark while surfing as compared to swimmming, assuming you are on the surf-board already. Also, while surfing you are faster, which leads to more non-fatal attacks.

I would have imagined that there should not be many fatal attacks while spearfishing. You have a spear, mate! But surprisingly, 13% of the attacks have been fatal while spearfishing.

# Provoked vs Unprovoked Attacks

Let's dig a little deeper on provoked and unprovoked attacks.

**Provoked Attacks** 

Provoked attacks occur when a human touches, hooks, nets, or otherwise aggravates the animal. Incidents that occur outside of a shark's natural habitat, such as aquariums and research holding-pens, are considered provoked, as are all incidents involving captured sharks. Sometimes humans inadvertently provoke an attack, such as when a surfer accidentally hits a shark with a surf board. So techinically surfing and fishing etc. should contribute to a larger percentage of shark attacks. We can verify this with the acitivity data. 

**Unprovoked Attacks**

Unprovoked attacks are initiated by the shark. They occur in a shark's natural habitat on a live human and without human provocation. So if you infiltrate they attack.

In [12]:
prov_attack = df.groupby('Type')['Type'].count()
prov_attack = prov_attack[(prov_attack.index == 'Provoked') | (prov_attack.index == 'Unprovoked')]

fig = px.bar(prov_attack, x=prov_attack.values, y=prov_attack.index, orientation='h', labels={'index':'','x':'Attack Count'}, 
             color = prov_attack.index)
fig.update_layout(height=500, width=800)
fig.show()

**Wow....89% of the attacks are unprovoked attacks. Better be careful when you go out for a swim or a surf!**

Lets check if certain activities lead to provoked activities per the definition.

In [13]:
prov_activity = df[df.Type == 'Provoked'].groupby('Activity')['Activity'].count().sort_values(ascending=False)[:10]

fig = px.bar(prov_activity, x=prov_activity.values, y=prov_activity.index, orientation='h', labels={'index':'','x':'Attack Count'},
            title = 'Provoked Attacks by Activity')
fig.update_layout(height=500, width=800)
fig.show()

Fantastic!! 80% of the provoked attacks are due to Fishing, Spearfishing and Sharkfinishing, which agrees with the definition. But shark fishing....seriously....DUH...you are going to get attacked! Wonder if it is all in Japan since it is banned in the rest of the world.

## What type of sharks attack the most?

The species column provides the data on the type of shark that attack the most. There are approximately 1,549 species. Did not know there were so many shark species. Let's figure out shark behaviour by species.

In [14]:
#rename species column
df = df.rename(columns={'Species ':'Species'})

# species that attack most
species_attack = df.groupby('Species')['Species'].count().sort_values(ascending=False)[:15]

data = go.Bar(x = species_attack.index,y=species_attack.values,text=species_attack.values,textposition='auto', marker_color='red')

layout = go.Layout(title = 'Shark Attack by Species', 
                   xaxis=dict(title='Species'),
                   yaxis=dict(title='Attack Count',visible=False),
                 paper_bgcolor='rgba(0,0,0,0)',
                 plot_bgcolor='rgba(0,0,0,0)'
                  )

fig = go.Figure(
    data=data,
    layout=layout
)   
fig.show()

Based on a preliminary analysis, looks like white sharks attack the most. There is noise in the data with values such as 'Invalid','Shark Involvement not confirmed' etc. And thhe size for species is also duplicated. For example 5' shark and 1.5 m (5') are the same species. Cleaning the column will add more value for the analysis. 