Global Maritime Pirate Attacks (1993-2020) Exploratory Data Cleaning and Analysis
Introduction:
Piracy and robbery against ships is a modern day challange for international trade and commerce. In this analysis, I will clean and analyze the data to answer some key questions I have come up with to indetify how to better prepare for these events.
Scenario:
The US Department of Transportation's Maritime Administration has come to me with questions to direct their future trade endeavors. They have been collecting data on piracy attacks since 1993 and are trying to predict if a boarding attack is more likely to happen. They have asked me to put together a model showing them how certain data they recieve can help predict when attacks will occur so they may make better decisions on trade routes.
Citation: Benden, P., Feng, A., Howell, C. and Dalla Riva, G.V., 2021. Crime at Sea: A Global Database of Maritime Pirate Attacks (1993–2020). Journal of Open Humanities Data, 7, p.19. DOI.
and
https://www.kaggle.com/n0n5ense/global-maritime-pirate-attacks-19932020?select=country_indicators.csv
I am going to use the list of variables in each CSV that was documented by Vagif on Kaggle.
Variable Analysis
pirate_attacks.csv

Date [Key] - Date of Attack. Used as a key with the Country Matrix data frame.
Time - Time the attack took place, either in UTC or Local Time.
Longitude - Longitude where the attack took place.
Latitude - Latitude where the attack took place.
Attack Type - Either NA (Missing), Attempted, Boarding, or Hijacked.
Location Description - A text description of the location. With attacks taking place at sea, it is not as simple as just       naming a city or town.
Nearest Country [Key] - The country code whose shore is closest to the attack. The resolution is around 1 km, it can be         much better depending on how detailed the mapping of the coast is in the vicinity.
EEZ Country [Key] - The Exclusive Economic Zone country code in which the attack took place, if it took place within an         EEZ.
Shore Distance - Distance in kilometres to the shore from the attack location. This is the true geographic distance over       the surface of the earth.
Shore Longitude - The longitude of the closest point on the shore to the attack.
Shore Latitude - The latitude of the closest point on the shore to the attack.
Attack Description - The text description of the attack if it exists.
Vessel Name - The name of the ship which was attacked if it is known.
Vessel Type - The type of vessel attacked if known.
Vessel Status - The status of the ship at the time it was attacked. Either NA (Missing), Berthed (Tied to a berth), Anchored (anchored at sea or in a harbour), or Steaming (ship underway).

country_indicators.csv

Country [Key] - The country in ISO3 country code format.
Corruption Index - Corruption Perceptions Index.
Homicide Rate - Total Intentional Homicides per 100,000 people.
GPD - Gross Domestic Product (US Dollars).
Total Fisheries Per Ton - Total Fisheries Production (metric tons).
Total Military - Total Number of Armed Forces personnel.
Population - Country Population.
Unemployment Rate - Percentage of the Country Unemployed.
Total GR - Total Government Revenue. An indication of how well the country collects taxes.
Industry - Industrial contribution to total GDP.

country_codes.csv

Country [Key] - The country in ISO3 country code format.
Region - The region the country is in.
Country Name - The English country name.
Key Variables
Country - How the datasets will be joined.

Longitude and Latitude - This could be used for creating a map visualization.

Time - When are events most frequentley occuring?

Shore Distance - Does distance to shore have a strong effect on probability of an event occuring?

Vessel Status - How does the probability of an event occuring change based on status?

Can using these country indicators tell us something about which countries to avoid when creating trade routes? Do these countries look similar to the logitude and latitude where attacks are frequent?

Corruption Index , Homicide Rate , GPD , Total Fisheries Per Ton , Total Military , Population , Unemployment Rate , Total GR , Industry

Key Questions
1) What variables have the highest impact on probability of an attack?

2) Has there been an increase in attacks in certain areas? How about an increase overall?

3) Do variables associated with countries have an affect on their safety in terms of maritime trade?

4) Has the frequency of different types of attacks increased or decreased over time?

5) Is it common to have an ally close by where attacks are frequent? This could be useful if the US decides to be a bit riskier and go around areas frequently attacked.

6) To prepare our sailors, what Vessel Status is most likely to be hit by an attack?

7) Is there value in creating a continous model to advise the administration on what course of action to take? Does the data provide the answers the Administration needs?



In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
df_pirates = pd.read_csv(r'C:\Users\ragod\OneDrive\Escritorio\Proyectos_TB\Proyectos_TB\visualizacion\archive\pirate_attacks.csv')
df_codes = pd.read_csv(r'C:\Users\ragod\OneDrive\Escritorio\Proyectos_TB\Proyectos_TB\visualizacion\archive\country_codes.csv')
df_country_data = pd.read_csv(r'C:\Users\ragod\OneDrive\Escritorio\Proyectos_TB\Proyectos_TB\visualizacion\archive\country_indicators.csv')
#df_pirates['attack_type'].unique()

In [7]:
df_pirates.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7511 entries, 0 to 7510
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   date                  7511 non-null   object 
 1   time                  1149 non-null   object 
 2   longitude             7511 non-null   float64
 3   latitude              7511 non-null   float64
 4   attack_type           7391 non-null   object 
 5   location_description  7503 non-null   object 
 6   nearest_country       7492 non-null   object 
 7   eez_country           7216 non-null   object 
 8   shore_distance        7511 non-null   float64
 9   shore_longitude       7511 non-null   float64
 10  shore_latitude        7511 non-null   float64
 11  attack_description    1173 non-null   object 
 12  vessel_name           6079 non-null   object 
 13  vessel_type           1173 non-null   object 
 14  vessel_status         6599 non-null   object 
 15  data_source          

In [10]:
df_country_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5899 entries, 0 to 5898
Data columns (total 11 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   country                  5899 non-null   object 
 1   year                     5865 non-null   float64
 2   corruption_index         3377 non-null   float64
 3   homicide_rate            3420 non-null   float64
 4   GDP                      5379 non-null   float64
 5   total_fisheries_per_ton  4991 non-null   float64
 6   total_military           4133 non-null   float64
 7   population               5858 non-null   float64
 8   unemployment_rate        5055 non-null   float64
 9   totalgr                  4119 non-null   float64
 10  industryofgdp            4875 non-null   float64
dtypes: float64(10), object(1)
memory usage: 507.1+ KB


In [11]:
df_codes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 217 entries, 0 to 216
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   country       217 non-null    object
 1   region        217 non-null    object
 2   country_name  217 non-null    object
dtypes: object(3)
memory usage: 5.2+ KB


In [12]:
df_codes

Unnamed: 0,country,region,country_name
0,ABW,Latin America & Caribbean,Aruba
1,AFG,South Asia,Afghanistan
2,AGO,Sub-Saharan Africa,Angola
3,ALB,Europe & Central Asia,Albania
4,AND,Europe & Central Asia,Andorra
...,...,...,...
212,XKX,Europe & Central Asia,Kosovo
213,YEM,Middle East & North Africa,"Yemen, Rep."
214,ZAF,Sub-Saharan Africa,South Africa
215,ZMB,Sub-Saharan Africa,Zambia
