## This notebook outlines efforts to augment our original piracy event dataframe with the meteorological data  surrounding those events to see if any trends existed between weather factors and acts of piracy. The test dataset came from the Copernicus Marine Data Store. The methods proved effective but due to time constraints only wave heights were augmented for a dataset that covered 3 of the 30 year period. There are other datasets that provide the desired information on the Copernicus site, but future work would be tuning this method to extract the pertinent information from those other datasets. 

# Copernicus Marine Data Store: (https://data.marine.copernicus.eu/products)
# Ocean Wave Data 2021-2024: https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_WAV_001_027/description

# NOTE: This notebook demonstrates the process used to arrive at the final product. The clean code that takes the dataset and spits out a csv with the augemented wave heights is in the Github repository:

#                                        Copernicus_Finaly.py

In [4]:
#!pip install copernicusmarine
#!pip install netCDF4
from IPython.display import IFrame
%matplotlib inline
import matplotlib.pyplot as pltmm
import pydap
import pandas as pd
import numpy as np
import math
import datetime
from datetime import timedelta
import xarray as xr
import copernicusmarine  as copernicus_marine
# To avoid warning messages
import warnings
warnings.filterwarnings('ignore')

In [5]:
## Product's filename for GLOBAL_ANALYSISFORECAST_WAV_001_027 wave heights 
datasetID = 'cmems_mod_glo_wav_anfc_0.083deg_PT3H-i'

In [6]:
#Super nice because its is a 24 GB dataset but doesn't download to my computer. I can work with it here in the notebook 
#and save the data I actually want to a different file later. Drawback is could lose all I'm working on if connection to server goes down 
#only three variables I care about
#This data is only from 30 Sep 2021 to 25 Mar 2024 - will need to extend with other or just show as use-case
DS = copernicus_marine.open_dataset(dataset_id = datasetID)
DS

#Username: mgalvan
#Passwrd: 27OviedoSpain

Fetching catalog: 100%|██████████| 4/4 [00:50<00:00, 12.69s/it]


INFO - 2024-03-19T03:44:32Z - Dataset version was not specified, the latest one was selected: "202311"
INFO - 2024-03-19T03:44:32Z - Dataset part was not specified, the first one was selected: "default"
INFO - 2024-03-19T03:44:35Z - Service was not specified, the default one was selected: "arco-geo-series"
username: mgalvan
password: ········


The variable I care about:
1. VHM0 [m]
    Spectral significant wave height (Hm0)
    sea_surface_wave_significant_height

In [7]:
#get full list of variables available to dataset
DS.data_vars

Data variables:
    VCMX       (time, latitude, longitude) float32 ...
    VHM0       (time, latitude, longitude) float32 ...
    VHM0_SW1   (time, latitude, longitude) float32 ...
    VHM0_SW2   (time, latitude, longitude) float32 ...
    VHM0_WW    (time, latitude, longitude) float32 ...
    VMDR       (time, latitude, longitude) float32 ...
    VMDR_SW1   (time, latitude, longitude) float32 ...
    VMDR_SW2   (time, latitude, longitude) float32 ...
    VMDR_WW    (time, latitude, longitude) float32 ...
    VPED       (time, latitude, longitude) float32 ...
    VSDX       (time, latitude, longitude) float32 ...
    VSDY       (time, latitude, longitude) float32 ...
    VTM01_SW1  (time, latitude, longitude) float32 ...
    VTM01_SW2  (time, latitude, longitude) float32 ...
    VTM01_WW   (time, latitude, longitude) float32 ...
    VTM02      (time, latitude, longitude) float32 ...
    VTM10      (time, latitude, longitude) float32 ...
    VTPK       (time, latitude, longitude) float3

In [8]:
#Get list of dimensions
DS.coords

Coordinates:
  * latitude   (latitude) float64 -80.0 -79.92 -79.83 ... 89.83 89.92 90.0
  * longitude  (longitude) float64 -180.0 -179.9 -179.8 ... 179.8 179.8 179.9
  * time       (time) datetime64[ns] 2021-10-01T03:00:00 ... 2024-03-28

In [9]:
#Get info on specific variable
DS.VHM0

In [10]:
#info on specific dimensions:
DS.time, DS.latitude

(<xarray.DataArray 'time' (time: 7272)>
 array(['2021-10-01T03:00:00.000000000', '2021-10-01T06:00:00.000000000',
        '2021-10-01T09:00:00.000000000', ..., '2024-03-27T18:00:00.000000000',
        '2024-03-27T21:00:00.000000000', '2024-03-28T00:00:00.000000000'],
       dtype='datetime64[ns]')
 Coordinates:
   * time     (time) datetime64[ns] 2021-10-01T03:00:00 ... 2024-03-28
 Attributes:
     valid_min:  2021-10-01T03:00:00.000000000
     valid_max:  2024-03-28T00:00:00.000000000,
 <xarray.DataArray 'latitude' (latitude: 2041)>
 array([-80.      , -79.916667, -79.833333, ...,  89.833333,  89.916667,
         90.      ])
 Coordinates:
   * latitude  (latitude) float64 -80.0 -79.92 -79.83 -79.75 ... 89.83 89.92 90.0
 Attributes:
     axis:           Y
     long_name:      latitude coordinate
     standard_name:  latitude
     step:           0.08333587646484375
     units:          degrees_north
     valid_min:      -80.0
     valid_max:      90.0)

In [11]:
#read in clean dataset
piracy_df = pd.read_csv('Data_Files\[Clean] IMO Piracy - 2000 to 2022 (PDV 01-2023).csv')
#drop lat/long nulls: actually useful info on map
piracy_df_map = piracy_df.dropna(subset=['Latitude','Longitude'])

In [12]:
piracy_df

Unnamed: 0,Incident Date,Ship Name,Ship Flag,Ship Type,Area,Latitude,Longitude,Consequences to Crew,Part of Ship Raided,Ship Status,Weapons Used,Flag - Crew Injuries,Flag - Crew Held Hostage,Flag - Crew Missing,Flag - Crew Deaths,Flag - Crew Assaulted
0,3/18/2010,AL-ASA'A,Yemen,Dhow,In international waters,,,Ship Hijacked,Not Stated,Not Stated,None or Not Reported,False,True,False,False,False
1,5/25/2010,AL JAWAT,Yemen,Dhow,In international waters,,,Ship Hijacked,Not Stated,Steaming,None or Not Reported,False,False,False,False,False
2,2/13/2011,AL FARDOUS,Yemen,Fishing vessel,In territorial waters,,,Ship Hijacked,Not Stated,Steaming,None or Not Reported,False,False,False,False,False
3,4/16/2011,ABDI KHAN,Yemen,Fishing vessel,In international waters,11.900000,54.083333,Ship Hijacked,Not Stated,Steaming,None or Not Reported,False,True,False,False,False
4,1/14/2012,AL WASIL,Yemen,Dhow,In international waters,,,Ship Hijacked,Not Stated,Steaming,None or Not Reported,False,True,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4809,12/30/2009,GULF ELAN,Bahamas,Chemical tanker,In port area,22.690000,113.696667,No Consequences or Not Stated,Engine Room,At Anchor,None or Not Reported,False,False,False,False,False
4810,11/7/2008,CEC FUTURE,Bahamas,General cargo ship,In international waters,12.766667,45.933333,Ship Hijacked,Engine Room,Steaming,None or Not Reported,False,True,False,False,False
4811,2/13/2006,ASPEN ARROW,Bahamas,General cargo ship,In port area,,,No Consequences or Not Stated,Engine Room,At Anchor,None or Not Reported,False,False,False,False,False
4812,10/24/2009,ELLEN S,Antigua and Barbuda,Container ship,In territorial waters,20.641667,106.880000,Not Reported,Engine Room,At Anchor,None or Not Reported,False,False,False,False,False


In [14]:
#convert piracy_df_names incident dates to datetimes
piracy_df_map['Incident Date'] = pd.to_datetime(piracy_df_map['Incident Date'])
piracy_df_map['Incident Date']

3      2011-04-16
5      2012-03-02
6      2012-04-21
10     2009-04-26
11     2018-12-04
          ...    
4808   2015-04-26
4809   2009-12-30
4810   2008-11-07
4812   2009-10-24
4813   2006-06-05
Name: Incident Date, Length: 2810, dtype: datetime64[ns]

# Testing process on one Piracy Event:

Row entry:

5/28/2022	Magnum Energy	Marshall Islands	Bulk carrier	In international waters	1.141666667	103.475	Not Reported	Store Rooms	Steaming	Knives	FALSE	FALSE	FALSE	FALSE	FALSE


In [15]:
piracy_df_names = piracy_df_map.set_index('Ship Name')

In [16]:
piracy_df_names.loc['Magnum Energy']

Incident Date                   2022-05-28 00:00:00
Ship Flag                          Marshall Islands
Ship Type                              Bulk carrier
Area                        In international waters
Latitude                                   1.141667
Longitude                                   103.475
Consequences to Crew                   Not Reported
Part of Ship Raided                     Store Rooms
Ship Status                                Steaming
Weapons Used                                 Knives
Flag - Crew Injuries                          False
Flag - Crew Held Hostage                      False
Flag - Crew Missing                           False
Flag - Crew Deaths                            False
Flag - Crew Assaulted                         False
Name: Magnum Energy, dtype: object

# Step 1: determine my buffer / can play with this once I start seeing data or not seeing data


# Step 2: Extract the lat, lon from piracy event

    
# Step 3: Create a subset of data with the buffer to the Magnum Energy event 





In [19]:
#Step 1
#first testing the time buffer for the specific instance, then putting it into a loop
#setting buffers so I have data that straddles the event in a 0.1x0.1 degree box lat/lon and 1 day (30 mins before 30 after)
#will play to tune the buffers to get as small a dataset as possible 
time_buffer = pd.Timedelta(0.5, unit="h") #d "day", h "hour", m "minute"
lat_buffer = 0.05 #degree 
lon_buffer = 0.05 #degree 

#Step 2
#set the lat and lon to the Magnum Energy event
lat = piracy_df_names.loc['Magnum Energy'].Latitude
lon = piracy_df_names.loc['Magnum Energy'].Longitude
time = piracy_df_names.loc['Magnum Energy']['Incident Date']

Timestamp('2022-05-28 00:00:00')

In [22]:
#Step 3
#Use the buffer to make a subset of the weather data for points around the event
lat_add = lat + lat_buffer
lat_subtract = lat - lat_buffer
lon_add = lon + lon_buffer
lon_subtract = lon - lon_buffer
time_add = time + time_buffer
time_subtract = time - time_buffer

#create my data subset for the bubble around this specific piracy event
subset_Magnum_Energy = DS['VHM0'].sel(
    latitude = slice(lat_subtract,lat_add),
    longitude = slice(lon_subtract,lon_add),
    time = slice(time_subtract, time_add))
subset_Magnum_Energy

### Inspeting the dataset, tuning was perfect (maybe by luck) and I got one reading very close to the event location at that time. 

### If tuning is "imperfect" and I get more data points "around" the event, the values for wave height (my principle variable of interest) are means, and I can further average them to get a rough estimate of the wave height (indicator of sea state) at that time. Ultimately still outputting one value for that event. 

In [23]:
print(lat, lon)
#NOT HALF BAD MATEY - not sure if my dimension buffer will always filter out leaving only one but lets keep sailing
#also of note these readings are for the day, so a good bit of variability (report didnt have hour/minute just day)

1.141666667 103.475


In [25]:
df = subset_Magnum_Energy.to_dataframe()
df
#Notice there is a NaN value for the max wave height VCMX......don't really need it....or the wave direciton for that matter. 
#but it raises the question of what do I do if I have a NaN value and have to expand the buffer, thus letting in potentially
#more than one value for a particular coordinate? That is when I'd use the nearest method or .minarg stack overflow

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,VHM0
time,latitude,longitude,Unnamed: 3_level_1
2022-05-28,1.166667,103.5,0.29


# Now build a function that builds these subsets and extracts the wave heights for each piracy event in our piracy dataframe. 

In [31]:
#For this case with the wave data from 30 Sep 2021 to 25 Mar 2024 
DS_start_date = datetime.date(2021,9,30)
DS_end_date = datetime.date(2024,3,25)

def get_wave_height(row):
    if row['Incident Date'] >= DS_start_date:
        #print(row['Incident_Date'])
        lat = row['Latitude']
        lon = row['Longitude']
        
        #Use the buffer to make a subset of the weather data for points around the event
        lat_add = lat + lat_buffer
        lat_subtract = lat - lat_buffer
        lon_add = lon + lon_buffer
        lon_subtract = lon - lon_buffer
        time = row['Incident Date']
        time_add = time + time_buffer
        time_subtract = time - time_buffer
        
        #create my data subset for the bubble around this specific piracy event for wave height
        #hopefully this is only going to return one value for each point but it may return more or none
        subset = DS[['VHM0', 'VMDR', 'VCMX']].sel(
            latitude = slice(lat_subtract,lat_add),
            longitude = slice(lon_subtract,lon_add),
            time = slice(time_subtract, time_add))
        
        return subset['VHM0'].values[0][0]

In [33]:
#write code to augment this data to the new matrix 
piracy_df_map["Wave Height"] = piracy_df_map.apply(get_wave_height, axis=1)

In [34]:
piracy_df_map[piracy_df_map["Wave Height"].notna()]

Unnamed: 0,Incident Date,Ship Name,Ship Flag,Ship Type,Area,Latitude,Longitude,Consequences to Crew,Part of Ship Raided,Ship Status,Weapons Used,Flag - Crew Injuries,Flag - Crew Held Hostage,Flag - Crew Missing,Flag - Crew Deaths,Flag - Crew Assaulted,Wave Height
13,2022-11-21,Wayne / Rig T20,Vanuatu,Tug,In international waters,1.354333,103.293500,Not Reported,Not Stated,Steaming,None or Not Reported,False,False,False,False,False,"[0.93, 0.91999996]"
20,2022-01-29,Hai Duong 29 (& Hakuryu 5),Vietnam,Supply ship,In international waters,1.173167,103.478333,Not Reported,Not Stated,Steaming,None or Not Reported,False,False,False,False,False,[0.14]
51,2022-11-15,Armenistis,Togo,Ro-ro-cargo ship,In international waters,7.200000,-13.283333,Ship Hijacked,Not Stated,Steaming,Firearms,False,True,False,False,False,[1.11]
72,2022-12-06,HK Tug 9 / LKH 7887,Singapore,Tug,In territorial waters,10.376167,107.040000,Not Reported,Not Stated,Steaming,None or Not Reported,False,False,False,False,False,"[0.45, 0.56]"
73,2022-09-17,HK Tug 9 (LKH 2882 Barge),Singapore,Tug,In territorial waters,1.292667,104.157833,Not Reported,Not Stated,Steaming,None or Not Reported,False,False,False,False,False,[0.11]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4748,2021-12-18,Seacon 8,Hong Kong,Bulk carrier,In international waters,1.241667,104.033333,Not Reported,Engine Room,Steaming,None or Not Reported,False,False,False,False,False,[0.25]
4785,2021-11-25,Atalanti SB,Cyprus,Bulk carrier,In international waters,1.243333,104.047667,Not Reported,Engine Room,Steaming,None or Not Reported,False,False,False,False,False,"[0.099999994, 0.14]"
4794,2022-07-29,Equinox Agnandoussa,Cayman Islands,Bulk carrier,In international waters,1.311667,104.350167,Not Reported,Engine Room,Steaming,Knives,False,False,False,False,False,[0.29999998]
4797,2022-05-12,Pelican,Cameroon,Tanker,In international waters,1.174500,103.425833,Not Reported,Engine Room,Steaming,None or Not Reported,False,False,False,False,False,[0.08]


In [35]:
#Write out to csv file for analysis (145 events updated)
piracy_df_map.to_csv('piracy_df_waves.csv', index=False) 

# Successful method. Would extend in future work to build out weather data for these piracy events. 