# AIS Trade Estimation

> This notebook implements the AIS-derived trade estimation methodology from: [Global economic impacts of COVID-19 lockdown measures stand out in high-frequency shipping data, Vesrchuur et al 2021](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0248818#pone.0248818.s001)

0.1 Load libraries

In [1]:
%load_ext autoreload
%autoreload 2

In [3]:
import os
from os.path import join
from glob import glob
import pandas as pd
import geopandas as gpd
import folium
from shapely.geometry import Point
import folium.plugins as plugins
import seaborn as sns
from matplotlib import pyplot as plt

import numpy as np
import datetime
from datetime import timedelta
from port_call import create_port_calls
import boto3

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics 

In [4]:
pd.options.display.max_columns = None

0.2 Load AIS data

AIS data has been accessed through the [UN Big Data Platform](https://unstats.un.org/wiki/display/AIS/AIS+Handbook).  

We have retrieved all AIS messages that intersect a 20 km. buffer from each port in Syria (Al Ladhiqiyah, Tartus, Baniyas), available from December 1st 2018 to August 31st 2022.

In [6]:
ais_dir = join(os.path.expanduser("~"), 'data', 'AIS')
data_dir = join(ais_dir, 'Syria')

In [7]:
data_files = glob(data_dir+"/*.csv")

In [8]:
dfs = [pd.read_csv(f, index_col=0) for f in data_files]

In [9]:
df = pd.concat(dfs)

In [10]:
data = df.copy()
data[['Date','Time']] = data.dt_pos_utc.str.split(' ',expand=True)
data['hour'] = pd.to_datetime(data['Time'], format='%H:%M:%S',errors = 'coerce').dt.hour
data['dtg'] = pd.to_datetime(data['Date'] + ' ' + data['Time'])
data = data.loc[data.Date<"2022-08-01"].copy()
data.drop(['Date','Time', 'hour', 'dtg'], axis=1, inplace=True)

In [19]:
df_new = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/AIS_Syria_2022-08-01_2023-03-20.csv', index_col=0)

In [15]:
df_new2 = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/syria_2022_08/AIS_Syria_2023-01-01-2023-06-01.csv', index_col=0)

In [20]:
df_new[['Date','Time']] = df_new.dt_pos_utc.str.split(' ',expand=True)
df_new['hour'] = pd.to_datetime(df_new['Time'], format='%H:%M:%S',errors = 'coerce').dt.hour
df_new['dtg'] = pd.to_datetime(df_new['Date'] + ' ' + df_new['Time'])

In [21]:
df_new = df_new.loc[df_new.Date<"2023-03-01"].copy()
df_new.drop(['Date','Time', 'hour', 'dtg'], axis=1, inplace=True)

In [22]:
df_new = pd.concat([df_new, df_new2])

In [23]:
df = pd.concat([data, df_new])

In [24]:
df.polygon_name.unique()

array(['AL LADHIQIYAH', 'TARTUS', 'BANIYAS'], dtype=object)

In [25]:
df_latakia = df.loc[df.polygon_name=="AL LADHIQIYAH"].copy()
df_tartus = df.loc[df.polygon_name=="TARTUS"].copy()
df_baniyas = df.loc[df.polygon_name=="BANIYAS"].copy()

In [26]:
start = '2018-12-01'
# end = '2022-08-31'
# end = '2023-03-19'
end = '2023-06-01'
country = "Syria"

#### 1.2 Data Preparation

Run port call algorithm that converts AIS messages into trips.  

The port call algorithm sorts AIS data by MMSI (unique identified per ship) and date/time of AIS message. It then captures days with consecutive AIS messages and groups them into a single trip, calculating new attributes for each trip: turnaround-time (total time spent at port between arrival and departure), difference in reported draft, and difference in  direction of travel.

In [27]:
trips = create_port_calls(df_latakia, start, end, "Latakia", "Syria")

In [28]:
len(trips)

870

Filter out trips that are not related to trade  
- Only cargo and tanker vessels
- Turnaround time < 5 hours or > 95th percentile (refueling or maintenance)
- Turnaround time < 10 hours and direction within 45 degrees (passing by)
- Vessel types or subtypes that do not relate to trade

In [29]:
trips.loc[trips.turn_around_time < 5, "passing2"] = "refueling"
trips.loc[trips.turn_around_time > np.percentile(trips.turn_around_time, 95), "passing2"] = "maintenance"
trips.loc[(trips.turn_around_time < 10) & (trips['heading-diff'].abs()<45), "passing2"] = "passing by"

In [30]:
trips.passing2.value_counts()

passing by     112
maintenance     44
refueling       20
Name: passing2, dtype: int64

In [31]:
data = trips.loc[trips.passing2.isna()].copy()

In [32]:
len(data)

694

In [33]:
data.vessel_type.unique()

array(['Cargo', 'UNAVAILABLE', 'Passenger', 'Unknown', 'Reserved'],
      dtype=object)

In [34]:
data.vessel_type_main.unique()

array(['Container Ship', 'General Cargo Ship', nan,
       'Oil And Chemical Tanker', 'Bulk Carrier', 'Ro Ro Cargo Ship',
       'Specialized Cargo Ship', 'Offshore Vessel'], dtype=object)

In [35]:
data.loc[data.vessel_type=="Reserved", "vessel_type_main"].unique()

array([nan], dtype=object)

In [36]:
data.loc[data.vessel_type=="Passenger", "vessel_type_main"].unique()

array(['Ro Ro Cargo Ship', nan], dtype=object)

In [37]:
drop_types = ["Reserved", "Passenger"]
data = data.loc[~(data.vessel_type.isin(drop_types))].copy()

In [38]:
len(data)

686

In [39]:
data.vessel_type_sub.unique()

array([nan, 'Oil Products Tanker', 'Vehicles Carrier',
       'Livestock Carrier', 'Offshore Tug Supply Ship'], dtype=object)

In [40]:
drop_sub_types = ["Offshore Tug Supply Ship", "Crewboat"]
data = data.loc[~(data.vessel_type_sub.isin(drop_sub_types))].copy()

In [41]:
len(data)

684

In [42]:
data.loc[:, "idx"] = data.index

In [44]:
data.head(2)

Unnamed: 0,date-leave,mmsi,turn_around_time,datetime-leave,draught-out,heading-out,seconds,datetime-entry,date-entry,vessel_type,dtg,length,width,draught-in,heading-in,vessel_type_main,vessel_type_sub,draught-diff,heading-diff,passing,port-name,country,passing2,idx
1,2019-01-07,271044398.0,31.154722,2019-01-07 15:06:36,8.9,0.0,112157.0,2019-01-06 07:57:19,2019-01-06,Cargo,2019-01-06 07:39:16,152.0,24.0,8.9,0.0,Container Ship,,0.0,0.0,3,Latakia,Syria,,1
2,2019-01-10,271040403.0,10.353056,2019-01-10 19:54:11,9.3,48.0,37271.0,2019-01-10 09:33:00,2019-01-10,Cargo,2019-01-10 09:33:00,184.0,25.0,9.3,48.0,Container Ship,,0.0,0.0,3,Latakia,Syria,,2


In [46]:
ais_dir

'/home/wb514197/data/AIS'

In [47]:
data.to_csv(join(ais_dir, f"port_calls_latakia_{end}.csv"))

In [48]:
port_calls_latakia = data.copy()

#### 1.3 Predict DWT

Match vessel information from Fleetmon database, and impute missing values

In [49]:
data.loc[:, "mmsi"] = data.loc[:, "mmsi"].astype('int')

In [50]:
vessels = pd.read_csv(join(ais_dir, "vessel_database_full.csv"))

In [51]:
# vessels = pd.read_excel(join(ais_dir, "Pacific_vessel_database_2021_new.xlsx"), 0)

In [52]:
vessels.head(2)

Unnamed: 0,mmsi,dwt,length_design,width_design,draught_design,block,vessel_type_AIS,sub_type,sub_vessel_type_AIS
0,667001408,82.0,34.5,11.008,2.1,0.67,Cargo,Container Ship,Container
1,351863000,90.0,30.5,9.919,3.5,0.67,Cargo,Container Ship,Container


In [53]:
vessels.rename(columns={
    'length_design':'length',
    'width_design':'width',
    'vessel_type_AIS':'vessel_type',
    'sub_vessel_type_AIS':'sub_vessel_type'
}, inplace=True)

In [54]:
vessels = vessels.loc[~(vessels.mmsi.isna())].copy()

In [55]:
vessels.loc[:, "mmsi"] = vessels.loc[:, "mmsi"].astype('int')

In [56]:
vessels_filt = vessels[['sub_type', 'vessel_type', 'mmsi', 'dwt', 'length', 'width', 'draught_design']].copy()

In [57]:
# vessels_filt = vessels[['sub_vessel_type', 'vessel_type', 'mmsi', 'dwt', 'gt', 'length', 'width', 'Draft', 'flag']].copy()

In [58]:
vessels_filt.loc[:, 'vessel_info'] = 1

In [59]:
data_join = data.merge(vessels_filt, on='mmsi', how='left', suffixes=['_ais', '_vessel'])

In [60]:
data_join.loc[data_join.vessel_info.isna(), "vessel_info"] = 0

In [61]:
data_join.vessel_info.value_counts()

1.0    673
0.0     11
Name: vessel_info, dtype: int64

Prepare attributes for later use

In [62]:
block_coefficients = {
    'bulk':0.79,
    'container':0.73,
    'tanker':0.83,
    'LNG':0.79
}
# groups = {
#     'Dry Bulk':['Bulk carrier'],
#     'Container':['Container ship'],
#     'General Cargo':['General cargo vessel', 'Cargo ship'],
#     'Reefer':['Reefer'],
#     'Vehicle carrier':['Vehicle carrier'],
#     'Ro Ro Cargo Ship':['RoRo ship'],
#     'Animal products':['livestock carrier'],
#     'Forest':['Forest-producer carrier']
# }

In [63]:
# data_join.to_csv(join(ais_dir, "port_calls_latakia.csv"))
# data_join = pd.read_csv(join(ais_dir, "port_calls_latakia.csv"), index_col=0)

In [64]:
data_join.loc[:, "draught_delta"] = 1

In [65]:
data_join.loc[data_join['draught-diff']==0, "draught_delta"] = 0

In [66]:
data_join.draught_delta.value_counts()/len(data_join)

0    0.635965
1    0.364035
Name: draught_delta, dtype: float64

Only 38% have some draught delta

In [67]:
data_join.loc[:, 'draft_ais_max'] = data_join[['draught-in', 'draught-out']].max(axis=1)

In [68]:
train = data_join[(data_join.vessel_info==1) & (~data_join.dwt.isna())].copy()

In [69]:
df_missing = data_join[data_join.dwt.isna()].copy()

In [70]:
len(train)+len(df_missing)== len(data_join)

True

In [71]:
train.length_vessel.isna().sum(), train.width_vessel.isna().sum(), train.draught_design.isna().sum(), train.dwt.isna().sum()

(0, 0, 0, 0)

In [72]:
# train.dwt.astype('int')

In [73]:
y = list(train.dwt.astype('int'))
X = [[row.length_vessel, row.width_vessel, row.draught_design] for idx, row in train.iterrows()]
len(y)==len(X)

True

In [74]:
# Spliting arrays or matrices into random train and test subsets
# i.e. 70 % training dataset and 25 % test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [75]:
# creating a RF classifier
clf = RandomForestClassifier(n_estimators = 500, random_state=1) 
 
# Training the model on the training dataset
# fit function is used to train the model using the training sets as parameters
clf.fit(X_train, y_train)
 
# performing predictions on the test dataset
y_pred = clf.predict(X_test)

# using metrics module for accuracy calculation
print("ACCURACY OF THE MODEL: ", metrics.accuracy_score(y_test, y_pred))
print("R-Squared OF THE MODEL: ", metrics.r2_score(y_test, y_pred))

ACCURACY OF THE MODEL:  0.7928994082840237
R-Squared OF THE MODEL:  0.9957374991564852


In [76]:
X_missing = [[row.length_ais, row.width_ais, row.draft_ais_max] for idx, row in df_missing.iterrows()]

In [77]:
pred_with_missing = clf.predict(X_missing)

In [78]:
df_missing.dwt = pred_with_missing

In [79]:
data_join.loc[data_join.dwt.isna(), 'dwt'] = df_missing.dwt

In [80]:
df = data_join.copy()

Add block coefficient category

In [81]:
df.columns

Index(['date-leave', 'mmsi', 'turn_around_time', 'datetime-leave',
       'draught-out', 'heading-out', 'seconds', 'datetime-entry', 'date-entry',
       'vessel_type_ais', 'dtg', 'length_ais', 'width_ais', 'draught-in',
       'heading-in', 'vessel_type_main', 'vessel_type_sub', 'draught-diff',
       'heading-diff', 'passing', 'port-name', 'country', 'passing2', 'idx',
       'sub_type', 'vessel_type_vessel', 'dwt', 'length_vessel',
       'width_vessel', 'draught_design', 'vessel_info', 'draught_delta',
       'draft_ais_max'],
      dtype='object')

In [82]:
df.loc[:, 'block_cat'] = ""

In [83]:
df.loc[:, 'vessel_type_best'] = df.sub_type #df.sub_vessel_type
df.vessel_type_best.isna().sum()

200

In [84]:
df.loc[df.vessel_type_best.isna(), 'vessel_type_best'] = df.loc[df.vessel_type_best.isna(), 'vessel_type_vessel'] # vessel_type_vessel
df.vessel_type_best.isna().sum()

11

In [85]:
df.loc[df.vessel_type_best.isna(), 'vessel_type_best'] = df.loc[df.vessel_type_best.isna(), 'vessel_type_sub']
df.vessel_type_best.isna().sum()

11

In [86]:
df.loc[df.vessel_type_best.isna(), 'vessel_type_best'] = df.loc[df.vessel_type_best.isna(), 'vessel_type_main']
df.vessel_type_best.isna().sum()

10

In [87]:
df.loc[df.vessel_type_best.isna(), 'vessel_type_best'] = df.loc[df.vessel_type_best.isna(), 'vessel_type_ais']
df.vessel_type_best.isna().sum()

0

In [88]:
df.loc[df.vessel_type_best.str.lower().str.contains('container'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('bulk'), "block_cat"] = 'bulk'
df.loc[df.vessel_type_best.str.lower().str.contains('tanker'), "block_cat"] = 'tanker'
df.loc[df.vessel_type_best.str.lower().str.contains('lng'), "block_cat"] = 'LNG'

In [89]:
df.loc[df.block_cat=='', "vessel_type_best"].unique()

array(['General Cargo Ship', 'Cargo', 'Vehicles Carrier',
       'Livestock Carrier'], dtype=object)

In [90]:
df.loc[df.vessel_type_best.str.lower().str.contains('cargo'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('roro'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('vehicle'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('forest'), "block_cat"] = 'bulk'
df.loc[df.vessel_type_best.str.lower().str.contains('livestock'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('reefer'), "block_cat"] = 'container'

In [91]:
df.block_cat.unique()

array(['container', 'tanker', 'bulk'], dtype=object)

Get draft differences from next

In [92]:
len(df)

684

In [93]:
df['date-leave'] = pd.to_datetime(df['date-leave'])

In [111]:
df_old = df.loc[df['date-leave']<"2022-08-01"].copy()

In [196]:
df_new = df.loc[(df['date-leave']>="2022-08-01")].copy() # & (df['date-leave']<"2023-03-01")

In [197]:
# df_new2 = df.loc[(df['date-leave']>="2023-03-01")].copy()

In [202]:
len(df_old), len(df_new), #len(df_new2)

(610, 74)

In [203]:
aws_bucket = "wbgdecinternal-ntl"
path = "Andres_Temp/AIS"

In [204]:
client = boto3.client('s3')

In [205]:
file_list = client.list_objects_v2(Bucket=aws_bucket, Prefix=path, MaxKeys=5000)
bucket_files = [os.path.join("s3://", aws_bucket, content['Key']) for content in file_list['Contents']]
# bucket_files

In [206]:
df_next_old = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/Latakia_next_draught.csv', index_col=0)

In [207]:
len(df_next_old)

308

In [212]:
# df_next_new = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/Latakia_next_draught_2023-03-19.csv', index_col=0)
# df_next_new = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/Latakia_next_draught_2023-06-04.csv', index_col=0)
df_next_new = pd.read_csv(join(ais_dir, "Latakia_next_draught_2023-06-04.csv"), index_col=0)

In [213]:
# df_next_new2 = pd.read_csv(bucket_files[9], index_col=0)
# df_next_new2 = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/Latakia_next_draught_2023-06-04.csv', index_col=0)
# df_next_new2 = pd.read_csv(join(ais_dir, "Latakia_next_draught_2023-06-04.csv"), index_col=0)

In [214]:
# df_next_new['dt_insert_utc'] = pd.to_datetime(df_next_new['dt_insert_utc'])

In [215]:
# df_next_new = pd.concat([df_next_new, df_next_new2])

In [216]:
df_orig = pd.read_csv(join(ais_dir, "port_calls_latakia.csv"), index_col=0)

In [217]:
df_next_old = df_next_old.join(df_orig[['turn_around_time', 'date-leave']])

In [218]:
df_next_old['date-leave'] = pd.to_datetime(df_next_old['date-leave'])

In [219]:
df_next_new.head(2)

Unnamed: 0,mmsi,dt_insert_utc,longitude,latitude,imo,vessel_name,vessel_type,vessel_type_cargo,vessel_class,length,width,flag_country,destination,draught,sog,cog,rot,heading,nav_status,dt_pos_utc,dt_static_utc,vessel_type_main,vessel_type_sub
774,215418000,2022-08-03 04:17:53,35.53114,33.932183,9379351.0,VENTO DI TRAMONTANA,Cargo,"Carrying DG,HS or MP,IMO hazard or Pollutant C...",A,184.0,24.0,Malta,EGALY,8.0,8.1,252.3,0.0,252.0,Under Way Using Engine,2022-08-03 04:17:48,2022-08-03 04:11:52,,
776,271044659,2022-08-05 02:14:52,35.486932,35.534593,9147215.0,MED CERKEZKOY,Cargo,"Carrying DG,HS or MP,IMO hazard or Pollutant C...",A,0.0,0.0,Turkey,TRMER,7.4,9.9,269.2,0.0,266.0,Under Way Using Engine,2022-08-05 02:14:46,2022-08-05 02:13:24,General Cargo Ship,


In [220]:
df2 = df.merge(df_next_old[['mmsi', 'draught', 'heading', 'date-leave', 'turn_around_time']], on=['mmsi', 'date-leave', 'turn_around_time'], how='left').copy()

In [221]:
# df2.loc[df2.idx==774]

In [222]:
# df_next_new.loc[774]

In [225]:
df_next_new['idx'] = df_next_new.index

In [226]:
df2 = df2.merge(df_next_new[['draught', 'heading', 'idx']], how='left').copy()

In [227]:
# df2

In [228]:
# df = df.merge(df_next[['mmsi', 'draught', 'heading', 'date-leave', 'turn_around_time']], on=['mmsi', 'date-leave', 'turn_around_time'], how='left').copy()

In [229]:
df = df2.copy()

In [230]:
df.loc[(df.draught_delta==0) & (~df.draught.isna()), "draught-out"] = df.loc[(df.draught_delta==0) & (~df.draught.isna()), "draught"]
df.loc[(df.draught_delta==0) & (~df.draught.isna()), "heading-out"] = df.loc[(df.draught_delta==0) & (~df.draught.isna()), "heading"]

In [231]:
df['draught-diff'] = df['draught-out'] - df['draught-in']

In [232]:
df.loc[df['draught-diff']!=0, "draught_delta"] = 1

In [233]:
df.draught_delta.value_counts()

1    493
0    191
Name: draught_delta, dtype: int64

Calculate payload

In [234]:
df.loc[df['draught-in']==0, 'draught_delta'] = 0
df.loc[df['draught-out']==0, 'draught_delta'] = 0
df.loc[df['draught-in']==0, 'draught-diff'] = 0
df.loc[df['draught-out']==0, 'draught-diff'] = 0

In [235]:
df.loc[:, 'draft_max'] = df[['draught_design', 'draught-in', 'draught-out']].max(axis=1)

In [236]:
df.columns

Index(['date-leave', 'mmsi', 'turn_around_time', 'datetime-leave',
       'draught-out', 'heading-out', 'seconds', 'datetime-entry', 'date-entry',
       'vessel_type_ais', 'dtg', 'length_ais', 'width_ais', 'draught-in',
       'heading-in', 'vessel_type_main', 'vessel_type_sub', 'draught-diff',
       'heading-diff', 'passing', 'port-name', 'country', 'passing2', 'idx',
       'sub_type', 'vessel_type_vessel', 'dwt', 'length_vessel',
       'width_vessel', 'draught_design', 'vessel_info', 'draught_delta',
       'draft_ais_max', 'block_cat', 'vessel_type_best', 'draught', 'heading',
       'draft_max'],
      dtype='object')

In [237]:
df.loc[:, 'length'] = df.loc[:, 'length_vessel']
df.loc[df.length.isna(), 'length'] = df.loc[df.length.isna(), 'length_ais']

In [238]:
df.loc[:, 'width'] = df.loc[:, 'width_vessel']
df.loc[df.width.isna(), 'width'] = df.loc[df.width.isna(), 'width_ais']

In [239]:
def calculate_payload(row, direction):
    
    # get parameters
    Din = row['draught-in']
    Dout = row['draught-out']
    if Din==Dout:
        return pd.NA
    else:
        L = row['length']
        W = row['width']
        Dd = row['draft_max']
        DWT = row['dwt']
        Cb = block_coefficients[row['block_cat']]
        pw = 1.025 # tons/m3  1029kg/m3

        if direction=="in":
            Dr = Din
        if direction =='out':
            Dr = Dout

        # calcualte block coefficient for reported draft
        Cbr = 1 - ((1 - Cb)*((Dr / Dd)**(1/3)))

        # calcualte payload rate
        payload = ((((Cbr*Dr) - (Cb*Dd)) * (L*W*pw)) + DWT ) / DWT
        return payload

In [240]:
df.loc[:, "payload_in"] = df.apply(lambda x: calculate_payload(x, 'in'), axis=1)
df.loc[:, "payload_out"] = df.apply(lambda x: calculate_payload(x, 'out'), axis=1)

In [241]:
df.head(2)

Unnamed: 0,date-leave,mmsi,turn_around_time,datetime-leave,draught-out,heading-out,seconds,datetime-entry,date-entry,vessel_type_ais,dtg,length_ais,width_ais,draught-in,heading-in,vessel_type_main,vessel_type_sub,draught-diff,heading-diff,passing,port-name,country,passing2,idx,sub_type,vessel_type_vessel,dwt,length_vessel,width_vessel,draught_design,vessel_info,draught_delta,draft_ais_max,block_cat,vessel_type_best,draught,heading,draft_max,length,width,payload_in,payload_out
0,2019-01-07,271044398,31.154722,2019-01-07 15:06:36,6.4,0.0,112157.0,2019-01-06 07:57:19,2019-01-06,Cargo,2019-01-06 07:39:16,152.0,24.0,8.9,0.0,Container Ship,,-2.5,0.0,3,Latakia,Syria,,1,Container Ship,Cargo,13623.0,151.0,24.0,8.25,1.0,1,8.9,container,Container Ship,6.4,0.0,8.9,151.0,24.0,1.0,0.551421
1,2019-01-10,271040403,10.353056,2019-01-10 19:54:11,9.3,48.0,37271.0,2019-01-10 09:33:00,2019-01-10,Cargo,2019-01-10 09:33:00,184.0,25.0,9.3,48.0,Container Ship,,0.0,0.0,3,Latakia,Syria,,2,Container Ship,Cargo,20624.133195,184.0,25.0,9.4,1.0,0,9.3,container,Container Ship,,,9.4,184.0,25.0,,


In [242]:
# trade_in = (df.payload_in*df.dwt).sum()
# trade_out = (df.payload_out*df.dwt).sum()
# trade_tot = trade_in+trade_out
# fr_im = trade_in/trade_tot
# fr_exp = trade_out/trade_tot
# (fr_im, fr_exp)

In [243]:
df.loc[:, "export_val"] = 0
df.loc[:, "import_val"] = 0

In [244]:
for idx, row in df.iterrows():
    delta = row['draught-diff']
    if delta < 0:
        import_val = (row.payload_in*row.dwt) - (row.payload_out*row.dwt)
        export_val = 0
    elif delta > 0:
        import_val = 0
        export_val = (row.payload_out*row.dwt) - (row.payload_in*row.dwt)
    df.loc[idx, 'import_val'] = import_val
    df.loc[idx, 'export_val'] = export_val

In [245]:
df_sub = df.loc[df.draught_delta==1].copy()

In [246]:
avg_flow = df_sub[['mmsi', 'import_val', 'export_val']].groupby('mmsi').mean()

In [247]:
avg_flow.loc[avg_flow.import_val>avg_flow.export_val, "import_avg"] = avg_flow.import_val
avg_flow.loc[avg_flow.import_val<avg_flow.export_val, "export_avg"] = avg_flow.export_val

In [248]:
avg_flow.fillna(0, inplace=True)
# avg_flow.reset_index(inplace=True)
avg_flow.drop(['import_val', 'export_val'], axis=1, inplace=True)

In [249]:
# trade_in = df.loc[df['draught-diff']<0, "import_val"].sum()
# trade_out = df.loc[df['draught-diff']>0, "export_val"].sum()
# trade_tot = trade_in+trade_out
# fr_im = trade_in/trade_tot
# fr_exp = trade_out/trade_tot
# (fr_im, fr_exp)

In [250]:
for idx, row in df.iterrows():
    delta = row['draught-diff']
    if delta == 0:
        try:
            avg = avg_flow.loc[row.mmsi]
            df.loc[idx, 'import_val'] = avg.import_avg # import_val
            df.loc[idx, 'export_val'] = avg.export_avg # export_val
        except:
            df.loc[idx, 'import_val'] = 0
            df.loc[idx, 'export_val'] = 0
#         import_val = fr_im*row.dwt
#         export_val = fr_exp*row.dwt

In [251]:
len(df.loc[df.draught_delta==0])

197

In [252]:
len(df.loc[(df.draught_delta==0) & (df.import_val>0)])

128

In [253]:
len(df.loc[(df.draught_delta==0) & (df.export_val>0)])

8

In [254]:
df_latakia_trade = df.copy()

In [256]:
df_latakia_trade.to_csv(join(ais_dir, 'summer', "preliminary_results_latakia_summer_v2.csv"))

In [257]:
# df_final = df_latakia_trade.copy()

In [258]:
# df_final['date-entry']

In [259]:
# df_final.loc[:, "ym"] = 

In [260]:
# df_final[['port-name', 'date-entry', 'export_val', 'import_val']].groupby(['port-name', 'date-entry']).sum().to_csv(join(ais_dir, "preliminary_results_latakia_11.22.v3.csv"))

### Run Tartus

#### 1.2 Data Prep

Run port call algorithm that converts AIS messages into trips

In [265]:
trips = create_port_calls(df_tartus, start, end, "Tartus", "Syria")

In [266]:
len(trips)

335

Filter out non-trade trips

In [267]:
trips.loc[trips.turn_around_time < 5, "passing2"] = "refueling"
trips.loc[trips.turn_around_time > np.percentile(trips.turn_around_time, 95), "passing2"] = "maintenance"
trips.loc[(trips.turn_around_time < 10) & (trips['heading-diff'].abs()<45), "passing2"] = "passing by"

In [268]:
trips.passing2.value_counts()

passing by     94
maintenance    17
refueling      11
Name: passing2, dtype: int64

In [269]:
data = trips.loc[trips.passing2.isna()].copy()

In [270]:
len(data)

213

In [271]:
data.vessel_type.unique()

array(['Cargo', 'Tanker', 'Other', 'Passenger', 'SAR', 'Tug',
       'UNAVAILABLE', 'Unknown', 'Towing'], dtype=object)

In [272]:
data.vessel_type_main.unique()

array(['Container Ship', nan, 'General Cargo Ship', 'Bulk Carrier',
       'Other Tanker', 'Specialized Cargo Ship', 'Ro Ro Cargo Ship',
       'Oil And Chemical Tanker'], dtype=object)

In [273]:
drop_types = ["Reserved", "Passenger", "Passenger", "Tug", "Towing", "SAR"]
data = data.loc[~(data.vessel_type.isin(drop_types))].copy()

In [274]:
len(data)

204

In [275]:
data.vessel_type_sub.unique()

array([nan, 'Vegetable Oil Tanker', 'Livestock Carrier',
       'Chemical Oil Products Tanker', 'Chemical Tanker'], dtype=object)

In [276]:
data.loc[:, "idx"] = data.index

In [277]:
# drop_sub_types = ["Offshore Tug Supply Ship", "Crewboat"]
# data = data.loc[~(data.vessel_type_sub.isin(drop_sub_types))].copy()

In [278]:
# data.loc[664]

In [280]:
data.to_csv(join(ais_dir, f"port_calls_tartus_{end}.csv"))

In [281]:
port_calls_tartus = data.copy()

#### 1.3 Predict DWT

Match vessel information from Fleetmon database, and impute missing values

In [283]:
data.loc[:, "mmsi"] = data.loc[:, "mmsi"].astype('int')

In [284]:
vessels = pd.read_csv(join(ais_dir, "vessel_database_full.csv"))

In [285]:
# vessels = pd.read_excel(join(ais_dir, "Pacific_vessel_database_2021_new.xlsx"), 0)

In [286]:
vessels.head(2)

Unnamed: 0,mmsi,dwt,length_design,width_design,draught_design,block,vessel_type_AIS,sub_type,sub_vessel_type_AIS
0,667001408,82.0,34.5,11.008,2.1,0.67,Cargo,Container Ship,Container
1,351863000,90.0,30.5,9.919,3.5,0.67,Cargo,Container Ship,Container


In [287]:
vessels.rename(columns={
    'length_design':'length',
    'width_design':'width',
    'vessel_type_AIS':'vessel_type',
    'sub_vessel_type_AIS':'sub_vessel_type'
}, inplace=True)

In [288]:
vessels = vessels.loc[~(vessels.mmsi.isna())].copy()

In [289]:
vessels.loc[:, "mmsi"] = vessels.loc[:, "mmsi"].astype('int')

In [290]:
vessels_filt = vessels[['sub_type', 'vessel_type', 'mmsi', 'dwt', 'length', 'width', 'draught_design']].copy()

In [291]:
vessels_filt.loc[:, 'vessel_info'] = 1

In [292]:
data_join = data.merge(vessels_filt, on='mmsi', how='left', suffixes=['_ais', '_vessel'])

In [293]:
data_join.loc[data_join.vessel_info.isna(), "vessel_info"] = 0

In [294]:
data_join.vessel_info.value_counts()

1.0    171
0.0     33
Name: vessel_info, dtype: int64

In [295]:
data_join.vessel_type_vessel.unique()

array(['Cargo', 'Tanker', nan], dtype=object)

For later use

In [296]:
block_coefficients = {
    'bulk':0.79,
    'container':0.73,
    'tanker':0.83,
    'LNG':0.79
}
groups = {
    'Dry Bulk':['Bulk carrier'],
    'Container':['Container ship'],
    'General Cargo':['General cargo vessel', 'Cargo ship'],
    'Reefer':['Reefer'],
    'Vehicle carrier':['Vehicle carrier'],
    'Ro Ro Cargo Ship':['RoRo ship'],
    'Animal products':['livestock carrier'],
    'Forest':['Forest-producer carrier']
}

In [297]:
# drop_sub_types = ["Offshore Tug Supply Ship", "Crewboat"]
# data_join = data_join.loc[~(data_join.vessel_type_sub.isin(drop_sub_types))].copy()

In [298]:
# data_join.to_csv(join(ais_dir, "port_calls_latakia.csv"))
# data_join = pd.read_csv(join(ais_dir, "port_calls_latakia.csv"), index_col=0)

In [299]:
data_join.loc[:, "draught_delta"] = 1

In [300]:
data_join.loc[data_join['draught-diff']==0, "draught_delta"] = 0

In [301]:
data_join.draught_delta.value_counts()/len(data_join)

0    0.852941
1    0.147059
Name: draught_delta, dtype: float64

In [302]:
data_join.draught_delta.value_counts()

0    174
1     30
Name: draught_delta, dtype: int64

In [303]:
data_join.loc[:, 'draft_ais_max'] = data_join[['draught-in', 'draught-out']].max(axis=1)

In [304]:
# data_join[data_join.dwt.isna()].iloc[0]
# data_join[data_join.draft_ais_max==0]

In [305]:
train = data_join[(data_join.vessel_info==1) & (~data_join.dwt.isna())].copy()

In [306]:
df_missing = data_join[data_join.dwt.isna()].copy()

In [307]:
len(train)+len(df_missing)== len(data_join)

True

In [308]:
train.length_vessel.isna().sum(), train.width_vessel.isna().sum(), train.draught_design.isna().sum(), train.dwt.isna().sum()

(0, 0, 0, 0)

In [309]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics 

In [310]:
y = list(train.dwt.astype('int'))
X = [[row.length_vessel, row.width_vessel, row.draught_design] for idx, row in train.iterrows()]
len(y)==len(X)

True

In [311]:
# Spliting arrays or matrices into random train and test subsets
# i.e. 70 % training dataset and 25 % test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [312]:
# creating a RF classifier
clf = RandomForestClassifier(n_estimators = 500, random_state=1) 
 
# Training the model on the training dataset
# fit function is used to train the model using the training sets as parameters
clf.fit(X_train, y_train)
 
# performing predictions on the test dataset
y_pred = clf.predict(X_test)

# using metrics module for accuracy calculation
print("ACCURACY OF THE MODEL: ", metrics.accuracy_score(y_test, y_pred))
print("R-Squared OF THE MODEL: ", metrics.r2_score(y_test, y_pred))

ACCURACY OF THE MODEL:  0.6046511627906976
R-Squared OF THE MODEL:  0.9503774550441378


In [313]:
# df_missing.loc[df_missing.length_ais==0]

In [314]:
X_missing = [[row.length_ais, row.width_ais, row.draft_ais_max] for idx, row in df_missing.iterrows()]

In [315]:
pred_with_missing = clf.predict(X_missing)

In [316]:
df_missing.dwt = pred_with_missing

In [317]:
data_join.loc[data_join.dwt.isna(), 'dwt'] = df_missing.dwt

In [318]:
# data_join = data_join.loc[data_join.draught_delta==1].copy()
# data_join = data_join.loc[data_join.vessel_info==1].copy()

In [319]:
# data_join['draught-diff'].describe()

In [320]:
df = data_join.copy()

Add block coefficient category

In [321]:
df.columns

Index(['date-leave', 'mmsi', 'turn_around_time', 'datetime-leave',
       'draught-out', 'heading-out', 'seconds', 'datetime-entry', 'date-entry',
       'vessel_type_ais', 'dtg', 'length_ais', 'width_ais', 'draught-in',
       'heading-in', 'vessel_type_main', 'vessel_type_sub', 'draught-diff',
       'heading-diff', 'passing', 'port-name', 'country', 'passing2', 'idx',
       'sub_type', 'vessel_type_vessel', 'dwt', 'length_vessel',
       'width_vessel', 'draught_design', 'vessel_info', 'draught_delta',
       'draft_ais_max'],
      dtype='object')

In [322]:
df.loc[:, 'block_cat'] = ""

In [323]:
df.loc[:, 'vessel_type_best'] = df.sub_type
df.vessel_type_best.isna().sum()

90

In [324]:
df.loc[df.vessel_type_best.isna(), 'vessel_type_best'] = df.loc[df.vessel_type_best.isna(), 'vessel_type_vessel']
df.vessel_type_best.isna().sum()

33

In [325]:
df.loc[df.vessel_type_best.isna(), 'vessel_type_best'] = df.loc[df.vessel_type_best.isna(), 'vessel_type_sub']
df.vessel_type_best.isna().sum()

33

In [326]:
df.loc[df.vessel_type_best.isna(), 'vessel_type_best'] = df.loc[df.vessel_type_best.isna(), 'vessel_type_main']
df.vessel_type_best.isna().sum()

33

In [327]:
df.loc[df.vessel_type_best.isna(), 'vessel_type_best'] = df.loc[df.vessel_type_best.isna(), 'vessel_type_ais']
df.vessel_type_best.isna().sum()

0

In [328]:
df.loc[df.vessel_type_best.str.lower().str.contains('container'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('bulk'), "block_cat"] = 'bulk'
df.loc[df.vessel_type_best.str.lower().str.contains('tanker'), "block_cat"] = 'tanker'
df.loc[df.vessel_type_best.str.lower().str.contains('lng'), "block_cat"] = 'LNG'

In [329]:
df.loc[df.block_cat=='', "vessel_type_best"].unique()

array(['Cargo', 'General Cargo Ship', 'Other', 'Livestock Carrier',
       'Ro Ro Cargo Ship'], dtype=object)

In [330]:
# df.loc[df.block_cat=='bulk', "sub_vessel_type"].unique()
# df.loc[df.block_cat=='container', "sub_vessel_type"].unique()

In [331]:
# df.loc[df.vessel_type_best=='Other', ]

In [332]:
df.loc[df.vessel_type_best.str.lower().str.contains('cargo'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('roro'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('vehicle'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('forest'), "block_cat"] = 'bulk'
df.loc[df.vessel_type_best.str.lower().str.contains('livestock'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('reefer'), "block_cat"] = 'container'
df.loc[df.vessel_type_best.str.lower().str.contains('other'), "block_cat"] = 'container'

In [333]:
df.block_cat.unique()

array(['container', 'tanker', 'bulk'], dtype=object)

Get draft differences from next

In [334]:
df['date-leave'] = pd.to_datetime(df['date-leave'])

In [335]:
df_old = df.loc[df['date-leave']<"2022-08-01"].copy()

In [336]:
df_new = df.loc[df['date-leave']>="2022-08-01"].copy()

In [337]:
len(df_old), len(df_new)

(184, 20)

In [338]:
aws_bucket = "wbgdecinternal-ntl"
path = "Andres_Temp/AIS"

In [339]:
client = boto3.client('s3')

In [340]:
file_list = client.list_objects_v2(Bucket=aws_bucket, Prefix=path, MaxKeys=5000)
bucket_files = [os.path.join("s3://", aws_bucket, content['Key']) for content in file_list['Contents']]
# bucket_files

In [341]:
df_next_old = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/Tartus_next_draught.csv', index_col=0)

In [342]:
len(df_next_old)

67

In [349]:
# df_next_new = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/Tartus_next_draught_2023-03-19.csv', index_col=0)
# df_next_new = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/Tartus_next_draught_2023-06-04.csv', index_col=0)
df_next_new = pd.read_csv(join(ais_dir, 'Tartus_next_draught_2023-06-04.csv'), index_col=0)

In [350]:
len(df_next_new)

7

In [351]:
df_orig = pd.read_csv(join(ais_dir, "port_calls_tartus.csv"), index_col=0)

In [352]:
df_next_old = df_next_old.join(df_orig[['turn_around_time', 'date-leave']])

In [353]:
df_next_old['date-leave'] = pd.to_datetime(df_next_old['date-leave'])

In [354]:
df_next_new.head(2)

Unnamed: 0,mmsi,dt_insert_utc,longitude,latitude,imo,vessel_name,vessel_type,vessel_type_cargo,vessel_class,length,width,flag_country,destination,draught,sog,cog,rot,heading,nav_status,dt_pos_utc,dt_static_utc,vessel_type_main,vessel_type_sub
306,273394890,2022-08-20 14:32:36,35.068172,34.744268,9160994.0,SPARTA II,Cargo,,A,122.0,18.0,Russian Federation,RU NVS,5.0,10.4,248.7,0.0,249.0,Under Way Using Engine,2022-08-20 14:32:30,2022-08-20 13:36:49,Ro Ro Cargo Ship,
307,671259100,2022-09-13 01:38:47,34.690505,36.620905,7500580.0,SMSM,Cargo,,A,109.0,17.0,Togo,MER TUR,5.0,8.8,335.6,0.0,0.0,Under Way Using Engine,2022-09-13 01:38:43,2022-09-13 01:26:54,,


In [355]:
df2 = df.merge(df_next_old[['mmsi', 'draught', 'heading', 'date-leave', 'turn_around_time']], on=['mmsi', 'date-leave', 'turn_around_time'], how='left').copy()

In [356]:
# df2.loc[df2.idx==774]

In [357]:
# df_next_new.loc[774]

In [358]:
df_next_new['idx'] = df_next_new.index

In [359]:
df2 = df2.merge(df_next_new[['draught', 'heading', 'idx']], how='left').copy()

In [360]:
# df = df.merge(df_next[['mmsi', 'draught', 'heading', 'date-leave', 'turn_around_time']], on=['mmsi', 'date-leave', 'turn_around_time'], how='left').copy()

In [361]:
df = df2.copy()

In [362]:
# df_next = pd.read_csv('s3://wbgdecinternal-ntl/Andres_Temp/AIS/Tartus_next_draught.csv', index_col=0)

In [363]:
# len(df_next)

In [364]:
# df_orig = pd.read_csv(join(ais_dir, "port_calls_tartus.csv"), index_col=0)

In [365]:
# df_orig[['turn_around_time', 'date-leave']].head(2)

In [366]:
# df_next = df_next.join(df_orig[['turn_around_time', 'date-leave']])

In [367]:
# df = df.merge(df_next[['mmsi', 'draught', 'heading', 'date-leave', 'turn_around_time']], on=['mmsi', 'date-leave', 'turn_around_time'], how='left').copy()

In [368]:
df.loc[(df.draught_delta==0) & (~df.draught.isna()), "draught-out"] = df.loc[(df.draught_delta==0) & (~df.draught.isna()), "draught"]
df.loc[(df.draught_delta==0) & (~df.draught.isna()), "heading-out"] = df.loc[(df.draught_delta==0) & (~df.draught.isna()), "heading"]

In [369]:
df['draught-diff'] = df['draught-out'] - df['draught-in']

In [370]:
df.loc[df['draught-diff']!=0, "draught_delta"] = 1

In [371]:
df.draught_delta.value_counts()

0    128
1     76
Name: draught_delta, dtype: int64

In [372]:
df.loc[df['draught-in']==0, 'draught_delta'] = 0
df.loc[df['draught-out']==0, 'draught_delta'] = 0
df.loc[df['draught-in']==0, 'draught-diff'] = 0
df.loc[df['draught-out']==0, 'draught-diff'] = 0

In [373]:
df.loc[:, 'draft_max'] = df[['draught_design', 'draught-in', 'draught-out']].max(axis=1)

In [374]:
df.loc[:, 'length'] = df.loc[:, 'length_vessel']
df.loc[df.length.isna(), 'length'] = df.loc[df.length.isna(), 'length_ais']

In [375]:
df.loc[:, 'width'] = df.loc[:, 'width_vessel']
df.loc[df.width.isna(), 'width'] = df.loc[df.width.isna(), 'width_ais']

In [376]:
def calculate_payload(row, direction):
    
    # get parameters
    Din = row['draught-in']
    Dout = row['draught-out']
    if Din==Dout:
        return pd.NA
    else:
        L = row['length']
        W = row['width']
        Dd = row['draft_max']
        DWT = row['dwt']
        Cb = block_coefficients[row['block_cat']]
        pw = 1.025 # tons/m3  1029kg/m3

        if direction=="in":
            Dr = Din
        if direction =='out':
            Dr = Dout

        # calcualte block coefficient for reported draft
        Cbr = 1 - ((1 - Cb)*((Dr / Dd)**(1/3)))

        # calcualte payload rate
        payload = ((((Cbr*Dr) - (Cb*Dd)) * (L*W*pw)) + DWT ) / DWT
        return payload

In [377]:
df.loc[:, "payload_in"] = df.apply(lambda x: calculate_payload(x, 'in'), axis=1)
df.loc[:, "payload_out"] = df.apply(lambda x: calculate_payload(x, 'out'), axis=1)

In [378]:
# df.loc[:, ['payload_in', 'payload_out']]

In [379]:
df.loc[:, "export_val"] = 0
df.loc[:, "import_val"] = 0

In [380]:
for idx, row in df.iterrows():
    delta = row['draught-diff']
    if delta < 0:
        import_val = (row.payload_in*row.dwt) - (row.payload_out*row.dwt)
        export_val = 0
    elif delta > 0:
        import_val = 0
        export_val = (row.payload_out*row.dwt) - (row.payload_in*row.dwt)
    df.loc[idx, 'import_val'] = import_val
    df.loc[idx, 'export_val'] = export_val

In [381]:
# trade_in = df.loc[df['draught-diff']<0, "import_val"].sum()
# trade_out = df.loc[df['draught-diff']>0, "export_val"].sum()
# trade_tot = trade_in+trade_out
# fr_im = trade_in/trade_tot
# fr_exp = trade_out/trade_tot
# (fr_im, fr_exp)

In [382]:
df_sub = df.loc[df.draught_delta==1].copy()

In [383]:
avg_flow = df_sub[['mmsi', 'import_val', 'export_val']].groupby('mmsi').mean()

In [384]:
avg_flow.loc[avg_flow.import_val>avg_flow.export_val, "import_avg"] = avg_flow.import_val
avg_flow.loc[avg_flow.import_val<avg_flow.export_val, "export_avg"] = avg_flow.export_val

In [385]:
avg_flow.fillna(0, inplace=True)
# avg_flow.reset_index(inplace=True)
avg_flow.drop(['import_val', 'export_val'], axis=1, inplace=True)

In [386]:
len(avg_flow)

27

In [387]:
for idx, row in df.iterrows():
    delta = row['draught-diff']
    if delta == 0:
        try:
            avg = avg_flow.loc[row.mmsi]
            df.loc[idx, 'import_val'] = avg.import_avg # import_val
            df.loc[idx, 'export_val'] = avg.export_avg # export_val
        except:
            df.loc[idx, 'import_val'] = 0
            df.loc[idx, 'export_val'] = 0

In [388]:
len(df.loc[df.draught_delta==0])

132

In [389]:
len(df.loc[(df.draught_delta==0) & (df.import_val>0)])

29

In [390]:
len(df.loc[(df.draught_delta==0) & (df.export_val>0)])

3

In [391]:
# trade_in = (df.payload_in*df.dwt).sum()
# trade_out = (df.payload_out*df.dwt).sum()
# trade_tot = trade_in+trade_out
# fr_im = trade_in/trade_tot
# fr_exp = trade_out/trade_tot
# (fr_im, fr_exp)

In [392]:
# df.loc[:, "export_val"] = 0
# df.loc[:, "import_val"] = 0

In [393]:
# for idx, row in df.iterrows():
#     delta = row['draught-diff']
#     if delta < 0:
#         import_val = (row.payload_in*row.dwt) - (row.payload_out*row.dwt)
#         export_val = 0
#     elif delta > 0:
#         import_val = 0
#         export_val = (row.payload_out*row.dwt) - (row.payload_in*row.dwt)
#     elif delta == 0:
#         if fr_im>fr_exp:
#             fr_import = fr_im-fr_exp
#             fr_export = 0
#         elif fr_im<fr_exp:
#             fr_export = fr_exp-fr_im
#             fr_import = 0
#         import_val = fr_import*row.dwt
#         export_val = fr_export*row.dwt
#     df.loc[idx, 'import_val'] = import_val
#     df.loc[idx, 'export_val'] = export_val

In [394]:
df_tartus_trade = df.copy()

Asseble both

In [395]:
df_final = pd.concat([df_latakia_trade, df_tartus_trade])

In [398]:
data_dir

'/home/wb514197/data/AIS/Syria'

In [399]:
df_final['date'] = pd.to_datetime(df_final['date-entry'])
df_final['ym'] = df_final['date'].dt.strftime('%Y-%m')

In [400]:
len(df_final)

888

In [401]:
len(df_final.loc[(df_final.export_val==0) & (df_final.import_val==0)])

161

In [402]:
df_final = df_final.loc[~ ((df_final.export_val==0) & (df_final.import_val==0))].copy()

In [403]:
df_final = df_final[['port-name', 'date-entry', 'ym', 'export_val', 'import_val']].groupby(['port-name', 'date-entry', 'ym']).sum()

In [404]:
df_final.reset_index(inplace=True)

In [405]:
# df_final['date-entry'] = pd.to_datetime(df_final['date-entry'])
# df_final['ym'] = df_final['date-entry'].dt.strftime('%Y-%m')
df_final.to_csv(join(ais_dir, 'summer', "preliminary_results_6.4.csv"))