What do I learn about emissions around European ports?

I have read the files in excel and joined a table with country codes and continents to join with the ports table using vlookups. This will allow me to filter the european ports.
The Excel code is as follows: =VLOOKUP(Sheet1!B2; Countries!\\$A\\$2:\\$B\\$263; 2;FALSE)

In [3]:
import pandas as pd
import seaborn as sns

In [4]:
ships = pd.read_excel ('AIS.xlsx')

In [5]:
ports = pd.read_excel('PORTS.xlsx')

I defined a function to calculate the distance between two coordinates using the geodesic function from geopy.


In [6]:
from geopy.distance import geodesic

def calculate_distance(lat1, lon1, lat2, lon2):
    return geodesic((lat1, lon1), (lat2, lon2)).kilometers

I proceded with running a code to use the distance function with two for loops to calculate distances between every observation and every port. As expected, this was a time consuming task for even a powerful PC and one which I had to stop after about one hour. I decided to take steps to simplify the calculation*. 

\\* *I considered backup plans to parralelize the compuation with gpu accelarion on CuPy, or using SQL on snowflake.*



I defined a **rectangular** **space** with 0.2 degrees LAT and 0.3 degrees LONG on each side and the ship's observed AIS position in the middle. 0.3 degrees Longitude is roughly 12 nautical miles(UNCLOS definition of territorial waters) at 50 degrees North which is an average for European ports. 0.2 degrees lattitude amounts to a similar distance at 50 degrees North.

In [7]:

ships['ports_in_range_indexes'] = None

for index, row in ships.iterrows():
    lng_distance = 0.3
    lat_distance = 0.2
    condition1 = ports['LONGITUDE'] > row['LONGITUDE'] - lng_distance
    condition2 = ports['LONGITUDE'] < row['LONGITUDE'] + lng_distance
    condition3 = ports['LATITUDE'] > row['LATITUDE'] - lat_distance
    condition4 = ports['LATITUDE'] < row['LATITUDE'] + lat_distance

    result_indexes = ports[condition1 & condition2 & condition3 & condition4].index
    if result_indexes.empty:
        continue
    else:
        ships.at[index, 'ports_in_range_indexes'] = result_indexes.tolist()

ships.to_csv('ships_box.csv', index=False)


Next, I filter the observations where ships where close to ports.

In [8]:
ships_around_ports = ships[ships['ports_in_range_indexes'].notna()]

In [9]:
ships_around_ports

Unnamed: 0,IMO,TIME,SOG,LONGITUDE,LATITUDE,E_CO2_kg,ports_in_range_indexes
4,9632143,2019-12-29 14:00:00,10.669231,121.780000,38.746667,4136.355716,[1627]
5,9619933,2019-07-05 18:00:00,0.000000,120.261050,35.998700,0.000000,"[1625, 1844]"
10,9769300,2019-09-16 04:00:00,0.000000,128.800500,35.077288,1925.230500,"[2459, 2464, 2465, 2478]"
12,9755945,2019-02-10 22:00:00,0.000000,114.280000,22.575000,1517.140800,"[1665, 1871]"
14,9839301,2019-04-18 14:00:00,0.000000,128.706667,34.875000,0.000000,[2459]
...,...,...,...,...,...,...,...
875991,9839466,2019-01-08 08:00:00,0.135019,128.570233,34.947090,1925.230500,"[2459, 2478]"
875992,9767388,2019-03-11 05:00:00,0.735000,0.136667,49.458333,1476.683600,"[544, 551, 5004, 5013, 5014, 5053, 5055, 7523]"
875993,9769283,2019-12-03 16:00:00,0.000000,128.805000,35.076667,1925.230500,"[2459, 2464, 2465, 2478]"
875994,9632129,2019-12-01 19:00:00,0.129966,121.865283,39.013786,1851.117300,[1627]


I can now use the distance function defined earlier on and run a code with for loops applied on a smaller dataset. 

In [10]:
matches = []

for index, ship in ships_around_ports.iterrows():
    lat_ship = float(ship['LATITUDE'].replace(',', '.')) if ',' in str(ship['LATITUDE']) else float(ship['LATITUDE'])
    lon_ship = float(ship['LONGITUDE'].replace(',', '.')) if ',' in str(ship['LONGITUDE']) else float(ship['LONGITUDE'])
    min_distance = float('inf') 
    closest_port_index = None

    for value in ship['ports_in_range_indexes']:
        lat_port = float(ports.loc[value, 'LATITUDE'].replace(',', '.')) if ',' in str(ports.loc[value, 'LATITUDE']) else float(ports.loc[value, 'LATITUDE'])
        lon_port = float(ports.loc[value, 'LONGITUDE'].replace(',', '.')) if ',' in str(ports.loc[value, 'LONGITUDE']) else float(ports.loc[value, 'LONGITUDE'])

        distance = calculate_distance(lat_ship, lon_ship, lat_port, lon_port)

        if distance < min_distance:
            min_distance = distance
            closest_port_index = value

    ships.at[index, 'closest_port'] = ports.loc[closest_port_index, 'PORT_NAME']
    ships.at[index, 'continent'] = ports.loc[closest_port_index, 'Continent']
    ships.at[index, 'distance_to_closest_port'] = min_distance
        

ships.to_csv('final_ship.csv', index=False)
