# Puertos USA
- Carpeta ais_noaa_gov: deteccion de barcos en las costas de Estados Unidos.
- _Faltan puertos_
- CBP_drug_seizures: incautaciones de droga por año fiscal

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np



Revisar los tipos de cargo que hay. Y ordenar por tamaño total

In [2]:
# Cargar barcos ais noaa:
barcos_dia_10 = pd.read_csv('../ais_noaa_gov/df_procesado_AIS_2024_09_10.csv', header = 0, sep=',')

In [3]:
barcos_dia_10 = barcos_dia_10.drop(columns='Cargo')
barcos_dia_10['size'] = barcos_dia_10['Length']*barcos_dia_10['Width']*barcos_dia_10['Draft']
size_array = barcos_dia_10['size'].values.reshape(-1, 1)

In [4]:
scaler = MinMaxScaler(feature_range=(2,20))
scaler.fit(size_array)
barcos_dia_10['size_scaled'] = scaler.transform(size_array)

In [5]:
barcos_dia_10['size_scaled'] = pd.to_numeric(barcos_dia_10['size_scaled'])
barcos_dia_10 = barcos_dia_10.dropna()

Comentarios sobre dataframe:
- Se filtró por categorías de barco 'cargo' -> Revisar documentación.
- MMSI es un identificador único.
- BaseDateTime varía, pero al considerarse como una actualización cada 24 horas, pues se omiten las horas.
- Latitud y Longitud: es necesario confirmar el Datum (sistema de referencia)
- Ordenando la longitud (y en menor medida el ancho): podemos ver las dimensiones reales del barco.
- Podemos obtener también la velocidad media a la que viajan.
- Se podría estimar si el viaje es doméstico (de un puerto a otro del mismo país) o internacional.
- SOG: Speed Over Ground (knots), COG (Course Over Ground, degrees), Heading (True heading angle, degrees)
- Status: Navigational Status:
    - 0: under way using its engine: 682
    - 1: Anchored: 175
    - 3: Restricted Maneuverability: 11
    - 5: Moored (tied to another object to limit free movement): 420
    - 7: Engaged in fishing: 1
    - 8: Under way sailing: 5
    - 15: Undefined (Default): 2


In [6]:
import plotly.express as px
import plotly.graph_objects as go

Para dibujar mapas: https://medium.com/@alexroz/6-python-libraries-to-make-beautiful-maps-9fb9edb28b27

In [7]:
# # Visualizacion de los barcos en un mapa:
# fig_old = px.scatter_geo(barcos_dia_10, lat='LAT', lon='LON',
#                      color='VesselType', size='size',
#                      locationmode='USA-states') #otros: projection, size, hover_name, mapbox_style


In [7]:
fig = px.scatter_mapbox(barcos_dia_10, lat='LAT', lon='LON',
                     color='VesselType', size='size_scaled', zoom=3, height=600,
                     mapbox_style="open-street-map",
                     color_continuous_scale="Viridis",
                     ) #symbol="square"

fig.update_layout(
    margin={"r":0, "t":0, "l":0,"b":0})
fig.show()

  group = grouped.get_group(group_name if len(group_name) > 1 else group_name[0])


Agrupar el número de cargueros según su estatus:

In [8]:
flota_status = barcos_dia_10.groupby(by='Status').count()
flota_status

Unnamed: 0_level_0,MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Length,Width,Draft,TransceiverClass,size,size_scaled
Status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
0.0,682,682,682,682,682,682,682,682,682,682,682,682,682,682,682,682,682
1.0,175,175,175,175,175,175,175,175,175,175,175,175,175,175,175,175,175
3.0,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11
5.0,420,420,420,420,420,420,420,420,420,420,420,420,420,420,420,420,420
7.0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
8.0,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
15.0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2


Algoritmo de clustering para agrupar cargueros, en función del "puerto más próximo".
"Puerto-más-próximo": igual que un centroide, pero situado en el litoral. Habrá que repensar el algoritmo.

In [8]:
# Color de los barcos en funcion de CallSign:
fig2 = px.scatter_geo(barcos_dia_10, lat='LAT', lon='LON',
                     color='CallSign', size='size_scaled') #otros: projection, size, hover_name, mapbox_style

fig2.update_layout(title='Agrupacion de barcos por CallSign',
                   geo_scope = 'north america')

fig2.show()

La variable "Call Sign" no parece que tenga una relevancia geográfica.

In [None]:
# otros go.Figure: carpet, candlestick, heatmap, histogram, sankey

Si añadimos al dataframe los datos de los dias 1 a 12:

In [10]:
import glob
path = "../ais_noaa_gov/"

In [11]:
csv_files = glob.glob(f"{path}*.csv")

df_list = []

for file in csv_files:
    if "2024_09_29" not in file:
        dataframe_day = pd.read_csv(file)
    else:
        continue
    df_list.append(dataframe_day)

cargo_september_df = pd.concat(df_list, ignore_index=True)

In [12]:
cargo_september_df

Unnamed: 0,MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,Cargo,TransceiverClass
0,368270970,2024-09-03T00:00:03,33.77111,-118.21058,0.0,233.5,511.0,CARIBBEAN,IMO9694608,WDN3363,70.0,0.0,66.0,14.0,3.3,70.0,A
1,477929700,2024-09-03T00:00:04,34.94338,-75.20047,1.0,0.0,309.0,ZIM OPAL,IMO9967988,VRVI7,70.0,0.0,272.0,43.0,13.2,70.0,A
2,367414810,2024-09-03T00:00:03,28.96818,-95.28575,0.0,68.0,511.0,MAX CHERAMIE,IMO9184122,WDE9215,70.0,0.0,44.0,11.0,3.0,70.0,A
3,255805803,2024-09-03T00:00:01,38.83185,-74.51315,11.5,258.8,259.0,INDEPENDENT FUTURE,IMO9246712,CQAA,79.0,0.0,220.0,32.0,10.2,79.0,A
4,366971360,2024-09-03T00:00:02,47.07511,-90.96753,13.9,249.1,250.0,JOHN G MUNSON,IMO5173670,WDH7557,70.0,0.0,234.0,22.0,7.9,70.0,A
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16418,367371000,2024-09-04T00:06:22,13.45990,144.63663,0.1,207.0,135.0,PFC DEWAYNETWILLIAMS,IMO8219396,NHNU,70.0,5.0,205.0,32.0,7.9,70.0,A
16419,636023494,2024-09-04T09:51:22,13.43484,143.00657,13.9,66.1,70.0,EMERALD LIUHENG,IMO9991513,5LNZ6,70.0,0.0,200.0,32.0,11.9,70.0,A
16420,235110737,2024-09-04T22:26:49,13.89056,145.38682,18.6,268.5,266.0,EVER LIFTING,IMO9629122,2ILJ7,70.0,0.0,335.0,46.0,10.8,74.0,A
16421,352001547,2024-09-04T20:10:54,11.91910,144.24020,11.3,322.0,324.0,FUDA,IMO9331933,3E2551,70.0,0.0,190.0,32.0,6.8,70.0,A


In [13]:
cargo_september_df = cargo_september_df.drop(columns='Cargo')
cargo_september_df['size'] = cargo_september_df['Length']*cargo_september_df['Width']*cargo_september_df['Draft']
size_array = cargo_september_df['size'].values.reshape(-1, 1)

scaler = MinMaxScaler(feature_range=(2,20))
scaler.fit(size_array)
cargo_september_df['size_scaled'] = scaler.transform(size_array)

In [16]:
cargo_september_df.sort_values(by=["MMSI","BaseDateTime"])

Unnamed: 0,MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,TransceiverClass,size,size_scaled
2518,205221000,2024-09-02T04:46:18,41.43746,-130.72458,11.4,114.6,113.0,LOWLANDS PELIKAAN,IMO9700005,ONLV,70.0,0.0,180.0,30.0,9.3,A,50220.0,4.975524
459,205221000,2024-09-03T00:00:07,39.61722,-126.25712,12.3,123.4,123.0,LOWLANDS PELIKAAN,IMO9700005,ONLV,70.0,0.0,180.0,30.0,9.3,A,50220.0,4.975524
15973,205221000,2024-09-04T00:02:56,37.79459,-122.36053,0.1,214.8,139.0,LOWLANDS PELIKAAN,IMO9700005,ONLV,70.0,1.0,180.0,30.0,9.3,A,50220.0,4.975524
6459,205221000,2024-09-05T00:02:36,38.56399,-121.54941,0.0,39.9,129.0,LOWLANDS PELIKAAN,IMO9700005,ONLV,70.0,5.0,180.0,30.0,9.3,A,50220.0,4.975524
7840,205221000,2024-09-06T00:02:47,38.56410,-121.54958,0.0,39.9,132.0,LOWLANDS PELIKAAN,IMO9700005,ONLV,70.0,5.0,180.0,30.0,9.3,A,50220.0,4.975524
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4319,720202000,2024-09-08T00:00:08,25.80435,-80.25575,0.0,360.0,511.0,INTL VICTORY,IMO8977766,CPA3027,70.0,5.0,33.0,12.0,3.0,A,1188.0,2.070389
3520,720202000,2024-09-09T00:02:20,25.80434,-80.25575,0.0,360.0,511.0,INTL VICTORY,IMO8977766,CPA3027,70.0,5.0,33.0,12.0,3.0,A,1188.0,2.070389
9153,720202000,2024-09-10T00:02:57,25.80433,-80.25575,0.0,360.0,511.0,INTL VICTORY,IMO8977766,CPA3027,70.0,5.0,33.0,12.0,3.0,A,1188.0,2.070389
12853,720202000,2024-09-11T00:00:45,25.80433,-80.25575,0.1,360.0,511.0,INTL VICTORY,IMO8977766,CPA3027,70.0,5.0,33.0,12.0,2.2,A,871.2,2.051618


Comprobar que:
- Las dimensiones se mantienen igual para todas las jornadas
- Las variaciones del estatus: "cuantos cambian su estatus a lo largo de las jornadas".
    - Recordemos los tres estados con más frecuencias: 0 (underway using its engine), 1 (Anchored), 5 (Moored) y, en menor medida, 3 (Restricted Maneuverability)

Si mismo mmsi en varios registros, trazar linea de trayectoria:
- Para los 50, 100 barcos con size más grande. Agruparlos por mmsi. Modificado por 100 mmsi aleatorios.
- Solo dibujar cabecera del último día, pero trayectoria completa

In [36]:
np.random.seed(123)
top100_barcos = cargo_september_df['MMSI'].unique()
valores_aleatorios = np.random.choice(top100_barcos, size=100, replace=False)

In [37]:
top100_barcos_df = cargo_september_df[cargo_september_df['MMSI'].isin(valores_aleatorios)]

In [38]:
top100_barcos_df.sort_values(by=["MMSI","BaseDateTime"])

Unnamed: 0,MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,TransceiverClass,size,size_scaled
6830,209593000,2024-09-05T16:15:56,29.18263,-78.04305,12.3,324.4,320.0,CONTSHIP ZOE,IMO9434797,5BFL5,70.0,0.0,147.0,22.0,8.3,A,26842.2,3.590394
7958,209593000,2024-09-06T00:18:57,30.43741,-79.14644,0.7,349.5,17.0,CONTSHIP ZOE,IMO9434797,5BFL5,70.0,0.0,147.0,22.0,8.3,A,26842.2,3.590394
14066,209593000,2024-09-07T00:00:03,32.08705,-81.09708,7.2,316.8,318.0,CONTSHIP ZOE,IMO9434797,5BFL5,70.0,0.0,147.0,22.0,7.2,A,23284.8,3.379619
5082,209593000,2024-09-08T00:02:58,32.12327,-81.13506,0.0,350.6,157.0,CONTSHIP ZOE,IMO9434797,5BFL5,70.0,5.0,147.0,22.0,7.2,A,23284.8,3.379619
3454,209593000,2024-09-09T00:01:43,28.84749,-79.23931,11.4,167.6,170.0,CONTSHIP ZOE,IMO9434797,5BFL5,70.0,0.0,147.0,22.0,7.9,A,25548.6,3.513749
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13236,636093041,2024-09-11T00:03:42,33.76647,-118.27366,0.0,18.2,81.0,CONTI CONQUEST,IMO9293818,5LCJ8,71.0,5.0,334.0,42.0,12.3,A,172544.4,12.223218
11839,636093041,2024-09-12T00:03:42,33.76649,-118.27361,0.0,18.2,81.0,CONTI CONQUEST,IMO9293818,5LCJ8,71.0,5.0,334.0,42.0,12.3,A,172544.4,12.223218
4035,636093159,2024-09-09T18:59:31,21.07173,-64.00725,15.1,238.8,236.0,AS CLAUDIA,IMO9330549,D5MS9,71.0,0.0,222.0,30.0,10.6,A,70596.0,6.182797
9257,636093159,2024-09-10T00:06:06,20.43903,-65.17786,14.4,240.4,240.0,AS CLAUDIA,IMO9330549,D5MS9,71.0,0.0,222.0,30.0,11.2,A,74592.0,6.419560


Ahora se van a dibujar las trayectorias de los barcos escogidos. En principio el grosor de la línea se mantendrá igual, aunque podría variar en función de, por ejemplo, la velocidad tangencial.

- Se observa que en algunos barcos, la columna 'size' disminuye. Ver en función de qué variable podría ser (con cual correlaciona mejor). A simple vista, pudiera parecer que se trata del calado y que ello esté ligado a alguna otra variable.

- También, observar las variaciones de estatus.

- La variable 'size' cambia de valor después de pasar por puerto (status = 5) y zarpando hacia un nuevo rumbo.

La trayectoria de un barco en particular:

In [42]:
ship_zoe = top100_barcos_df.loc[top100_barcos_df["MMSI"] == 209593000]
ship_zoe = ship_zoe.sort_values(by="BaseDateTime")
ship_zoe['size_traced'] = ship_zoe.apply(lambda row: 0.25 if row['Status'] == 0.0 else row['size_scaled'], axis=1) 

In [116]:
# Cont'ship Zoe's Trajectory:
trace = px.line_mapbox(ship_zoe, lat='LAT', lon='LON')                  
# trace.update_traces()  # Adjust line width and opacity # line=dict(width=2, opacity=0.7)

In [117]:
# Create the figure
fig_ship_zoe = px.scatter_mapbox(ship_zoe, lat='LAT', lon='LON', color='Status',
                     size='size_traced',
                     zoom=3, height=600,
                     hover_name='BaseDateTime',
                     mapbox_style="open-street-map",
                     hover_data=["SOG", "COG", "Heading"]
                    )

# Add the trajectory lines to the figure
trace_data = trace.data[0]

fig_ship_zoe.add_trace(trace_data)

fig_ship_zoe.update_layout(
    margin={"r":0, "t":0, "l":0,"b":0})
fig_ship_zoe.show()





In [128]:
# Generalizar para 100 barcos:
top100_barcos_df['size_traced'] = top100_barcos_df.apply(lambda row: 0.25 if row['Status'] == 0.0 else row['size_scaled'], axis=1)
unique_mmsi = top100_barcos_df.loc[:,'MMSI'].unique() #loc[:,'MMSI']
traces = []
top100_barcos_df = top100_barcos_df.dropna(subset=['size_traced'])



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [138]:
top100_barcos_df

Unnamed: 0,MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,TransceiverClass,size,size_scaled,size_traced
33,538005359,2024-09-03T00:00:03,28.92471,-124.37106,12.8,120.3,122.0,STAR CHALLENGER,IMO9632997,V7DA8,79.0,0.0,200.0,32.0,10.8,A,69120.0,6.095345,0.25
45,367622850,2024-09-03T00:00:04,29.18720,-94.05138,4.1,281.5,280.0,FMS ENDURANCE,IMO9209075,WDH4937,75.0,0.0,62.0,13.0,3.2,A,2579.2,2.152817,0.25
85,366843080,2024-09-03T00:00:05,28.97061,-89.81967,0.8,206.7,511.0,CAPT LEVERT,IMO1123753,WDA7416,70.0,0.0,36.0,9.0,0.0,A,0.0,2.000000,0.25
108,255806123,2024-09-03T00:00:00,43.33614,-66.60371,15.0,277.9,273.0,EF AVA,IMO9389306,CQAA9,71.0,0.0,130.0,21.0,7.8,A,21294.0,3.261665,0.25
132,367750000,2024-09-03T00:00:02,28.98812,-92.90963,10.2,279.7,276.0,HARVEY SUPPLIER,IMO9388118,WDAW,70.0,0.0,77.0,18.0,4.1,A,5682.6,2.336693,0.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16220,255806187,2024-09-04T02:47:45,40.24877,-67.43078,9.0,286.6,286.0,UHL FRONTIER,IMO9789685,CQAI9,70.0,0.0,150.0,26.0,8.6,A,33540.0,3.987238,0.25
16284,338034000,2024-09-04T08:07:41,28.70709,-94.33558,9.8,332.8,338.0,CAPE HOPE,IMO9292319,WDJ4747,70.0,0.0,63.0,16.0,4.2,A,4233.6,2.250840,0.25
16318,305146000,2024-09-04T12:07:46,39.78867,-71.98383,11.6,289.0,289.0,INDUSTRIALCHALLENGER,IMO9213935,V2HI2,70.0,0.0,120.0,20.0,5.2,A,12480.0,2.739437,0.25
16351,538006472,2024-09-04T15:25:28,28.03963,-77.61681,12.0,260.3,262.0,FEDERAL CARIBOU,IMO9671096,V7NK6,70.0,0.0,200.0,23.0,7.8,A,35880.0,4.125882,0.25


In [136]:
for mmsi in unique_mmsi:
    df_filtered = top100_barcos_df.loc[top100_barcos_df['MMSI'] == mmsi]
    trace = px.line_mapbox(df_filtered, lat='LAT', lon='LON')
    traces.append(trace)

fig_traces = px.scatter_mapbox(top100_barcos_df, lat='LAT', lon='LON', color='Status',
                     size='size_traced',
                     zoom=3, height=600,
                     hover_name='BaseDateTime',
                     mapbox_style="open-street-map",
                     hover_data=["SOG", "COG", "Heading"]
                    )

for trace in traces:
    if trace.data:
        fig_traces.add_trace(trace.data[0])
    else:
        print("No hay datos para la traza {trace.name}")

fig_traces.update_layout(mapbox_style="open-street-map", margin={"r":0, "t":0, "l":0,"b":0})

# Display the figure
fig_traces.show()


No hay datos para la traza {trace.name}
No hay datos para la traza {trace.name}
No hay datos para la traza {trace.name}






La mera ejecución de las trayectorias de los cien barcos escogidos al azar resulta muy costosa computacionalmente. Es por ello que se decide proseguir con su ejecución en algún entorno más adaptado a ello: Google Colab o Databricks.

In [141]:
# Guardar csv con el dataframe de todos los barcos:
top100_barcos_df.to_csv(path + 'top100_barcos_df.csv', sep=',')
cargo_september_df.to_csv(path + 'cargo_september_df.csv', sep=',')

Opciones para estimar hubs portuarios:
- K-Medoids (PAM) permite escoger puntos de referencia reales (considerar puntos del litoral)
- DBSCAN: si las trayectorias tienen densidades variables. No hay necesidad de especificar centroides fijos.
- Agglomerative Clustering (Jerárquico con referencia jerárquica): agrupación basada en proximidad jerárquica a los puntos del litoral.

DBSCAN y Agglomerative Clustering permiten descubrir patrones de agrupamiento naturales (si no se asignan los centroides de manera manual).

Con todo, el primero en ser usado será 'AgglomerativeClustering'
- Considerar HDBSCAN