# Examples

Here follows examples of how to use the utility Python functions contained in `utils.py`.

In [2]:
import json
from utils import *

# Load ferries data
with open("ferries.json", "r") as file:
    ferries_data = json.load(file)

## `fetch_vessel_data`

Fetches historical vessel data from PONTOS-HUB through the REST API.
The function requires a specified time range because it is NOT possible to access all the data for a specific vessel with a single request. It is just too much data and the REST API has a limit of returning max 1 million rows per request.

The example below fetches the data of the ferry "Fragancia" as it is stored in the hub for a time range of 10 minutes.

In [30]:
# PONTOS vessel id for ferry "Fragancia"
vessel_id = "mmsi_265558290"

# Time range for fetching data (10 minutes)
start_time = "2024-11-01 12:00:00"
end_time = "2024-11-01 12:10:00"

# Fetch vessel data
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time)

# Print the first 5 data points
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")
unique_parameters = set(data_point['parameter_id'] for data_point in vessel_data)
print(f"The unique measurements available are:\n \t{'\n \t'.join(unique_parameters)}")
print("The first data point is:")
print(vessel_data[0])

Vessel data contains 7200 data points within the 10 minutes interval.
The unique measurements available are:
 	enginemain_fuelcons_lph_4
 	enginemain_speed_rpm_3
 	positioningsystem_sog_kn_1
 	positioningsystem_cog_deg_1
 	enginemain_fuelcons_lph_1
 	enginemain_speed_rpm_2
 	enginemain_fuelcons_lph_3
 	enginemain_speed_rpm_4
 	positioningsystem_latitude_deg_1
 	enginemain_fuelcons_lph_2
 	enginemain_speed_rpm_1
 	positioningsystem_longitude_deg_1
The first data point is:
{'time': '2024-11-01T12:00:00+00:00', 'parameter_id': 'enginemain_fuelcons_lph_1', 'value': 4.7999997}


To facilate the handling of the data, the `fetch_vessel_data` function can use the PONTOS-HUB resources that return data averaged within different "time buckets". This functionality is controlled by the `time_bucket` argument. The examples below demonstrate.

In [13]:
# Fetch vessel data, averaged within a 5 seconds time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="5 seconds")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")

# Fetch vessel data, averaged within a 30 seconds time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="30 seconds")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")

# Fetch vessel data, averaged within a 1 minute time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="1 minute")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")

# Fetch vessel data, averaged within a 5 minutes time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="5 minutes")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")

# Fetch vessel data, averaged within a 10 minutes time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="10 minutes")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")


Vessel data contains 1440 data points within the 10 minutes interval.
Vessel data contains 240 data points within the 10 minutes interval.
Vessel data contains 120 data points within the 10 minutes interval.
Vessel data contains 24 data points within the 10 minutes interval.
Vessel data contains 12 data points within the 10 minutes interval.


As expected, the number of data points decreases as the size of the time bucket increases. 

The `fetch_vessel_data` function can also request specific measurements. The `parameter_ids` argument takes a list of strings that are to be matched with the the tags of the available measurements. The examples below demonstrates this.

In [31]:
# Fech vessel data, with only measurement tags that include the words 'latitude' and 'longitude', avaraged within a 10 minutes time bucket
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time, 
    end_time, 
    time_bucket="10 minutes",
    parameter_ids=["latitude", "longitude"]
    )
print("With parameter_ids=['latitude', 'longitude'] ...")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")
print(f"The measurements available are:\n \t{'\n \t'.join([data_point['parameter_id'] for data_point in vessel_data])}")
print("")

# Fetch vessel data, only the measurements with the tag 'enginemain, averaged within a 10 minutes time bucket
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time, 
    end_time, 
    time_bucket="10 minutes",
    parameter_ids=["enginemain"]
    )
print("With parameter_ids=['enginemain_fuelcons_lph_1'] ...")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")
print(f"The measurements available are:\n \t{'\n \t'.join([data_point['parameter_id'] for data_point in vessel_data])}")
print("")

# Fetch vessel data, only the measurements with the tag 'enginemain_fuelcons_lph_1', averaged within a 10 minutes time bucket
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time, 
    end_time, 
    time_bucket="10 minutes",
    parameter_ids=["enginemain_fuelcons_lph_1"]
    )
print("With parameter_ids=['enginemain_fuelcons_lph_1'] ...")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")
print(f"The measurements available are:\n \t{'\n \t'.join([data_point['parameter_id'] for data_point in vessel_data])}")


With parameter_ids=['latitude', 'longitude'] ...
Vessel data contains 2 data points within the 10 minutes interval.
The measurements available are:
 	positioningsystem_latitude_deg_1
 	positioningsystem_longitude_deg_1

With parameter_ids=['enginemain_fuelcons_lph_1'] ...
Vessel data contains 8 data points within the 10 minutes interval.
The measurements available are:
 	enginemain_fuelcons_lph_1
 	enginemain_fuelcons_lph_2
 	enginemain_fuelcons_lph_3
 	enginemain_fuelcons_lph_4
 	enginemain_speed_rpm_1
 	enginemain_speed_rpm_2
 	enginemain_speed_rpm_3
 	enginemain_speed_rpm_4

With parameter_ids=['enginemain_fuelcons_lph_1'] ...
Vessel data contains 1 data points within the 10 minutes interval.
The measurements available are:
 	enginemain_fuelcons_lph_1


## `transform_vessel_data_to_dataframe`

Transforms vessel data into a Pandas DataFrame where each row corresponds to a timestamp. If not averaged, the resolution of the timestamps is 1 second and the first measurment in that second is used in the dataframe. If averaged, the resolution of the timestamp is the size of the time bucket. Also, if averaged, the `avg_` prefixes are removed.

The examples below demonstrate.

In [34]:
# Fetch vessel data without averaging
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time, 
    end_time, 
    parameter_ids=["latitude","longitude","sog","fuelcons"]
    )
df = transform_vessel_data_to_dataframe(vessel_data)
print(f"The data contains {len(df)} rows.")
df.head(5)

The data contains 600 rows.


parameter_id,time,enginemain_fuelcons_lph_1,enginemain_fuelcons_lph_2,enginemain_fuelcons_lph_3,enginemain_fuelcons_lph_4,positioningsystem_latitude_deg_1,positioningsystem_longitude_deg_1,positioningsystem_sog_kn_1
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,2024-11-01 12:00:00+00:00,4.8,4.7,4.7,4.8,59.3953,18.441317,0.0
1,2024-11-01 12:00:01+00:00,4.7,4.5,4.7,4.8,59.3953,18.441317,0.0
2,2024-11-01 12:00:02+00:00,4.6,4.4,4.7,4.8,59.3953,18.441317,0.0
3,2024-11-01 12:00:03+00:00,4.6,4.4,4.7,4.8,59.3953,18.441317,0.0
4,2024-11-01 12:00:04+00:00,4.5,4.3,4.7,4.8,59.395306,18.441317,0.0


In [None]:
# Fetch vessel data
vessel_data = fetch_vessel_data(
    vessel_id, 
    "2024-11-01 12:00:00",
    "2024-11-01 12:30:00",
    time_bucket="30 seconds",
    parameter_ids=["latitude","longitude","sog","fuelcons"]
)
df = transform_vessel_data_to_dataframe(vessel_data)

df.head(5)

The data contains 10 rows.


parameter_id,time,enginemain_fuelcons_lph_1,enginemain_fuelcons_lph_2,enginemain_fuelcons_lph_3,enginemain_fuelcons_lph_4,positioningsystem_latitude_deg_1,positioningsystem_longitude_deg_1,positioningsystem_sog_kn_1
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,2024-11-01 12:00:29.500000+00:00,4.371667,4.166667,4.661667,4.791666,59.395311,18.441323,0.0
1,2024-11-01 12:01:29.500000+00:00,4.4,4.2,4.57,4.775,59.395317,18.441328,0.0
2,2024-11-01 12:02:29.500000+00:00,4.4,4.2,4.601667,4.791666,59.395301,18.441329,0.0
3,2024-11-01 12:03:29.500000+00:00,4.401667,4.205,4.581667,4.805,59.395295,18.441338,0.0
4,2024-11-01 12:04:29.500000+00:00,4.425,4.238333,4.68,4.806666,59.395289,18.441341,0.0


## plot_paths

Plots a series of paths on a map using the pydeck library where a 'path' is a list of latitude, longitude tuples. The example below demonstrates.

In [None]:
# Fetch vessel data, averaged within a 30 seconds time bucket
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time, 
    end_time, 
    time_bucket="30 seconds",
    parameter_ids=["latitude","longitude","sog","fuelcons"]
)
df = transform_vessel_data_to_dataframe(vessel_data)

## `get_trips_from_vessel_data`



[{'bucket': '2024-11-01T12:00:00+00:00',
  'vessel_id': 'mmsi_265558290',
  'parameter_id': 'enginemain_fuelcons_lph_1',
  'avg_time': '2024-11-01T12:04:59.5+00:00',
  'avg_value': 6.228666714333333}]