# Computation of DCPA and TCPA from AIS-Based Ship Pair Data

## Purpose of this Notebook

This notebook computes two fundamental maritime collision-risk metrics:
**Distance at Closest Point of Approach (DCPA)** and
**Time to Closest Point of Approach (TCPA)** for pairs of ships using AIS data.

The objective of this step is purely *kinematic*:
to enrich ship-pair AIS observations with standardized risk indicators
that describe how close two vessels are expected to pass each other if they
maintain their current speed and course.

This notebook does **not** perform behavioral analysis.
Instead, it produces an intermediate dataset that serves as the foundation
for subsequent interaction and behavior analysis.

---

## Input Data

The input file `classified_near_collisions.csv` contains time-synchronized AIS
observations for pairs of ships that have already been filtered based on spatial
proximity.

Each row represents one timestamp and includes:
- Latitude and longitude of both ships
- Speed over ground (knots)
- Course over ground (degrees)
- Distance between ships
- Region classification (harbor / open sea)

The ship pairs and timestamps are derived from raw AIS messages.

---

## Methodology

The computation follows standard maritime kinematics:

1. Ship speeds are converted from knots to meters per second.
2. Courses are converted from degrees to radians.
3. Relative positions between ships are approximated in meters using a
   latitude-dependent conversion from degrees to meters.
4. Relative velocity vectors are computed from speed and course.
5. TCPA is calculated analytically using the relative motion equation.
6. DCPA is computed as the Euclidean distance at the time of closest approach.

Special cases are handled explicitly:
- Parallel or identical velocity vectors result in undefined TCPA/DCPA
- Negative TCPA values (CPA in the past) are clipped to zero

All computations are vectorized for efficiency and numerical consistency.

---

## Output

The result is a cleaned and enriched dataset saved as:

**`classified_ais_dcpa_tcpa.csv`**

This file is used as the direct input for subsequent interaction grouping
and behavioral analysis notebooks.


In [1]:
# ----------------------------------------------------
# 1. Imports
# ----------------------------------------------------
import pandas as pd
import numpy as np

# ----------------------------------------------------
# 2. Load Dataset
# ----------------------------------------------------
input_file = "classified_near_collisions.csv"

df = pd.read_csv(input_file)
print("Loaded:", df.shape, "rows")


# ----------------------------------------------------
# 3. Vectorized DCPA & TCPA Calculation
# ----------------------------------------------------

# Speeds from knots → m/s
v1 = df["speed_1"].values * 0.514444
v2 = df["speed_2"].values * 0.514444

# Courses in radians
c1 = np.deg2rad(df["course_1"].values)
c2 = np.deg2rad(df["course_2"].values)

# Lat/lon → meters conversion
lat_mean = np.deg2rad((df["lat_1"].values + df["lat_2"].values) / 2)

m_per_deg_lat = 110540
m_per_deg_lon = 111320 * np.cos(lat_mean)

# Relative positions (meters)
dx = (df["lon_2"].values - df["lon_1"].values) * m_per_deg_lon
dy = (df["lat_2"].values - df["lat_1"].values) * m_per_deg_lat

# Relative velocities
vx = v2 * np.sin(c2) - v1 * np.sin(c1)
vy = v2 * np.cos(c2) - v1 * np.cos(c1)

# Denominator for TCPA formula
denom = vx**2 + vy**2

# TCPA calculation (vectorized)
tcpa = - (dx * vx + dy * vy) / denom

# If denom is zero → no CPA (parallel or same velocity)
tcpa[denom < 1e-6] = np.nan

# Negative TCPA means CPA happened in the past → set to 0
tcpa = np.where(tcpa < 0, 0, tcpa)

# DCPA calculation (vectorized)
dcpa = np.sqrt((dx + vx * tcpa)**2 + (dy + vy * tcpa)**2)

# ----------------------------------------------------
# 4. Insert results into dataframe
# ----------------------------------------------------
df["DCPA_m"] = dcpa
df["TCPA_s"] = tcpa

print("Computed DCPA & TCPA!")


# ----------------------------------------------------
# 5. Save the updated dataset
# ----------------------------------------------------
output_file = "classified_ais_dcpa_tcpa.csv"
df.to_csv(output_file, index=False)

print("Saved to:", output_file)
df[["DCPA_m", "TCPA_s"]].head()


Loaded: (677044, 15) rows
Computed DCPA & TCPA!


  tcpa = - (dx * vx + dy * vy) / denom


Saved to: classified_ais_dcpa_tcpa.csv


Unnamed: 0,DCPA_m,TCPA_s
0,340.833377,25.287202
1,349.946902,0.0
2,409.039756,0.0
3,362.466954,0.0
4,378.588652,658.583962


# Clean

In [3]:
df = pd.read_csv("classified_ais_dcpa_tcpa.csv")

In [4]:
df.describe()

Unnamed: 0,mmsi_1,mmsi_2,lat_1,lon_1,lat_2,lon_2,speed_1,speed_2,course_1,course_2,distance_m,port_distance_m,port_distance_km,DCPA_m,TCPA_s
count,677044.0,677044.0,677044.0,677044.0,677044.0,677044.0,677044.0,677044.0,677044.0,677044.0,677044.0,677044.0,677044.0,676878.0,676878.0
mean,231785900.0,231659300.0,48.304529,-4.472769,48.304586,-4.472691,7.220654,6.665915,181.967049,180.476434,762.392491,6014.303168,6.014303,609.196661,234.744217
std,35994000.0,35798040.0,0.101062,0.128291,0.101136,0.1284,12.843809,11.469102,100.448652,100.190646,532.248806,7782.718021,7.782718,512.233884,1479.682066
min,205067000.0,923166.0,45.784878,-7.929798,45.783394,-7.941065,0.6,0.6,0.0,0.0,0.778376,212.161424,0.212161,0.000988,0.0
25%,227574000.0,227578500.0,48.31123,-4.496654,48.31114,-4.496688,3.3,3.1,93.4,92.6,307.325625,3579.799123,3.579799,187.731076,0.0
50%,227631400.0,227632800.0,48.320473,-4.46192,48.32058,-4.462052,5.2,5.0,177.2,175.8,621.894427,4941.997566,4.941998,429.733631,0.0
75%,227686500.0,227686500.0,48.3436,-4.410367,48.34379,-4.410393,7.3,7.2,271.7,270.5,1206.347858,6719.052142,6.719052,972.666628,182.265321
max,1000000000.0,1000000000.0,50.34189,-1.677815,50.33638,-1.678665,102.3,102.3,409.5,409.5,1851.998144,348985.15157,348.985152,1848.528077,508829.434399


In [5]:
nan_df = df[df["DCPA_m"].isna()]
nan_df.head()

Unnamed: 0,time_window,mmsi_1,mmsi_2,lat_1,lon_1,lat_2,lon_2,speed_1,speed_2,course_1,course_2,distance_m,port_distance_m,port_distance_km,region_type,DCPA_m,TCPA_s
2524,2015-10-14 16:58:00,227005550,246497000,48.34384,-4.565136,48.34401,-4.564853,7.9,7.9,251.0,251.0,28.2608,5406.225049,5.406225,open_sea,,
2547,2015-10-14 21:48:00,256462000,227005550,48.340866,-4.579097,48.34079,-4.57921,9.9,9.9,243.0,243.0,11.899265,5706.137636,5.706138,open_sea,,
2841,2015-10-16 12:04:00,257739000,228051000,48.385666,-4.456732,48.38632,-4.455497,0.6,0.6,13.0,13.0,116.858669,2125.668338,2.125668,harbor,,
4314,2015-10-30 18:56:00,236175000,228051000,48.3725,-4.462665,48.37185,-4.463272,5.3,5.3,27.0,27.0,85.110984,2218.80753,2.218808,harbor,,
5327,2015-11-08 23:29:00,305886000,227730220,48.356,-4.507332,48.35586,-4.508205,6.7,6.7,255.0,255.0,66.545117,5724.950799,5.724951,open_sea,,


In [6]:
'''file_path = "classified_ais_dcpa_tcpa.csv"
df = pd.read_csv(file_path)

print("Before cleaning:", df.shape)

# Drop rows where DCPA or TCPA is missing
df_clean = df.dropna(subset=["DCPA_m", "TCPA_s"])

print("After cleaning:", df_clean.shape)
print("Rows removed:", df.shape[0] - df_clean.shape[0])

# Save it back to the SAME file
df_clean.to_csv(file_path, index=False)

print("Cleaned file saved to:", file_path)
'''

'file_path = "classified_ais_dcpa_tcpa.csv"\ndf = pd.read_csv(file_path)\n\nprint("Before cleaning:", df.shape)\n\n# Drop rows where DCPA or TCPA is missing\ndf_clean = df.dropna(subset=["DCPA_m", "TCPA_s"])\n\nprint("After cleaning:", df_clean.shape)\nprint("Rows removed:", df.shape[0] - df_clean.shape[0])\n\n# Save it back to the SAME file\ndf_clean.to_csv(file_path, index=False)\n\nprint("Cleaned file saved to:", file_path)\n'

# Output Description: classified_ais_dcpa_tcpa.csv

This file contains ship-pair AIS observations enriched with kinematic
collision-risk metrics.

Each row represents one timestamp for a pair of ships.

---

## Identification & Time

- **time_window**  
  Timestamp of the AIS observation.

- **mmsi_1**  
  MMSI of the first ship.

- **mmsi_2**  
  MMSI of the second ship.

---

## Position Information

- **lat_1**, **lon_1**  
  Latitude and longitude of ship 1 (degrees).

- **lat_2**, **lon_2**  
  Latitude and longitude of ship 2 (degrees).

---

## Motion Information

- **speed_1**, **speed_2**  
  Speed over ground of ship 1 and ship 2 (knots).

- **course_1**, **course_2**  
  Course over ground of ship 1 and ship 2 (degrees).

---

## Distance & Context

- **distance_m**  
  Instantaneous distance between the two ships (meters).

- **port_distance_m**  
  Distance from the nearest port (meters).

- **port_distance_km**  
  Distance from the nearest port (kilometers).

- **region_type**  
  Navigational context (`harbor` or `open_sea`).

---

## Collision Risk Metrics

- **DCPA_m**  
  Distance at Closest Point of Approach (meters).  
  This represents the minimum predicted distance between the ships assuming
  constant speed and course.

- **TCPA_s**  
  Time to Closest Point of Approach (seconds).  
  A value of 0 indicates that the closest approach is occurring at the current
  timestamp or has already occurred.

---

## Notes on Missing Values

Rows with missing DCPA/TCPA correspond to situations where:
- Relative velocity is near zero (parallel motion)
- CPA cannot be determined reliably

Such rows can be removed or retained depending on downstream analysis needs.
