# Run workflow

## CSV to pickle
```bash
python -m src.preprocessing.csv2pkl --input_dir data/raw/ --output_dir data/processed_pickle/
```


## Map_reduce
```bash
python -m src.preprocessing.map_reduce --input_dir  data/processed_pickle/ --temp_dir data/TEMP_DIR --final_dir data/map_reduced/
```


## Train test split
```bash
python -m src.preprocessing.train_test_split --data_dir data/map_reduced/ --val_size 0.1 --test_size 0.1 --random_state 42
```


## Train Model
```bash
python -m src.train.train_traj_V2 --config configs/test_alex.yaml
```


## Evaluate model

1) Evaluate all MMSIs in the split

Plots, CSVs, and metrics will be written under --out_dir/<MMSI>/...
```bash
python -m src.eval.eval_traj \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --horizon 12 \
  --past_len 64 \
  --pred_cut 80 \
  --min_points 60 \
  --mmsi all \
  --out_dir data/figures/val_all_cut80 \
  --auto_extent \
  --extent_outlier_sigma 3.0 \
  --match_distance
```

2) Evaluate a subset of MMSIs
```bash
python -m src.eval.eval_traj \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --horizon 12 \
  --past_len 64 \
  --pred_cut 75 \
  --min_points 60 \
  --mmsi 210046000,210174000 \
  --out_dir data/figures/val_subset_cut75 \
  --auto_extent \
  --match_distance
```


3) Evaluate a single specific trip
```bash
python -m src.eval.eval_traj \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --horizon 12 \
  --past_len 64 \
  --pred_cut 75 \
  --min_points 30 \
  --mmsi 210046000 \
  --trip_id 0 \
  --out_dir data/figures/val_single_cut75 \
  --auto_extent \
  --verbose \
  --match_distance
```



```bash
python -m src.eval.eval_traj   --split_dir data/map_reduced/val   --ckpt data/checkpoints/traj_tptrans.pt   --model tptrans   --horizon 12   --past_len 64   --pred_cut 80    --mmsi all   --out_dir data/figures/val_all_cut80   --auto_extent   --extent_outlier_sigma 3.0   --match_distance  --verbose
```

How to run
1) Evaluate all MMSIs in the split

Plots, CSVs, and metrics will be written under --out_dir/<MMSI>/...

python -m src.eval.eval_traj \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --horizon 12 \
  --past_len 64 \
  --pred_cut 90 \
  --min_points 60 \
  --mmsi all \
  --out_dir data/figures/val_all_cut90 \
  --auto_extent \
  --extent_outlier_sigma 3.0 \
  --match_distance


What you’ll see per MMSI folder

traj_tptrans_mmsi-<MMSI>_trip-<id>_cut-<pct>_idx-<i>.png — plot

Blue = first pred_cut% (past)

Black dot = last blue point

Green = full true future (cut → end of trip)

Red = predicted future aligned to the first N_future timestamps; if --match_distance is set, it’s trimmed only if longer in km

trip_<MMSI>_<id>_cut-<pct>_idx-<i>.csv — the series that was plotted:

All past points, full true future, and the aligned predicted future with timestamps

metrics_<MMSI>.csv — append-only metrics per trip:

ADE (km), FDE (km), median AE (km), plus counts used

2) Evaluate a subset of MMSIs
python -m src.eval.eval_traj \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --horizon 12 \
  --past_len 64 \
  --pred_cut 75 \
  --min_points 60 \
  --mmsi 210046000,210174000 \
  --out_dir data/figures/val_subset_cut75 \
  --auto_extent \
  --match_distance

3) Evaluate a single specific trip
python -m src.eval.eval_traj \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --horizon 12 \
  --past_len 64 \
  --pred_cut 75 \
  --min_points 30 \
  --mmsi 210046000 \
  --trip_id 0 \
  --out_dir data/figures/val_single_cut75 \
  --auto_extent \
  --verbose \
  --match_distance

A few notes so results are consistent

horizon (12) must match the checkpoint’s head.

past_len (64) should be ≤ your training window.

pred_cut is % of points, not distance/time. The plot always shows the full green tail; metrics are computed on the first N_future points (or --cap_future, if set), and the red is aligned to those timestamps.

match_distance only trims if red is longer in km than green. If red is shorter, we leave it (prevents collapse).

auto_extent zooms but stays within DK; omit to lock to [6E–16E, 54N–58N]