# Run preprocessing


## CSV to pickle
```bash
python -m src.preprocessing.csv2pkl --input_dir data/raw/ --output_dir data/processed_pickle/
```


## Map_reduce
```bash
python -m src.preprocessing.map_reduce --input_dir  data/processed_pickle/ --temp_dir data/TEMP_DIR --final_dir data/map_reduced/
```


## Train test split
```bash
python -m src.preprocessing.train_test_split --data_dir data/map_reduced/ --val_size 0.1 --test_size 0.1 --random_state 42
```


## Train Model
```bash
python -m src.train.train_traj --config configs/test_alex.yaml
```

# Evaluate trajectory new new new copy
- auto-zoom on actual (recommended):
```bash
python -m src.eval.eval_traj_newnewnew_copy \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --lat_idx 0 --lon_idx 1 \
  --y_order latlon \
  --past_len 64 --max_plots 8 \
  --out_dir data/figures \
  --auto_extent --extent_source actual --extent_outlier_sigma 3.0 \
  --denorm --lat_min 54 --lat_max 58 --lon_min 6 --lon_max 16
```

- full Europe view:
```bash
python -m src.eval.eval_traj_newnewnew_copy \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --lat_idx 0 --lon_idx 1 \
  --y_order latlon \
  --past_len 64 --max_plots 8 \
  --out_dir data/figures \
  --denorm --lat_min 54 --lat_max 58 --lon_min 6 --lon_max 16
```


# eval traj new new new copy 2
Auto-zoom with stamped ID and metadata (Denmark dataset assumptions):
- Lat/lon normalized to [54–58], [6–16], Y is [lat,lon]
```bash
python -m src.eval.eval_traj_newnewnew_copy2 \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --lat_idx 0 --lon_idx 1 \
  --y_order latlon \
  --past_len 64 --max_plots 8 \
  --out_dir data/figures \
  --auto_extent --extent_source actual --extent_outlier_sigma 3.0 \
  --denorm --lat_min 54 --lat_max 58 --lon_min 6 --lon_max 16 \
  --annotate_id --full_trip
```

- Europe view with stamped filename only:
```bash
python -m src.eval.eval_traj_newnewnew_copy2 \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --lat_idx 0 --lon_idx 1 \
  --y_order latlon \
  --past_len 64 --max_plots 8 \
  --out_dir data/figures \
  --denorm --lat_min 54 --lat_max 58 --lon_min 6 --lon_max 16 \
  --no_stamp_titles --stamp_filename
```



# eval traj new new new
Trajectory evaluation and plotting. Note: Past is a window of length --past_len, not the full MMSI trip.

options:
  -h, --help            show this help message and exit
  --split_dir SPLIT_DIR
  --ckpt CKPT
  --model {gru,tptrans}
  --batch_size BATCH_SIZE
  --max_plots MAX_PLOTS
  --lat_idx LAT_IDX
  --lon_idx LON_IDX
  --past_len PAST_LEN
  --out_dir OUT_DIR
  --basemap_path BASEMAP_PATH
                        Optional path to a Natural Earth land file (shp/geojson). Defaults to bundled fixtures if available.
  --map_extent MIN_LON MAX_LON MIN_LAT MAX_LAT
                        Fix the map extent; by default the Europe view (-25 45 30 72) is used.
  --auto_extent         If set, zoom to each trajectory with padding instead of using the fixed Europe extent.
  --denorm              If set, convert normalized [0..1] lat/lon back to geographic degrees for plotting.
  --lat_min LAT_MIN     Lat min bound used during normalization (if denorm).
  --lat_max LAT_MAX     Lat max bound used during normalization (if denorm).
  --lon_min LON_MIN     Lon min bound used during normalization (if denorm).
  --lon_max LON_MAX     Lon max bound used during normalization (if denorm).
  --y_order {latlon,lonlat}
                        Column order of Y/YP tensors. Use 'latlon' if Y[:,0]=lat, Y[:,1]=lon.
  --extent_source {both,actual,pred}
                        Which points control auto-zoom extent (default: actual).
  --extent_outlier_sigma EXTENT_OUTLIER_SIGMA
                        Sigma for outlier clipping when computing auto-extent.
  --pred_is_delta       Set if model outputs per-step deltas instead of absolute coords.
  --anchor_pred         Anchor absolute predictions to current position if first point is far.
  --no_anchor_pred
  --mmsi MMSI           Select MMSI: omit for default windows; 'all' for batch; or numeric ID
  --trip_id TRIP_ID     Trip index when --mmsi is numeric
  --pred_cut PRED_CUT   % of trip to treat as past before predicting tail
  --cap_future CAP_FUTURE
                        Cap predicted horizon steps
  --min_points MIN_POINTS
                        Skip too-short trips
  --output_per_mmsi_subdir
                        Save outputs in per-MMSI subfolders under out_dir
  --list_only           Dry-run; list selected files and exit
  --log_skip_reasons    Print skip reason for each trip
  --seed SEED           RNG seed for reproducibility of selection
  --stamp_titles
  --no_stamp_titles
  --stamp_filename
  --no_stamp_filename
  --save_meta
  --no_save_meta
  --meta_path META_PATH
  --timefmt TIMEFMT     Time format for titles/meta (use strftime tokens like %Y-%m-%d %%H:%%M:%%S UTC)
  --annotate_id         Draw MMSI + time span near current point.
  --full_trip           Overlay full trip context for the sample's source file.
  --mmsi_filter MMSI_FILTER
                        When set, only overlay context if MMSI matches.
  --max_hours MAX_HOURS
                        Max hours for context (currently advisory).

- Default windows + full-trip overlay and titles/IDs:
  - Same as now for windows; add --full_trip to overlay context; --annotate_id to label current point.
python -m src.eval.eval_traj_newnewnew \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --max_plots 8 \
  --out_dir data/figures \
  --auto_extent --extent_source actual --extent_outlier_sigma 3.0 \
  --denorm --lat_min 54 --lat_max 58 --lon_min 6 --lon_max 16 \
  --full_trip
  
- Dry run: lists selected files without a checkpoint:
python -m src.eval.eval_traj_newnewnew \
  --split_dir data/map_reduced/val \
  --max_plots 10 \
  --list_only --seed 0


- Single trip, predict last 10% (anchored):
python -m src.eval.eval_traj_newnewnew \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --full_trip \
  --mmsi 209867000 --trip_id 0 \
  --pred_cut 90 --cap_future 60 \
  --auto_extent --extent_source actual \
  --lat_idx 0 --lon_idx 1 --y_order latlon \
  --denorm --lat_min 54 --lat_max 58 --lon_min 6 --lon_max 16 \
  --annotate_id --log_skip_reasons

- Batch all, per-MMSI subfolders:
python -m src.eval.eval_traj_newnewnew \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --lat_idx 0 --lon_idx 1 --y_order latlon \
  --full_trip --mmsi all --pred_cut 85 \
  --auto_extent --extent_source actual \
  --out_dir data/figures \
  --output_per_mmsi_subdir \
  --save_meta --meta_path data/figures/traj_eval_meta.csv \
  --log_skip_reasons --seed 42






# old


# Evaluate on Map-Reduce split
```bash
python -m src.eval.evaluate_traj_new \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --plot
```

```bash
python -m src.eval.evaluate_traj_new \
  --split_dir data/map_reduced/test \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans
```













# Validate new new
```bash
python -m src.eval.eval_traj_newnew \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --lat_idx 1 --lon_idx 0 \   # adjust if your feature order differs
  --max_plots 6 \
  --out_dir data/figures
```

python -m src.eval.eval_traj_newnew \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --lat_idx 1 --lon_idx 0 \   # adjust if your feature order differs
  --out_dir data/figures





python -m src.eval.eval_traj_newnew `
  --split_dir data\map_reduced\val `
  --ckpt data\checkpoints\traj_tptrans.pt `
  --model tptrans `
  --lat_idx 1 `
  --lon_idx 0 `
  --max_plots 8 `
  --out_dir data\figures

python -m src.eval.eval_traj_newnew \
  --split_dir "data/map_reduced/val" \
  --ckpt "data/checkpoints/traj_tptrans.pt" \
  --model tptrans \
  --lat_idx 1 \
  --lon_idx 0 \
  --max_plots 8 \
  --out_dir "data/figures"



python -m src.eval.eval_traj_newnew \
  --split_dir "data/map_reduced/val" \
  --ckpt "data/checkpoints/traj_tptrans.pt" \
  --model tptrans \
  --lat_idx 1 \
  --lon_idx 0 \
  --max_plots 8 \
  --out_dir "data/figures"






python -m src.eval.eval_traj_abs \
  --split_dir data/map_reduced/val \
  --ckpt data/checkpoints/traj_tptrans.pt \
  --model tptrans \
  --lat_idx 1 --lon_idx 0 \
  --max_plots 8 \
  --out_dir data/figures








### Unsupervised Pretraining MSP

1) Unsupervised pretraining

```bash
python -m src.train.pretrain_msp_new \
  --split_dir data/map_reduced/train \
  --out data/checkpoints/pretrain_msp.pt \
  --epochs 5
```

2) Fine-tune supervised

in configs/test.yaml add:
warm_start_msp: data/checkpoints/pretrain_msp.pt

```bash
python -m src.train.train_traj_unsup --config configs/test_unsub.yaml
```





In [1]:
import numpy as np, json, pathlib
p = pathlib.Path("data/map_reduced/val")
print((p/"scaler.npz").exists())
S = np.load(p/"scaler.npz")
print(S["mean"].shape, S["std"].shape)


False


FileNotFoundError: [Errno 2] No such file or directory: 'data\\map_reduced\\val\\scaler.npz'

In [4]:
import os
os.chdir(r"D:\DTU\AIS-MDA")
print("cwd:", os.getcwd())

cwd: D:\DTU\AIS-MDA


In [5]:
import os 
from src.utils.datasets import AISDataset
ds = AISDataset("data/map_reduced/val", max_seqlen=96)
x,y = ds[0]; print(x.shape, y.shape)      # e.g. [T,F], [H,2]
print(y[:3])  # should look like small step deltas, not a long ramp


Filtering trajectories...
Valid samples: 77/87
torch.Size([64, 4]) torch.Size([12, 2])
tensor([[0.0905, 0.4064],
        [0.0920, 0.4077],
        [0.0924, 0.4092]])


In [6]:
import pickle, numpy as np
d = pickle.load(open("data/map_reduced/val/211286440_0_processed.pkl","rb"))
x, y = d["traj"]["x"], d["traj"]["y"]   # or however you stored it
print(x.shape, y.shape)                 # expect [T,F], [H,2]
print("y first 5:", y[:5])


IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices