# Notebook 0: Preprocessing Trajectory Dataset

This notebook focuses on preprocessing trajectory data to generate a collection of trips, which will later be used for inferring Origin-Destination (OD) matrices. While this notebook is based on the public dataset available [here](https://ckan-sobigdata.d4science.org/dataset/gps_track_milan_italy), it can be easily adapted to work with any vehicular trace dataset.

In [None]:
import pandas as pd
import geopandas as gpd
from utils_trip_preprocessing import create_trips_from_gps

### Pre-processing parameters

In [None]:
max_speed_kmh = 250
spatial_radius_km_stops=0.2
minutes_for_a_stop=20

In [None]:
df_raw_dataset = pd.read_csv("../gps_data/MilanoData.csv.zip", compression="zip")

# Load city_shape
city_shape = gpd.read_file("../data/bbox_cities/bbox_road_network_milan.geojson")
city_shape_proj = city_shape.to_crs('epsg:32633')

### Pre-process the dataset and segment trajectories into trips

In [None]:
%%time

preprocessed_trips = create_trips_from_gps(df_raw_dataset, city_shape, "datetime", 
                                          max_speed_kmh=max_speed_kmh, 
                                          spatial_radius_km_stops=spatial_radius_km_stops,
                                           minutes_for_a_stop=minutes_for_a_stop)

Save the pre-processed file

In [None]:
preprocessed_trips.to_csv("../gps_data/preprocessed_trips_milan.csv.zip", compression="zip", index=False)