# Notebook 1: Create Origin-Destination (OD) Matrix

The goal of the following notebook is to compute an OD matrix.
For each city we compute an OD matrix $M$, where an element $m_{o, d} \in M$ denotes the number of trips that start in tile $o$ and end in tile $d$. Each vehicle's trip starting and ending tiles determine the origins and destinations.

To compute the OD matrix we divide the urban environment into squared tiles of a given side.


In this notebook, we provide two ways to compute the OD-matrix because the OctoTelematics dataset utilized in our study is proprietary and not publicly available. Therefore, the original OD-matrices employed in this research cannot be included in this repository. However, we have provided the necessary code to generate an OD-matrix for Milan using a publicly accessible dataset. This code is flexible and can be adapted for use with any data source. Additionally, we offer a routine to create random OD matrices, which can be useful in scenarios lacking trajectory data.

In [None]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import skmob
import geopandas as gpd

import random

from skmob.tessellation import tilers
from skmob.utils.plot import plot_gdf
import numpy as np

In [None]:
def compute_od_matrix_date_fdf(t_flows, tessellation, normalize=False):
    
    matrix_flows_m = np.zeros((len(tessellation), len(tessellation)))

    for o, d, flow in zip(t_flows["origin"], t_flows["destination"], t_flows["flow"]):
        matrix_flows_m[int(o),int(d)]+=flow
        
    if normalize:
        matrix_flows_m = matrix_flows_m/matrix_flows_m.sum()
        
    return matrix_flows_m

#### City and Tessellation

In [None]:
city = "milan"


city_shape = gpd.read_file(f"../data/bbox_cities/bbox_road_network_milan.geojson")
tile_size_meters = 1000
tessellation = tilers.tiler.get('squared', base_shape=city_shape, meters=tile_size_meters)
print(len(tessellation))
# style of the tessellation
tex_style = {'fillColor':'blue', 'color':'black', 'opacity': 0.2}
plot_gdf(tessellation, style_func_args=tex_style, zoom=10)

#### 1 Create from GPS trajectories

In [None]:
path_dataset = f"../gps_data/preprocessed_trips_{city}.csv.zip"

df_trajectories = pd.read_csv(path_dataset, compression="zip")

In [None]:
#df_trajectories = df_trajectories.drop("uid", axis=1)

In [None]:
tdf = skmob.TrajDataFrame(df_trajectories, latitude='lat', longitude='lng', 
                                             user_id='trip_id', datetime='datetime')

tdf = tdf.sort_by_uid_and_datetime()
tdf = tdf.reset_index(drop=True)

Keep only origin and destination for each trip

In [None]:
#compute origin and destination
t_start = tdf.groupby("uid", as_index=False).first()
t_end = tdf.groupby("uid", as_index=False).last()

#concatenate the Os and Ds
t_trips_od = pd.concat([t_start,t_end])
t_trips_od = t_trips_od.sort_by_uid_and_datetime()
t_trips_od = t_trips_od.reset_index(drop=True)

In [None]:
fdf = t_trips_od.to_flowdataframe(tessellation, self_loops=False)

In [None]:
od_matrix = compute_od_matrix_date_fdf(fdf, tessellation)

In [None]:
np.save(f"../data/od_matrices/od_matrix_{city}.npy", od_matrix)

#### 2 Create random OD matrix

In [None]:
matrix_random = np.zeros((len(tessellation), len(tessellation)))

In [None]:
n_samples = 100*1000

samples_rows = [random.randint(0, len(tessellation)-1) for _ in range(n_samples)]
samples_cols = [random.randint(0, len(tessellation)-1) for _ in range(n_samples)]

In [None]:
for row, col in zip(samples_rows, samples_cols):
    matrix_random[row][col]+=1

In [None]:
np.save(f"../data/od_matrices/rand_od_matrix_{city}.npy", matrix_random)