## Visualizing taxi trajectories with MovingPandas

In an efort to analyze our trajectory more closely, MovingPandas is a good tool. In order for the dataset to work with MovingPandas however, we need to make some changes.
After the dataset is on the right format, we are able to retrieve information about speed, distance, direction, acceleration, and perform stop detection. Among other things.

In [9]:
import datetime as dt
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import holoviews as hv
import shapely as shp
import hvplot.pandas 
import os
import re

import global_variables

### Some helper functions
We need to create a DataFrame with one point + timestamp per row before we can use MovingPandas to create Trajectories

In [10]:
def unixtime_to_datetime(unix_time):
    return dt.datetime.fromtimestamp(unix_time)
 
def compute_datetime(row):
    unix_time = row['TIMESTAMP']
    # Adding time to each point
    offset = row['running_number'] * dt.timedelta(seconds=15)
    return unixtime_to_datetime(unix_time) + offset
 
def create_point(xy):
    try: 
        return shp.Point(xy)
    except TypeError:  # when there are nan values in the input data
        return None

### Folder to use

In [11]:
path_to_results_folder = f"experiments/results/{global_variables.CHOSEN_SUBSET_NAME}"

### Create dataframe

In [12]:
def plot_result(path_to_file):
    # Interactive map gets a bit slow above 100 rows
    # Crashes with an index error if nrows > 265 
    df = pd.read_csv(path_to_file) # , skiprows=[i for i in range(1, 10)]
    df.POLYLINE = df.POLYLINE.apply(eval)  # string to list

    #Explode POLYLINE to make each point a separate row. Then, explode timestamp and calculate the timestamp for each point. Drop the old columns and now the DF is suited for MPD
    new_df = df.explode('POLYLINE')
    new_df['geometry'] = new_df['POLYLINE'].apply(create_point)
    new_df['running_number'] = new_df.groupby('TRIP_ID').cumcount()
    new_df['datetime'] = new_df.apply(lambda row: compute_datetime(row), axis=1)
    new_df.drop(columns=['POLYLINE', 'TIMESTAMP', 'running_number'], inplace=True)

    #Plotting the trajectories with hvplot to create an interactive map.
    trajs = mpd.TrajectoryCollection(gpd.GeoDataFrame(new_df, crs=4326), traj_id_col='TRIP_ID', t='datetime')

    # Add hover_cols if we want to display TRIP_ID when hovering over each trajectory

    plot = trajs.hvplot(title='Taxi Trajectory Data', tiles='CartoLight', x = 'x', y = 'y', hover_cols=['TRIP_ID'])

    hvplot.show(plot)


In [13]:
match_list = [file for file in os.listdir(path_to_results_folder) if re.match(r'\b' + re.escape("match-") + r'[^\\]*\.csv$', file)]
not_match_list = [file for file in os.listdir(path_to_results_folder) if re.match(r'\b' + re.escape("not-match-") + r'[^\\]*\.csv$', file)]

print("ALL MATCHES PRINTED:")
for file in match_list:
    print("test")
    plot_result(f"{path_to_results_folder}/{file}")

print("ALL NON_MATCHES PRINTED:")
for file in not_match_list:
    plot_result(f"{path_to_results_folder}/{file}")

ALL MATCHES PRINTED:
test
Launching server at http://localhost:32807
test
Launching server at http://localhost:36081
test
Launching server at http://localhost:45789
test
Launching server at http://localhost:33499
test
Launching server at http://localhost:35629
test
Launching server at http://localhost:44111
test
Launching server at http://localhost:45549
test
Launching server at http://localhost:44607
test
Launching server at http://localhost:34243
test
Launching server at http://localhost:46365
test
Launching server at http://localhost:42273
test
Launching server at http://localhost:39119
test
Launching server at http://localhost:41621
test
Launching server at http://localhost:40459
test
Launching server at http://localhost:34101
test
Launching server at http://localhost:41237
test
Launching server at http://localhost:45711
test
Launching server at http://localhost:42067
test
Launching server at http://localhost:41865
test
Launching server at http://localhost:34781
test
Launching serv

ERROR:bokeh.server.views.ws:Refusing websocket connection from Origin 'https://apps.hpc.ntnu.no';                       use --allow-websocket-origin=apps.hpc.ntnu.no or set BOKEH_ALLOW_WS_ORIGIN=apps.hpc.ntnu.no to permit this; currently we allow origins {'localhost:32807'}


### Citation

MovingPandas:
Graser, A. (2019). MovingPandas: Efficient Structures for Movement Data in Python. GI_Forum ‒ Journal of Geographic Information Science 2019, 1-2019, 54-68. doi:10.1553/giscience2019_01_s54.

Inspiration and code:
Graser, A. (2023). Free and Open Source GIS Ramblings - How to use Kaggle’s Taxi Trajectory Data in MovingPandas.
https://anitagraser.com/2023/05/12/how-to-use-kaggles-taxi-trajectory-data-in-movingpandas/