# GTFS integration for Milano Area PT with integration of train routes for airport access
Author Ilias Parmaksizoglou

This notebook provided a description on how to redefine a GTFS file and describe the PT infrastracture for users that want to access Malpensa International Airport

All GTFS file were downloaded from [Transitland](https://www.transit.land/). Specifically:

* GTFS for Public Transit (PT) (tram,metro,bus) from AMAT - [GTFS_ATM](https://www.transit.land/feeds/f-u0nd-comunedimilano)
* GTFS for Trains from Trenord - [GTFS_Trenord](https://www.transit.land/feeds/f-u0n-trenord)

Both GTFS feeds are updated regularly. For the specific GTFS uses the period of combined scheduled validity is 09/05/2023 - 10/06/2023. Local versions of the file are stored at 'data/gtfs'folder. 

In [1]:
# Selects row relating to associatied parameters in other GTFS file for specific cell
def reduce_file(cur_input,factor,cell):
    frames = []
    for idx,r in cur_input.iterrows():
        if r[cell] in factor:
            frames.append(idx)

    cur_input = cur_input.loc[frames]
    return cur_input

In [2]:
# Loading libraries
import pandas as pd
from shapely.geometry import Point
import csv
import numpy as np
import os
import zipfile
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
st_date,end_date = "20230509","20230610"

The current case study focuses on the infrastracture of Milano Metropolitan Area (MMA). Hence, the AMAT GTFS is too extenive. To reduce it we are going to utilize a pre-existing zonal classification of MMA, which separates it to 88 zones. We will keep only stops within this area. A geojson which already has this structure established is stored in the 'data' folder. The file was located from this [link](https://dati.comune.milano.it/dataset/ds964-nil-vigenti-pgt-2030/resource/9c4e0776-56fc-4f3d-8a90-f4992a3be426).

In [3]:
gdf_nils = gpd.read_file('data/milano/nils_milano.geojson')
df_stop = pd.read_csv("data/gtfs/amat/stops.txt")

# Keep stations within the NILs
frames = []
for idx,r in df_stop.iterrows():
    for idx2,r2 in gdf_nils.iterrows():
        if Point(r[5],r[4]).within(r2[-1]):
            frames.append(idx)
            break

df_stop = df_stop.loc[frames]


For now there is no need to reduce the GTFS folder any further. Although there will be broken routes in the current GTFS the wrapper that we will use to finalize the combined GTFS will fix any incomplete routes. We continue with reducing the Trenord GTFS, for that we want to keep only the routes servicing Malpensa. These are ["XP1","XP2","R28"]. We start with editing the 'trips.txt' in the data/gtfs/trenord folder. 

In [4]:
routes_trenord = ["XP1","XP2","R28"]

# Reducing trips.txt
trips_mxp = pd.read_csv('data/gtfs/trenord/trips.txt')
trips_mxp = trips_mxp[(trips_mxp['route_id'] == routes_trenord[0]) | (trips_mxp['route_id'] == routes_trenord[1]) | (trips_mxp['route_id'] == routes_trenord[2])]
trips_mxp['wheelchair_accessible']  = np.nan
trips_mxp['x_trip_desc']  = np.nan
trips_mxp['x_shape_id_order']  = np.nan
trips_mxp['trip_headsign']  = np.nan
trips_mxp['block_id']  = np.nan
trips_mxp['shape_id']  = np.nan

# Buses mxp
buses_mxp = pd.read_csv('data/gtfs/buses/trips.txt')
buses_mxp['wheelchair_accessible']  = np.nan
buses_mxp['x_trip_desc']  = np.nan
buses_mxp['x_shape_id_order']  = np.nan
buses_mxp['trip_headsign']  = np.nan
buses_mxp['block_id']  = np.nan
buses_mxp['shape_id']  = np.nan
trips_mxp = pd.concat([trips_mxp, buses_mxp], axis=0)
trips_mxp

Unnamed: 0,route_id,service_id,trip_id,trip_headsign,trip_short_name,direction_id,block_id,shape_id,wheelchair_accessible,x_trip_desc,x_shape_id_order
124,XP1,234137515667508,15667508,,900388,0,,,,,
150,R28,233933315666238,15666238,,2932,0,,,,,
156,XP1,234265315668070,15668070,,900331,0,,,,,
165,XP1,233964715666658,15666658,,900394,0,,,,,
185,R28,233940615666534,15666534,,2948,0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
233,MALPENSA TERRA,26875003107645176,107645176,,176,0,,,,,
234,MALPENSA TERRA,27360259212789596,212789596,,596,0,,,,,
235,MALPENSA TERRA,28561652365638850,365638850,,850,0,,,,,
236,MALPENSA TERRA,29620977941958439,941958439,,439,0,,,,,


The three empty columns were added to assert uniformity with the AMAT GTFS, even though these information are missing from that GTFS. We continue with the stop_times.txt and stops.txt

In [5]:
# Reducing stop_times.txt
trip_ids = trips_mxp['trip_id'].values.tolist()
service_ids = trips_mxp['service_id'].values.tolist()
stop_times_mxp = pd.read_csv('data/gtfs/trenord/stop_times.txt')
stop_times_mxp = reduce_file(stop_times_mxp,trip_ids,0)
stop_times_mxp['stop_headsign'] = np.nan
stop_times_mxp['pickup_type'] = np.nan
stop_times_mxp['drop_off_type'] = np.nan
stop_times_mxp['shape_dist_traveled'] = np.nan

# Add buses
bus_times_mxp = pd.read_csv('data/gtfs/buses/stop_times.txt')
bus_times_mxp['stop_headsign'] = np.nan
bus_times_mxp['pickup_type'] = np.nan
bus_times_mxp['drop_off_type'] = np.nan
bus_times_mxp['shape_dist_traveled'] = np.nan

stop_times_mxp = pd.concat([stop_times_mxp, bus_times_mxp], axis=0)

# Reducing stops.txt
stop_ids = list(set(stop_times_mxp['stop_id'].values.tolist()))
stop_mxp = pd.read_csv('data/gtfs/trenord/stops.txt')
stop_mxp = reduce_file(stop_mxp,stop_ids,0)
stop_mxp['zone_id'] = np.nan
stop_mxp['stop_timezone'] = np.nan
stop_mxp['stop_desc'] = np.nan
stop_mxp['parent_station'] = np.nan
stop_mxp['stop_url'] = np.nan

# Buses
stop_bus = pd.read_csv('data/gtfs/buses/stops.txt')
stop_bus['zone_id'] = np.nan
stop_bus['stop_timezone'] = np.nan
stop_bus['stop_desc'] = np.nan
stop_bus['parent_station'] = np.nan
stop_bus['stop_url'] = np.nan

stop_mxp = pd.concat([stop_mxp, stop_bus], axis=0)
stop_mxp

Unnamed: 0,stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,stop_url,location_type,parent_station,zone_id,stop_timezone
56,5265,,FERNO-LONATE POZZOLO,,45.608521,8.755505,,0,,,
74,5264,,BUSTO ARSIZIO FN,,45.606114,8.851341,,0,,,
80,1707,,MILANO BOVISA FNM,,45.502682,9.159492,,0,,,
115,1715,,MILANO PORTA GARIBALDI,,45.484462,9.187875,,0,,,
125,5262,,RESCALDINA,,45.622296,8.946664,,0,,,
285,5053,,MILANO CADORNA,,45.468359,9.175596,,0,,,
299,5266,,MALPENSA AEROPORTO T.1,,45.62712,8.711128,,0,,,
318,5048,,SARONNO,,45.625316,9.030748,,0,,,
329,5263,,CASTELLANZA,,45.613557,8.893948,,0,,,
369,1728,,MILANO CENTRALE,,45.485188,9.202954,,0,,,


Again some columns were added ot stops.txt to assert uniformity with the AMAT GTFS. We continue with the agency.txt and routes.txt, were the changes are minimal

In [6]:
# Move to routes & agency 
routes_mxp = pd.read_csv('data/gtfs/trenord/routes.txt')
routes_mxp = reduce_file(routes_mxp,routes_trenord,0)
routes_mxp['route_desc'] = np.nan
routes_mxp['route_url'] = np.nan
agency_mxp = pd.read_csv('data/gtfs/trenord/agency.txt')

# Move to routes & agency 
routes_bus = pd.read_csv('data/gtfs/buses/routes.txt')
routes_bus['route_desc'] = np.nan
routes_bus['route_url'] = np.nan

routes_mxp = pd.concat([routes_mxp, routes_bus], axis=0)
routes_mxp

Unnamed: 0,route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color
14,R28,Trenord,R28,Malpensa-Saronno-Milano Centrale,,2,,94C120,FFFFFF
57,XP1,Trenord,XP1,Malpensa-Milano Cadorna,,2,,9B194E,FFFFFF
58,XP2,Trenord,XP2,Malpensa-Milano Centrale,,2,,9B194E,FFFFFF
0,MALPENSA BUS,Air Pullman,MALPENSA BUS,Malpensa-Milano Centrale,,4,,9B194E,FFFFFF
1,MALPENSA SHUTTLE,Autostradale,MALPENSA SHUTTLE,Malpensa-Milano Centrale,,4,,9B194E,FFFFFF
2,MALPENSA TERRA,Terravision,MALPENSA TERRA,Malpensa-Milano Centrale,,4,,9B194E,FFFFFF


The AMAT GTFS makes use of the calendar_dates.txt, while the Trenord GTFS makes use of the calendar.txt. As only one is necessary of the two, we will convert the calendar.txt of the Trenord GTFS to calendar_dates.txt. This essentially show cases whethere a particular service is active during a day, by activating the exception_type column to 1 when active and to 2 when not active.

In [7]:
calendar_mxp = pd.read_csv('data/gtfs/trenord/calendar.txt')
calendar_bus = pd.read_csv('data/gtfs/buses/calendar.txt')
calendar_mxp = pd.concat([calendar_mxp, calendar_bus], axis=0)

calendar_dates = []
for idx,r in calendar_mxp.iterrows():
    cur_date = max(pd.to_datetime(r[-2],format='%Y%m%d'),pd.to_datetime(st_date,format='%Y%m%d'))
    end_date = min(pd.to_datetime(r[-1],format='%Y%m%d'),pd.to_datetime(end_date,format='%Y%m%d'))
    day_c =1
    while cur_date!=end_date:
        td = pd.Timedelta(1, "d")
        cur_date+=td
        calendar_dates.append([r[0],cur_date.strftime('%Y%m%d'),2-r[day_c]]) # When it is zero in the calendar it should be 2 in the calendar_dates
        day_c +=1
        if day_c == 8:
            day_c = 1

calendar_dates_mxp = pd.DataFrame(calendar_dates,columns=["service_id","date","exception_type"])

Finally, since there are duplicate stations within the two GTFS we define them as transfers. These stations are MILANO CENTRALE - CENTRALE FS, MILANO PORTA GARIBALDI - GARIBALDI FS, MILANO CADORNA - CADORNA FN M1 & CADORNA FN M2, bovisa (politecnico) - MILANO BOVISA FNM

In [8]:
df_transfers = pd.read_csv("data/gtfs/amat/transfers.txt")
extra_transfers = {"CADORNA FN M1": ["MILANO CADORNA"],"CADORNA FN M2": ["MILANO CADORNA"],"MILANO CADORNA": ["CADORNA FN M1","CADORNA FN M2"],
                   "CENTRALE FS": ["MILANO CENTRALE","MILANO CENTRALE BUS"],  "MILANO CENTRALE": ["CENTRALE FS","MILANO CENTRALE BUS"],"MILANO CENTRALE BUS": ["MILANO CENTRALE","CENTRALE FS"],
                   "GARIBALDI FS": ["MILANO PORTA GARIBALDI"],  "MILANO PORTA GARIBALDI": ["GARIBALDI FS"],
                   "LOTTO M5": ["LOTTO BUS"],  "LOTTO BUS": ["LOTTO M5"],
                   "MALPENSA AEROPORTO T.1": ["MALPENSA AEROPORTO T.1 BUS"],  "MALPENSA AEROPORTO T.1 BUS": ["MALPENSA AEROPORTO T.1"],
                   "LOTTO FIERAMILANOCITY": ["FIERAMILANOCITY BUS"],  "FIERAMILANOCITY BUS": ["LOTTO FIERAMILANOCITY"],
                   "MILANO BOVISA FNM": ["bovisa (politecnico)"], "bovisa (politecnico)": ["MILANO BOVISA FNM"]}
rows = []
for tr1 in extra_transfers.items():
    for tr2 in tr1[1]:
        rows.append([tr1[0],tr2,0,np.nan])
    
df_extra = pd.DataFrame(rows,columns=df_transfers.columns)
transfers = pd.concat([df_transfers,df_extra])

We have now edited the Trenord GTFS to comply with the AMAT GTFS we now have to merge all files together. We will keep the required files which are:
* agency.txt
* stops.txt
* routes.txt
* trips.txt
* stop_times.txt
* calendar_dates.txt



In [9]:
# Combined agency
agency = agency_mxp
agency['agency_id'] = 'AMAT & Trenord & Ilias'
agency['agency_name'] = 'AMAT & Trenord& Ilias'
agency['agency_url'] = 'http://www.trenord.it/, http://www.atm.it/'

# Combined stops
stops = pd.concat([df_stop,stop_mxp])

# Combined routes
df_routes = pd.read_csv("data/gtfs/amat/routes.txt")
routes = pd.concat([df_routes,routes_mxp])

# Combined trips
df_trips = pd.read_csv("data/gtfs/amat/trips.txt")
trips = pd.concat([df_trips,trips_mxp])

# Combined stop_times
df_stop_times = pd.read_csv("data/gtfs/amat/stop_times.txt")
stop_times = pd.concat([df_stop_times,stop_times_mxp])

# Combined calendar_dates
df_calendar_dates = pd.read_csv("data/gtfs/amat/calendar_dates.txt")
calendar_dates= pd.concat([df_calendar_dates,calendar_dates_mxp])

# Load calendar
calendar = pd.read_csv("data/gtfs/amat/calendar.txt")


  df_trips = pd.read_csv("data/gtfs/amat/trips.txt")
  df_stop_times = pd.read_csv("data/gtfs/amat/stop_times.txt")


We finally save the combined gtfs to a separate folder representing the whole milano area.

In [10]:
# Check whether the specified path exists or not
path = "data/gtfs/milano"
isExist = os.path.exists(path)
if not isExist:
   # Create a new directory because it does not exist
   os.makedirs(path)

agency.to_csv(f"{path}/agency.txt",index=False,quoting=csv.QUOTE_ALL)
stops.to_csv(f"{path}/stops.txt",index=False,quoting=csv.QUOTE_ALL)
routes.to_csv(f"{path}/routes.txt",index=False,quoting=csv.QUOTE_ALL)
trips.to_csv(f"{path}/trips.txt",index=False,quoting=csv.QUOTE_ALL)
stop_times.to_csv(f"{path}/stop_times.txt",index=False,quoting=csv.QUOTE_ALL)
calendar_dates.to_csv(f"{path}/calendar_dates.txt",index=False,quoting=csv.QUOTE_ALL)
transfers.to_csv(f"{path}/transfers.txt",index=False,quoting=csv.QUOTE_ALL)
calendar.to_csv(f"{path}/calendar.txt",index=False,quoting=csv.QUOTE_ALL)

Save the directory concerning milano to a zipfile

In [11]:

path = "data/gtfs/"
path = os.path.abspath(os.path.normpath(os.path.expanduser(path)))
for folder in os.listdir(path):
    if folder == "milano":
        zipf = zipfile.ZipFile('{0}_gtfs.zip'.format(os.path.join(path, folder)), 'w', zipfile.ZIP_DEFLATED)
        for root, dirs, files in os.walk(os.path.join(path, folder)):
            for filename in files:
                zipf.write(os.path.abspath(os.path.join(root, filename)), arcname=filename)
        zipf.close()

Finally lets prepare the new GTFS to be used by the RoutingEngine. To that we use a pre-existing Python [Transit Router](https://github.com/transnetlab/transit-routing). The router has been adapted to the needs of this case study.

In [12]:
from TransitRouting import GTFS_wrapper,build_transfer_file
GTFS_wrapper.main('milano',20230601,[0,1,2,3,4]) # Metro, Train, Tram ,Bus
build_transfer_file.main('milano',270) # Max walking time 270 

Reading GTFS data
Network: milano
transfer.txt missing

Removing unwanted routes
Total routes: 168
Route types distribution:
 route_type
3    140
0     17
1      5
2      3
4      3
Name: count, dtype: int64
Total routes after filtering on route_types: 168

After Filtering on date 20230601
Valid trips:  21932
Valid routes:  144
Filtering stop_times.txt
Applying dates


100%|██████████| 502497/502497 [00:08<00:00, 60410.80it/s]



Filtering stops.txt
Valid stops left: 2631

Renaming routes


100%|██████████| 21932/21932 [00:05<00:00, 3700.01it/s]



Renaming trips


100%|██████████| 639/639 [00:00<00:00, 1517.73it/s]



Removing overlapping trips


100%|██████████| 639/639 [00:02<00:00, 249.32it/s]


15 trips were overlapped

Checking trips length


100%|██████████| 21917/21917 [00:00<00:00, 37400.98it/s]



Applying final stoptimes filter

Applying final trips filter

Saving files
Final stops count    : 2631
Final trips count    : 21917
Final routes count   : 639

Graph imported from disk


100%|██████████| 2631/2631 [00:00<00:00, 3408.03it/s]


Total connected components identified: 254
Ensuring Transitive closure in serial...


100%|██████████| 254/254 [00:06<00:00, 41.38it/s] 


Before Transitive closure: 7462
After  Transitive closure (final file): 74276
Total transfers: 74276
Longest transfer: 136.7 seconds
Time required: 0.1 minutes


Now the GTFS is almost ready, we still need to build the dictionaries for initiating the Routing, but this happens automatically when initiating the RoutingEngine. The final step is to create a separate csv wit all the stations that includes projected coordinates to input into the simulation file.

In [13]:
import utm

def rule(row):
    X, Y,A,B = utm.from_latlon(row['stop_lat'], row['stop_lon'])
    return pd.Series({"X": X, "Y": Y})

df = pd.read_csv("TransitRouting\GTFS\milano\stops.txt")
df = df.merge(df.apply(rule, axis=1), left_index= True, right_index= True)
df = df[['stop_id','stop_name','stop_lat','stop_lon','X','Y']]
df.to_csv('data/milano/stations_all.csv',index=False)