## Project Geospatial Developer Part 2

In [1]:
import gtfs_functions as gtfs

## Manipulating linestrings with the gtfs_functions module

The function import_gtfs from the gtfs_functions takes the path or the zip file as argument and returns the routes, stops, stop_times, trips and shapes dataframes/geodataframes.

In [2]:
routes, stops, stop_times, trips, shapes = gtfs.import_gtfs('gtfs.zip')

In [3]:
routes.head()

Unnamed: 0,route_id,agency_id,route_short_name,route_long_name,route_type,route_color,route_text_color
0,11-WLB-j20-1,3,WLB,Wien Oper - Wiener Neudorf - Guntramsdorf - Tr...,0,0A295D,FFFFFF
1,21-U1-j20-1,4,U1,Oberlaa - Leopldau bis 17.3.2020,1,E3000F,FFFFFF
2,21-U1-Y-j20-1,4,U1,Oberlaa - Leopldau gültig ab 16.5.2020,1,,
3,21-U2-j20-1,4,U2,Seestadt - Karlsplatz bis 17.3.2020,1,A862A4,FFFFFF
4,21-U2-Y-j20-1,4,U2,Seestadt - Karlsplatz gültig ab 16.5.2020,1,,


In [4]:
stops.head()

Unnamed: 0,stop_id,stop_name,geometry
0,at:43:3121:0:1,Baden Josefsplatz,POINT (16.23370 48.00595)
1,at:43:3134:0:1,Baden Viadukt,POINT (16.24097 48.00384)
2,at:43:3134:0:2,Baden Viadukt,POINT (16.24091 48.00373)
3,at:43:3142:0:3,Baden Leesdorf,POINT (16.25161 47.99951)
4,at:43:3142:0:4,Baden Leesdorf,POINT (16.25165 47.99956)


In part one

### Selecting the service_id with more trips

To solve the issues listed above I can first start by looking for the 'service_id' I want to use. I could do it in many different ways and, if I was using the `triply` dataset, the most probable 'service_id' would be `citybus` service that I will want to analyze. For the purpose of this exercise, I will just take the service with more trips.

The biggest service in this dataset seems to be 'T8+cor2', so I will take that one to filter.

In [5]:
## Filtering by service_id, stop_sequence and direction_id
stop_times = stop_times.loc[(stop_times.service_id=='T8+cor2'),:].reset_index()
stop_times.head()

Unnamed: 0,index,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type,shape_dist_traveled,route_id,service_id,direction_id,shape_id,stop_name,geometry
0,148413,7430.T8.22-10-j20-1.1.H,17700.0,17700.0,at:49:597:0:8,1,0,0,0.0,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Joachimsthalerplatz,POINT (16.30441 48.20883)
1,148414,7430.T8.22-10-j20-1.1.H,17820.0,17820.0,at:49:445:0:5,2,0,0,381.52,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Gutraterplatz,POINT (16.30889 48.20775)
2,148415,7430.T8.22-10-j20-1.1.H,17880.0,17880.0,at:49:1317:0:4,3,0,0,805.04,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Kendlerstraße U,POINT (16.30807 48.20424)
3,148416,7430.T8.22-10-j20-1.1.H,17940.0,17940.0,at:49:754:0:1,4,0,0,1074.24,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Laurentiusplatz,POINT (16.30792 48.20196)
4,148417,7430.T8.22-10-j20-1.1.H,18060.0,18060.0,at:49:1035:0:9,5,0,0,1487.58,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Hütteldorfer Straße U,POINT (16.31215 48.19958)


In [6]:

stop_times.head()

Unnamed: 0,index,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type,shape_dist_traveled,route_id,service_id,direction_id,shape_id,stop_name,geometry
0,148413,7430.T8.22-10-j20-1.1.H,17700.0,17700.0,at:49:597:0:8,1,0,0,0.0,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Joachimsthalerplatz,POINT (16.30441 48.20883)
1,148414,7430.T8.22-10-j20-1.1.H,17820.0,17820.0,at:49:445:0:5,2,0,0,381.52,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Gutraterplatz,POINT (16.30889 48.20775)
2,148415,7430.T8.22-10-j20-1.1.H,17880.0,17880.0,at:49:1317:0:4,3,0,0,805.04,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Kendlerstraße U,POINT (16.30807 48.20424)
3,148416,7430.T8.22-10-j20-1.1.H,17940.0,17940.0,at:49:754:0:1,4,0,0,1074.24,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Laurentiusplatz,POINT (16.30792 48.20196)
4,148417,7430.T8.22-10-j20-1.1.H,18060.0,18060.0,at:49:1035:0:9,5,0,0,1487.58,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H,Hütteldorfer Straße U,POINT (16.31215 48.19958)


In [7]:
## Filtering by service_id
trips = trips.loc[(trips.service_id=='T8+cor2')]
trips.head()

Unnamed: 0,trip_id,route_id,service_id,direction_id,shape_id
5973,1.T8.22-10-j20-1.8.R,22-10-j20-1,T8+cor2,1,22-10-j20-1.8.R
5974,10.T8.22-10-j20-1.9.R,22-10-j20-1,T8+cor2,1,22-10-j20-1.9.R
5975,105.T8.22-10-j20-1.3.H,22-10-j20-1,T8+cor2,0,22-10-j20-1.3.H
5976,106.T8.22-10-j20-1.1.H,22-10-j20-1,T8+cor2,0,22-10-j20-1.1.H
5977,107.T8.22-10-j20-1.3.H,22-10-j20-1,T8+cor2,0,22-10-j20-1.3.H


In [8]:
shapes.head()

Unnamed: 0,shape_id,geometry
0,11-WLB-j20-1.1.H,"LINESTRING (16.37064 48.20200, 16.37038 48.202..."
1,11-WLB-j20-1.10.R,"LINESTRING (16.33424 48.14977, 16.33479 48.150..."
2,11-WLB-j20-1.11.R,"LINESTRING (16.25162 47.99952, 16.25248 47.999..."
3,11-WLB-j20-1.12.R,"LINESTRING (16.23370 48.00595, 16.23388 48.005..."
4,11-WLB-j20-1.13.R,"LINESTRING (16.31405 48.08710, 16.31410 48.087..."


## Stop frequencies

This function will create a geodataframe with the frequency for each combination of stop, time of day and direction. Each row with a Point geometry. The stops_freq function takes stop_times and stops created in the previous steps as arguments.

### cutoffs
With the cutoff = [6,7,8,9,10,11,12,13,14,15,16,17,18], I filtered the data between 6am - 6pm.



In [9]:
cutoffs = [6,7,8,9,10,11,12,13,14,15,16,17,18]
stop_freq = gtfs.stops_freq(stop_times, stops, cutoffs = cutoffs)
stop_freq.head()

Unnamed: 0,stop_id,dir_id,window,ntrips,frequency,max_trips,max_freq,stop_name,geometry
12928,at:49:15:0:2,Inbound,14:00-15:00,1,60,3,20,Albern,POINT (16.48533 48.15988)
3737,at:49:1158:0:6,Inbound,6:00-7:00,1,60,1,60,Linzer Str./Johnstr.,POINT (16.31479 48.19280)
18053,at:49:240:0:1,Inbound,13:00-14:00,1,60,1,60,Dorfmeistergasse,POINT (16.29601 48.16436)
34946,at:49:958:0:4,Inbound,14:00-15:00,1,60,1,60,Aspernstraße/Oberdorfstr.,POINT (16.48149 48.22014)
34945,at:49:958:0:4,Inbound,13:00-14:00,1,60,1,60,Aspernstraße/Oberdorfstr.,POINT (16.48149 48.22014)


## Line frequencies
This create a geodataframe with the frequencyu for each combination of line, time of day and direction. Each row with a linestring geomety/

In [10]:
cutoffs = [6,7,8,9,10,11,12,13,14,15,16,17,18]
line_freq = gtfs.lines_freq(stop_times, trips, shapes, routes, cutoffs = cutoffs)
line_freq.head()

Unnamed: 0,route_id,route_name,dir_id,window,frequency,ntrips,max_freq,max_trips,geometry
5451,23-76A-j20-1,76A Enkplatz/Grillgasse - Alberner Hafen,Inbound,11:00-12:00,30,2,20,3,"LINESTRING (16.41681 48.17429, 16.41661 48.174..."
5491,23-76A-j20-1,76A Enkplatz/Grillgasse - Alberner Hafen,Outbound,12:00-13:00,30,2,20,3,"LINESTRING (16.42768 48.17922, 16.42716 48.179..."
5455,23-76A-j20-1,76A Enkplatz/Grillgasse - Alberner Hafen,Inbound,12:00-13:00,30,2,20,3,"LINESTRING (16.41681 48.17429, 16.41661 48.174..."
5456,23-76A-j20-1,76A Enkplatz/Grillgasse - Alberner Hafen,Inbound,12:00-13:00,30,2,20,3,"LINESTRING (16.41681 48.17429, 16.41661 48.174..."
5481,23-76A-j20-1,76A Enkplatz/Grillgasse - Alberner Hafen,Inbound,9:00-10:00,30,2,20,3,"LINESTRING (16.41681 48.17429, 16.41661 48.174..."


# Export Data

## Stop Frequencies

In [11]:
condition_dir = stop_freq.dir_id == 'Inbound'

stop_gdf = stop_freq.loc[(condition_dir),:].reset_index()
stop_gdf.to_file("data/stop_gdf.geojson", driver="GeoJSON")

## Line frequencies

In [12]:
condition_dir = line_freq.dir_id == 'Inbound'
line_gdf = line_freq.loc[(condition_dir),:].reset_index()
line_gdf.to_file("data/line_gdf.geojson", driver="GeoJSON")

## Finally

I visualized the data with [Kepler GL](https://kepler.gl/demo/map?mapUrl=https://dl.dropboxusercontent.com/s/zbh62eqoyj4d054/CityBus%20and%20Postbus%20services.json). This data visualize the line frequencies and stop frequencies.