# Geospatial Data Science Tutorial for IC2S2'23
Github: https://github.com/NERDSITU/gdstutorial  
Organizers: https://nerds.itu.dk/
# 4. SPATIAL NETWORKS: Spatial Networks with Geopandas

Contents:
* Graph types in graph theory and in NetworkX
* Planar and non-planar graphs
* Converting osmnx graphs to geodataframes
* Distance on graphs vs. distance on street networks
* GTFS: Public transit data 

Figures & data used in this notebook have been generated with the `SpatialNetworks_behindthescences.ipynb` notebook - for your further reference.

In [None]:
# import libraries
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas
import pandas
import networkx
import osmnx
import matplotlib.pyplot as plt
import shapely
import contextily


<p style="text-align:center;">
    <img src="images/world-cities.png" alt="Bucaramanga, Bunia, Gasteiz" width=1000px>
</p>

In [None]:
# Load the 2 example graphs (in the `data` folder) with the load_graphml function from OSMnx:

# Bucaramanga

# Bunia

<p style="text-align:center;">
    <img src="images/graph-types.png" alt="Graph types" width=1000px>
</p>

### Q1: What graph type does OSMnx use by default for street networks?

In [None]:
# Check the type of the two graph objects that we just loaded:

# Planar graphs

<p style="text-align:center;">
    <img src="images/three-utilities.png" alt="Three Utilities Problem" width=1000px>
</p>

<p style="text-align:center;">
    <img src="images/plan-exp.png" alt="Planarity Explained" width=1000px>
</p>

### Q2: Are the street networks of Bucaramanga and Bunia planar?

In [37]:
# Hint: networkx has a function that checks for planarity
# (don't try to get a counterexample, it takes too long for our purposes)

## Now, let's explore the geometries of our graphs.

### Q3: How to convert a graph into a geodataframe?

Hint 1: If you downloaded the graph with OSMnx (which is our case) >> use the `OSMnx` function `graph_to_gdfs`

Hint 2: more generally: if geometries are attributes of edges, you can use the `networkx` function `get_edge_attributes`

In [None]:
# For G_bun and G_buc, create geodataframes of nodes and edges: 
# nodes_buc, edges_buc; nodes_bun, edges_bun 

### Q4: How to quickly visualize your geodataframe in an interactive map?

Task 1: Visualize **all** edges of the **Bunia street network**

Task 2: Visualize **only** the `"highway==cycleway"` edges of the **Bucaramanga street network**

Hint: You can use the geopandas `.explore()` method on the geodataframe 

In [None]:
# explore the Bunia street network

In [None]:
# explore the Bucaramange cycleway network

## Now, let's look at distances between nodes on spatial graphs

We will use the street network graph of Bunia as an example

In [None]:
# First, check what the nodes_bun geodataframe looks like:

In [40]:
# let us plot the *first two nodes* and *all edges* of the Bunia network

### Q5: What is the distance between the two source (red) and target (blue) nodes...
- ... on the network?
- ... as the (proverbial) crow flies?

Hint 1: `networkx` has a function to compute the shortest path length

Hint 2: geopandas has a `distance` method to compute distances between 2 geometries

In [38]:
# compute `distance_network` - the distance between the 2 nodes on the street network 

In [39]:
# compute `distance_crow` - the distance between the 2 nodes as the crow flies

### Q6: What is wrong here?

In [None]:
# project the graph (with a manually or automatically defined crs)

In [None]:
# get the "nodes" data frame once more

In [41]:
# compute `distance_crow` for the manually projected graph

### Q7: What is the "detour factor" for the path between our 2 nodes? 

(i.e. how much longer is network path in comparison to the straight line?)

`detour_factor = network_path / crow_path`

e.g. detour factor of 2 means that network_path is 2x as long as crow_path

In [42]:
# compute the detour factor

# Public transit networks

### GTFS (General Transit Feed Specification)
[https://gtfs.org](https://gtfs.org)

```
Data > 3rd party GTFS URL directories > The Mobility Database > Download the catalogs CSV
```


<p style="text-align:center;">
    <img src="images/gtfs-sourcelist.png" alt="List of GTFS sources" width=1000px>
</p>

### We will use Vitoria-Gasteiz (Euskadi) as an example 

Link to download GTFS feed:

http://www.vitoria-gasteiz.org/we001/http/vgTransit/google_transit.zip

In [None]:
# Read in 3 text files: routes; shapes; stop_times; stops; and trips
routes = pandas.read_csv(f'./data/GTFS_Data/routes.txt')
trips = pandas.read_csv(f'./data/GTFS_Data/trips.txt')
shapes = pandas.read_csv(f'./data/GTFS_Data/shapes.txt')

# the original crs is epsg:4326
crs_orig = "EPSG:4326"

# as projection we will use the crs epsg:2062
crs_proj = "EPSG:2062"

### Rethorical Q: Can we make a plot of all public transit lines in Vitoria-Gasteiz?

A: Yes we can! It's just a bit messy

Information is spread out over 3 files:

* `routes` contains route IDs and names (routes = public transit lines)
* `trips` contains all trips for each route, and their "shape id"
* `shapes` contains the geometries corresponding to each shape id 

In [None]:
# routes: each row is a *route* (e.g. a certain busline)
routes = routes[["route_id", "route_long_name"]]
routes.head(10)

In [None]:
# trips: each row is a trip (e.g. the 10'o clock bus nr. 6 in a given direction); 
# so each route has several trips (with different schedules)
trips = trips[["route_id", "trip_id", "shape_id", "trip_headsign"]]
trips.head(10)

In [None]:
# shapes: each row is a point of a linestring
shapes.head(10)

### Q8: How can we convert the "shapes" dataframe into a geodataframe?

Hint 1: you can use the `geopandas.GeoSeries.from_xy()` method, where the input consists of two list of point coordinates (longitude, latitude)

Hint 2: you need to define a CRS when creating a geodataframe (for this data set, it is `EPSG:4326`)


In [None]:
### Converting the "shapes" file into a geodataframe with a geometry column

# Get the geometry column (aggregating the longitude and latitude information)
shapes["geometry"] = geopandas.GeoSeries.from_xy(
    x = shapes["shape_pt_lon"], y = shapes["shape_pt_lat"])
# Remove not needed columns
shapes.drop(columns = ["shape_pt_lat", "shape_pt_lon"], inplace = True)
# Make a GeoDataFrame (pass a geometry column and a crs!)
shapes = geopandas.GeoDataFrame(shapes, geometry = "geometry", crs = crs_orig)
# Project into a projected CRS (here: for nicer plotting)
shapes = shapes.to_crs(crs_proj)
# check the result:
shapes.head()

In [None]:
# We're not done yet! We still need to aggregate points into trajectories:
# Each point is part of a linestring describing the trajectory of a trip, NOT a stop
shapes.plot(column = "shape_id")

In [None]:
# For each unique shape_id, combine all points (in correct sequence order) into one linestring:

# initialize list for shape ids and list for shape geometries (linestrings)
shape_ids = []
shape_geoms = []

# loop through unique shape ids
for shape_id_current, shape_df_current in shapes.groupby("shape_id"):
    
    # sort the subset of the dataframe for this unique shape id by shape_pt_sequence
    shape_df_current = shape_df_current.sort_values(by = "shape_pt_sequence", ascending=True)
    
    # get a list of shape points (the geometry column, orderer by shape_pt_sequence)
    list_of_shape_points = list(shape_df_current.geometry)

### !!! HERE THE MAGIC HAPPENS:
    
    # make a linestring from the list of points
    shape_geom_current = shapely.geometry.LineString(list_of_shape_points)

### MAGIC IS OVER
    
    # append the current shape id and the current line string to our list of results
    shape_ids.append(shape_id_current)
    shape_geoms.append(shape_geom_current)

In [None]:
# make a geodataframe from the lists of results
shapes_gdf = geopandas.GeoDataFrame(
    {
        "geometry": shape_geoms,
        "shape_id": shape_ids
    },
    crs = shapes.crs
    )

shapes_gdf.head()

In [None]:
# But we're still not done - each route has several shape_ids... 
# make a list of shape_ids for each route:
route_shapes = trips.groupby("route_id").aggregate({"shape_id": list})
route_shapes["route_id"] = route_shapes.index
route_shapes.reset_index(drop=True, inplace=True)
route_shapes.head()


In [None]:
# for each route, merge the geometries of all shape_ids together with the .unary_union method
route_shapes["geometry"] = route_shapes.apply(
    lambda x: 
        shapes_gdf[shapes_gdf.shape_id.isin(x.shape_id)]["geometry"].unary_union,
    axis = 1
    )
route_shapes.head()

In [None]:
# merge the route_shapes information to our routes table
gdf = pandas.merge(routes, route_shapes, how = "left")
# convert into a geodataframe
gdf = geopandas.GeoDataFrame(gdf, geometry = "geometry", crs = shapes_gdf.crs)
# drop empty rows
gdf = gdf.dropna().reset_index(drop=True)
gdf.head()


In [None]:
# Visualize on an interactive map, where each route_id has a separate color:
gdf[["route_id", "route_long_name", "geometry"]].explore(column="route_id")

### Q9: What about the *stops*? Can we make a plot of all the public transit stops?

* task 1: read in the stops file
* task 2: convert it into a geodataframe (needs a geometry column and a CRS; same as we did with the "shapes" file)
* task 3: visualize it on a map in a projected CRS; only the stop_name should appear when hovering

<p style="text-align:center;">
    <img src="images/gasteiz-stops.png" alt="Public transit stops in Gasteiz" width=500px>
</p>

In [43]:
# Task 1: Read in file

In [44]:
# Task 2: Convert it into a geodataframe

In [45]:
# Task 3: Project and visualize with the "stop_name" as appearing attribute