# Preliminaries: Data, Python environment, and Jupyter notebook

## Introduction

```{admonition} Credits:

This tutorial was written by Henrikki Tenkanen, Christoph Fink & Willem Klumpenhouwer (i.e. `r5py` developer team).
You can read the [full documentation of `r5py`](https://r5py.readthedocs.io/en/latest/user-guide/user-manual/advanced-use.html#use-a-custom-installation-of-r5) which includes much more information and detailed user manual in case you are interested in using the library for research purposes.

```
### Getting started

There are basically three options to run the codes in this tutorial:

1. Copy-Paste the codes from the website and run the codes line-by-line on your own computer with your preferred IDE (Jupyter Lab, Spyder, PyCharm etc.).
2. Download this Notebook (see below) and run it using Jupyter Lab which you should have installed by following the [installation instructions](https://otesama2023.readthedocs.io/en/latest/info/installing-miniconda.html).
3. Run the codes using Binder (see below) which is the easiest way, but has very limited computational resources (i.e. can be very slow).


### Download the Notebook

You can download this tutorial Notebook to your own computer by clicking the **Download button** from the Menu on the top-right section of the website. 

- Right-click the option that says `.ipynb` and choose **"Save link as .."**

![Download tutorial Notebook.](img/Download_notebook_button.png)

### Run the codes on your own computer

Before you can run this Notebook, and/or do any programming, you need to launch the Jupyter Lab programming environment. The JupyterLab comes with the environment that you installed earlier (if you have not done this yet, follow the [installation instructions](https://otesama2023.readthedocs.io/en/latest/info/installing-miniconda.html)). To run the JupyterLab:

1. Using terminal/command prompt, navigate to the folder where you have downloaded the Jupyter Notebook tutorial: `$ cd /mydirectory/`
2. Activate the programming environment: `$ conda activate geo`
3. Launch the JupyterLab: `$ jupyter lab`

After these steps, the JupyterLab interface should open, and you can start executing cells (see hints below at "Working with Jupyter Notebooks").

#### Alternatively: Run codes in Binder (with limited resources)

Alternatively (not recommended due to limited computational resources), you can run this Notebook by launching a Binder instance. You can find buttons for activating the python environment at the top-right of this page which look like this:

![Launch Binder](img/launch_binder.png)

### Working with Jupyter Notebooks

Jupyter Notebooks are documents that can be used and run inside the JupyterLab programming environment containing the computer code and rich text elements (such as text, figures, tables, and links). 

**A couple of hints**:

- You can **execute a cell** by clicking a given cell that you want to run and pressing <kbd>Shift</kbd> + <kbd>Enter</kbd> (or by clicking the "Play" button on top)
- You can **change the cell-type** between `Markdown` (for writing text) and `Code` (for writing/executing code) from the dropdown menu above. 

See **further details and help for** [**using Notebooks and JupyterLab from here**](https://pythongis.org/part1/chapter-01/nb/04-using-jupyterlab.html). 

**Lesson objectives**

This tutorial focuses on how to work with various geospatial data to construct a **spatial networks** and learn how to construct a routable **directed** graph for Networkx and find shortest paths along the given street network based on travel times or distance by car using **R5py**.  In addition, we will learn about the relevant data to calculate GHG emissions for different travel modes of commutating among multiple origin-destination zones simultaneously.  

## Data requirements
**to discuss -- Describe types of data use here, (i) Networkx (ii) R5py (iii) Locomizer (iv) H3 **

### Data for creating a routable network
**R5py** is a Python library for routing and calculating travel time matrices on multimodal transport networks (walk, bike, public transport and car).
It provides a simple and friendly interface to R<sup>5</sup> (*the Rapid Realistic Routing on Real-world and Reimagined networks*) which is a [routing engine](https://github.com/conveyal/r5) developed by [Conveyal](https://conveyal.com/). `R5py` is designed to interact with [GeoPandas](https://geopandas.org) GeoDataFrames, and it is inspired by [r5r](https://ipeagit.github.io/r5r) which is a similar wrapper developed for R. `R5py` exposes some of R5’s functionality via its [Python API](reference.html), in a syntax similar to r5r’s. At the time of this writing, only the computation of travel time matrices has been fully implemented. Over time, `r5py` will be expanded to incorporate other functionalities from R5.
When calculating travel times with `r5py`, you typically need a couple of datasets: 

- **A road network dataset from OpenStreetMap** (OSM) in Protocolbuffer Binary (`.pbf`) -format: 
  - This data is used for finding the fastest routes and calculating the travel times based on walking, cycling and driving. In addition, this data is used for walking/cycling legs between stops when routing with transit. 
  - *Hint*: Sometimes you might need modify the OSM data beforehand, e.g. by cropping the data or adding special costs for travelling (e.g. for considering slope when cycling/walking). When doing this, you should follow the instructions at [Conveyal website](https://docs.conveyal.com/prepare-inputs#preparing-the-osm-data). For adding customized costs for pedestrian and cycling analyses, see [this repository](https://github.com/RSGInc/ladot_analysis_dataprep).

- **A transit schedule dataset** in General Transit Feed Specification (GTFS.zip) -format (optional):
   - This data contains all the necessary information for calculating travel times based on public transport, such as stops, routes, trips and the schedules when the vehicles are passing a specific stop. You can read about [GTFS standard from here](https://developers.google.com/transit/gtfs/reference).
   - *Hint*: `r5py` can also combine multiple GTFS files, as sometimes you might have different GTFS feeds representing e.g. the bus and metro connections. 


### Data for origin and destination locations (Locomizer and H3)

In addition to OSM and GTFS datasets, you need data that represents the origin and destination locations (OD-data) for routings. This data is typically stored in one of the geospatial data formats, such as Shapefile, GeoJSON or GeoPackage. As `r5py` is build on top of `geopandas`, it is easy to read OD-data from various different data formats. 


### Where to get these datasets?

Here are a few places from where you can download the datasets for creating the routable network:

- **OpenStreetMap data in PBF-format**:

  - [pyrosm](https://pyrosm.readthedocs.io/en/latest/basics.html#protobuf-file-what-is-it-and-how-to-get-one)  -library. Allows downloading data directly from Python (based on GeoFabrik and BBBike).
  - [pydriosm](https://pydriosm.readthedocs.io/en/latest/quick-start.html#download-data) -library. Allows downloading data directly from Python (based on GeoFabrik and BBBike).
  - [GeoFabrik](http://download.geofabrik.de/) -website. Has data extracts for many pre-defined areas (countries, regions, etc).
  - [BBBike](https://download.bbbike.org/osm/bbbike/) -website. Has data extracts readily available for many cities across the world. Also supports downloading data by [specifying your own area or interest](https://extract.bbbike.org/).
  - [Protomaps](https://protomaps.com/downloads/osm) -website. Allows to download the data with custom extent by specifying your own area of interest.


- **GTFS data**:  
  - [Transitfeeds](https://transitfeeds.com/) -website. Easy to navigate and find GTFS data for different countries and cities. Includes current and historical GTFS data. Notice: The site will be depracated in the future.  
  - [Mobility Database](https://database.mobilitydata.org) -website. Will eventually replace TransitFeeds -website. 
  - [Transitland](https://www.transit.land/operators) -website. Find data based on country, operator or feed name. Includes current and historical GTFS data.
    
### Data for GHG emission factors


The [International Transport Forum (ITF)](https://www.itf-oecd.org/) has developed a comprehensive life-cycle analysis [Excel-based tool](data/life-cycle-assessment-calculations-2020.xlsx) of urban transport modes including new mobility services, such as shared vehicles and ridesourcing. The Excel-based tool (provide the link?) includes calculations and assumptions made for the ITF report titled as "Good to go? Assessing the environmental performance of new mobility in cities". 
- **Sources**:
    - Cazzola, P. and Crist, P., 2020. [Good to go? Assessing the environmental performance of new mobility](https://www.itf-oecd.org/good-go-assessing-environmental-performance-new-mobility)
    - [The electricity generation mix of Finland in 2020.](https://www.iea.org/countries/finland)

From these sources, a CSV file named ["LCA_gCO2_per_pkm_by_transport_mode.csv"](data/LCA_gCO2_per_pkm_by_transport_mode.csv) is created, consisting of GHG emissions per passenger-kilometer (g CO<sub>2</sub>/pkm) by transport modes derived from the mentioned LCA tool by ITF. The columns represent the different transport modes and the rows represent the GHG emissions. The GHG emissions of the transport modes have been divided into four separate components: vehicle component, fuel component, infrastructure component and operational services. The explanations of the acronyms (in the transport modes names) are: BEV = battery electric vehicle; HEV = hybrid electric vehicle; ICE = internal combustion engine; FCEV = fuel cell electric vehicle; PHEV = plug-in hybrid electric vehicle. 
### Sample datasets

In the following tutorial, we use various open source datasets:
- The point dataset for Helsinki has been obtained from [Helsinki Region Environmental Services](https://www.hsy.fi/en/environmental-information/open-data/avoin-data---sivut/population-grid-of-helsinki-metropolitan-area/) (HSY) licensed under a Creative Commons By Attribution 4.0. 
- The street network for Helsinki is a cropped and filtered extract of OpenStreetMap (© OpenStreetMap contributors, [ODbL license](https://www.openstreetmap.org/copyright))
- The GTFS transport schedule dataset for Helsinki is a cropped and minimised copy of Helsingin seudun liikenne’s (HSL) open dataset [Creative Commons BY 4.0](https://www.hsl.fi/hsl/avoin-data#aineistojen-kayttoehdot).

In [1]:
# this cell is hidden from output
# it’s used to set sys.path to point to the local repo,
# and to define a `DATA_DIRECTORY` pathlib.Path
import pathlib
import sys

NOTEBOOK_DIRECTORY = pathlib.Path().resolve()
DOCS_DIRECTORY = NOTEBOOK_DIRECTORY.parent.parent
DATA_DIRECTORY = DOCS_DIRECTORY / "_static" / "data"
R5PY_DIRECTORY = DOCS_DIRECTORY.parent / "src"
sys.path.insert(0, str(R5PY_DIRECTORY))

## Introduction to H3 Hexagonal Hierarchical Geospatial Indexing System

In this tutorial, first, we will learn how to work with Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in Python. We will utilize the geographical data around the city center area of Helsinki, Finland. **to discuss - a brief demonstration is provided on the quick utilization of H3 data, visualization, and utilization with Locomizer data.**


In [2]:
## Helsinki Central Location
from shapely.geometry import Point 
import osmnx as ox
import geopandas as gpd

address = "Stockmann - Helsinki, Finland"
address = "Scandic Hub, Helsinki"
lat, lon = ox.geocode(address)

# Create a GeoDataFrame out of the coordinates
origin = gpd.GeoDataFrame({"geometry": [Point(lon, lat)], "name": "Helsinki Railway station", "id": [0]}, index=[0], crs="epsg:4326")
origin.explore(max_zoom=13, color="red", marker_kwds={"radius": 12})

We select 30 neighboring hexagon cells from the center of Helsinki. 

In [3]:
import shapely
import h3
import matplotlib.pyplot as plt
# h3 hexagons are to be created at a specified resolution from 1-10
resolution = 9

# We can indicate the number of hexagonal rings around the central hexagon
ring_size = 6

# Get the H3 hexagons covering the central Helsinki 
center_h3 = h3.latlng_to_cell(lat, lon, resolution)
hexagons = list(h3.grid_disk(center_h3, ring_size))  # Convert the set to a list
##hexagons = list(h3.grid_ring(center_h3, ring_size))  ## Check with henrikki! 
##hexagons = ['891126d33d7ffff','891126d148bffff','891126d3213ffff','891126d33c7ffff','891126d33d3ffff','891126d3223ffff','891126d3357ffff','891126d3383ffff','891126d3377ffff','891126d33afffff','891126d314bffff',
##'891126d302fffff','891126d3077ffff','891126d3047ffff','891126d32b3ffff','891126d301bffff']
##hexagons

AttributeError: module 'h3' has no attribute 'latlng_to_cell'

In [None]:
# Create a GeoDataFrame with hexagons and their corresponding geometries
hexagon_geometries = [shapely.geometry.Polygon(h3.cell_to_boundary(hexagon)) for hexagon in hexagons]
#hexagon_geometries = [h3.cell_to_boundary(hexagon) for hexagon in hexagons]
##hexagon_geometries[0]

#h3.cells_to_multi_polygon(hexagons[0], geo_json=False)
hexagon_df = gpd.GeoDataFrame({'Hexagon_ID': hexagons, 'geometry': hexagon_geometries},crs="epsg:4326")
hexagon_df.explore()

In [None]:
h3.versions()
l = list(h3.cell_to_boundary(str(hexagons[0])))
lr = [(p[1],p[0]) for p in l]
polygon =shapely.geometry.Polygon(lr) ## list(h3.cell_to_boundary(hexagons[0]))) #str(hexagons[0])
hexagon_df = gpd.GeoDataFrame({'Hexagon_ID': hexagons[0], 'geometry': polygon},index=[0],crs="epsg:4326") ## 
hexagon_df.explore() 

In [None]:
# Create a GeoDataFrame with hexagons and their corresponding geometries
## We need to revrse the (lat, lon) for h3 V4. It is a beta (unstable) release,
## h3 V4 also has issues with returning geo_json=True in the h3.cell_to_boundary() function
list_h3_coords = [h3.cell_to_boundary(hexagon) for hexagon in hexagons]
## reversing each lat,lon pair in the list
hexagon_geometries_r = [shapely.geometry.Polygon((p[1],p[0]) for p in h_list) for h_list in list_h3_coords]
#hexagon_geometries = [shapely.geometry.Polygon() for hexagon in hexagons]
hexagon_geometries_r 
hexagon_df = gpd.GeoDataFrame({'Hexagon_ID': hexagons, 'geometry': hexagon_geometries_r},crs="epsg:4326")
hexagon_df.explore()

In [None]:
#hexagon_df.head()
h3.cell_to_latlng('891126d06d7ffff')##.explore()

orig_y, orig_x = 60.16874416, 24.95721918 
# Destination
dest_y, dest_x =   60.1622494, 24.9082137 
h3.latlng_to_cell(dest_y, dest_x ,9)
#h3.latlng_to_cell(orig_y, orig_x,9)

In [None]:
h3.latlng_to_cell(orig_y, orig_x,9)

In [None]:
## Create a column of centroids and make them as point geometry
hexagon_df["H3_centroids"] = hexagon_df.Hexagon_ID.apply(lambda x: Point(h3.cell_to_latlng(str(x))))
hexagon_df.head()
#hexagon_df.explore()

In [None]:
hexagon_df.to_csv("data/Helsinki_Hexagons_with_Centroids_6Rings_9Res.csv",header=True)

In [None]:
from itertools import combinations, permutations
import pandas as pd

col_combs = list(combinations(hexagon_df.Hexagon_ID, 2))
col_combs = list(permutations(hexagon_df.Hexagon_ID, 2))

hexagon_OD_df = pd.DataFrame(col_combs,columns = [['Origin_Hexagon_ID','Destination_Hexagon_ID']])
hexagon_OD_df['Origin_Centroid_Lat'] = hexagon_OD_df.apply(lambda x: h3.cell_to_latlng(str(x.Origin_Hexagon_ID))[0],axis=1)
hexagon_OD_df['Origin_Centroid_Lon'] = hexagon_OD_df.apply(lambda x: h3.cell_to_latlng(str(x.Origin_Hexagon_ID))[1],axis=1)
hexagon_OD_df['Destination_Centroid_Lat'] =  hexagon_OD_df.apply(lambda x: h3.cell_to_latlng(str(x.Destination_Hexagon_ID))[0],axis=1)
hexagon_OD_df['Destination_Centroid_Lon'] = hexagon_OD_df.apply(lambda x: h3.cell_to_latlng(str(x.Destination_Hexagon_ID))[1],axis=1)
hexagon_OD_df.head()

In [None]:
hexagon_OD_df.to_csv("data/Helsinki_OD_Hexagons_with_LatLon_6Rings_9Res.csv",header=True)
## Origin Hex ID - 891126d3073ffff (Res 9)
## Destination Hex ID - 891126d3377ffff (Res 9)
###id_cen_dict = hexagon_df[['Hexagon_ID','H3_centroids']].set_index('Hexagon_ID').to_dict('index')
##id_cen_dict['891126d3327ffff']['H3_centroids']
###hexagon_OD_df['Origin_Centroid'] = hexagon_OD_df.apply(lambda x:id_cen_dict[x.Origin_Hexagon_ID]['H3_centroids'],axis=1)
##hexagon_OD_df['Destination_Centroid'] = hexagon_OD_df.apply(lambda x:id_cen_dict[x.Destination_Hexagon_ID]['H3_centroids'],axis=1)
###hexagon_OD_df.head()
###hexagon_OD_df['Origin_Centroid_Lat']= hexagon_OD_df.Origin_Centroid.apply(lambda x: x.y)
#, hexagon_OD_df.Origin_Centroid.apply(lambda x: x.x)
###hexagon_OD_df['Destination_Centroid_Lat'], hexagon_OD_df['Destination_Centroid_Lon'] = hexagon_OD_df.Destination_Centroid.apply(lambda x: x.y), hexagon_OD_df.Destination_Centroid.apply(lambda x: x.x)
##hexagon_df.H3_centroids.apply(lambda x: x.y)
##hexagon_OD_df.head()

## Load and prepare the origin and destination data
We will use a pair of these hexagons as origin and destinations to understand travel patterns of commuters and Green House Gas (GHG) emissions of individual users due to different mode of transport. 

### Locomizer data
Let's start by understanding a sample of the Locomizer dataset into a pandas `DataFrame` that we can use as origin-destination hexagons that contain the number of users travelled in between. 

In [None]:
import glob
import pandas as pd

LocoFiles = glob.glob(r'data/Locomizer_data/*R9*.csv')
LocoFiles

In [None]:
loco_df = pd.read_csv(LocoFiles[0],index_col=0)
loco_df.head()

The `loco_df` GeoDataFrame contains a few columns, namely `ORIGIN_CODE_R9`, `DESTINATION_CODE_R9`,`DAY` `NUMBER_OF_USERS`, `EXTRAPOLATED_NUMBER_OF_USERS`. As we can see, there is no `geometry` column with the unique boundary values of the hexagons. The origin and destination hexagon `geometry` columns are required for defining the hexagonal boundaries to work. Hence we have to calculate the hexagon boundaries. 

#### Calculating the Polygon boundary of the OD hexagons

We can calculate the Polygon boundaries of the OD hexagon cells of the locomizer data for future use. We can do this by applying shapely.geometry function to all the origin and destination hex-codes and calculating the boundary of the Polygons. 

*Note: You can ignore the UserWarning raised by geopandas about the geographic CRS. The geography columns is accurate enough for most purposes.*

In [None]:
loco_df_all = pd.DataFrame()
for filename in LocoFiles:
    loco_df = pd.read_csv(filename,index_col=0)
    loco_df["ORIGIN_boundary"] = loco_df.ORIGIN_CODE_R9.apply(lambda x: shapely.geometry.Polygon((p[1],p[0]) for p in h3.cell_to_boundary(x))) # 
    loco_df["DESTINATION_boundary"] = loco_df.DESTINATION_CODE_R9.apply(lambda x: shapely.geometry.Polygon((p[1],p[0]) for p in h3.cell_to_boundary(x))) 
    interval = filename.split(".")[0].split("_")[-2]
    loco_df["interval"] = [interval]  * len(loco_df) 
    loco_df_all = pd.concat([loco_df_all,loco_df],ignore_index=True)

In [None]:
loco_df_all.to_csv("data/Locomizer_data/SDey_LocomizerOD_MayJune2023_R9.csv")
loco_df_all
##shapely.geometry.Polygon(h3.cell_to_boundary(hexagon))
##shapely.geometry.Polygon((p[1],p[0]) for p in h3.cell_to_boundary(hexagon))


### Exploring OD hexgons with locomizer data
To get a better sense of the data map that shows the locations of the polygons and visualize the geographical boundary of each cell. We will begin with visualising origin hexgons with valid users in H3 hexagonal resolution 9.  



In [None]:
unique_origins = loco_df_all[["ORIGIN_CODE_R9","ORIGIN_boundary"]].drop_duplicates()
loco_Origin_df = gpd.GeoDataFrame({'O_ID': unique_origins.ORIGIN_CODE_R9.to_list(), 'geometry': unique_origins.ORIGIN_boundary.to_list()},crs="epsg:4326")
loco_Origin_df.explore()

In [None]:
unique_destinations = loco_df_all[["DESTINATION_CODE_R9","DESTINATION_boundary"]].drop_duplicates()
loco_Destination_df = gpd.GeoDataFrame({'D_ID': unique_destinations.DESTINATION_CODE_R9.to_list(), 'geometry': unique_destinations.DESTINATION_boundary.to_list()},crs="epsg:4326")
loco_Destination_df.explore()

## Helsinki Regional Transport Authority (HSL) survey data
The Helsinki Regional Transport Authority (Finnish: Helsingin seudun liikenne, HSL; Swedish: Helsingforsregionens trafik, HRT) is the inter-municipal body responsible for maintaining the public transportation network across the nine municipalities of Greater Helsinki, Finland.

HSL oversees the operation of Helsinki's entire public transportation system, which includes local buses, trams, metro trains, ferries, commuter trains, and bikeshare services. According to an HSL survey on the number of trips, residents of the Helsinki region aged seven and above made a total of 4.6 million trips on a typical weekday in the fall of 2023. The detailed report can be found [here](data/lt23-kulkutapajakaumat-hsl-nettisivuille.pdf).

Based on the modality of travel choice and different percentages of various weekday trips of residents of the Helsinki region in autumn 2023, the survey report contains the following ratio of travel mode share in the HSL-alue yhteensä (i.e. total HSL area):

In [None]:
Car_share_Hsl = 0.35 ## Henkilöauto
PT_share_Hsl = 0.23 ## Joukkoliikenne
Bike_share_Hsl = 0.08  ## Polkupyörä
Walk_share_Hsl = 0.33  ## Kävely
Other_share_Hsl = 0.01 ## Muu

We will be using these data in the later part of the tutorial to split the travellers among OD hexgon pairs accordingly to calculate CO2 emissions. 
### any addition ? 





## Bibliography

:::{bibliography}
:filter: docname in docnames
:::