# Stockholm street types

This project is about creating a map that shows different types of streets: ways, roads, streets, etc. I was curious to see how streets types could be a proxy for the type of urban fabric in an area. Dense city centers are dominated by streets ("gata" in Swedish), suburban car-dominated sprawl is mostly filled with "ways" ("väg" in Swedish), and remote residential areas mainly have other types of streets like alleys and paths.

Since the analysis is by no means sophisticated, my main goal was to learn how to extract and handle OpenStreetMap data using only Python. Why? It is of course possible to acquire OSM data using Overpass Turbo, but that is a manual task. If I would need to export data for a different city, or change the OSM query in some other way I would not want to go through all the manual steps again. It would be much easier if everything was managed through code. 

My initial research into Python libraries that enable OSM data export showed that one of the most common options is the OSMnx package. I followed this [blog post](https://levelup.gitconnected.com/working-with-openstreetmap-in-python-c49396d98ad4), which motivated me to finally sit down and understand how conda and virtual environments work. Let's go through the whole process step-by-step.

## 1. Setup

### 1.1. Installing Anaconda

**Why**

When I first started learning Python I was reluctant to installing Anaconda. It took a lot of space on my machine and I was not sure what the benefits were. But installing geo packages like geopandas or osmnx through pip proved to be very tricky. I kept getting all sorts of errors. After watching [this video](https://www.youtube.com/watch?v=0Hhqf8L-b_0) I realized that Anaconda could take away the headache of installing packages and managing all their dependencies. 

**How**

Followed this [tutorial](https://www.youtube.com/watch?v=0Hhqf8L-b_0)

### 1.2. Setting up a virtual environment

**Why**

Coming from R-world where all packages are installed in the same place and are accessable from any project, Python required a shift in perspective. This project helped me realise the benefits of virtual environments, where I can install the packages I need and a) they will not affect any other program, b) my project will not break even if I will use a newer version of Python or any of the packages in later projects. 

**How**
- Open Terminal. By default Terminal will open with conda base environment activated.
- To create a new environment, I used this command `conda create —name geo_env python=3.9`
- Activate the new environment geo_env `conda activate geo_env`
- Install geopandas. Simple `conda install geopandas` did not work, so I had to add the channel conda-forge where geopandas is avaiable with all its dependencies. These three commands come from [geopandas documentation](https://geopandas.org/en/stable/getting_started/install.html):
    - `conda config --env --add channels conda-forge`
    - `conda config --env --set channel_priority strict`
    - `conda install python=3 geopandas`
- Install osmnx `conda install osmnx`

### 1.3. Using virtual environment in Jupyter

**Why**

I wanted to use Jupyter notebook for this project to better document my workflow and to be able to see the output of every code block, which is more difficult in code editors like VSCode or PyCharm. 

**How**

To make my geo_env available on Jupyter, I followed this [blog post](https://janakiev.com/blog/jupyter-virtual-envs/). As it suggests:
- Activate the virtual env
- "Next, install ipykernel which provides the IPython kernel for Jupyter": `conda install ipykernel`
- Next you can add your virtual environment to Jupyter by typing: `python -m ipykernel install --user --name=geo_env`
- From Anaconda-Navigator launcg JupyterLab. geo_env will be avaiable there. 
- Now I could simply import the packages I needed.


## 2. Getting OSM data

To export data from OpenStreetMap I followed this [blog post by Juan Nathaniel](https://levelup.gitconnected.com/working-with-openstreetmap-in-python-c49396d98ad4), but changed the place variable to Stockholm and added multiple highway tags. 
 

In [68]:
import osmnx

In [49]:
place = "Stockholms län, Sweden"
tags = {'highway': ['residential','primary','secondary','tertiary', 'motorway', 'trunk', 'unclassified']}
roads = osmnx.geometries_from_place(place, tags=tags)

  gdf = gdf.append(_geocode_query_to_gdf(q, wr, by_osmid))


As a result I got a GeoDataFrame with 57k rows.

In [8]:
type(roads)
roads.shape

geopandas.geodataframe.GeoDataFrame

In [51]:
# Select only relevant columns and view the first rows in the data frame
roads_small = roads[['name','geometry']]
roads_small.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,name,geometry
element_type,osmid,Unnamed: 2_level_1,Unnamed: 3_level_1
way,8214875,Skytten Hälls väg,"LINESTRING (17.95571 58.90711, 17.95565 58.907..."
way,8214892,Nynäsvägen,"LINESTRING (17.95472 58.90876, 17.95461 58.908..."
way,8214903,,"LINESTRING (17.95468 58.90696, 17.95548 58.907..."
way,8214979,Alkärrsgatan,"LINESTRING (17.94285 58.90429, 17.94290 58.904..."
way,23322590,,"LINESTRING (17.95400 58.90927, 17.95406 58.909..."


## 3. Classifying street names 

I would like to group streets into categories based on their type. For example, the ones that end with "vägen" or "väg" will be grouped into one category, the ones that end with "gatan" or "gata" into another category. For that I would need to check the end of the string.

- Started with the endswith() method for series: str.endswith() `roads_small['name'].str.endswith("vägen")`
- Tested to write an if statement using that: 

`if roads_small['name'].str.endswith("vägen"):`

`    roads_small['category'] = 'vägen'`

- Getting an error "ValueError: The truth value of a Series is ambiguous", which appears becaus `roads_small['name'].str.endswith("vägen")` is a Series object containing both True and False values.
- The solution for this error is described [here](https://www.learndatasci.com/solutions/python-valueerror-truth-value-series-ambiguous-use-empty-bool-item-any-or-all/). We need to use the boolean Series to subset the dataframe like so: `roads_small[roads_small['name'].str.endswith("vägen")]`

- Getting an error "ValueError: Cannot mask with non-boolean array containing NA / NaN values", which means that I need to remove the NaN values or ignore them somehow. [Found a solution here](https://stackoverflow.com/questions/28311655/ignoring-nans-with-str-contains).

In [29]:
roads_small[roads_small['name'].str.endswith("vägen", na=False)]

Unnamed: 0_level_0,Unnamed: 1_level_0,name,geometry
element_type,osmid,Unnamed: 2_level_1,Unnamed: 3_level_1
way,1240,Klensmedsvägen,"LINESTRING (17.99032 59.29686, 17.99052 59.296..."
way,1241,Hyvelvägen,"LINESTRING (17.99266 59.29664, 17.99222 59.296..."
way,1242,Spikvägen,"LINESTRING (17.99351 59.29639, 17.99298 59.295..."
way,1243,Bultvägen,"LINESTRING (17.99439 59.29611, 17.99429 59.295..."
way,1246,Borrvägen,"LINESTRING (17.99785 59.29484, 17.99751 59.294..."
way,...,...,...
way,1040772612,Skogängsvägen,"LINESTRING (17.90440 59.38263, 17.90384 59.38217)"
way,1040772618,Skogängsvägen,"LINESTRING (17.90431 59.38196, 17.90449 59.38193)"
way,1044696341,Vårbergsvägen,"LINESTRING (17.91125 59.27396, 17.90945 59.273..."
way,1046497897,Rissnavägen,"LINESTRING (17.91630 59.37930, 17.91635 59.379..."


To create conditional column I used numpy's select() method described in [this blog post](https://datagy.io/pandas-conditional-column/). The select() method requires a list of conditions and a list of corresponsing categories. First, making a small test:

In [30]:
import numpy as np

In [None]:
conditions = [roads_small['name'].str.endswith("vägen", na=False), 
            roads_small['name'].str.endswith("gatan", na=False)]

values = ["vägen", "gatan"]

roads_small['category'] = np.select(conditions, values)
roads_small.head()

List of street types with in both definite and indefinite forms. This list will be used to create a list of conditions

In [52]:
types = ['slingan','farten','fart','gången','gång','backen','backe','stigen','stig','höjden',
          'höjd','spåret','spår','terrassen','terrass','hamnen','hamn','gatan','gata','leden',
          'led','gränden','gränd','vägen','väg','länken','länk','stranden','strand','bron','bro',
          'kajen','kaj','allén','allé','tunneln','tunnel','plan','torget','torg','platsen','plats']

List of street types with indefinite forms only that would make up category names. Most values repeat because the length of the two lists needs to be the same

In [None]:
values = ['slinga','fart','fart','gång','gång','backe','backe','stig','stig','höjd',
          'höjd','spår','spår','terrass','terrass','hamn','hamn','gata','gata','led',
          'led','gränd','gränd','väg','väg','länk','länk','strand','strand','bro','bro',
          'kaj','kaj','allé','allé','tunnel','tunnel','plan','torg','torg','plats','plats']

Using list comprehension I create a list of conditions

In [74]:
conditions = [roads_small['name'].str.endswith(i, na=False) for i in types]

Finally, I can use the numpy select() method to create the category column.

In [54]:
roads_small['category'] = np.select(conditions, values_indef)
roads_small.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Unnamed: 0_level_0,Unnamed: 1_level_0,name,geometry,category
element_type,osmid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
way,8214875,Skytten Hälls väg,"LINESTRING (17.95571 58.90711, 17.95565 58.907...",väg
way,8214892,Nynäsvägen,"LINESTRING (17.95472 58.90876, 17.95461 58.908...",väg
way,8214903,,"LINESTRING (17.95468 58.90696, 17.95548 58.907...",0
way,8214979,Alkärrsgatan,"LINESTRING (17.94285 58.90429, 17.94290 58.904...",gata
way,23322590,,"LINESTRING (17.95400 58.90927, 17.95406 58.909...",0


To quickly check the data I counted the number of values in each category. 

In [75]:
roads_small['category'].value_counts()

väg        31794
0          14206
gata        4514
stig        1677
led         1354
gränd       1228
backe       1014
allé         298
plan         275
slinga       268
bro          220
länk         113
tunnel       103
torg          93
strand        78
höjd          62
gång          52
fart          49
hamn          30
kaj           21
plats         18
terrass       15
spår           4
Name: category, dtype: int64

To continue using the data in Mapbox Studion, I saved it as a geojson

In [60]:
roads_small.to_file("stockholm_roads.geojson", driver='GeoJSON')

  if LooseVersion(gdal_version) >= LooseVersion("3.0.0") and crs:


## 4. Visualizing the data

I used Mapbox Studio to visualize the data - uploaded geojson as a new component and added conditional line color. 
You can view the [final result here](https://api.mapbox.com/styles/v1/ninlin/cl1kgmmef001w14o6ggqdgveu.html?title=view&access_token=pk.eyJ1IjoibmlubGluIiwiYSI6ImNqanR0Zzc4bzI5b2Ezd2xlb2ZmbzdrOHMifQ.nhMfjVcApf7oZVzhlMnRLA&zoomwheel=true&fresh=true#9.52/59.3347/18.0604)