<figure>
   <IMG SRC="https://mamba-python.nl/images/logo_basis.png" WIDTH=125 ALIGN="right">
</figure>

# Geopandas
_developed by Davíd Brakenhoff & Onno Ebbens_
<hr>

This notebook shows some of the basic functionality of the geopandas module. Geopandas combines the power of shapefiles with the power of Pandas for working with large datasets.

### Contents<a id="top"></a>
1. [Installing geopandas](#1)
2. [Importing geopandas](#2)
3. [Reading shapefiles](#3)
4. [Viewing attribute table](#4)
5. [Geodataframe from DataFrame](#5)
6. [Plotting data](#plot)
7. [Add basemap to plot](#7)

## 1. [Installing geopandas](#top)<a id="1"></a>
`geopandas`is a notoriously difficult package to install without getting vague errors. It has gotten better recently (this was written early 2019) so hopefully this step will be a breeze for you. Otherwise, if you do have trouble, you're not the only one.

There are two methods of getting geopandas that have worked consistently on my PC. The first uses `conda` which is available if you have installed the Anaconda Python distribution or Miniconda distribution. The second uses precompiled packages (called wheels) from a website.

### Using conda
This step assumes you have `conda` installed, via the Anaconda Python Distribution or Miniconda, and it's quite straightforward. Type `conda install geopandas` into the Anaconda Prompt, and wait. Confirm the installation by pressing `y` when prompted. Afterwards start a new Anaconda Prompt, type `python` and then once Python has started up, type `import geopandas`. If you don't see any errors, congratulations, this worked so you can continue to step 2.

If you do see an error you could try to install geopandas from the conda-forge channel (a different location from which to get python packages). You can do this by typing `conda install -c conda-forge geopandas` into the Anaconda Prompt.

### Using wheels and pip
Download the following packages from this [Christoph Gohlke's website](https://www.lfd.uci.edu/~gohlke/pythonlibs):
- GDAL
- Fiona
- geopandas

Use CTRL+F to find the download link on the page. Be sure to download the correct version of the package. The Python version should match your Python version (see Help > About in the Jupyter Notebook if you're not sure which Python version you have). Also the architecture should match (i.e. 64bits vs 32bits). For example:

- GDAL‑2.3.3‑cp37‑cp37m‑win_amd64.whl

This is the latest GDAL version as of writing this notebook for Python 3.7 (as can be seen from the cp37 in the name), for 64-bits Python (as derived from the amd64 in the name). This is usually the one you want (latest Python, 64 bits).

Once you have downloaded the correct files, open Anaconda Prompt, and navigate to the directory in which you saved your downloads. Now type the following commands (the order is important):
1. `pip install GDAL‑2.3.3‑cp37‑cp37m‑win_amd64.whl`
2. `pip install Fiona‑1.8.5‑cp37‑cp37m‑win_amd64.whl`
3. `pip install geopandas‑0.4.1‑py2.py3‑none‑any.whl` (Note that the geopandas download is not python or archtecture specific)

If these steps complete succesfully, open a new Anaconda Prompt, start Python and try to `import geopandas`. If it works without any error messages, congrats and move to step 2.


## 2. [Importing geopandas](#top)<a id="2"></a>

This is easy!

In [None]:
import geopandas as gpd
import pandas as pd
%matplotlib inline

## 3. [Reading shapefiles](#top)<a id="3"></a>
This is done using the `gpd.read_file()` function

In [None]:
fileName = r"shapefiles/Rotterdam_centraal.shp" ##file name
gdf = gpd.read_file(fileName)

## 4. [Viewing attribute table](#top)<a id="4"></a>
The attribute table is loaded as a GeoDataFrame which is similar to a `pandas.DataFrame`.

In [None]:
gdf

Operations, such as `loc` and `sum()`, that youdo on `pandas.DataFrames` are also available on `GeoDataFrames`. 

In [None]:
gdf.loc[1, "id"]

In [None]:
gdf["id"].sum()

#### Exercise 1<a name="ex1"></a>
In the `shapefiles` directory is a file named `nybb.shp`. Read the shapefile as a `GeoDataFrame` and have a look at the attribute table. What kind of data is in tihs shapefile?

<a href="#ans1">Answer Exercise 1</a>

#### Exercise 2<a name="ex2"></a>
One of the columns in the attribute table shows the surface area within a shape. Use the `idxmax()` method of pandas to find the shape and properties of the polygon with the biggest surface area.

<a href="#ans2">Answer Exercise 2</a>

## 5. [GeoDataFrame from DataFrame](#top)<a id="5"></a>

You can convert a DataFrame with x and y coördinates to a GeoDataFrame. First we load a dataframe with x and y coordinates in the columns `UTM_X` and `UTM_Y`.

In [None]:
df_turbines = pd.read_excel(r'data\turbines_ohvs.xlsx')
df_turbines.head()

We use the `points_from_xy` function to convert the x and y coördinates in to a list of `POINT` geometries. Then we create a `GeoDataFrame` from the original `DataFrame` and the list of geometries we've just created. 

In [None]:
geometry = gpd.points_from_xy(df_turbines['UTM_X'], df_turbines['UTM_Y'])
gdf_turbines = gpd.GeoDataFrame(df_turbines, geometry=geometry)
gdf_turbines.head()

#### Exercise 3<a name="ex3"></a>

The file `turbines_ohvs` contains the owner ('eigenaar') per turbine. List all the owners that are mentioned in this file?

<a href="#ans3">Answer Exercise 3</a>

#### Exercise 4<a name="ex4"></a>

The file `ind06intensiteitpergoedgekeurdelus-2015-01.csv` contains the traffic intensities in January 2015 at different locations in the city Utrecht. Read this file as a `DataFrame` and convert to a `GeoDataFrame` using the method aboven.

<a href="#ans4">Answer Exercise 4</a>

## 6. [Plotting data](#top)<a id="plot"></a>

Plotting a shapefile using geopandas is easy!

In [None]:
gdf.plot()

#### Exercise 5<a name="ex5"></a>

Plot the locations of the traffic measurements from exercise 4.

<a href="#ans5">Answer Exercise 5</a>

#### Exercise 6<a name="ex6"></a>

The plot in exercise 5 doesn't look very appealing. This is caused by an error in one of the latitude coördinates. Remove the points with the wrong coördinates.

<a href="#ans6">Answer Exercise 6</a>


## 7. [Add a basemap](#top)<a id="7"></a>

You can add a basemap to your plot with the `contextily` package. Before you install `contextily` you have to install `cartopy` and `rasterio`. 

Just like `gdal`, `fiona` and `geopandas` you can download `cartopy` and `rasterio` from [Christoph Gohlke's website](https://www.lfd.uci.edu/~gohlke/pythonlibs) and install with:<br><br>
`pip install Cartopy‑0.17.0‑cp37‑cp37m‑win_amd64.whl`

`pip install rasterio‑1.1.8‑cp37‑cp37m‑win_amd64.whl`




Once you've installed this succesfully you can install `descartes` and `contextily` with:<br><br>

`pip install descartes`

`pip install contextily`.


Adding a basemap requires just the following steps:
1. convert the coördinate reference system (crs) of your `GeoDataFrame` to lat/lon coördinates.
2. plot the `GeoDataFrame` with `gdf.plot()`
3. use the `add_basemap` function from the `contextily` package to add a basemap to the plot

In [None]:
import contextily as ctx

## 7.1 convert crs

`contextily` requires a specific coördinate reference system (epsg 3857). Therefore we need to convert the crs from our `GeoDataFrame` to epsg 3857. This is easily done with the `to_crs` method. If your current `GeoDataFrame` has no crs the `to_crs` method won't work and you have to specify the `crs` first. The current crs is an attribute of your `GeoDataFrame`: `gdf.crs`.

In [None]:
print(gdf.crs)
gdf = gdf.to_crs(epsg=3857)

In [None]:
ax = gdf.plot(figsize=(10, 10), alpha=0.5, edgecolor='k')
ctx.add_basemap(ax)

#### Exercise 7<a name="ex7"></a>

When we created the `GeoDataFrame` from the `DataFrame` of the traffic measurements we didn't specify a coordinate reference system (crs). In order to plot a background map we need to convert the `GeoDataFrame` to a certain crs. Therefore we have to specify the original crs. Specify the crs when creating the `GeoDataFrame` of the traffic measurements, then convert the crs and plot a background map. The crs of the original `latitude` and `longitude` have epsg number 4326. 

<a href="#ans7">Answer Exercise 7</a>

#### Exercise 8<a name="ex8"></a>
use the `column` and `legend` arguments of the `GeoDataFrame.plot` method to color the points based on the traffic intensity, defined by the 'Waarde' column. 

<a href="#ans8">Answer Exercise 8</a>


## Answers

#### <a href="#ex1">Answer exercise 1</a> <a name="ans1"></a>

This shapefile contains the boundaries of the boroughs of New York.

In [None]:
gdf_nybb = gpd.read_file(r'data\nybb.shp')
gdf_nybb.head()

#### <a href="#ex2">Answer exercise 2</a> <a name="ans2"></a>
Queens has the biggest surface area.

In [None]:
gdf_nybb.loc[gdf_nybb['Shape_Area'].idxmax()]

#### <a href="#ex3">Answer exercise 3</a> <a name="ans3"></a>

In [None]:
gdf_turbines['EIGENAAR'].unique()

#### <a href="#ex4">Answer exercise 4</a> <a name="ans4"></a>

In [None]:
df_verkeer = pd.read_csv(r'data\traffic_intensity_Utrecht.csv')
geometry = gpd.points_from_xy(df_verkeer['longitude'], df_verkeer['latitude'])
gdf_verkeer = gpd.GeoDataFrame(df_verkeer.copy(), geometry=geometry)
gdf_verkeer.head()

#### <a href="#ex5">Answer exercise 5</a> <a name="ans5"></a>

In [None]:
gdf_verkeer.plot()

#### <a href="#ex6">Answer exercise 6</a> <a name="ans6"></a>

In [None]:
gdf_verkeer = gdf_verkeer[gdf_verkeer.latitude!=gdf_verkeer.latitude.min()]
gdf_verkeer.plot()

#### <a href="#ex7">Answer exercise 7</a> <a name="ans7"></a>

In [None]:
gdf_verkeer = gpd.GeoDataFrame(df_verkeer.copy(), geometry=geometry, crs=4326)
gdf_verkeer = gdf_verkeer[gdf_verkeer.latitude!=gdf_verkeer.latitude.min()]
gdf_verkeer = gdf_verkeer.to_crs(epsg=3857)
ax = gdf_verkeer.plot(figsize=(10, 10), alpha=0.5, edgecolor='k')
ctx.add_basemap(ax)

#### <a href="#ex8">Answer exercise 8</a> <a name="ans8"></a>

In [None]:
ax = gdf_verkeer.plot('Waarde', legend=True, figsize=(10, 10), alpha=0.5, edgecolor='k')
ctx.add_basemap(ax)

#bonus plot the log value
import numpy as np
gdf_verkeer['logWaarde'] = np.log(gdf_verkeer['Waarde'])
ax = gdf_verkeer.plot('logWaarde', legend=True, figsize=(10, 10), alpha=0.5, edgecolor='k')
ctx.add_basemap(ax)

### Origin of the data
* traffic intensity data 'intensity_day.csv' from Utrecht obtained from https://data.overheid.nl/dataset/verkeer-tellingen-verkeerslichten-2015