# CME538 - Introduction to Data Science

## Tutorial 6 - Geospatial Analysis: An overview 
By Navid Kayhani, Marc Saleh
### Goals

### Tutorial Structure



### 1. Read in geospaital data


### 2. Coordinate Reference Systems in Python / GeoPandas


### 3. Spatial relationships and operations

    3.1 Using 'Within'
    
    3.2 Using 'Intersect'
    

### 4. Folium and interactive maps

    4.1 Using 'Chlorpeth'
    
    4.2 Using 'Marker'


***
    
This notebook is manily based on:


[[1]](https://github.com/jorisvandenbossche/geopandas-tutorial) Introduction to geospatial data analysis with GeoPandas and the PyData stack by @jorisvandenbossche

[[2]](https://www.kaggle.com/learn/geospatial-analysis) Geospatial Analysis by Kaggle

<a id='section0'></a>
## Setup Notebook
At the start of a notebook, we need to import the Python packages we plan to use.


In [None]:
import os
import json
import pandas as pd
import seaborn as sns
from datetime import datetime
import matplotlib.pylab as plt
import geopandas as gpd

# Configure Notebook
#for plots to be inline
%matplotlib inline 
#for auto_complete 
%config Completer.use_jedi = False 

plt.style.use('fivethirtyeight')
sns.set_context("notebook")

<a id='section1'></a>

# 1. Read in geospatial data

There are several geospatial GIS file formats, such as [shapefile (.shp)](https://en.wikipedia.org/wiki/Shapefile), [GeoJSON](https://en.wikipedia.org/wiki/GeoJSON), geopackage files (GPKG), PostGIS (PostgreSQL) database. 

Shapefile is the most common file type that you'll encounter, and all of these file types can be quickly loaded with the gpd.read_file() function. 

You have been already familiar with shapfiles, geometric object, GeoPandas, etc. from LEC14 of the course. We review some basics and see some examples here. 

### What's a GeoDataFrame?

A GeoDataFrame contains a tabular, geospatial dataset:

* It has a **'geometry' column** that contains the geometry information (or features in GeoJSON).
* The other columns are the **attributes** (or properties in GeoJSON) that describe each of the geometries

Such a `GeoDataFrame` is just like a pandas `DataFrame`, but with some additional functionality for working with geospatial data:

* A `.geometry` attribute that always returns the column with the geometry information (returning a GeoSeries). The column name itself does not necessarily need to be 'geometry', but it will always be accessible as the `.geometry` attribute.
* It has some extra methods for working with spatial data (area, distance, buffer, intersection, ...), which we will see in later notebooks.

***
**It's still a DataFrame**, so we have all the pandas functionality available to use on the geospatial dataset, and to do data manipulations with the attributes and geometry information together.

### Geometry (`shapely`) Objects: Points, Linestrings and Polygons

Spatial vector data can consist of different types, and the 3 fundamental types are:

1. `Point` data: represents a single point in space.

2. `Line` data ("`LineString`"): represents a sequence of points that form a line.

3. `Polygon` data: represents a filled area.


<div class="alert alert-info" style="font-size:100%">

**Summary:** <br>

* A `GeoDataFrame` allows to perform typical tabular data analysis together with spatial operations
* A `GeoDataFrame` (or *Feature Collection*) consists of:
    * **Geometries** or **features**: the spatial objects
    * **Attributes** or **properties**: columns with information about each spatial object
    
Single geometries are represented by `shapely` objects:

* If you access a single geometry of a GeoDataFrame, you get a shapely geometry object
* Those objects have similar functionality as geopandas objects (GeoDataFrame/GeoSeries). For example:
    * `single_shapely_object.distance(other_point)` -> distance between two points
    * `geodataframe.distance(other_point)` ->  distance for each point in the geodataframe to the other point

</div>

we start with the following datasets:



1. The administrative districts of Paris (https://opendata.paris.fr/explore/dataset/quartier_paris/): `paris_districts_utm.geojson`.


2. Information about the public bicycle sharing system in Paris ([vélib](https://opendata.paris.fr/explore/dataset/velib-emplacement-des-stations/export/?basemap=jawg.dark&location=11,48.8559,2.35192))

We will start with exploring the bicycle station dataset (available as a GeoPackage file)
    


#### Read the stations datasets into a GeoDataFrame called `stations`.
Check the type of the returned object (with `type(..)`)

In [None]:
# import stations geodataframe


#### Check the first rows of the dataframes. What kind of geometries dooes this datasets contain?

In [None]:
# print the first 5 rows of the geodataframe


In [None]:
# print columns types


In [None]:
# add availability column, which represents available bikes / bike_stands


#### Make a quick plot of the stations dataset.

In [None]:
# plot stations


#### Plot a histogram showing the distribution of the capacity in the stations.

In [None]:
# plot 'availability' histogram


<a id='section2'></a>

# 2. Coordinate Reference Systems in Python / GeoPandas

![Mercator](images/projections.png)

![Mercator](images/Mercator_area.gif)

In [None]:
# Import the districts dataset


In [None]:
# Check the CRS information


In [None]:
# Show the first 5 rows of the GeoDataFrame


In [None]:
# Plot the districts dataset #pip install descartes for polygons in geopandas


In [None]:
# plot districts with a specific focus on population as a legend


## Does it make sense?
### Is just population a good metric?

In [None]:
# Calculate the area of all districts


For converting to projected coordinates, we will use 'EPSG 2154' as the standard projected CRS for France

In [None]:
# Convert the districts to the projected CRS


In [None]:
# Plot the districts dataset again


In [None]:
# Calculate the area of all districts


#### What is the unit though?!

In [None]:
# print unit


In [None]:
# dividing by 10^6 for showing km²


In [None]:
# sort values in descending order


In [None]:
# Add a population density column


In [None]:
# Make a plot of the districts colored by the population density


In [None]:
# Plot stations geodataframe again


In [None]:
# Plot districts and stations together


In [None]:
# Convert station geodataframe to match districts


In [None]:
# Plot districts and stations together


# <a id='section3'></a>

# 3. Spatial relationships and operations

<div class="alert alert-info" style="font-size:120%">

**REFERENCE**:

An example of the different functions to check spatial relationships (*spatial predicate functions*):

* `equals`
* `contains`
* `crosses`
* `disjoint`
* `intersects`
* `overlaps`
* `touches`
* `within`
* `covers`


See https://shapely.readthedocs.io/en/stable/manual.html#predicates-and-relationships for an overview of those methods.

See https://en.wikipedia.org/wiki/DE-9IM for all details on the semantics of those operations.

</div>

### 3.1 Using 'Within'

##### Let's count the number of the number of stations in each district using 'within'

In [None]:
# convert to a common crs


In [None]:
# add new column that represents total stations in each district using 'within'


In [None]:
# sort values in descending order


### 3.2 Using 'Intersect'

In [None]:
# import district file and projected to 3857 crs


In [1]:
# created a line with http://geojson.io
s_seine = gpd.GeoDataFrame.from_features({"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"type":"LineString","coordinates":[[2.408924102783203,48.805619828930226],[2.4092674255371094,48.81703747481909],[2.3927879333496094,48.82325391133874],[2.360687255859375,48.84912860497674],[2.338714599609375,48.85827758964043],[2.318115234375,48.8641501307046],[2.298717498779297,48.863246707697],[2.2913360595703125,48.859519915404825],[2.2594070434570312,48.8311646245967],[2.2436141967773438,48.82325391133874],[2.236919403076172,48.82347994904826],[2.227306365966797,48.828339513221444],[2.2224998474121094,48.83862215329593],[2.2254180908203125,48.84856379804802],[2.2240447998046875,48.85409863123821],[2.230224609375,48.867989496547864],[2.260265350341797,48.89192242750887],[2.300262451171875,48.910203080780285]]}}]},
                                               crs={'init': 'epsg:4326'})
# convert to common crs
s_seine_utm = s_seine.to_crs(epsg=3857)
s_seine_utm

NameError: name 'gpd' is not defined

In [None]:
# check geometry


In [None]:
# check geometry type


In [None]:
# plot districts and seine together


In [None]:
# access the single geometry object


In [None]:
# add buffer of 100m around seine


In [None]:
# create geoDataframe of the intersection of the bugger and the districts


In [None]:
# Plot intersection


# 4. Folium and interactive maps

In [None]:
# !pip intstall folium

In [None]:
import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster

### 4.1 Using Chropleth

In [None]:
districts_4326

In [None]:
# create a GeoDataFrame where each distict is assigned a different row, 
# and the "geometry" column contains the geographical boundaries.


In [None]:
# Next, we'll create a DataFrame from districts_4326 containing the quanity we want to plot 'total_stations' 
# and the district name, which should match the index of plot_geography.


In [None]:
# Create a base map
map_6 = folium.Map(location=[48.8566, 2.3522], 
                 tiles='cartodbpositron',
                 zoom_start=10)

# Add a choropleth map to the base map
Choropleth(geo_data=plot_geography.__geo_interface__, 
           columns=['district_name', 'total_stations'],
           data=plot_data, 
           key_on='feature.id', 
           fill_color='YlOrRd', 
           legend_name='Bikeshare Total Station number'
          ).add_to(map_6)

# Display the map
map_6

### 4.2 Using a Marker

##### Creating a map 'm_1'  from a point

In [None]:
# Create a map of Toronto
m_1 = folium.Map(location=[43.63,-79.4], tiles='openstreetmap', zoom_start=12)

# Display the map
m_1

##### Plot the points from dataframe or geodataframe on the map 'm_1' based on coordinates

In [None]:
# create dataframe with cooridinates of the UofT and a nearby location
uoft_df = pd.DataFrame({'Lat': [43.6629,43.6524], 'Long': [-79.3957,-79.3957]})

In [None]:
uoft_df

In [None]:
# You can convert the dataframe to a GeoDataFrame of the UofT location and the nearby location


In [None]:
# Add points to the map 'm_1' generated earlier using Marker by iterating trhough dataframe or geodataframe


https://www.kaggle.com/alexisbcook/interactive-maps