# Vectors with fiona, shapely, and geopandas

## Fiona
- https://fiona.readthedocs.io/en/latest/manual.html
- a Python wrapper for vector data access functions from the GDAL/OGR library
- simple wrapper for minimalists
- reads data records from files as GeoJSON-like mappings and writes the same kind of mappings as records back to files
- fiona trades memory and speed for simplicity and reliability

## GDAL/OGR Python bindings
- https://gdal.org/api/python/osgeo.html
- probably the most performant option but the least Pythonic

## IMPORTANT
Geospatial Python libraries are commonly **used together** rather than by themselves.

In [None]:
import fiona
from shapely.geometry import shape

with fiona.open("data/phl.gpkg", layer_name="ncr_municities_pop") as ncr_municities:
    municity = ncr_municities[2]
    print(f'This is {municity["properties"]["municity"]}')
    geom = shape(municity["geometry"])
    print(geom.area)

geom

## Challenge 09:
1. Write a script that iterates over the municities and prints the:
   - name of the municity
   - population in 2020
   - change in population from 2015
  
Note the following properties/attributes:
- name: municity
- population in 2020: pop2020
- population in 2014: pop2015

### Extra challenge
- Can you also compute for the population density in persons/100sqm?

## GeoPandas
- https://geopandas.org/en/stable/getting_started/introduction.html
- extends the popular data science library [pandas](https://pandas.pydata.org/) by adding support for geospatial data

## GeoDataFrames

The core data structure in GeoPandas are: 
- the `geopandas.GeoDataFrame`, a subclass of `pandas.DataFrame`, that can store geometry columns and perform spatial operations, and
- the `geopandas.GeoSeries`, a subclass of `pandas.Series`, handles the geometries.

Therefore, your GeoDataFrame is a combination of pandas.Series, with traditional data (numerical, boolean, text etc.), and geopandas.GeoSeries, with geometries (points, polygons etc.). 

You can have as many columns with geometries as you wish; there’s no limit typical for desktop GIS software.

### Loading data

In [None]:
import geopandas as gpd

gdf_municities = gpd.read_file("data/phl.gpkg", layer='ncr_municities_pop')

gdf_municities


In [None]:
df_mcdo = gpd.read_file("data/NCR_McDonalds.csv")

df_mcdo

In [None]:
import pandas as pd
df = pd.read_csv("data/NCR_McDonalds.csv")
gdf_mcdo = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))

gdf_mcdo

### Making maps

In [None]:
gdf_municities.plot("pop2020", legend="True")

In [None]:
gdf_municities.explore("pop2020", legend="True")

Learn more about explore: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html

### Attribute table analysis and computations

In [None]:
# filtering
import geopandas as gpd

gdf_municities = gpd.read_file("data/phl.gpkg", layer='ncr_municities_pop')

gdf_municities[gdf_municities.pop2020 > 500000]

In [None]:
dist2 = gdf_municities[gdf_municities["province"] == "Second District, NCR"]
dist2

In [None]:
increase = gdf_municities[(gdf_municities.pop2020 - gdf_municities.pop2015) > 50000]
increase

### CRS handling

GeoPandas is able to handle CRS.

Let's try to use it to convert our NCR municities data to PRS 92 Zone 3.

In [None]:
import geopandas as gpd

gdf_municities = gpd.read_file("data/phl.gpkg", layer='ncr_municities_pop')

In [None]:
gdf_municities.crs

In [None]:
gdf_municities_3123 = gdf_municities.to_crs(3123)

In [None]:
gdf_municities_3123

### Area computation

In [None]:
gdf_municities_3123.area # in sqm

In [None]:
gdf_municities_3123.area/100 # in 100 sqm

## Challenge 10:
1. Improve your script in Challenge 9 to include the computation of population density per 100sqm

### Attribute joins

In [None]:
ncr_pop1990_2010 = pd.read_csv("data/ncr_pop1990_2010.csv")
ncr_pop1990_2010

In [None]:
gdf_municities_pop1990_2020 = gdf_municities_3123.merge(ncr_pop1990_2010, on="psgc_municity")

**IMPORTANT: Column or index level names to join on must be found in both DataFrames (i.e. same name).**

**IMPORTANT: If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.**

In [None]:
gdf_municities_pop1990_2020

1. Notice that we have two municity fields in the joined output.
2. Let's remove the extra field and rename the remaining one

In [None]:
gdf_municities_pop1990_2020 = gdf_municities_3123.merge(ncr_pop1990_2010, on="psgc_municity")
gdf_municities_pop1990_2020 = gdf_municities_pop1990_2020.drop("municity_y", axis=1)
gdf_municities_pop1990_2020

You can also use `del`

In [None]:
gdf_municities_pop1990_2020 = gdf_municities_3123.merge(ncr_pop1990_2010, on="psgc_municity")
del gdf_municities_pop1990_2020["municity_y"]
gdf_municities_pop1990_2020

In [None]:
gdf_municities_pop1990_2020.rename(columns={'municity_x':'municity'}, inplace=True)

In [None]:
gdf_municities_pop1990_2020

## Challenge 11:
1. Compute for a simple annual average rate of change of the population from 1990 to 2020 for each municity.
2. Map the results.

### Spatial Joins
Pandas has support for spatial joins
https://geopandas.org/en/stable/gallery/spatial_joins.html

In [None]:
gdf_ncr_floods = gpd.read_file("data/ncr.gpkg", layer="manila_flood_hazard_lipad")
gdf_ncr_hospitals = gpd.read_file("data/ncr.gpkg", layer="manila_hospitals_osm")

In [None]:
gdf_ncr_hospitals

In [None]:
gdf_ncr_floods.explore("hazard", legend="True")

In [None]:
gdf_ncr_hospitals.explore()

In [None]:
hospitals_join = gdf_ncr_hospitals.sjoin(gdf_ncr_floods, how="left")

In [None]:
hospitals_join

In [None]:
hospitals_join.explore("hazard", legend="True")

## Challenge 12:
1. Go over the GeoPandas explore documentation (https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html) and improve/update the style of your maps.