# Tutorial 4: Introduction to GeoPandas

[GeoPandas](https://geopandas.org) is an open-source Python library that simplifies working with geospatial data by extending Pandas data structures. It seamlessly integrates geospatial operations with a pandas-like interface, allowing for the manipulation of geometric types such as points, lines, and polygons. GeoPandas combines the functionalities of Pandas and Shapely, enabling geospatial operations like spatial joins, buffering, intersections, and projections with ease.

The core data structures in GeoPandas are `GeoDataFrame` and `GeoSeries`. A `GeoDataFrame` extends the functionality of a Pandas DataFrame by adding a geometry column, allowing spatial data operations on geometric shapes. The `GeoSeries` handles geometric data (points, polygons, etc.).

A `GeoDataFrame` can have multiple geometry columns, but only one is considered the active geometry at any time. All spatial operations are applied to this active geometry, accessible via the `.geometry` attribute.

<div class="alert alert-block alert-info">
<b>Note:</b> This tutorial is heavily based upon the work of <a href="https://geog-312.gishub.org/index.html">others</a>
</div>

## Important before we start
<hr>
Make sure that you save this file before you continue, else you will lose everything. To do so, go to Bestand/File and click on Een kopie opslaan in Drive/Save a Copy on Drive!

Now, rename the file into TAA1_Tutorial4.ipynb. You can do so by clicking on the name in the top of this screen.

## Learning Objectives
<hr>

- Understand the basic data structures in GeoPandas: `GeoDataFrame` and `GeoSeries`.
- Create `GeoDataFrames` from tabular data and geometric shapes.
- Read and write geospatial data formats like Shapefile and GeoJSON.
- Perform common geospatial operations such as measuring areas, distances, and spatial relationships.
- Visualize geospatial data using Matplotlib and GeoPandas' built-in plotting functions.
- Work with different Coordinate Reference Systems (CRS) and project geospatial data.

<h2>Tutorial outline<span class="tocSkip"></span></h2>
<hr>
<div class="toc"><ul class="toc-item">
    <li><span><a href="#installing-and-importing-geopandas" data-toc-modified-id="1.-Installing-and-Importing-Geopandas-1">1. Installing and Importing Geopandas</a></span></li>
    <li><span><a href="#creating-geodataframes" data-toc-modified-id="2.-Creating-GeoDataFrames-2">2. Creating GeoDataFrames</a></span></li>
    <li><span><a href="#reading-and-writing-geospatial-data" data-toc-modified-id="3.-Reading-and-Writing-Geospatial-Data-3">3. Reading and Writing Geospatial Data</a></span></li>
    <li><span><a href="#simple-accessors-and-methods" data-toc-modified-id="4.-Simple-Accessors-and-Methods-4">4. Simple Accessors and Methods</a></span></li>
    <li><span><a href="#plotting-geospatial-data" data-toc-modified-id="5.-Plotting-Geospatial-Data-5">5. Plotting Geospatial Data</a></span></li>
    <li><span><a href="#geometry-manipulations" data-toc-modified-id="6.-Geometry-Manipulations-6">6. Geometry Manipulations</a></span></li>
    <li><span><a href="#spatial-queries-and-relations" data-toc-modified-id="7.-Spatial-Queries-and-Relations-7">7. Spatial Queries and Relations</a></span></li>
    <li><span><a href="#projections-and-coordinate-reference-systems" data-toc-modified-id="8.-Projections-and-Coordinate-Reference-Systems-8">8. Projections and Coordinate Reference Systems</a></span></li>
    <li><span><a href="#exercises" data-toc-modified-id="9.-Exercises-9">9. Exercises</a></span></li>
</ul></div>

## 1. Installing and Importing GeoPandas

Before we begin, make sure you have geopandas installed. You can install it using:

In [None]:
# %pip install geopandas

Once installed, import GeoPandas and other necessary libraries:

In [None]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

## 2. Creating GeoDataFrames

A GeoDataFrame is a tabular data structure that contains a `geometry` column, which holds the geometric shapes. You can create a GeoDataFrame from a list of geometries or from a pandas DataFrame.

In [None]:
# Creating a GeoDataFrame from scratch
data = {
    "City": ["Tokyo", "New York", "London", "Paris"],
    "Latitude": [35.6895, 40.7128, 51.5074, 48.8566],
    "Longitude": [139.6917, -74.0060, -0.1278, 2.3522],
}

df = pd.DataFrame(data)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))
gdf

## 3. Reading and Writing Geospatial Data

GeoPandas allows reading and writing a variety of geospatial formats, such as Shapefiles, GeoJSON, and more. We'll use a GeoJSON dataset of New York City borough boundaries.

### Reading a GeoJSON File

We'll load the New York boroughs dataset from a GeoJSON file hosted online.

In [None]:
url = "https://github.com/opengeos/datasets/releases/download/vector/nybb.geojson"
gdf = gpd.read_file(url)
gdf.head()

This `GeoDataFrame` contains several columns, including `BoroName`, which represents the names of the boroughs, and `geometry`, which stores the polygons for each borough.

### Writing to a GeoJSON File

GeoPandas also supports saving geospatial data back to disk. For example, we can save the GeoDataFrame as a new GeoJSON file:

In [None]:
output_file = "nyc_boroughs.geojson"
gdf.to_file(output_file, driver="GeoJSON")
print(f"GeoDataFrame has been written to {output_file}")

Similarly, you can write GeoDataFrames to other formats, such as Shapefiles, GeoPackage, and more. 

In [None]:
output_file = "nyc_boroughs.shp"
gdf.to_parquet(output_file)

In [None]:
output_file = "nyc_boroughs.gpkg"
gdf.to_file(output_file, driver="GPKG")

## 4. Simple Accessors and Methods

Now that we have the data, let's explore some simple GeoPandas methods to manipulate and analyze the geometric data.

### Measuring Area

We can calculate the area of each borough. GeoPandas automatically calculates the area of each polygon:

In [None]:
# Set BoroName as the index for easier reference
gdf = gdf.set_index("BoroName")

# Calculate the area
gdf["area"] = gdf.area
gdf

### Getting Polygon Boundaries and Centroids

To get the boundary (lines) and centroid (center point) of each polygon:

In [None]:
# Get the boundary of each polygon
gdf["boundary"] = gdf.boundary

# Get the centroid of each polygon
gdf["centroid"] = gdf.centroid

gdf[["boundary", "centroid"]]

### Measuring Distance

We can also measure the distance from each borough's centroid to a reference point, such as the centroid of Manhattan.

In [None]:
# Use Manhattan's centroid as the reference point
manhattan_centroid = gdf.loc["Manhattan", "centroid"]

# Calculate the distance from each centroid to Manhattan's centroid
gdf["distance_to_manhattan"] = gdf["centroid"].distance(manhattan_centroid)
gdf[["centroid", "distance_to_manhattan"]]

### Calculating Mean Distance

We can calculate the mean distance between the borough centroids and Manhattan:

In [None]:
mean_distance = gdf["distance_to_manhattan"].mean()
print(f"Mean distance to Manhattan: {mean_distance} units")

## 5. Plotting Geospatial Data

GeoPandas integrates with Matplotlib for easy plotting of geospatial data. Let's create some maps to visualize the data.

### Plotting the Area of Each Borough

We can color the boroughs based on their area and display a legend:

In [None]:
gdf.plot("area", legend=True, figsize=(10, 6))
plt.title("NYC Boroughs by Area")
plt.show()

### Plotting Centroids and Boundaries

We can also plot the centroids and boundaries:

In [None]:
# Plot the boundaries and centroids
ax = gdf["geometry"].plot(figsize=(10, 6), edgecolor="black")
gdf["centroid"].plot(ax=ax, color="red", markersize=50)
plt.title("NYC Borough Boundaries and Centroids")
plt.show()

You can also explore your data interactively using `GeoDataFrame.explore()`, which behaves in the same way `plot()` does but returns an interactive map instead.

In [None]:
gdf.explore("area", legend=False)

## 6. Geometry Manipulations

GeoPandas provides several methods for manipulating geometries, such as buffering (creating a buffer zone around geometries) and computing convex hulls (the smallest convex shape enclosing the geometries).

### Buffering Geometries

We can create a buffer zone around each borough:

In [None]:
# Buffer the boroughs by 10000 feet
gdf["buffered"] = gdf.buffer(10000)

# Plot the buffered geometries
gdf["buffered"].plot(alpha=0.5, edgecolor="black")
plt.title("Buffered NYC Boroughs (10,000 feet)")
plt.show()

### Convex Hulls

The convex hull is the smallest convex shape that can enclose a geometry. Let's calculate the convex hull for each borough:

In [None]:
# Calculate convex hull
gdf["convex_hull"] = gdf.convex_hull

# Plot the convex hulls
gdf["convex_hull"].plot(alpha=0.5, color="lightblue", edgecolor="black")
plt.title("Convex Hull of NYC Boroughs")
plt.show()

## 7. Spatial Queries and Relations

We can also perform spatial queries to examine relationships between geometries. For instance, we can check which boroughs are within a certain distance of Manhattan.

### Checking for Intersections

We can find which boroughs' buffered areas intersect with the original geometry of Manhattan:

In [None]:
# Get the geometry of Manhattan
manhattan_geom = gdf.loc["Manhattan", "geometry"]

# Check which buffered boroughs intersect with Manhattan's geometry
gdf["intersects_manhattan"] = gdf["buffered"].intersects(manhattan_geom)
gdf[["intersects_manhattan"]]

### Checking for Containment

Similarly, we can check if the centroids are contained within the borough boundaries:

In [None]:
# Check if centroids are within the original borough geometries
gdf["centroid_within_borough"] = gdf["centroid"].within(gdf["geometry"])
gdf[["centroid_within_borough"]]

## 8. Projections and Coordinate Reference Systems

GeoPandas makes it easy to manage projections. Each GeoSeries and GeoDataFrame has a crs attribute that defines its CRS.

### Checking the CRS

Let's check the CRS of the boroughs dataset:

In [None]:
print(gdf.crs)

The CRS for this dataset is [`EPSG:2263`](https://epsg.io/2263) (NAD83 / New York State Plane). We can reproject the geometries to WGS84 ([`EPSG:4326`](https://epsg.io/4326)), which uses latitude and longitude coordinates.

[EPSG](https://epsg.io) stands for European Petroleum Survey Group, which was a scientific organization that standardized geodetic and coordinate reference systems. EPSG codes are unique identifiers that represent coordinate systems and other geodetic properties. 

### Reprojecting to WGS84

In [None]:
# Reproject the GeoDataFrame to WGS84 (EPSG:4326)
gdf_4326 = gdf.to_crs(epsg=4326)

# Plot the reprojected geometries
gdf_4326.plot(figsize=(10, 6), edgecolor="black")
plt.title("NYC Boroughs in WGS84 (EPSG:4326)")
plt.show()

Notice how the coordinates have changed from feet to degrees.

## 9. Exercises

#### Question 1

Load the natural earth file of the world, and plot the population per country.

* The url to the natural earth file: [https://github.com/nvkelso/natural-earth-vector/raw/master/10m_cultural/ne_10m_admin_0_countries.shp](https://github.com/nvkelso/natural-earth-vector/raw/master/10m_cultural/ne_10m_admin_0_countries.shp)
* The column we want to show is called `POP_EST`
* Think about the colormap we want to use: [https://matplotlib.org/stable/users/explain/colors/colormaps.html](https://matplotlib.org/stable/users/explain/colors/colormaps.html)
* Think about the distribution of the data in the `schemes` option. You can choose, for example, "natural_breaks", "quantiles" or "equal_interval".

**Create a choropleth map showing world population with appropriate colormap and classification scheme.**

In [None]:
world = gpd.read_file("")

In [None]:
# Create figure with specific size
fig, ax = plt.subplots(figsize=(15, 10))

world.plot(column='', legend=True, scheme='', k=5, cmap = '',
           ax=ax, legend_kwds={'loc': 'lower left'})

ax.set_title("Population per country")

plt.show()

#### Question 2

Perform a spatial join between the geodataframe we created at the start of this tutorial (`gdf`) and the global data we just loaded from the internet (`world`).

* Set the CRS for the `gdf` dataframe, which was not set yet. The CRS is 4326.
* Check the documentation for the function we are going to use: [https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.sjoin.html](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.sjoin.html)
* Think about the order of your spatial join. Do you merge the cities with the world data, or vice versa. If you try both, you can see there is a difference.

**What is the difference between the order of your spatial join?**

In [None]:
gdf = gdf.set_crs()

In [None]:
gdf.sjoin()

In [None]:
world.sjoin()