# Tutorial 2 - scientific Python ecosystem: `pandas` and `GeoPandas`

In this tutorial we will learn using 

 1. `pandas` DataFrames
 2. `GeoPandas` GeoDataFrames

within PyGMT to create histograms and different maps.

## 0. General stuff

Import required packages

In [None]:
import geopandas as gpd
import numpy as np
import pandas as pd
import pygmt

## 1. `pandas`

### 1.1 Data - `pandas.DataFrame`

Use a data set with in provided together with `PyGMT` and load it into a `pandas.DataFrame`:

In [None]:
df_jp_eqs = pygmt.datasets.load_sample_data(name="japan_quakes")
df_jp_eqs.head()

### 1.2 Create a histogram

First we create a histogram for the moment magnitude distribution of the earthquakes.

In [None]:
# Choose the bin width of the bars; feel free to play around with this value
mag_min = df_jp_eqs["magnitude"].min()
mag_max = df_jp_eqs["magnitude"].max()
step_histo = 0.2

fig = pygmt.Figure()

fig.histogram(
    data=df_jp_eqs["magnitude"],
    projection="X10c",
    # Determine y range automatically
    region=[mag_min - step_histo, mag_max + step_histo * 2, 0, 0],
    # Define the frame, add labels to the x-axis and y-axis
    frame=["WSne", "x+lmagnitude", "y+lcounts"],
    # Generate evenly spaced bins
    series=step_histo,
    # Fill bars with color "orange"
    fill="orange",
    # Draw gray outlines with a width of 1 point
    pen="1p,gray",
    # Choose histogram type 0, i.e., counts [Default]
    histtype=0,
)

fig.show(dpi=150)

Use code example above as orientation, and create a similar histogram showing the hypocentral depth distribution of the earthquakes.

In [None]:
# Your code (:

### 1.3 Create a geographical map of the earthquakes

For plotting the earthquakes as size (magnitude) and color (depth) coded circles ontop of a map please follow the tutorial at
https://www.pygmt.org/v0.13.0/tutorials/basics/plot.html.

## 2. `GeoPandas`

Different datasets are available from: https://github.com/GenericMappingTools/pygmt/issues/2786#issuecomment-1787655589

### 2.1 Line geometry

Features which can be represented as a line geometry are for example rivers, roads or boundaries.

#### 2.1.1 Data - `geopandas.GeoDataFrame` with line geometry

First we download some data into in GeoPandas GeoDataFrame.

In [None]:
gpd_rivers_org = gpd.read_file(
    "https://www.eea.europa.eu/data-and-maps/data/wise-large-rivers-and-large-lakes/" + \
    "zipped-shapefile-with-wise-large-rivers-vector-line/zipped-shapefile-with-wise-large-rivers-vector-line/" + \
    "at_download/file/wise_large_rivers.zip"
)
gpd_rivers_org.head()

Convert to coordinate system / reference system

In [None]:
gpd_rivers_org.crs
gpd_rivers = gpd_rivers_org.to_crs('EPSG:4326')
gpd_rivers.head()

#### 2.1.2 Create a geographical map of the rivers

xxx

In [None]:
fig = pygmt.Figure()

fig.coast(
    projection="M10c", 
    region=[-10, 30, 35, 57],
    land="gray99",
    shorelines="1/0.1p,gray50",
    frame=True,
)

fig.plot(data=gpd_rivers, pen="0.5p,steelblue,solid")

fig.show(dpi=150)

#### 2.1.3 Plot subsets of the rivers differently

xxx

In [None]:
fig = pygmt.Figure()

fig.coast(
    projection="M10c", 
    region=[-10, 35, 35, 58],
    land="gray99",
    shorelines="1/0.1p,gray50",
    frame=True,
)

# Split the dataset in two subsets of shorter and longer rivers
# Feel free to play around with this value
len_limit = 700000  # in meters
gpd_rivers_short = gpd_rivers[gpd_rivers["Shape_Leng"] < len_limit]
gpd_rivers_long = gpd_rivers[gpd_rivers["Shape_Leng"] > len_limit]

fig.plot(data=gpd_rivers_short, pen="0.5p,orange", label=f"rivers shorter {len_limit} m")
fig.plot(data=gpd_rivers_long, pen="0.5p,darkred", label=f"rivers longer {len_limit} m")

fig.legend()

fig.show(dpi=150)

#### 2.1.4 Plot the rivers with color-coding for the river length

Use the gallery example https://www.pygmt.org/v0.13.0/gallery/lines/line_custom_cpt.html to plot the rivers with color-coding for the river length.

In [None]:
# Your code (:

### 2.2 Polygon geometry

Used in / ideas of this example
* plot with fill="+z" and aspatial parameter
* choropleth map: maybe higher-level method see PR https://github.com/GenericMappingTools/pygmt/pull/2798
* Data stored in geopandas GeoDataFrame
* built-in data of geopandas: deprecated see issue https://github.com/GenericMappingTools/pygmt/issues/2786

#### 2.2.1 Data - `geopandas.GeoDataFrame` with polygon geometry

In [None]:
gdf_airbnb = gpd.read_file("https://geodacenter.github.io/data-and-lab//data/airbnb.zip")
gdf_airbnb.head()

#### 2.2.2 Create a choropleth map

In [None]:
fig = pygmt.Figure()
fig.basemap(region=[-88, -87.4, 41.6, 42.05], projection="M10c", frame="rltb")

# Set up colormap
popul_min = np.min(gdf_airbnb["population"])
popul_max = np.max(gdf_airbnb["population"])
pygmt.makecpt(cmap="bilbao", series=[popul_min, popul_max, 10])
# Add colorbar
fig.colorbar(frame="x+lpopulation")

# Plot the polygons with color-coding for the population
fig.plot(
    data=gdf_airbnb, 
    pen="0.2p,gray10", 
    fill="+z", 
    cmap=True,
    aspatial="Z=population",
)

fig.show(dpi=150)

### 3. Additional comments

Some interesting aspects:

- Convert other objects to `pandas` or `GeoPandas` objects to make them usable in PyGMT
- Combination with `shapely` to create more complex geometries (i.e. `from shapely.geometry import Polygon`)
- Use sutiable colormaps (F. Crameri publication)
- Use similarly with DataFrames from xyz  

### 4. References



- xyz