In [None]:
%matplotlib inline
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

In [None]:
stops = gpd.read_file("data/metlink-stops.gpkg").to_crs(2193)[
    ["stop_code", "stop_name", "geometry"]]
routes = gpd.read_file("data/metlink-routes.gpkg").to_crs(2193)[
    ["route_id", "route_type", "route_short_name", "geometry"]]
sa2 = gpd.read_file("data/sa2-wellington.gpkg")

# Counting points in polygons
Here's a recipe for counting points in polygons, which is something you may want to do fairly often. It is based on the spatial join - groupby - apply approach mentioned in the notebook about binary operations.

First join points to the polygons. This will produce an output with as many duplicate rows for each polygon as it contains points, and NA values for polygons that contain no points. 

In [None]:
n_stops = sa2.sjoin(stops, how = "left")
n_stops

Next select a polygon identifier variable and one other variable and apply `groupby()` on the identifier, followed by the `count()` method. In this case `sa2_code` or `name` will work as the identifier variable.

In [None]:
n_stops = n_stops[["name", "stop_code"]] \
    .groupby("name", as_index = False) \
    .count() \
    .rename({"stop_code": "n_stops"})
n_stops

Finally, `merge` the result back into the polygon dataset.

In [None]:
sa2_n_stops = sa2.merge(n_stops)
sa2_n_stops

You could do all this in one go.

In [None]:
sa2.merge(
    sa2.sjoin(stops, how = "left") \
        .loc[:, ["name", "stop_code"]] \
        .groupby("name", as_index = False) \
        .count() \
        .rename(columns = {"stop_code": "n_stops"})
)

If you are planning on doing a lot of this, then it's probably worth writing a function that can be reused easily. Here's how that might look, complete with documentation.

In [None]:
def count_points_in_polygons(
    polys:gpd.GeoDataFrame, 
    pts:gpd.GeoDataFrame, 
    id_var:str) -> gpd.GeoDataFrame:
    """Counts points in polygons and appends new column to GeoDataFrame.

    Args:
        polys (gpd.GeoDataFrame): the polygons within which to count.
        pts (gpd.GeoDataFrame): the points to count.
        id_var (str): a variable that uniquely identifies the polygons.

    Returns:
        gpd.GeoDataFrame: polygon GeoDataFrame with added column 'n_points' 
            containing result.
    """
    pt_var = pts.columns[0]
    return polys.merge(
        polys.sjoin(pts, how = "left") \
            .loc[:, [id_var, pt_var]] \
            .groupby(id_var, as_index = False) \
            .count() \
            .rename(columns = {pt_var: "n_points"}))

count_points_in_polygons(sa2, stops, "name")

## More advanced ideas
It's worth noting that an amended version of this function could be used to count the number of cases of any binary predicate being satisfied between any two datasets.

The function below introduces key word function arguments and some other ideas to show how this might be done.

In [None]:
def count_relations(
    gdf1:gpd.GeoDataFrame, 
    gdf2:gpd.GeoDataFrame, 
    id_var_1:str,
    predicate:str = "intersects",
    **kwargs) -> gpd.GeoDataFrame:
    """Counts how many pairs meet specified predicate between two GeoDataFrames.

    Args:
        gdf1 (gpd.GeoDataFrame): first GeoDataFrame.
        gdf2 (gpd.GeoDataFrame): second GeoDataFrame.
        id_var (str): variable that uniquely identifies geometries in the 
            first GeoDataFrame.
        predicate (str): the binary predicate (see documentation for sjoin). 
            Defaults to 'intersects'.
        **kwargs: any additional arguments for the predicate function.

    Returns:
        gpd.GeoDataFrame: first GeoDataFrame with added column 'n_relations'
            containing count of numbers of geometries in gdf2 that match the
            predicate.
    """
    id_var_2 = gdf2.columns[0]
    return gdf1.merge(
        gdf1.sjoin(gdf2, how = "left", predicate = predicate, **kwargs) \
            .loc[:, [id_var_1, id_var_2]] \
            .groupby(id_var_1, as_index = False) \
            .count() \
            .rename(columns = {id_var_2: "n_relations"}))

count_relations(
    routes, stops, "route_short_name", predicate = "dwithin", distance = 10)