## Compute clusters of poor service

This notebook computes clusters of poor service using local Moran's local indicators of spatial association

Before running this notebook, you will need to:

- record data
- construct `dataset.parquet` and `stations_geo.geojson` with [`Build dataset`](../Build%20dataset.ipynb)
- construct `stations_service_measures.geojson` with [`Build service measures`](Build%20service%20measures.ipynb)

In [1]:
import pandas as pd
import geopandas as gpd
import numpy as np

from libpysal.weights import DistanceBand
from esda import Moran_Local

from concave_hull import concave_hull
from shapely import Polygon

import seaborn as sns
sns.set_style('whitegrid')


In [22]:
QUADRANT_LABELS = {
    1:'HH',
    2:'LH',
    3:'LL',
    4:'HL'
}

In [2]:
!pip list

Package                       Version
----------------------------- ---------
altair                        5.0.1
asttokens                     2.2.1
attrs                         23.1.0
backcall                      0.2.0
backports.functools-lru-cache 1.6.4
beautifulsoup4                4.12.2
branca                        0.6.0
certifi                       2023.7.22
charset-normalizer            3.1.0
click                         8.1.3
click-default-group-wheel     1.2.2
click-plugins                 1.1.1
cligj                         0.7.2
colorama                      0.4.6
comm                          0.1.3
concave-hull                  0.0.6
contourpy                     1.0.7
cramjam                       2.6.2
cycler                        0.11.0
debugpy                       1.6.7
decorator                     5.1.1
esda                          2.5.1
executing                     1.2.0
fastparquet                   2023.4.0
Fiona                         1.9.4
folium      

read in data

In [6]:
stations_service_measures = (
    gpd.read_file('../stations_service_measures.geojson')
    .set_index('station_id')
)

In [64]:
stations_service_measures.head()

Unnamed: 0_level_0,pct_of_docks_w_disabled_bikes_median,pct_of_docks_w_disabled_bikes_mean,freq_am_or_evening_no_bikes_or_no_docks,zero_dock_daytime_duration_max,zero_dock_daytime_duration_mean,zero_dock_daytime_duration_median,zero_bike_daytime_duration_max,zero_bike_daytime_duration_mean,zero_bike_daytime_duration_median,zero_daytime_duration_max,zero_daytime_duration_mean,zero_daytime_duration_median,geometry
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
96e72113-3681-4d19-88bb-032e76720093,0.051282,0.051637,0.033908,0.899167,0.333657,0.250139,1.994722,0.575125,0.479583,1.994722,0.553819,0.406528,POINT (-73.99280 40.75276)
c2bb1874-bcb7-47c5-aa7d-05423b9087e8,0.04,0.026949,0.222346,0.0,0.0,0.0,6.400556,1.023561,0.678889,6.400556,1.023561,0.678889,POINT (-73.94763 40.67234)
5645b05e-85be-460d-8506-cacd20bff233,0.033333,0.038672,0.238466,8.7375,1.36237,0.898333,0.0,0.0,0.0,8.7375,1.36237,0.898333,POINT (-74.00666 40.65572)
d8778570-f7f2-458d-82f5-9ad39210a501,0.06,0.053934,0.008894,1.060556,0.478167,0.411528,0.0,0.0,0.0,1.060556,0.478167,0.411528,POINT (-73.99099 40.72769)
2a66ed0a-5e6d-4893-a1d1-876bb2bac4be,0.037037,0.076333,0.143413,0.0,0.0,0.0,6.129167,1.627265,1.362778,6.129167,1.627265,1.362778,POINT (-73.91204 40.84481)


In [65]:
stations_service_measures.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

project to local projection for accurate local distance measures

In [66]:
stations_service_measures = stations_service_measures.to_crs(2263)

remove NJ and islands

(create network of stations within 1/2 mile distance. remove stations not in _main_ component)

In [None]:
w_threshold = DistanceBand.from_dataframe(
    stations_service_measures,
    threshold=2640,
    binary=True
)

In [68]:
(
    stations_service_measures
    .assign(
        component = w_threshold.component_labels
    )
    .explore(
        tiles='cartodb positron',
        column='component',
        legend=True,
        categorical=True
    )
)

drop Governor's Island and NJ

In [69]:
stations_service_measures = (
    stations_service_measures
    .loc[w_threshold.component_labels == 0]
)

## compute poor service clusters

Create spatial weights matrix. 

Use inverse square distance decay

In [None]:
w_idw = DistanceBand.from_dataframe(
    stations_service_measures,
    threshold=2640,
    binary=False,
    alpha=-2
)

In [71]:
w_idw.set_transform('r')

In [72]:
measures = [
    'freq_am_or_evening_no_bikes_or_no_docks',
    'zero_daytime_duration_median',
    'pct_of_docks_w_disabled_bikes_median',
]

Check for missing values. 

LISA cannot be computed over missing values. Fill NaNs, or remove NaNs then recompute the weights matrix with only the remaining rows.

In [73]:
stations_service_measures[measures].isna().any()

freq_am_or_evening_no_bikes_or_no_docks    False
zero_daytime_duration_median               False
pct_of_docks_w_disabled_bikes_median       False
dtype: bool

For each focus measure, compute local Moran's local indicators of spatial association across all station locations. label stations as significant high-high or low-low clusters if they are in these quadrants and are significant at the alpha threshold

In [75]:
alpha = 0.01

In [76]:
local_moran_results = []

for measure in measures:
  
    measure_local_moran = Moran_Local(
        y=stations_service_measures[measure],
        w=w_idw,
        transformation='r',
        permutations=1000,
        n_jobs=-1,
        seed=1
    )

    measure_result = (
        stations_service_measures
        [[measure]]
        .assign(
            q = measure_local_moran.q,
            p_z_sim = measure_local_moran.p_z_sim,
            p_sim = measure_local_moran.p_sim,
            significant_cluster = lambda row: (
                row['q'].map(QUADRANT_LABELS)
                .where(
                    (row['p_z_sim'] < alpha) & 
                    (row['q'].isin([1,3]))
                )
            ),
        )
        .rename(columns={
            'q':f'{measure}_q',
            'p_z_sim':f'{measure}_p_z_sim',
            'p_sim':f'{measure}_p_sim',
            'significant_cluster':f'{measure}_significant_cluster',
        })
    )

    local_moran_results.append(measure_result)


combine the local indicators across all focus measures and assign an 'any_high_high' label if the station is in any significant high-high cluster for any measure

In [83]:
local_moran_by_measure = (
    stations_service_measures[['geometry']]
    .join(
        pd.concat(
            local_moran_results, 
            axis=1
        ),
        how='left'
    )
    .assign(
        any_high_high = lambda row: (
            row
            .filter(like='significant_cluster')
            .eq('HH')
            .any(axis=1)
        )
    )
)

peek at results

In [85]:
(
    local_moran_by_measure
    .explore(
        tiles='cartodb positron nolabels',
        column='any_high_high',
        cmap=['#4f84bd','#f03813'],
        marker_kwds=dict(
            radius=1
        )
    )
)

### filter to stations within clusters of 5 or more and draw boundaries around clusters

In [86]:
poor_service_stations = (
    local_moran_by_measure
    [
        local_moran_by_measure['any_high_high'] == True
    ]
)

Group poor service stations into subnetworks of all poor serivce stations within 1/4 mile or one another. drop groups with fewer than 5 stations.

In [87]:
w_split_at_1320 = DistanceBand.from_dataframe(
    df=poor_service_stations,
    threshold=1320,
    binary=True
)

poor_service_stations = (
    poor_service_stations
    .assign(
        component = w_split_at_1320.component_labels
    )
)

poor_service_stations__component_5_or_more_nodes = (
    poor_service_stations
    [
        poor_service_stations
        ['component']
        .isin(
            poor_service_stations
            ['component']
            .value_counts()
            .ge(5)
            .where(lambda a:a).dropna()
            .index
        )
    ]
)

 There are 53 disconnected components.
 There are 22 islands with ids: 475c44e1-31e3-4c60-b071-af2ca3f7618c, a45a712c-1e3d-4fee-bef8-adafafd80670, 33840944-aa92-4d91-9de8-d29e5cd0b2d9, 66dd43bd-0aca-11e7-82f6-3863bb44ef7c, 9b7e3b8b-97ef-4038-820c-7cd1c1a34fc7, 989fdf95-cfb9-475c-ba0c-3152fd3a16e0, eb8fcff2-6f58-4724-b564-ebc9a8d374d8, 66dde484-0aca-11e7-82f6-3863bb44ef7c, f15ccc6e-5a7c-46e7-b505-0bfc9cae3d83, be2bdee1-e3e9-45cc-94f0-2ba81a722ba3, 5dc8ddac-979e-4ae5-b879-3670cca7482d, 787fe4b2-6029-4f71-917e-892479d8d64e, 344b52b0-73a9-4fe3-9132-23d0502dc0ae, 72196024-6f2b-45cb-9764-609911c7dc0b, 66de6100-0aca-11e7-82f6-3863bb44ef7c, 5d049d4b-0736-4323-834a-1b2454bd6551, 5483bd97-fe0e-4966-937a-5dd4805004fe, 66dde079-0aca-11e7-82f6-3863bb44ef7c, 4cd5a87d-7618-4a15-892d-8cf29eec865c, 0e82ab9f-8c93-4302-b9df-342f8135a05c, 66db3e32-0aca-11e7-82f6-3863bb44ef7c, 57f144cb-375d-4eee-b458-e288116cb8f7.


Create concave hulls encompassing poor service stations to represent poor service area

In [88]:
poor_service_area_hulls = []

for component in poor_service_stations__component_5_or_more_nodes['component'].unique():

    component_geom = (
        poor_service_stations__component_5_or_more_nodes
        [
            poor_service_stations__component_5_or_more_nodes['component'] == component
        ]
        .geometry
    )

    component_xy = np.stack([
        component_geom.x.values,
        component_geom.y.values
    ]).T

    component_hull = concave_hull(
        component_xy,
        concavity=1.5
        )
    
    component_polygon = Polygon(component_hull)

    poor_service_area_hulls.append(Polygon(component_hull))

poor_service_areas = gpd.GeoDataFrame(
    geometry=gpd.GeoSeries(poor_service_area_hulls),
    crs=2263
)

poor_service_areas_buffer = gpd.GeoDataFrame(
    geometry=gpd.GeoSeries(poor_service_area_hulls).buffer(500),
    crs=2263
)

view result

In [92]:
m = (
    poor_service_areas_buffer
    .explore(
        tiles='cartodb positron nolabels',
        color='orange'
    )
)

(
    local_moran_by_measure
    .explore(
        m=m,
        tiles='cartodb positron nolabels',
        column='any_high_high',
        cmap=['#4f84bd','#f03813'],
        marker_kwds=dict(
            radius=1
        )
    )
)

m

### save out

In [93]:
poor_service_areas_buffer.to_file('../poor_service_areas_buffer.geojson')