# Aquaya Waterpoint - Settlement Analysis


Explore Aquaya "Study" waterpoints for three counties (considered our ground truth?),
corresponding labs for those counties, and settlement extents and PPs to see if we can
derive population numbers and/or settlement classifications that we should look at to
derive expansion numbers

In [1]:
import pandas as pd
import geopandas as gpd

In [2]:
from utilities import county_shape_style
from config import study_counties, study_labs, codpp_cols, codpp_county_map

In [3]:
study_counties

['Nakuru', 'Uasin Gishu', 'Kericho']

In [53]:
CRS_LATLON = "EPSG:4326"
CRS_KENYA = "EPSG:21037"

## Load Data

In [54]:
# COD Administrative Boundaries
# NB: To list layers in a shapefile: `gpd.list_layers(adm_file)`
adm_file = "./data/ken_adm_iebc_20191031_shp.zip"
adm_df = gpd.read_file(adm_file, layer="ken_admbnda_adm1_iebc_20191031").to_crs(CRS_KENYA)
adm_df = adm_df[adm_df["ADM1_EN"].isin(study_counties)].copy()
adm_df.shape

(3, 13)

In [56]:
# Aquaya Study waterpoints
wp_file = "./data/AF-Kenya Study Systems - GPS water systems.xlsx"
wp_df = pd.read_excel(wp_file, sheet_name="Systems - Cleaned")
wp_df = gpd.GeoDataFrame(wp_df, geometry=gpd.points_from_xy(wp_df["Lon"], wp_df["Lat"], crs=CRS_LATLON)).to_crs(CRS_KENYA)
assert all(c in study_counties for c in wp_df["County"].unique())
wp_df.shape

(35, 5)

In [60]:
# WQ Labs
labs_file = "./data/KENAS Accredited Laboratories.xlsx"
labs_df = pd.read_excel(labs_file, sheet_name="KENAS Water Quality Testing Lab")
labs_df = gpd.GeoDataFrame(labs_df, geometry=gpd.points_from_xy(labs_df["Longitude (Y)"], labs_df["Latitude (X)"], crs=CRS_LATLON)).to_crs(CRS_KENYA)
labs_df = labs_df[labs_df["Laboratory Name"].isin(study_labs)]
labs_df.shape

(3, 9)

In [63]:
# Settlement Areas (Start w/v3)
se_file = "./data/GRID3_Kenya_Settlement_Extents_Version_3.0/GRID3_KEN_settlement_extents_v3_0.gpkg"
se_df = gpd.read_file(se_file).to_crs(crs=CRS_KENYA)
se_df = se_df[se_df.apply(lambda r: r["geometry"].intersects(adm_df.geometry).sum() > 0, axis=1)].copy()
se_df.shape

(14692, 10)

In [64]:
# Settlement Areas split out by type
bua_df = se_df[se_df["type"]=="Built-up Area"].copy()
ssa_df = se_df[se_df["type"]=="Small Settlement Area"].copy()
ham_df = se_df[se_df["type"]=="Hamlet"].copy()

bua_df.shape, ssa_df.shape, ham_df.shape

((66, 10), (1305, 10), (13321, 10))

In [69]:
# COD Populated Places
codpp_file = "./data/KEN_Populated places_2002_DEPHA"
codpp_df = gpd.read_file(codpp_file)[codpp_cols].copy().to_crs(crs=CRS_KENYA)
codpp_df["DISTRICT"] = codpp_df["DISTRICT"].replace(codpp_county_map)
codpp_df = codpp_df[codpp_df["DISTRICT"].isin(study_counties)].copy()
assert len(set(study_counties) - set(codpp_df["DISTRICT"].unique())) == 0, "Study counties missing"
codpp_df.shape

(72, 9)

In [76]:
# HOTOSM Populated Places
hotosm_file = "./data/hotosm_ken_populated_places_points_shp.zip"
hotosm_df = gpd.read_file(hotosm_file).to_crs(crs=CRS_KENYA)
hotosm_df = hotosm_df[~hotosm_df["place"].isin(["isolated_dwelling"])]
hotosm_df = hotosm_df[hotosm_df.apply(lambda r: r["geometry"].intersects(adm_df.geometry).sum() > 0, axis=1)].copy()
hotosm_df.shape

(3478, 10)

## Quick check of SEs

In [13]:
se_df.head(2)

Unnamed: 0,country,iso3,building_count,building_area,type,probability,date,source,mgrs_code,geometry
37235,Kenya,KEN,29,1070.958497,Hamlet,0.999986,2024,CIESIN,37NAA9616_1,"MULTIPOLYGON (((196925.107 10016350.715, 19692..."
37236,Kenya,KEN,3,142.962997,Hamlet,0.912191,2024,CIESIN,37NAA9616_2,"MULTIPOLYGON (((196513.954 10016612.036, 19651..."


In [14]:
se_df["type"].value_counts()

type
Hamlet                   13321
Small Settlement Area     1305
Built-up Area               66
Name: count, dtype: int64

## Explore All w/Aquaya WPs highlighted

In [22]:
# Plot ADM boundaries
disp_cols = ["ADM1_EN", "ADM1_PCODE", "Shape_Area"]
disp_style = dict(color="red", weight=2, opacity=0.75, fill=True, fillOpacity=0.05)
m = adm_df.explore(style_kwds=disp_style, tooltip=False, popup=disp_cols, highlight=False)

In [24]:
# Add Settlement Areas
m = se_df.explore("type", cmap="tab20", legend=True, m=m)

In [None]:
# Add both Populated Place sources
m = codpp_df.explore(color="dodgerblue", m=m)
m = hotosm_df.explore(color="mediumorchid", m=m)

In [None]:
# Add Labs - large markers
m = labs_df.explore(color="green", marker_kwds=dict(radius=10), m=m)

In [None]:
# Add Aquaya WPS - bright blue markers with black boundaries
m = wp_df.explore(color="cyan", marker_kwds=dict(radius=5), m=m)

## Aquaya WP vs. Community Stats

In [64]:
# Waterpoints within settlement shapes
wp_df.apply(lambda r: r["geometry"].intersects(se_df.geometry).sum(), axis=1).sum()

np.int64(0)

In [66]:
# Waterpoints within 1k of settlement shapes
wp_df.apply(lambda r: (r["geometry"].distance(se_df.geometry) < 1000).any(), axis=1).sum()

np.int64(0)

In [70]:
wp_df.iloc[0].geometry.distance(labs_df.geometry)

0     0.132925
4     1.083452
15    1.210095
Name: geometry, dtype: float64

In [71]:
wp_df.iloc[0]

County                         Nakuru
Name                  Njoro 1 treated
Lon                          35.93872
Lat                          -0.32611
geometry    POINT (35.93872 -0.32611)
Name: 0, dtype: object

In [72]:
labs_df

Unnamed: 0,Laboratory Name,Location (County),Laboratory Type,Accreditation Expiry Date,Latitude (X),Longitude (Y),Contact,Assurance Fund Selection,geometry
0,Nakuru Water and Sanitation Services Company L...,Nakuru,Utility,2024-11-30,-0.285098,36.06516,+254 51 212269/214148,Round 1,POINT (36.065 -0.285)
4,Eldoret Water And Sanitation Company Limited,Uasin Gishu,Utility,2026-09-13,0.52916,35.273602,+254 724255538,Round 1,POINT (35.274 0.529)
15,Kisumu Water and Sanitation Company Limited (K...,Kisumu,Utility,NaT,-0.098026,34.750314,,Study Round 1,POINT (34.75 -0.098)


In [73]:
labs_df.crs

<Projected CRS: EPSG:32736>
Name: WGS 84 / UTM zone 36S
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 30°E and 36°E, southern hemisphere between 80°S and equator, onshore and offshore. Burundi. Eswatini (Swaziland). Kenya. Malawi. Mozambique. Rwanda. South Africa. Tanzania. Uganda. Zambia. Zimbabwe.
- bounds: (30.0, -80.0, 36.0, 0.0)
Coordinate Operation:
- name: UTM zone 36S
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [74]:
wp_df.crs

<Projected CRS: EPSG:32736>
Name: WGS 84 / UTM zone 36S
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 30°E and 36°E, southern hemisphere between 80°S and equator, onshore and offshore. Burundi. Eswatini (Swaziland). Kenya. Malawi. Mozambique. Rwanda. South Africa. Tanzania. Uganda. Zambia. Zimbabwe.
- bounds: (30.0, -80.0, 36.0, 0.0)
Coordinate Operation:
- name: UTM zone 36S
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich