# Explore Aquaya waterpoint data

1. From "Study" dataset (potential for ground truth). Counties: Nakuru, Uasin Gishu, Kericho (Lab locations in Nakuru, Uasin Gishu, Kisumu)
> **AF-Kenya Study Systems - GPS water systems.xlsx** sheets "Systems - Cleaned" and "Labs - Cleaned"

2.  etc



In [1]:
import pandas as pd
import geopandas as gpd

In [2]:
from config import study_counties, selected_labs

## Load and Process data

### Study Systems & Labs

In [3]:
wp_file = "./data/AF-Kenya Study Systems - GPS water systems.xlsx"

In [24]:
# Load
wp_df = pd.read_excel(wp_file, sheet_name="Systems - Cleaned")
wp_labs_df = pd.read_excel(wp_file, sheet_name="Labs  - Cleaned")

# Checks
assert all(c in study_counties for c in wp_df["County"].unique())

wp_df.shape, wp_labs_df.shape

((35, 4), (3, 3))

In [45]:
# Re-process as GeoDF for plotability
wp_gdf = gpd.GeoDataFrame(wp_df, geometry=gpd.points_from_xy(wp_df["Lon"], wp_df["Lat"], crs="EPSG:4326"))
wp_labs_gdf = gpd.GeoDataFrame(wp_labs_df, geometry=gpd.points_from_xy(wp_labs_df["Lon"], wp_labs_df["Lat"], crs="EPSG:4326"))
wp_gdf.shape, wp_labs_gdf.shape

((35, 5), (3, 4))

### General Lab Locations

- Load Lab Locations - compare study labs to global labs sheet.
- Confirm / match which labs are being used - would rather use global than custom ones from study data

In [34]:
# NB: Latitude and Longitude are reversed in this version of the file. Lat should be ~0, Lon ~37
labs_file = "./data/KENAS Accredited Laboratories.xlsx"

In [35]:
labs_df = pd.read_excel(labs_file, sheet_name="KENAS Water Quality Testing Lab")

# Filter to selected labs and counties only
# labs_df[labs_df["Laboratory Name"].isin(selected_labs) & labs_df["Location (County)"].isin(primary_counties)].copy()

labs_df.shape

(15, 7)

In [36]:
labs_df.head(2)

Unnamed: 0,Laboratory Name,Location (County),Laboratory Type,Accreditation Expiry Date,Longitude (X),Latitude (Y),Contact
0,Nakuru Water and Sanitation Services Company L...,Nakuru,Utility,2024-11-30,-0.285098,36.06516,+254 51 212269/214148
1,Polucon Services Kenya Limited,Mombasa,Private,2026-09-20,-4.051329,39.685951,+254 722 229 944


In [41]:
# labs_gdf = gpd.GeoDataFrame(labs_df, geometry=gpd.points_from_xy(labs_df["Latitude (Y)"], labs_df["Longitude (X)"], crs="EPSG:21037"))
labs_gdf = gpd.GeoDataFrame(labs_df, geometry=gpd.points_from_xy(labs_df["Latitude (Y)"], labs_df["Longitude (X)"], crs="EPSG:4326"))
labs_gdf.shape

(15, 8)

### Display all, compare

In [49]:
# sel_labs_df.explore(m=m, color="limegreen", marker_kwds=dict(radius=6), tooltip=disp_cols, popup=disp_cols)
disp_cols = ['Laboratory Name', 'Location (County)', 'Laboratory Type', 'Accreditation Expiry Date', 'Contact']
m = labs_gdf.explore(color="limegreen", marker_kwds=dict(radius=6), tooltip=disp_cols, popup=disp_cols, tiles="CartoDB positron")
m = wp_labs_gdf.explore(m=m, color="coral", marker_kwds=dict(radius=6))
m = wp_gdf.explore(m=m)
m