### Workbook for exploring catchment survey data and replicating methods from 2021 paper
Week of June 9, 2025
<br>
Author: Adele Birkenes

Objectives of 2021 study (pg. 5):
- Understand the geographic area served by a completed bridge and validate the current needs assessment approach by comparing the geographic area served to the estimate of individuals directly served determined as part of the needs assessment
- Understand the specific ways in which completed trailbridges serve as connections to key destinations

The GIS methods in the paper are as follows:
<br>
5.1 Calculate summary statistics on number of individuals surveyed, demographics (age range and gender), and modes of transportation

5.2 Calculate total catchment area of each site in sq km, and count number of unique villages served by the site. Compare number of unique villages to "qualified home villages" from the needs assessment.
- Calculate the proportion of respondents reporting a given home village or destination
- Calculate mean Euclidean distance from bridge to centroid of farthest home village with one or more surveys
- Determine which of the home villages from the catchment surveys are considered "qualified" based on whether they meet the criteria for individuals directly served, based on reported purpose of travel

Sample outputs:

| Bridge Site | Catchment area, all home and destination villages | Catchment area (sq km) | Catchment area, qualified home villages | Qualified home villages (needs assessment) |
|-------------|---------------------------------------------------|------------------------|------------------------------------------|---------------------------------------------|
| Gahororo    | 27                                                | 44.6                   | 18                                       | 4                                           |

<br>
<img src="graphics/Figure4.png" alt="Figure 4" width="400"/>
<br>
<img src="graphics/Figure5.png" alt="Figure 5" width="400"/>

5.3 Calculate summary statistics on purpose of travel, disaggregated by gender

5.4 Calculate travel time for a one-way trip between home village and destination, disaggregated by purpose of travel and gender
- Exclude multi-day trips

In [1]:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, LineString, Polygon
import os

Task 1: Read in Rwanda bridge sites (df), catchment survey data (df), and village boundaries (gdf)

In [10]:
synced_catchment_path = "../../synced-data/catchment-analysis"
synced_population_path = "../../synced-data/population-exploration"

# Rwanda bridge sites
Rwanda_bridges_fp = os.path.join(synced_catchment_path, "Rwanda_catchment_bridge_sites.csv")
Rwanda_bridges = pd.read_csv(Rwanda_bridges_fp, encoding='ISO-8859-1') # Note: This encoding accommodates special characters)

# Rwanda catchment survey data
Rwanda_catchment_surveys_fp = os.path.join(synced_catchment_path, "Rwanda_catchment_all_surveys.csv")
Rwanda_catchment_surveys = pd.read_csv(Rwanda_catchment_surveys_fp, encoding='ISO-8859-1')

# Rwanda village boundaries
Rwanda_village_boundaries_fp = os.path.join(synced_population_path, "Rwanda Village Boundaries/Village.shp")
Rwanda_village_boundaries = gpd.read_file(Rwanda_village_boundaries_fp, encoding='ISO-8859-1')

Task 2: Convert bridge sites dataframe to geodataframe that has custom Rwanda TM CRS copied from village boundaries geodataframe

In [11]:
# Convert bridge sites dataframe to geodataframe that has custom Rwanda TM CRS copied from village boundaries geodataframe
def map_bridges(bridges, bridges_lat, bridges_lon, village_boundaries):

    # Check CRS of village boundaries gdf
    print(f'The CRS of the village boundaries gdf is: {village_boundaries.crs}')

    # Create lat/lon variables
    lon = bridges[bridges_lon]
    lat = bridges[bridges_lat]

    # Create gdf of bridges data by converting lat/lon values to list of Shapely Point objects
    bridge_points = gpd.GeoDataFrame(bridges, geometry=gpd.points_from_xy(x=lon, y=lat), crs='EPSG:4326')

    # Set CRS of bridges gdf to CRS of village boundaries gdf
    bridge_points.to_crs(village_boundaries.crs, inplace=True)

    # Check that reprojection was successful
    print(f'The CRS of the bridges gdf is: {bridge_points.crs}')
    
    return bridge_points

bridge_points = map_bridges(bridges = Rwanda_bridges,
                            bridges_lat = "Lat",
                            bridges_lon = "Long",
                            village_boundaries = Rwanda_village_boundaries)

The CRS of the village boundaries gdf is: PROJCS["TM_Rwanda",GEOGCS["ITRF2005",DATUM["International_Terrestrial_Reference_Frame_2005",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6896"]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",30],PARAMETER["scale_factor",0.9999],PARAMETER["false_easting",500000],PARAMETER["false_northing",5000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
The CRS of the bridges gdf is: PROJCS["TM_Rwanda",GEOGCS["ITRF2005",DATUM["International_Terrestrial_Reference_Frame_2005",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6896"]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",30],PARAMETER["scale_factor",0.9999],PARAMETER["false_

Task 3: For each bridge site, calculate the number of catchment villages (home or destination). Calculate the proportion of respondents reporting a given home/destination village.

In [12]:
# For each bridge site, count the number of unique home and destination admin codes
def count_catchment_villages_by_site(catchment_surveys):
    # Group by 'Bridge Site' and count unique home and destination admin codes
    result = catchment_surveys.groupby('Bridge Site').apply(
        lambda group: pd.Series({
            'unique_home_and_destination_admin_codes': pd.concat([
                group['Home Village - Admin Code'].dropna(),
                group['Destination - Admin Code'].dropna()
            ]).nunique()
        })
    ).reset_index()
    
    return result

# Apply the function
catchment_village_counts = count_catchment_villages_by_site(Rwanda_catchment_surveys)
print(catchment_village_counts)

         Bridge Site  unique_home_and_destination_admin_codes
0           Gahororo                                       32
1             Gasasa                                       56
2          Kanyarira                                       72
3            Muhembe                                       55
4           Muregeya                                       55
5        Mutiwingoma                                       66
6         Nyarusange                                       91
7            Rugeshi                                       71
8           Ruharazi                                       18
9          Rwamamara                                       42
10          Rwimvubu                                       57
11  Uwumugeti-Kigusa                                       43


  result = catchment_surveys.groupby('Bridge Site').apply(


In [14]:
# Create a dataframe that counts the number of respondents who reported each home village for each bridge site
def count_respondents_by_home_village(catchment_surveys):
    # Group by 'Bridge Site' and 'Home Village - Admin Code', then count unique respondents
    result = catchment_surveys.groupby(['Bridge Site', 'Home Village - Admin Code']).size().reset_index(name='respondent_count')
    
    return result

# Apply the function
respondent_counts_home_village = count_respondents_by_home_village(Rwanda_catchment_surveys)
print(respondent_counts_home_village)

          Bridge Site Home Village - Admin Code  respondent_count
0            Gahororo                  23140302                 2
1            Gahororo                  23140303                11
2            Gahororo                  23140501                 4
3            Gahororo                  24010101                 1
4            Gahororo                  24010202                 2
..                ...                       ...               ...
359  Uwumugeti-Kigusa                    251605                92
360  Uwumugeti-Kigusa                  25160501                76
361  Uwumugeti-Kigusa                  25160504                12
362  Uwumugeti-Kigusa                  25160509                11
363  Uwumugeti-Kigusa                  25160510                52

[364 rows x 3 columns]


In [16]:
# Create a dataframe that counts the number of respondents who reported each destination village for each bridge site
def count_respondents_by_destination_village(catchment_surveys):
    # Group by 'Bridge Site' and 'Destination Village - Admin Code', then count unique respondents
    result = catchment_surveys.groupby(['Bridge Site', 'Destination - Admin Code']).size().reset_index(name='respondent_count')
    
    return result

# Apply the function
respondent_counts_destination_village = count_respondents_by_destination_village(Rwanda_catchment_surveys)
print(respondent_counts_destination_village)

          Bridge Site  Destination - Admin Code  respondent_count
0            Gahororo                24010202.0                 6
1            Gahororo                24030201.0                 2
2            Gahororo                24030306.0               874
3            Gahororo                24030307.0                 8
4            Gahororo                24030501.0                49
..                ...                       ...               ...
289  Uwumugeti-Kigusa                25160503.0                 1
290  Uwumugeti-Kigusa                25160504.0                17
291  Uwumugeti-Kigusa                25160506.0                 1
292  Uwumugeti-Kigusa                25160509.0                16
293  Uwumugeti-Kigusa                25160510.0                35

[294 rows x 3 columns]
