# Overview

This notebook helps test out whether the RHR results in equal probability of selection of households in the Philippines. Key steps:

1. Randomly sample a small share of all rooftops and see how many google's snap to road methodology doesn't work on. Also, generate google map link for the points so that we can manually inspect them.
2. Randomly sample X barangays and generate maps showing a) OSM street network, b) borders, and c) rooftop centroids.



In [2]:
import geopandas as gpd
import pandas as pd
import osmnx as ox
from pathlib import Path
from shapely.geometry import Polygon
import matplotlib.pyplot as plt
import folium
import os
from tqdm import tqdm
tqdm.pandas()
from pin_drop_sampling2.utils import get_s2_cell_id, count_neighbors_in_radius, get_nearest_point_on_road, dist_in_meters

In [3]:
DB_DIR = Path.home() / 'IDinsight Dropbox' / 'Random Walk Testing' 
PSU_FILE = DB_DIR / '01_Raw data'/ '03_Census' / 'Philippines' / 'barangay_w_borders.parquet'
ROOFTOP_DIR = DB_DIR /'01_Raw data'/ '01_Rooftop'/'Philippines'
OUTPUT_DIR = DB_DIR / '03_Output' / '06_RHR Simulations'

In [4]:
rooftop_files = [str(file) for file in Path(ROOFTOP_DIR).rglob('*.parquet')]
sampled_gdf = gpd.GeoDataFrame()
for file in rooftop_files[1:2]:
    rooftop_temp = gpd.read_parquet(file)
    sample = rooftop_temp.sample(frac=0.0001, random_state=42)
    sampled_gdf = pd.concat([sampled_gdf, sample], ignore_index=True)

len(sampled_gdf)

1189

In [5]:
# get the nearest point on road using Google maps API
sampled_gdf['nearest_point_on_road'] = sampled_gdf.progress_apply(lambda x: get_nearest_point_on_road(x.geometry.centroid), axis=1)

  0%|          | 0/1189 [00:00<?, ?it/s]

100%|██████████| 1189/1189 [02:09<00:00,  9.20it/s]


In [6]:
# create new geodataframe for all rows where sampled_gdf['nearest_point_on_road'] is None
sampled_gdf_no_road = sampled_gdf[sampled_gdf['nearest_point_on_road'].isnull()]

# print the proportion of points that are not on the road
print(len(sampled_gdf_no_road)/len(sampled_gdf))

# generate google maps link
sampled_gdf_no_road['google_maps_directions_link'] = sampled_gdf_no_road.geometry.centroid.apply(lambda x: f"https://www.google.com/maps?q={x.y},{x.x}")

# save the google maps link to a csv
sampled_gdf_no_road[['google_maps_directions_link']].to_csv(OUTPUT_DIR / 'sample points not on road.csv')

0.1825063078216989



  sampled_gdf_no_road['google_maps_directions_link'] = sampled_gdf_no_road.geometry.centroid.apply(lambda x: f"https://www.google.com/maps?q={x.y},{x.x}")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


In [8]:
sampled_gdf['distance'] = sampled_gdf.apply(lambda x: dist_in_meters(x.geometry.centroid, x.nearest_point_on_road), axis=1)