# Overview

THIS IS AN ALTERNATE VERSION OF THE CLEANING BARANGAY DATA NOTEBOOK WHICH USES A DIFFERENT BARANGAY FILE FROM GIO.


This notebook creates a barangay dataset with barangay borders (in cases where we have data on the barangay boundary data) and PCWHS. 

I use barangay data (with PCWHS designation) provided by Gio.

I use barangay data from [here](https://github.com/altcoder/philippines-psgc-shapefiles) since it appears to be higher quality than the dataset on [HDX](https://data.humdata.org/dataset/cod-ab-phl/resource/12457689-6a86-4474-8032-5ca9464d38a8)



In [1]:
import geopandas as gpd
from pathlib import Path
import pandas as pd
from datetime import datetime
from tqdm import tqdm
tqdm.pandas()
from pin_drop_sampling2.utils import get_s2_cell_id

# Import and clean barangay borders

In [3]:
DB_DIR = Path.home() / 'IDinsight Dropbox' / 'Random Walk Testing' 
ROOFTOP_DIR = DB_DIR /'01_Raw data'/ '01_Rooftop'/'Philippines'
OUTPUT_DIR = DB_DIR / '03_Output' / '05_HPLS qual'

timestamp = datetime.now().strftime("%Y%m%d_%H")

In [17]:
# import new barangay census data, filter for barangay level
barangay_census_new = pd.read_excel(DB_DIR / '01_Raw data'/'03_Census'/'Philippines'/'PSGC-4Q-2023-Publication-Datafile.xlsx', sheet_name='PSGC')
barangay_census_new = barangay_census_new[barangay_census_new['Geographic Level'] == 'Bgy']
barangay_census_new.head()


Unnamed: 0,10-digit PSGC,Name,Correspondence Code,Geographic Level,Old names,City Class,Income\nClassification,Urban / Rural\n(based on 2020 CPH),2015 Population,Unnamed: 9,2020 Population,Unnamed: 11,Status
3,102801001.0,Adams,12801001.0,Bgy,,,,R,1792,,2189,,Pob.
5,102802001.0,Bani,12802001.0,Bgy,,,,R,853,,1079,,
6,102802002.0,Buyon,12802002.0,Bgy,,,,R,1596,,1669,,
7,102802003.0,Cabaruan,12802003.0,Bgy,,,,R,1413,,1418,,
8,102802004.0,Cabulalaan,12802004.0,Bgy,,,,R,733,,733,,


In [20]:
# import barangay census data, rename the PSGC column, and convert to numeric
barangay_census = pd.read_stata(Path.home() / 'IDinsight Dropbox' / 'DOH HPLS Phase 2 CB Qual - ETF'/'1 Capacity Building'/'1 Sample Size Calculations'/'psgc_barangays.dta')
barangay_census.rename(columns={'digitPSGC':'PSGC'}, inplace=True)
barangay_census['PSGC'] = pd.to_numeric(barangay_census['PSGC'], errors='coerce')
barangay_census.head()

Unnamed: 0,PSGC,brgy_name,GeographicLevel,brgy_urbanrural,brgy_pop,K,L,Status,reg_code,prov_code,...,prov_pop,citymun_name,city_type,city_income_class,citymun_pop,pcwhs_name,pcwhs_code,pcwhs_pop,brgy_prob,urbanrural_pop
0,1400108006,Liguis,Bgy,R,973.0,1029,,,1400000000,1400100000,...,241160,La Paz,,5th,15437,Abra,1400100000.0,241160.0,0.004035,225983.0
1,1400127004,Lap-lapog,Bgy,R,765.0,916,,,1400000000,1400100000,...,241160,Villaviciosa,,5th,5392,Abra,1400100000.0,241160.0,0.003172,225983.0
2,1400111002,Ba-i,Bgy,R,839.0,731,,,1400000000,1400100000,...,241160,Lagayan,,5th,4499,Abra,1400100000.0,241160.0,0.003479,225983.0
3,1400112008,Quillat,Bgy,R,783.0,880,,,1400000000,1400100000,...,241160,Langiden,,5th,3198,Abra,1400100000.0,241160.0,0.003247,225983.0
4,1400105001,Ableg,Bgy,R,213.0,226,,,1400000000,1400100000,...,241160,Daguioman,,5th,2088,Abra,1400100000.0,241160.0,0.000883,225983.0


In [None]:

# import altcoder barangay borders data
barangay_borders_altcoder = gpd.read_file(DB_DIR / '01_Raw data' / '02_Admin boundary data' / 'Philippines' / 'PH_Adm4_BgySubMuns.shp'/'PH_Adm4_BgySubMuns.shp.shp')
barangay_borders_altcoder.rename(columns={'adm4_psgc':'PSGC'}, inplace=True)
barangay_borders_altcoder.to_crs(epsg=4326, inplace=True)
barangay_borders_altcoder = barangay_borders_altcoder[['PSGC', 'geometry']]

# import hdx barangay borders data and clean it up. I DON'T USE THIS BUT AM KEEPING IT HERE FOR REFERENCE
barangay_borders_hdx = gpd.read_file(DB_DIR / '01_Raw data' / '02_Admin boundary data' / 'Philippines' / 'phl_adm_psa_namria_20231106_shp'/'phl_admbnda_adm4_psa_namria_20231106.shp')
barangay_borders_hdx['PSGC'] = pd.to_numeric(barangay_borders_hdx['ADM4_PCODE'].str[2:], errors='coerce')
barangay_borders_hdx = barangay_borders_hdx[['PSGC', 'geometry']]
# merge the two datasets
barangays = barangay_census.merge(barangay_borders_altcoder, on="PSGC", how ='left')

# print the length of barangays, barangay_census, and barangay_borders
print(f"Merged: {len(barangays)}, Borders: {len(barangay_borders_altcoder)}, Census: {len(barangay_census)}, Rows from census without borders: {sum(barangays.geometry.isna())}")

# create barangays_w_borders by removing rows with no geometry
barangays_w_borders = barangays[~barangays.geometry.isna()]

# Deal with Negros island barangays
Negros Island region split into two regions in the time between when the PSGC was created (which was a couple of years ago I think) and the map data was created (which is more recent). Thus, the PSGC codes for these barangays changed from 180xxxx to 60xxx or 70xxxx. The code below fixes this.

In [5]:
# create dataset of census data without borders and drop the geometry column
census_no_map = barangays[barangays.geometry.isna()]
census_no_map.drop(columns=['geometry'], inplace=True)

# create dataset of borders without census data
map_no_census = barangay_borders_altcoder[~barangay_borders_altcoder.PSGC.isin(barangays.PSGC)]

# for any PSGC codes that are in the 60s or 70s, replace 60 or 70 with 180
map_no_census['PSGC'] = map_no_census['PSGC'].astype(str)
map_no_census['PSGC'] = map_no_census['PSGC'].apply(lambda x: '180' + x[2:] if x[:2] in ['60', '70'] else x)
map_no_census['PSGC'] = pd.to_numeric(map_no_census['PSGC'])

# merge the two datasets
merged_negros = census_no_map.merge(map_no_census, on='PSGC', how='inner')


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  census_no_map.drop(columns=['geometry'], inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


In [7]:
# append merged_negros to barangays_w_borders
barangays_w_borders = pd.concat([barangays_w_borders, merged_negros], ignore_index=True)

# turn it into a gdf with appropriate CRS
barangays_w_borders = gpd.GeoDataFrame(barangays_w_borders, geometry='geometry', crs='EPSG:4326')
barangays_w_borders.to_crs(epsg=4326, inplace=True)

# get the s2 cell id for each barangay
barangays_w_borders['s2_cell_id'] = barangays_w_borders.apply(lambda x: get_s2_cell_id(x.geometry.centroid, 4), axis=1)

In [8]:
# save barangays_w_borders to file
barangays_w_borders.to_parquet(DB_DIR / '01_Raw data'/'02_Admin boundary data'/'Philippines' / 'barangays_w_borders.parquet')