# PSI Calculations (removing small area and ignoring "Other" category)

* Prepare colonies dataset **[DONE]**
    * Import colonies file (identify which one) **[DONE]**
    * Add column `exclude_from_psi` **[DONE]**
    * All area_km2 < .0001 get `exclude_from_psi` = True **[DONE]**
    * All USO types that should be ignore get `exclude_from_psi` = True **[DONE]**
    * Calculate only bounding box neighbors **[DONE]**
    * Turn into a function that I can easily change **[DONE]**
* Removing `exclude_from_psi` from index calculations **[DONE]**
    * If the row has a USO category equal to one of the values in `remove_uso_category`, it assigns -1 to the value of the PCEN.
    * Make sure that final PSI ignores all excluded polygons in its calculations.
* Calculate index with PCEN divided by (1) Population; (2) Population/Area; and (3) 1.
    * Refactor code to have these (and other options).
    * Can I pass in a variable that says what the denominator should be? Even hard coding this in with Python command that executes code from a string?
* Calculate Average PSI for all Services **[DONE]**
* Calculate Normalized PSI for all Services, using min-max method. **[DONE]**
    * Combine the top two above into one function
    * Embed this function in the larger function `calc_all_services` or its equivalent

## Import modules and set constants

In [148]:
import os
import pickle
from importlib import reload
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, box
import spatial_index_utils

reload(spatial_index_utils)

# WGS 84 / Delhi
epsg_code = 7760

# Columns to remove for export to ESRI Shapefile
bbox_drop_columns = ['nbrs_bbox', 'nbrs_dist_bbox', 'centroid']

## Import colonies and do pre-processing

In [149]:
from spatial_index_utils import generate_colonies_with_exclusions, calc_all_services

In [25]:
colonies_pkl_file = 'colonies_bbox_nbrs25Aug2020.pkl'
columns_to_drop = ['nbrs_bbox', 'nbrs_dist_bbox', 'index']
uso_types_to_drop = ['Other']
area_cutoff_km2 = .0001
colonies = generate_colonies_with_exclusions(colonies_pkl_file = colonies_pkl_file,
                                             columns_to_drop = columns_to_drop, 
                                             uso_types_to_drop = uso_types_to_drop,
                                             area_cutoff_km2 = area_cutoff_km2)

Calculate bounding box neighbors column: `nbrs_bbox`


100%|██████████████████████████████████████████████████████████████████████████████| 4352/4352 [00:52<00:00, 83.66it/s]
  0%|▎                                                                              | 15/4352 [00:00<00:29, 148.91it/s]

Calculate dist from polygons to their neighbors: `nbrs_dist_bbox`


100%|█████████████████████████████████████████████████████████████████████████████| 4352/4352 [00:29<00:00, 146.24it/s]


In [127]:
colonies_bbox_nbrs = colonies.copy()

## Import services shapefiles

In [128]:
# Define filepaths

services_dir = os.path.join('shapefiles', 'Spatial_Index_GIS', 'Public Services')

bank_fp = os.path.join(services_dir, 'Banking', 'Banking.shp')
health_fp = os.path.join(services_dir, 'Health', 'Health.shp')
road_fp = os.path.join(services_dir, 'Major Road', 'Road.shp')
police_fp = os.path.join(services_dir, 'Police', 'Police Station.shp')
ration_fp = os.path.join(services_dir, 'Ration', 'Ration.shp')
school_fp = os.path.join(services_dir, 'School', 'schools7760.shp')
transport_fp = os.path.join(services_dir, 'Transport', 'Transport.shp')

# boundary of Delhi
delhi_bounds_filepath = os.path.join('shapefiles', 'delhi_bounds_buffer.shp')

# Check that all filepaths exist
filepath_list = [bank_fp, health_fp, road_fp, police_fp, ration_fp, school_fp, transport_fp, delhi_bounds_filepath]

for filepath in filepath_list:
    if not os.path.exists(filepath):
        print('{} does not exist'.format(filepath))

In [129]:
# Import services
bank = gpd.read_file(bank_fp)
health = gpd.read_file(health_fp)
road = gpd.read_file(road_fp)
police = gpd.read_file(police_fp)
ration = gpd.read_file(ration_fp)
school = gpd.read_file(school_fp)
transport = gpd.read_file(transport_fp)

No need to check validity of these shapefiles, as this was previously done.

In [130]:
bank.crs == health.crs == road.crs == police.crs == ration.crs == school.crs == transport.crs == colonies_bbox_nbrs.crs

True

In [131]:
# Define all point services as dictionary
# makes it easier to calculate all point
# services with one function
point_services = {'bank': bank,
                  'health': health,
                  'police': police,
                  'ration': ration,
                  'school': school,
                  'transport': transport}

line_services = {'road': road}

### Calculate PSI for bbox neighbors using Population Size

In [132]:
colonies_bbox_psi_popsize = calc_all_services(polygon_gdf = colonies_bbox_nbrs, 
                                       point_services = point_services, 
                                       line_services = line_services, 
                                       epsg_code = epsg_code, 
                                       pcen_denom = 'pop',
                                       nbr_dist_colname = 'nbrs_dist_bbox')

GeoDataFrame now has the following CRS:

epsg:7760
bank service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
health service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
police service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
ration service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
school service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
transport service index is completed
--------------------------------------------------------
all point services completed
GeoDataFrame now has the following CRS:

epsg:7760
road service is completed


In [133]:
colonies_bbox_psi_popsize.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,area_km2,canal,railway,drain,barrier,...,school_pcen,school_idx,transport_count,transport_pcen,transport_idx,road_length,road_pcen,road_idx,unnorm_psi,norm_psi
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",1.966739,False,True,False,True,...,0.000719,0.002675,3,0.006834,0.004113,1.951928,0.003171,0.004338,0.00285,0.004912
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",0.036429,False,False,False,False,...,0.002281,0.008488,0,0.015586,0.009381,0.0,0.001117,0.001529,0.025415,0.043797
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",0.230739,False,False,False,False,...,0.003051,0.011351,6,0.004094,0.002464,0.0,0.000681,0.000932,0.006147,0.010592
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",0.281195,False,False,False,False,...,0.002216,0.008246,0,0.002169,0.001305,0.0,0.000437,0.000598,0.002901,0.005
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",0.301253,False,False,False,False,...,0.002003,0.007452,2,0.001473,0.000887,0.747875,0.00065,0.000889,0.001653,0.002848


In [145]:
def export_shapefile(gdf, filename, columns_to_drop):
    """Save as ESRI Shapefile, Pickle object, and CSV"""
        
    shapefile = filename+'.shp'
    csv_file = filename+'.csv'
    pickle_file = filename+'.pkl'
    
    # Save as ESRI Shapefile
    gdf.drop(columns=columns_to_drop).to_file(shapefile)    
    
    # Save as CSV file
    gdf.to_csv(csv_file)
    
    # Save as Pickle file
    with open(pickle_file, 'wb') as f:
        pickle.dump(gdf, f)

In [146]:
export_shapefile(colonies_bbox_psi_popsize, 'colonies_bbox_psi_popsize', bbox_drop_columns)

### Calculate PSI for bbox neighbors using Population Density

In [150]:
colonies_bbox_psi_popdensity = calc_all_services(polygon_gdf = colonies_bbox_nbrs, 
                                       point_services = point_services, 
                                       line_services = line_services, 
                                       epsg_code = epsg_code, 
                                       pcen_denom = 'popdensity',
                                       nbr_dist_colname = 'nbrs_dist_bbox')

colonies_bbox_psi_popdensity.head()

export_shapefile(colonies_bbox_psi_popdensity, 'colonies_bbox_psi_popdensity', bbox_drop_columns)

GeoDataFrame now has the following CRS:

epsg:7760
bank service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
health service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
police service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
ration service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
school service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
transport service index is completed
--------------------------------------------------------
all point services completed
GeoDataFrame now has the following CRS:

epsg:7760
road service is completed


### Calculate PSI for bbox neighbors using Denominator=1

In [152]:
colonies_bbox_psi_one = calc_all_services(polygon_gdf = colonies_bbox_nbrs, 
                                       point_services = point_services, 
                                       line_services = line_services, 
                                       epsg_code = epsg_code, 
                                       pcen_denom = 'one',
                                       nbr_dist_colname = 'nbrs_dist_bbox')

export_shapefile(colonies_bbox_psi_one, 'colonies_bbox_psi_one', bbox_drop_columns)

colonies_bbox_psi_one.head()

GeoDataFrame now has the following CRS:

epsg:7760
bank service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
health service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
police service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
ration service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
school service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
transport service index is completed
--------------------------------------------------------
all point services completed
GeoDataFrame now has the following CRS:

epsg:7760
road service is completed


Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,area_km2,canal,railway,drain,barrier,...,school_pcen,school_idx,transport_count,transport_pcen,transport_idx,road_length,road_pcen,road_idx,unnorm_psi,norm_psi
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",1.966739,False,True,False,True,...,2.56654,0.028971,3,24.398203,0.142039,1.951928,11.318965,0.23444,0.093715,0.096449
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",0.036429,False,False,False,False,...,0.736853,0.008318,0,5.034648,0.02931,0.0,0.36097,0.007476,0.04995,0.051408
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",0.230739,False,False,False,False,...,6.757838,0.076284,6,9.07001,0.052803,0.0,1.508212,0.031238,0.08333,0.085761
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",0.281195,False,False,False,False,...,8.767378,0.098968,0,8.579348,0.049946,0.0,1.729878,0.03583,0.071759,0.073852
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",0.301253,False,False,False,False,...,7.934982,0.089571,2,5.836379,0.033978,0.747875,2.573545,0.053304,0.032901,0.033861


## Redo PSI Calculations ignoring "Other", "Rural Villages", and areas < .0001

In [154]:
colonies_pkl_file = 'colonies_bbox_nbrs25Aug2020.pkl'
columns_to_drop = ['nbrs_bbox', 'nbrs_dist_bbox', 'index']
uso_types_to_drop = ['Other', 'RV']
area_cutoff_km2 = .0001
colonies = generate_colonies_with_exclusions(colonies_pkl_file = colonies_pkl_file,
                                             columns_to_drop = columns_to_drop, 
                                             uso_types_to_drop = uso_types_to_drop,
                                             area_cutoff_km2 = area_cutoff_km2)

colonies_bbox_nbrs = colonies.copy()

Calculate bounding box neighbors column: `nbrs_bbox`


100%|██████████████████████████████████████████████████████████████████████████████| 4352/4352 [00:50<00:00, 86.91it/s]
  0%|▏                                                                              | 12/4352 [00:00<00:37, 116.81it/s]

Calculate dist from polygons to their neighbors: `nbrs_dist_bbox`


100%|█████████████████████████████████████████████████████████████████████████████| 4352/4352 [00:30<00:00, 141.43it/s]


### Calculate PSI for bbox neighbors using Population Size

In [155]:
colonies_no_rv_bbox_psi_popsize = calc_all_services(polygon_gdf = colonies_bbox_nbrs, 
                                       point_services = point_services, 
                                       line_services = line_services, 
                                       epsg_code = epsg_code, 
                                       pcen_denom = 'pop',
                                       nbr_dist_colname = 'nbrs_dist_bbox')

export_shapefile(colonies_no_rv_bbox_psi_popsize, 'colonies_no_rv_bbox_psi_popsize', bbox_drop_columns)

colonies_no_rv_bbox_psi_popsize.head()

GeoDataFrame now has the following CRS:

epsg:7760
bank service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
health service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
police service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
ration service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
school service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
transport service index is completed
--------------------------------------------------------
all point services completed
GeoDataFrame now has the following CRS:

epsg:7760
road service is completed


Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,area_km2,canal,railway,drain,barrier,...,school_pcen,school_idx,transport_count,transport_pcen,transport_idx,road_length,road_pcen,road_idx,unnorm_psi,norm_psi
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",1.966739,False,True,False,True,...,0.000719,0.002675,3,0.006834,0.004113,1.951928,0.003171,0.004338,0.00285,0.004912
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",0.036429,False,False,False,False,...,0.002281,0.008488,0,0.015586,0.009381,0.0,0.001117,0.001529,0.025415,0.043797
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",0.230739,False,False,False,False,...,0.003051,0.011351,6,0.004094,0.002464,0.0,0.000681,0.000932,0.006147,0.010592
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",0.281195,False,False,False,False,...,0.002216,0.008246,0,0.002169,0.001305,0.0,0.000437,0.000598,0.002901,0.005
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",0.301253,False,False,False,False,...,0.002003,0.007452,2,0.001473,0.000887,0.747875,0.00065,0.000889,0.001653,0.002848


### Calculate PSI for bbox neighbors using Population Density

In [156]:
colonies_no_rv_bbox_psi_popdensity = calc_all_services(polygon_gdf = colonies_bbox_nbrs, 
                                       point_services = point_services, 
                                       line_services = line_services, 
                                       epsg_code = epsg_code, 
                                       pcen_denom = 'popdensity',
                                       nbr_dist_colname = 'nbrs_dist_bbox')

export_shapefile(colonies_no_rv_bbox_psi_popdensity, 'colonies_no_rv_bbox_psi_popdensity', bbox_drop_columns)

colonies_no_rv_bbox_psi_popdensity.head()

GeoDataFrame now has the following CRS:

epsg:7760
bank service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
health service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
police service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
ration service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
school service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
transport service index is completed
--------------------------------------------------------
all point services completed
GeoDataFrame now has the following CRS:

epsg:7760
road service is completed


Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,area_km2,canal,railway,drain,barrier,...,school_pcen,school_idx,transport_count,transport_pcen,transport_idx,road_length,road_pcen,road_idx,unnorm_psi,norm_psi
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",1.966739,False,True,False,True,...,0.001414,0.225979,3,0.013441,0.411841,1.951928,0.006236,0.58929,0.282122,0.460533
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",0.036429,False,False,False,False,...,8.3e-05,0.013281,0,0.000568,0.017397,0.0,4.1e-05,0.003847,0.035857,0.058533
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",0.230739,False,False,False,False,...,0.000704,0.112503,6,0.000945,0.028948,0.0,0.000157,0.014846,0.05966,0.097388
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",0.281195,False,False,False,False,...,0.000623,0.099598,0,0.00061,0.018685,0.0,0.000123,0.01162,0.036421,0.059453
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",0.301253,False,False,False,False,...,0.000603,0.096431,2,0.000444,0.013598,0.747875,0.000196,0.018493,0.021949,0.03583


### Calculate PSI for bbox neighbors using Denominator=1

In [157]:
colonies_no_rv_bbox_psi_one = calc_all_services(polygon_gdf = colonies_bbox_nbrs, 
                                       point_services = point_services, 
                                       line_services = line_services, 
                                       epsg_code = epsg_code, 
                                       pcen_denom = 'one',
                                       nbr_dist_colname = 'nbrs_dist_bbox')

export_shapefile(colonies_no_rv_bbox_psi_one, 'colonies_no_rv_bbox_psi_one', bbox_drop_columns)

colonies_no_rv_bbox_psi_one.head()

GeoDataFrame now has the following CRS:

epsg:7760
bank service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
health service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
police service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
ration service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
school service index is completed
--------------------------------------------------------
GeoDataFrame now has the following CRS:

epsg:7760
transport service index is completed
--------------------------------------------------------
all point services completed
GeoDataFrame now has the following CRS:

epsg:7760
road service is completed


Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,area_km2,canal,railway,drain,barrier,...,school_pcen,school_idx,transport_count,transport_pcen,transport_idx,road_length,road_pcen,road_idx,unnorm_psi,norm_psi
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",1.966739,False,True,False,True,...,2.56654,0.028971,3,24.398203,0.142039,1.951928,11.318965,0.23444,0.093715,0.096449
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",0.036429,False,False,False,False,...,0.736853,0.008318,0,5.034648,0.02931,0.0,0.36097,0.007476,0.04995,0.051408
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",0.230739,False,False,False,False,...,6.757838,0.076284,6,9.07001,0.052803,0.0,1.508212,0.031238,0.08333,0.085761
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",0.281195,False,False,False,False,...,8.767378,0.098968,0,8.579348,0.049946,0.0,1.729878,0.03583,0.071759,0.073852
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",0.301253,False,False,False,False,...,7.934982,0.089571,2,5.836379,0.033978,0.747875,2.573545,0.053304,0.032901,0.033861
