# Pre-processing Colonies Dataset (24-25 August 2020)

* Data Pre-processing **[done]**
    * Import colonies
    * Import barrier files – reproject all to EPSG 7760
    * Check validity of all shapefiles (turn this into a function…) – also check that all points are in Delhi. (might be part of spatial index notebook and UAC deduplication)    
* Compute barrier clip for all colonies **[done]**
* Run Neighbors Algorithm **[done]**
    * Touching Neighbors algorithm - Modify so that it ignores NDMC and related areas (The NDMC / DCB polygons are coded as NDMC and DCB)
    * bbox Neighbors algorithm
    * Should check for barriers
    * Should check for NDMC and related areas
    * Save as two separate columns: touching neighbors and bbox neighbors 
* Additional preprocessing for colonies (turn into super function) **[done]**
    * Create index column **[done]**
    * Distance from NDMC **[done]**
    * Area of each polygon **[done]**
* Merge with 2020 Population data **[done]**
* Export GeoDataFrame as pickle file and ESRI Shapefiles

## Import modules and set constants

In [25]:
import os
import pickle
from importlib import reload
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, box
import spatial_index_utils

In [26]:
reload(spatial_index_utils)

<module 'spatial_index_utils' from 'C:\\Users\\bwbel\\Google Drive\\slum_project\\spatial_index_python\\spatial_index_utils.py'>

In [27]:
# WGS 84 / Delhi
epsg_code = 7760

## Import shapefiles **[done]**

In [4]:
colony_filepath = os.path.join('shapefiles', 'Spatial_Index_GIS', 'Colony_Shapefile', 
                        'USO23Aug2020.shp')

barrier_directory = os.path.join('shapefiles', 'Barrier_Clip')

canal_filepath = os.path.join(barrier_directory, 'Canal', 'Canal.shp')
drain_filepath = os.path.join(barrier_directory, 'Drain', 'Major_Drain.shp')
railway_filepath = os.path.join(barrier_directory, 'Railway', 'Railway_Line.shp')

# boundary of Delhi
delhi_bounds_filepath = os.path.join('shapefiles', 'delhi_bounds_buffer.shp')

# Check that all filepaths exist
filepath_list = [colony_filepath, canal_filepath, drain_filepath, railway_filepath, delhi_bounds_filepath]

for filepath in filepath_list:
    if not os.path.exists(filepath):
        print('{} does not exist'.format(filepath))

In [5]:
colonies = gpd.read_file(colony_filepath)

## Inspect shapefiles for validity (`check_shapefile`) **[done]**

In [9]:
spatial_index_utils.check_shapefile(gdf=colonies, gdf_name='colonies', 
                                    geom_type='Polygon', 
                                    delhi_bounds_filepath=delhi_bounds_filepath)

colonies has duplicate rows: False
----------------------------------------------------
rows with invalid geometries 

----------------------------------------------------
all geometries in colonies are of type Polygon: True
----------------------------------------------------
Rows with None value in geometry column are below
Empty GeoDataFrame
Columns: [AREA, USO_AREA_U, HOUSETAX_C, USO_FINAL, geometry, geom_type]
Index: []
----------------------------------------------------
colonies shapefile is contained within Delhi: True
----------------------------------------------------
Done with shapefile evaluation


In [10]:
spatial_index_utils.check_shapefile(gdf=canal, gdf_name='canal', geom_type='Line', 
                                    delhi_bounds_filepath=delhi_bounds_filepath)

canal has duplicate rows: False
----------------------------------------------------
rows with invalid geometries 

----------------------------------------------------
all geometries in canal are of type Line: True
----------------------------------------------------
Rows with None value in geometry column are below
Empty GeoDataFrame
Columns: [FID_1, CAN_NM, CAN_CLSF, EL_GND, DIST_NM, geometry, geom_type]
Index: []
----------------------------------------------------
canal shapefile is contained within Delhi: True
----------------------------------------------------
Done with shapefile evaluation


In [11]:
spatial_index_utils.check_shapefile(gdf=drain, gdf_name='drain', geom_type='Line', 
                                    delhi_bounds_filepath=delhi_bounds_filepath)

drain has duplicate rows: False
----------------------------------------------------
rows with invalid geometries 

----------------------------------------------------
all geometries in drain are of type Line: True
----------------------------------------------------
Rows with None value in geometry column are below
Empty GeoDataFrame
Columns: [FID, Drain_type, Drain_Name, MAINTAINED, AC_NAME, DISTRICT, geometry, geom_type]
Index: []
----------------------------------------------------
drain shapefile is contained within Delhi: True
----------------------------------------------------
Done with shapefile evaluation


In [12]:
spatial_index_utils.check_shapefile(gdf=railway, gdf_name='railway', geom_type='Line', 
                                    delhi_bounds_filepath=delhi_bounds_filepath)

railway has duplicate rows: False
----------------------------------------------------
rows with invalid geometries 

----------------------------------------------------
all geometries in railway are of type Line: True
----------------------------------------------------
Rows with None value in geometry column are below
Empty GeoDataFrame
Columns: [FID_1, RL_ZONE, geometry, geom_type]
Index: []
----------------------------------------------------
railway shapefile is contained within Delhi: True
----------------------------------------------------
Done with shapefile evaluation


## Remove duplicate geometries **[done]**

In [18]:
#canal = spatial_index_utils.remove_duplicate_geom(canal)
#drain = spatial_index_utils.remove_duplicate_geom(drain)
#railway = spatial_index_utils.remove_duplicate_geom(railway)

#with open('canal.data', 'wb') as f:
#    pickle.dump(canal, f)

#with open('drain.data', 'wb') as f:
#    pickle.dump(drain, f)

#with open('railway.data', 'wb') as f:
#    pickle.dump(railway, f)

In [22]:
colonies = spatial_index_utils.remove_duplicate_geom(colonies)

Original number of rows is 4352:
New number of rows after deduplication is: 4352


In [31]:
len(colonies)

4352

## Start here, 25 August 2020 (download colonies and barrier files)

### First import modules and shapefiles above

In [28]:
with open('canal.data', 'rb') as f:
    canal = pickle.load(f)

In [29]:
with open('drain.data', 'rb') as f:
    drain = pickle.load(f)

In [30]:
with open('railway.data', 'rb') as f:
    railway = pickle.load(f)

## Check CRS, reproject to EPSG:7760.

In [32]:
colonies.crs

<Projected CRS: EPSG:7760>
Name: WGS 84 / Delhi
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: India - Delhi
- bounds: (76.83, 28.4, 77.34, 28.89)
Coordinate Operation:
- name: Delhi NSF LCC
- method: Lambert Conic Conformal (2SP)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [33]:
canal = spatial_index_utils.reproject_gdf(canal, epsg_code)

GeoDataFrame now has the following CRS:

PROJCRS["WGS 84 / Delhi",BASEGEOGCRS["WGS 84",DATUM["World Geodetic System 1984",ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["Delhi NSF LCC",METHOD["Lambert Conic Conformal (2SP)",ID["EPSG",9802]],PARAMETER["Latitude of false origin",28.62510126,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8821]],PARAMETER["Longitude of false origin",77,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8822]],PARAMETER["Latitude of 1st standard parallel",28.375,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8823]],PARAMETER["Latitude of 2nd standard parallel",28.875,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8824]],PARAMETER["Easting at false origin",1000000,LENGTHUNIT["metre",1],ID["EPSG",8826]],PARAMETER["Northing at false origin",1000000,LENGTHUNIT["metre",1],ID["EPSG",8827]]],CS[Cartesian,2],AXIS["easting (X)",east,ORDER[1],LENGTHUNIT["met

In [34]:
drain = spatial_index_utils.reproject_gdf(drain, epsg_code)

GeoDataFrame now has the following CRS:

PROJCRS["WGS 84 / Delhi",BASEGEOGCRS["WGS 84",DATUM["World Geodetic System 1984",ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["Delhi NSF LCC",METHOD["Lambert Conic Conformal (2SP)",ID["EPSG",9802]],PARAMETER["Latitude of false origin",28.62510126,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8821]],PARAMETER["Longitude of false origin",77,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8822]],PARAMETER["Latitude of 1st standard parallel",28.375,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8823]],PARAMETER["Latitude of 2nd standard parallel",28.875,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8824]],PARAMETER["Easting at false origin",1000000,LENGTHUNIT["metre",1],ID["EPSG",8826]],PARAMETER["Northing at false origin",1000000,LENGTHUNIT["metre",1],ID["EPSG",8827]]],CS[Cartesian,2],AXIS["easting (X)",east,ORDER[1],LENGTHUNIT["met

In [35]:
railway = spatial_index_utils.reproject_gdf(railway, epsg_code)

GeoDataFrame now has the following CRS:

PROJCRS["WGS 84 / Delhi",BASEGEOGCRS["WGS 84",DATUM["World Geodetic System 1984",ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["Delhi NSF LCC",METHOD["Lambert Conic Conformal (2SP)",ID["EPSG",9802]],PARAMETER["Latitude of false origin",28.62510126,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8821]],PARAMETER["Longitude of false origin",77,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8822]],PARAMETER["Latitude of 1st standard parallel",28.375,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8823]],PARAMETER["Latitude of 2nd standard parallel",28.875,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8824]],PARAMETER["Easting at false origin",1000000,LENGTHUNIT["metre",1],ID["EPSG",8826]],PARAMETER["Northing at false origin",1000000,LENGTHUNIT["metre",1],ID["EPSG",8827]]],CS[Cartesian,2],AXIS["easting (X)",east,ORDER[1],LENGTHUNIT["met

In [36]:
colonies.crs == drain.crs == canal.crs == railway.crs

True

## Calculate Area (in square kilometers)

In [37]:
colonies['area_km2'] = colonies.area/1000000

In [38]:
colonies.head()

Unnamed: 0,index,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2
0,0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739
1,1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429
2,2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739
3,3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195
4,4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253


In [39]:
colonies['area_km2'].max()

29.1165489652962

In [40]:
colonies['area_km2'].min()

2.3028217018471078e-09

## Compute barrier clip

In [41]:
colonies.head()

Unnamed: 0,index,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2
0,0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739
1,1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429
2,2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739
3,3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195
4,4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253


In [42]:
# Note... I had to reset index to make spatial join work!
#colonies = colonies.reset_index()
colonies = colonies.drop(columns=['index'])

In [43]:
colonies.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253


In [44]:
# Create new columns showing intersection with canal, railway and drain
colonies = spatial_index_utils.barrier_intersection(colonies, canal, "canal")

In [45]:
colonies = spatial_index_utils.barrier_intersection(colonies, railway, "railway")

In [46]:
colonies = spatial_index_utils.barrier_intersection(colonies, drain, "drain")

In [47]:
# Create barrier column as being intersection with canal, railway or drain
colonies['barrier'] = colonies['canal'] | colonies['railway'] | colonies["drain"]

In [48]:
colonies.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739,False,True,False,True
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429,False,False,False,False
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739,False,False,False,False
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195,False,False,False,False
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253,False,False,False,False


In [49]:
len(colonies)

4352

## Calculate centroid for each polygon

In [52]:
colonies['centroid'] = colonies.centroid
colonies.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739,False,True,False,True,POINT (1020123.175 995898.851)
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429,False,False,False,False,POINT (1019673.024 994869.699)
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739,False,False,False,False,POINT (1019485.484 994565.783)
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195,False,False,False,False,POINT (1019171.868 994576.688)
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253,False,False,False,False,POINT (1018785.675 994590.275)


## Distance from NDMC (turn into function)

In [53]:
# ndmc_center shapefile location
ndmc_center_filepath = os.path.join('shapefiles', 'ndmc_center7760.shp')

# Import shapefile
ndmc_center = gpd.read_file(ndmc_center_filepath)

# Extract NDMC Center as Shapely Point
ndmc_center_point = ndmc_center['geometry'].values[0]

# Code to generate ndmc_distances
# initialize new column with value 0
colonies['ndmc_dist_km'] = 0

# Compute distance from NDMC to centroid of each polygon
# Division by 1000 turns units into kilometers
for idx, row in colonies.iterrows():
    colonies.loc[idx, 'ndmc_dist_km'] = ndmc_center_point.distance(row['centroid'])/1000
    
colonies.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739,False,True,False,True,POINT (1020123.175 995898.851),5.159809
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739,False,False,False,False,POINT (1019485.484 994565.783),6.618792
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253,False,False,False,False,POINT (1018785.675 994590.275),6.839299


## Merge population data (2020) with colonies dataset

In [54]:
worldpop2020_filepath = os.path.join('population_data/', 'pop_colony_wp_2020.csv')

# Import 2020 population data
worldpop2020 = pd.read_csv(worldpop2020_filepath)

# Restrict dataframe to only two columns:
# layer: population data
# uso_area_u: unique id for colonies
worldpop2020 = worldpop2020[['layer', 'uso_area_u']]
worldpop2020.head()

# Merge population data with colonies data
colonies = colonies.merge(worldpop2020, how='inner', 
                          left_on="USO_AREA_U", right_on='uso_area_u')

# Rename 'layer' column as 'population'
colonies = colonies.rename(columns={'layer': 'population'})

# Remove extraneous columns
colonies = colonies.drop(columns=['uso_area_u'])

colonies.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,population
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739,False,True,False,True,POINT (1020123.175 995898.851),5.159809,3570.060984
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,323.028887
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739,False,False,False,False,POINT (1019485.484 994565.783),6.618792,2215.206473
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,3956.166944
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253,False,False,False,False,POINT (1018785.675 994590.275),6.839299,3961.943378


## Create GeoDataFrame with Bounding Box of each Polygon

In [56]:
colonies_bbox = spatial_index_utils.create_bbox_gdf(colonies)

## Spatial Join (intersection of polygon geometries)

In [58]:
colonies_touch_nbrs = spatial_index_utils.add_polygon_neighbors_column_fast(polygon_gdf=colonies,
                                                        right_gdf=colonies,
                                                        id_colname='USO_AREA_U', 
                                                        neighbor_colname='nbrs_touch',
                                                        barrier_colname='barrier')

colonies_touch_nbrs.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,population,nbrs_touch
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739,False,True,False,True,POINT (1020123.175 995898.851),5.159809,3570.060984,"[5598, 5599, 5602, 5603, 5605, 3491, 3508, 377..."
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,323.028887,"[5586, 5594]"
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739,False,False,False,False,POINT (1019485.484 994565.783),6.618792,2215.206473,"[5585, 5587, 5594, 5596]"
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,3956.166944,"[5586, 5588, 5596]"
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253,False,False,False,False,POINT (1018785.675 994590.275),6.839299,3961.943378,"[5587, 5596, 5620, 5621]"


## Spatial Join (intersection of polygon and bbox geometries)

In [59]:
colonies_bbox_nbrs = spatial_index_utils.add_polygon_neighbors_column_fast(polygon_gdf=colonies,
                                                       right_gdf=colonies_bbox,
                                                       id_colname='USO_AREA_U', 
                                                       neighbor_colname='nbrs_bbox',
                                                       barrier_colname='barrier')
colonies_bbox_nbrs.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,population,nbrs_bbox
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739,False,True,False,True,POINT (1020123.175 995898.851),5.159809,3570.060984,"[5598, 5599, 5602, 5603, 5605, 3491, 3508, 377..."
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,323.028887,"[5586, 5594]"
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739,False,False,False,False,POINT (1019485.484 994565.783),6.618792,2215.206473,"[5585, 5587, 5594, 5596]"
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,3956.166944,"[5586, 5588, 5596]"
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253,False,False,False,False,POINT (1018785.675 994590.275),6.839299,3961.943378,"[5587, 5596, 5620, 5621]"


## Calculate neighbor distances

In [60]:
colonies_touch_nbrs = spatial_index_utils.calc_nbr_dist(polygon_gdf=colonies_touch_nbrs,
                                  nbr_dist_colname='nbrs_dist_touch',
                                  centroid_colname='centroid',
                                  neighbor_colname='nbrs_touch',
                                  neighbor_id_col='USO_AREA_U')

colonies_touch_nbrs.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,population,nbrs_touch,nbrs_dist_touch
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739,False,True,False,True,POINT (1020123.175 995898.851),5.159809,3570.060984,"[5598, 5599, 5602, 5603, 5605, 3491, 3508, 377...","[(5598, 1.074790368771482), (5599, 1.015506410..."
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,323.028887,"[5586, 5594]","[(5586, 0.35712228070675794), (5594, 0.6299162..."
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739,False,False,False,False,POINT (1019485.484 994565.783),6.618792,2215.206473,"[5585, 5587, 5594, 5596]","[(5585, 0.35712228070675794), (5587, 0.3138054..."
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,3956.166944,"[5586, 5588, 5596]","[(5586, 0.3138054943365384), (5588, 0.38643264..."
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253,False,False,False,False,POINT (1018785.675 994590.275),6.839299,3961.943378,"[5587, 5596, 5620, 5621]","[(5587, 0.3864326417184364), (5596, 0.62727257..."


In [61]:
colonies_bbox_nbrs = spatial_index_utils.calc_nbr_dist(polygon_gdf=colonies_bbox_nbrs,
                                  nbr_dist_colname='nbrs_dist_bbox',
                                  centroid_colname='centroid',
                                  neighbor_colname='nbrs_bbox',
                                  neighbor_id_col='USO_AREA_U')

colonies_bbox_nbrs.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,population,nbrs_bbox,nbrs_dist_bbox
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739,False,True,False,True,POINT (1020123.175 995898.851),5.159809,3570.060984,"[5598, 5599, 5602, 5603, 5605, 3491, 3508, 377...","[(5598, 1.074790368771482), (5599, 1.015506410..."
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,323.028887,"[5586, 5594]","[(5586, 0.35712228070675794), (5594, 0.6299162..."
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739,False,False,False,False,POINT (1019485.484 994565.783),6.618792,2215.206473,"[5585, 5587, 5594, 5596]","[(5585, 0.35712228070675794), (5587, 0.3138054..."
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,3956.166944,"[5586, 5588, 5596]","[(5586, 0.3138054943365384), (5588, 0.38643264..."
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253,False,False,False,False,POINT (1018785.675 994590.275),6.839299,3961.943378,"[5587, 5596, 5620, 5621]","[(5587, 0.3864326417184364), (5596, 0.62727257..."


## Create index column

In [62]:
colonies_touch_nbrs['index'] = colonies_touch_nbrs.index
colonies_bbox_nbrs['index'] = colonies_bbox_nbrs.index

In [63]:
colonies_touch_nbrs.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,population,nbrs_touch,nbrs_dist_touch,index
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<class 'geopandas.geoseries.GeoSeries'>,1.966739,False,True,False,True,POINT (1020123.175 995898.851),5.159809,3570.060984,"[5598, 5599, 5602, 5603, 5605, 3491, 3508, 377...","[(5598, 1.074790368771482), (5599, 1.015506410...",0
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,323.028887,"[5586, 5594]","[(5586, 0.35712228070675794), (5594, 0.6299162...",1
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.230739,False,False,False,False,POINT (1019485.484 994565.783),6.618792,2215.206473,"[5585, 5587, 5594, 5596]","[(5585, 0.35712228070675794), (5587, 0.3138054...",2
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<class 'geopandas.geoseries.GeoSeries'>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,3956.166944,"[5586, 5588, 5596]","[(5586, 0.3138054943365384), (5588, 0.38643264...",3
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<class 'geopandas.geoseries.GeoSeries'>,0.301253,False,False,False,False,POINT (1018785.675 994590.275),6.839299,3961.943378,"[5587, 5596, 5620, 5621]","[(5587, 0.3864326417184364), (5596, 0.62727257...",4


In [64]:
colonies_bbox_nbrs.head

<bound method NDFrame.head of                                                    AREA  USO_AREA_U  \
0                                          NEW DELHI 36        5584   
1                                          NEW DELHI 35        5585   
2                                          NEW DELHI 34        5586   
3                                          NEW DELHI 33        5587   
4                                          NEW DELHI 32        5588   
...                                                 ...         ...   
4347                HARIJAN BASTI, SADAT PUR, DELHI-94.        5579   
4348             CHRISTIAN COLONY, PATELCHEST, DELHI-7.        5580   
4349  SULTANPUR MAZRA EXTN,. (EAST), NANGLOI, DELHI-41.        5581   
4350                    NAI BASTI BAKNER PASCHIM,DELHI.        5582   
4351  DEFENCE ENCLAVE GOELA TAJPUR RD.CHAWLA NEW DEL...        5583   

     HOUSETAX_C USO_FINAL                                           geometry  \
0          None   Planned  POLYGON Z 

## Remove extraneous columns (`geom_type`)

In [65]:
colonies_touch_nbrs = colonies_touch_nbrs.drop(columns=['geom_type']) 
colonies_bbox_nbrs = colonies_bbox_nbrs.drop(columns=['geom_type']) 

In [66]:
colonies_touch_nbrs.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,population,nbrs_touch,nbrs_dist_touch,index
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",1.966739,False,True,False,True,POINT (1020123.175 995898.851),5.159809,3570.060984,"[5598, 5599, 5602, 5603, 5605, 3491, 3508, 377...","[(5598, 1.074790368771482), (5599, 1.015506410...",0
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,323.028887,"[5586, 5594]","[(5586, 0.35712228070675794), (5594, 0.6299162...",1
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",0.230739,False,False,False,False,POINT (1019485.484 994565.783),6.618792,2215.206473,"[5585, 5587, 5594, 5596]","[(5585, 0.35712228070675794), (5587, 0.3138054...",2
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,3956.166944,"[5586, 5588, 5596]","[(5586, 0.3138054943365384), (5588, 0.38643264...",3
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",0.301253,False,False,False,False,POINT (1018785.675 994590.275),6.839299,3961.943378,"[5587, 5596, 5620, 5621]","[(5587, 0.3864326417184364), (5596, 0.62727257...",4


In [67]:
colonies_bbox_nbrs.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,population,nbrs_bbox,nbrs_dist_bbox,index
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",1.966739,False,True,False,True,POINT (1020123.175 995898.851),5.159809,3570.060984,"[5598, 5599, 5602, 5603, 5605, 3491, 3508, 377...","[(5598, 1.074790368771482), (5599, 1.015506410...",0
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,323.028887,"[5586, 5594]","[(5586, 0.35712228070675794), (5594, 0.6299162...",1
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",0.230739,False,False,False,False,POINT (1019485.484 994565.783),6.618792,2215.206473,"[5585, 5587, 5594, 5596]","[(5585, 0.35712228070675794), (5587, 0.3138054...",2
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,3956.166944,"[5586, 5588, 5596]","[(5586, 0.3138054943365384), (5588, 0.38643264...",3
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",0.301253,False,False,False,False,POINT (1018785.675 994590.275),6.839299,3961.943378,"[5587, 5596, 5620, 5621]","[(5587, 0.3864326417184364), (5596, 0.62727257...",4


## Save colonies file for Spatial Index

In [69]:
with open('colonies_bbox_nbrs25Aug2020.pkl', 'wb') as f:
    pickle.dump(colonies_bbox_nbrs, f)

In [70]:
with open('colonies_touch_nbrs25Aug2020.pkl', 'wb') as f:
    pickle.dump(colonies_touch_nbrs, f)