# Pre-processing Colonies Dataset (29 August 2021)

* Data Pre-processing **[done]**
    * Import colonies
    * Import barrier files – reproject all to EPSG 7760
    * Check validity of all shapefiles (turn this into a function…) – also check that all points are in Delhi. (might be part of spatial index notebook and UAC deduplication)    
* Compute barrier clip for all colonies **[done]**
* Run Neighbors Algorithm **[done]**
    * bbox Neighbors algorithm
    * Should check for barriers
    * Should check for NDMC and related areas
* Additional preprocessing for colonies (turn into super function) **[done]**
    * Create index column **[done]**
    * Distance from NDMC **[done]**
    * Area of each polygon **[done]**
* Export GeoDataFrame as pickle file and ESRI Shapefiles

## Import modules and set constants

In [1]:
import os
import pickle
from importlib import reload
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, box
import spatial_index_utils

In [12]:
reload(spatial_index_utils)

<module 'spatial_index_utils' from 'C:\\Users\\bwbel\\Google Drive\\slum_project\\spatial_index_python\\spatial_index_utils.py'>

In [13]:
# WGS 84 / Delhi
epsg_code = 7760

## Import shapefiles **[done]**

In [14]:
colony_filepath = os.path.join('shapefiles', 'Spatial_Index_GIS', 'Colony_Shapefile', 
                        'USO23Aug2020.shp')

barrier_directory = os.path.join('shapefiles', 'Barrier_Clip')

canal_filepath = os.path.join(barrier_directory, 'Canal', 'Canal.shp')
drain_filepath = os.path.join(barrier_directory, 'Drain', 'Major_Drain.shp')
railway_filepath = os.path.join(barrier_directory, 'Railway', 'Railway_Line.shp')

# boundary of Delhi
delhi_bounds_filepath = os.path.join('shapefiles', 'delhi_bounds_buffer.shp')

# Check that all filepaths exist
filepath_list = [colony_filepath, canal_filepath, drain_filepath, railway_filepath, delhi_bounds_filepath]

for filepath in filepath_list:
    if not os.path.exists(filepath):
        print('{} does not exist'.format(filepath))

In [15]:
colonies = gpd.read_file(colony_filepath)

## Inspect shapefiles for validity (`check_shapefile`) **[done]**

In [16]:
spatial_index_utils.check_shapefile(gdf=colonies, gdf_name='colonies', 
                                    geom_type='Polygon', 
                                    delhi_bounds_filepath=delhi_bounds_filepath)

colonies has duplicate rows: False
----------------------------------------------------
rows with invalid geometries 

----------------------------------------------------
all geometries in colonies are of type Polygon: True
----------------------------------------------------
Rows with None value in geometry column are below
Empty GeoDataFrame
Columns: [AREA, USO_AREA_U, HOUSETAX_C, USO_FINAL, geometry, geom_type]
Index: []
----------------------------------------------------
colonies shapefile is contained within Delhi: True
----------------------------------------------------
Done with shapefile evaluation


In [17]:
# spatial_index_utils.check_shapefile(gdf=canal, gdf_name='canal', geom_type='Line', 
#                                     delhi_bounds_filepath=delhi_bounds_filepath)

In [18]:
# spatial_index_utils.check_shapefile(gdf=drain, gdf_name='drain', geom_type='Line', 
#                                     delhi_bounds_filepath=delhi_bounds_filepath)

In [19]:
# spatial_index_utils.check_shapefile(gdf=railway, gdf_name='railway', geom_type='Line', 
#                                     delhi_bounds_filepath=delhi_bounds_filepath)

## Remove duplicate geometries **[done]**

In [20]:
#canal = spatial_index_utils.remove_duplicate_geom(canal)
#drain = spatial_index_utils.remove_duplicate_geom(drain)
#railway = spatial_index_utils.remove_duplicate_geom(railway)

#with open('canal.data', 'wb') as f:
#    pickle.dump(canal, f)

#with open('drain.data', 'wb') as f:
#    pickle.dump(drain, f)

#with open('railway.data', 'wb') as f:
#    pickle.dump(railway, f)

In [21]:
colonies = spatial_index_utils.remove_duplicate_geom(colonies)

1it [00:00,  5.37it/s]

0/4357


101it [00:17,  5.44it/s]

100/4357


201it [00:35,  5.90it/s]

200/4357


300it [00:56,  3.54it/s]

300/4357


400it [01:23,  4.05it/s]

400/4357


500it [01:49,  4.09it/s]

500/4357


600it [02:15,  4.32it/s]

600/4357


700it [02:39,  4.07it/s]

700/4357


800it [03:04,  3.81it/s]

800/4357


900it [03:28,  4.07it/s]

900/4357


1000it [03:52,  4.38it/s]

1000/4357


1100it [04:16,  4.02it/s]

1100/4357


1200it [04:39,  4.23it/s]

1200/4357


1300it [05:08,  4.14it/s]

1300/4357


1400it [05:31,  4.34it/s]

1400/4357


1500it [05:54,  4.66it/s]

1500/4357


1601it [06:17,  4.73it/s]

1600/4357


1701it [06:40,  4.49it/s]

1700/4357


1800it [07:02,  4.43it/s]

1800/4357


1901it [07:24,  4.93it/s]

1900/4357


2001it [07:46,  4.45it/s]

2000/4357


2100it [08:09,  3.90it/s]

2100/4357


2200it [08:30,  4.50it/s]

2200/4357


2301it [08:52,  4.59it/s]

2300/4357


2400it [09:15,  3.67it/s]

2400/4357


2500it [09:37,  4.41it/s]

2500/4357


2601it [09:57,  4.95it/s]

2600/4357


2701it [10:18,  5.09it/s]

2700/4357


2800it [10:37,  5.22it/s]

2800/4357


2900it [10:58,  4.11it/s]

2900/4357


3000it [11:19,  4.64it/s]

3000/4357


3100it [11:41,  5.31it/s]

3100/4357


3201it [12:00,  5.35it/s]

3200/4357


3300it [12:21,  4.96it/s]

3300/4357


3400it [12:40,  5.61it/s]

3400/4357


3501it [12:59,  5.14it/s]

3500/4357


3601it [13:20,  5.37it/s]

3600/4357


3700it [13:38,  5.16it/s]

3700/4357


3800it [13:58,  4.92it/s]

3800/4357


3901it [14:17,  5.62it/s]

3900/4357


4001it [14:36,  5.32it/s]

4000/4357


4101it [14:54,  5.77it/s]

4100/4357


4201it [15:14,  5.93it/s]

4200/4357


4301it [15:33,  5.47it/s]

4300/4357


4357it [15:44,  4.62it/s]

Original number of rows is 4357:
New number of rows after deduplication is: 4357





In [22]:
len(colonies)

4357

In [30]:
colonies.to_csv("colonies_no_duplicates_29Aug.csv")

## Import canal, drain, and railway shapefiles

In [31]:
with open('canal.data', 'rb') as f:
    canal = pickle.load(f)

In [32]:
with open('drain.data', 'rb') as f:
    drain = pickle.load(f)

In [33]:
with open('railway.data', 'rb') as f:
    railway = pickle.load(f)

## Check CRS, reproject to EPSG:7760.

In [34]:
colonies.crs

<Projected CRS: EPSG:7760>
Name: WGS 84 / Delhi
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: India - Delhi national capital territory.
- bounds: (76.83, 28.4, 77.34, 28.89)
Coordinate Operation:
- name: Delhi NSF LCC
- method: Lambert Conic Conformal (2SP)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [35]:
canal = spatial_index_utils.reproject_gdf(canal, epsg_code)

GeoDataFrame now has the following CRS:

PROJCRS["WGS 84 / Delhi",BASEGEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["Delhi NSF LCC",METHOD["Lambert Conic Conformal (2SP)",ID["EPSG",9802]],PARAMETER["Latitude of false origin",28.62510126,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8821]],PARAMETER["Longitude of false origin",77,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8822]],PARAMETER["Latitude of 1st standard parallel",28.375,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8823]],PARAMETER["Latitude of 2nd standard p

In [36]:
drain = spatial_index_utils.reproject_gdf(drain, epsg_code)

GeoDataFrame now has the following CRS:

PROJCRS["WGS 84 / Delhi",BASEGEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["Delhi NSF LCC",METHOD["Lambert Conic Conformal (2SP)",ID["EPSG",9802]],PARAMETER["Latitude of false origin",28.62510126,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8821]],PARAMETER["Longitude of false origin",77,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8822]],PARAMETER["Latitude of 1st standard parallel",28.375,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8823]],PARAMETER["Latitude of 2nd standard p

In [37]:
railway = spatial_index_utils.reproject_gdf(railway, epsg_code)

GeoDataFrame now has the following CRS:

PROJCRS["WGS 84 / Delhi",BASEGEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["Delhi NSF LCC",METHOD["Lambert Conic Conformal (2SP)",ID["EPSG",9802]],PARAMETER["Latitude of false origin",28.62510126,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8821]],PARAMETER["Longitude of false origin",77,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8822]],PARAMETER["Latitude of 1st standard parallel",28.375,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8823]],PARAMETER["Latitude of 2nd standard p

In [38]:
colonies.crs == drain.crs == canal.crs == railway.crs

True

## Calculate Area (in square kilometers)

In [39]:
colonies['area_km2'] = colonies.area/1000000

In [40]:
colonies.head()

Unnamed: 0,index,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2
0,0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<property object at 0x000001FF10AB4DB0>,1.966739
1,1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.036429
2,2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.230739
3,3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.281195
4,4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<property object at 0x000001FF10AB4DB0>,0.301253


In [41]:
colonies['area_km2'].max()

29.1165489652962

In [42]:
colonies['area_km2'].min()

2.3028217018471078e-09

## Compute barrier clip

In [43]:
colonies.head()

Unnamed: 0,index,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2
0,0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<property object at 0x000001FF10AB4DB0>,1.966739
1,1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.036429
2,2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.230739
3,3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.281195
4,4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<property object at 0x000001FF10AB4DB0>,0.301253


In [44]:
# Note... I had to reset index to make spatial join work!
#colonies = colonies.reset_index()
colonies = colonies.drop(columns=['index'])

In [45]:
colonies.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<property object at 0x000001FF10AB4DB0>,1.966739
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.036429
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.230739
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.281195
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<property object at 0x000001FF10AB4DB0>,0.301253


In [46]:
# Create new columns showing intersection with canal, railway and drain
colonies = spatial_index_utils.barrier_intersection(colonies, canal, "canal")

In [47]:
colonies = spatial_index_utils.barrier_intersection(colonies, railway, "railway")

In [48]:
colonies = spatial_index_utils.barrier_intersection(colonies, drain, "drain")

In [49]:
# Create barrier column as being intersection with canal, railway or drain
colonies['barrier'] = colonies['canal'] | colonies['railway'] | colonies["drain"]

In [50]:
colonies.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<property object at 0x000001FF10AB4DB0>,1.966739,False,True,True,True
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.036429,False,False,False,False
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.230739,False,False,True,True
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.281195,False,False,False,False
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<property object at 0x000001FF10AB4DB0>,0.301253,False,False,True,True


In [51]:
len(colonies)

4357

## Calculate centroid for each polygon

In [52]:
colonies['centroid'] = colonies.centroid
colonies.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<property object at 0x000001FF10AB4DB0>,1.966739,False,True,True,True,POINT (1020123.175 995898.851)
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.036429,False,False,False,False,POINT (1019673.024 994869.699)
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.230739,False,False,True,True,POINT (1019485.484 994565.783)
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.281195,False,False,False,False,POINT (1019171.868 994576.688)
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<property object at 0x000001FF10AB4DB0>,0.301253,False,False,True,True,POINT (1018785.675 994590.275)


## Distance from NDMC (turn into function)

In [53]:
# ndmc_center shapefile location
ndmc_center_filepath = os.path.join('shapefiles', 'ndmc_center7760.shp')

# Import shapefile
ndmc_center = gpd.read_file(ndmc_center_filepath)

# Extract NDMC Center as Shapely Point
ndmc_center_point = ndmc_center['geometry'].values[0]

# Code to generate ndmc_distances
# initialize new column with value 0
colonies['ndmc_dist_km'] = 0

# Compute distance from NDMC to centroid of each polygon
# Division by 1000 turns units into kilometers
for idx, row in colonies.iterrows():
    colonies.loc[idx, 'ndmc_dist_km'] = ndmc_center_point.distance(row['centroid'])/1000
    
colonies.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<property object at 0x000001FF10AB4DB0>,1.966739,False,True,True,True,POINT (1020123.175 995898.851),5.159809
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.230739,False,False,True,True,POINT (1019485.484 994565.783),6.618792
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<property object at 0x000001FF10AB4DB0>,0.301253,False,False,True,True,POINT (1018785.675 994590.275),6.839299


## Create GeoDataFrame with Bounding Box of each Polygon

In [88]:
colonies_bbox = spatial_index_utils.create_bbox_gdf(colonies)

In [89]:
# To generate sindex
colonies_bbox_updated = gpd.GeoDataFrame(colonies_bbox, crs=colonies.crs)

## Spatial Join (intersection of polygon and bbox geometries)

In [90]:
colonies_bbox_nbrs = spatial_index_utils.add_polygon_neighbors_column_fast(polygon_gdf=colonies,
                                                       #right_gdf=colonies_bbox,
                                                       right_gdf = colonies_bbox_updated,
                                                       id_colname='USO_AREA_U', 
                                                       neighbor_colname='nbrs_bbox',
                                                       barrier_colname='barrier')
colonies_bbox_nbrs.head()

100%|██████████████████████████████████████████████████████████████████████████████| 4352/4352 [09:37<00:00,  7.54it/s]


Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,nbrs_bbox
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<property object at 0x000001FF10AB4DB0>,1.966739,False,True,True,True,POINT (1020123.175 995898.851),5.159809,"[5598, 5599, 5602, 5603, 3508, 3776, 4011, 349..."
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,"[5594, 4336, 2679, 1256, 4373, 5585, 1697, 180..."
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.230739,False,False,True,True,POINT (1019485.484 994565.783),6.618792,"[5594, 5587, 5585, 5596]"
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,"[5596, 5587]"
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<property object at 0x000001FF10AB4DB0>,0.301253,False,False,True,True,POINT (1018785.675 994590.275),6.839299,"[5621, 5587, 5620, 5596]"


## Calculate neighbor distances

In [91]:
colonies_bbox_nbrs = spatial_index_utils.calc_nbr_dist(polygon_gdf=colonies_bbox_nbrs,
                                  nbr_dist_colname='nbrs_dist_bbox',
                                  centroid_colname='centroid',
                                  neighbor_colname='nbrs_bbox',
                                  neighbor_id_col='USO_AREA_U')

colonies_bbox_nbrs.head()

100%|██████████████████████████████████████████████████████████████████████████████| 4357/4357 [04:51<00:00, 14.93it/s]


Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,nbrs_bbox,nbrs_dist_bbox
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<property object at 0x000001FF10AB4DB0>,1.966739,False,True,True,True,POINT (1020123.175 995898.851),5.159809,"[5598, 5599, 5602, 5603, 3508, 3776, 4011, 349...","[(5598, 1.074790368771482), (5599, 1.015506410..."
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,"[5594, 4336, 2679, 1256, 4373, 5585, 1697, 180...","[(5594, 0.6299162683011635), (4336, 14.7374799..."
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.230739,False,False,True,True,POINT (1019485.484 994565.783),6.618792,"[5594, 5587, 5585, 5596]","[(5594, 0.4532645992822079), (5587, 0.31380549..."
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,"[5596, 5587]","[(5596, 0.5564745103109905), (5587, 0.0)]"
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<property object at 0x000001FF10AB4DB0>,0.301253,False,False,True,True,POINT (1018785.675 994590.275),6.839299,"[5621, 5587, 5620, 5596]","[(5621, 0.5779093230804585), (5587, 0.38643264..."


## Create index column

In [92]:
colonies_bbox_nbrs['index'] = colonies_bbox_nbrs.index

In [94]:
colonies_bbox_nbrs.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,geom_type,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,nbrs_bbox,nbrs_dist_bbox,index
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",<property object at 0x000001FF10AB4DB0>,1.966739,False,True,True,True,POINT (1020123.175 995898.851),5.159809,"[5598, 5599, 5602, 5603, 3508, 3776, 4011, 349...","[(5598, 1.074790368771482), (5599, 1.015506410...",0
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,"[5594, 4336, 2679, 1256, 4373, 5585, 1697, 180...","[(5594, 0.6299162683011635), (4336, 14.7374799...",1
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.230739,False,False,True,True,POINT (1019485.484 994565.783),6.618792,"[5594, 5587, 5585, 5596]","[(5594, 0.4532645992822079), (5587, 0.31380549...",2
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",<property object at 0x000001FF10AB4DB0>,0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,"[5596, 5587]","[(5596, 0.5564745103109905), (5587, 0.0)]",3
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",<property object at 0x000001FF10AB4DB0>,0.301253,False,False,True,True,POINT (1018785.675 994590.275),6.839299,"[5621, 5587, 5620, 5596]","[(5621, 0.5779093230804585), (5587, 0.38643264...",4


## Remove extraneous columns (`geom_type`)

In [95]:
colonies_bbox_nbrs = colonies_bbox_nbrs.drop(columns=['geom_type']) 

In [96]:
colonies_bbox_nbrs.head()

Unnamed: 0,AREA,USO_AREA_U,HOUSETAX_C,USO_FINAL,geometry,area_km2,canal,railway,drain,barrier,centroid,ndmc_dist_km,nbrs_bbox,nbrs_dist_bbox,index
0,NEW DELHI 36,5584,,Planned,"POLYGON Z ((1020282.788 996796.773 0.000, 1020...",1.966739,False,True,True,True,POINT (1020123.175 995898.851),5.159809,"[5598, 5599, 5602, 5603, 3508, 3776, 4011, 349...","[(5598, 1.074790368771482), (5599, 1.015506410...",0
1,NEW DELHI 35,5585,,Planned,"POLYGON Z ((1019724.475 994932.797 0.000, 1019...",0.036429,False,False,False,False,POINT (1019673.024 994869.699),6.273149,"[5594, 4336, 2679, 1256, 4373, 5585, 1697, 180...","[(5594, 0.6299162683011635), (4336, 14.7374799...",1
2,NEW DELHI 34,5586,,Planned,"POLYGON Z ((1019571.955 994876.019 0.000, 1019...",0.230739,False,False,True,True,POINT (1019485.484 994565.783),6.618792,"[5594, 5587, 5585, 5596]","[(5594, 0.4532645992822079), (5587, 0.31380549...",2
3,NEW DELHI 33,5587,,Planned,"POLYGON Z ((1019352.702 994352.546 0.000, 1019...",0.281195,False,False,False,False,POINT (1019171.868 994576.688),6.709542,"[5596, 5587]","[(5596, 0.5564745103109905), (5587, 0.0)]",3
4,NEW DELHI 32,5588,,Planned,"POLYGON Z ((1018793.292 994224.182 0.000, 1018...",0.301253,False,False,True,True,POINT (1018785.675 994590.275),6.839299,"[5621, 5587, 5620, 5596]","[(5621, 0.5779093230804585), (5587, 0.38643264...",4


## Save colonies file for Spatial Index

In [97]:
with open('colonies_bbox_nbrs29Aug2021.pkl', 'wb') as f:
    pickle.dump(colonies_bbox_nbrs, f)