# Tips and Tricks

## Subsetting from large shapefiles

Suppose you have a large file containing all the zip codes in USA and you want only the zip codes with in Cuyahoga County. 

Our natural instinct is to load all the zip files and then clip out the necessary zip codes using a join operation with Cuyahoga County. While this is correct, we have to pay the price of loading in a large shapefile. There are two solutions here,

### Bounding Box Approach

#### What is a bounding box

>In geometry, the minimum or smallest bounding or enclosing box for a point set S in N dimensions is the box with the smallest measure (area, volume, or hypervolume in higher dimensions) within which all the points lie.

![bbox](images/bbox.png)

If we want to calculate the bounding box for Cuyahoga County, a good tool to use is

In [2]:
from IPython.display import HTML
HTML('<iframe src=https://boundingbox.klokantech.com/ width=700 height=350></iframe>')



From the tool, the bounding box for Cuyahoa County is -81.9713,41.275,-81.3748,42.1006

So let us read the zip dataset for USA and use the bounding box to extract out the zip codes that fall with in the bounding box of Cuyahoga County

In [4]:
import geopandas as gpd

In [6]:
dataCuyBounds = gpd.read_file(r'../../largedatasets/cb_2018_us_zcta510_500k/cb_2018_us_zcta510_500k.shp',bbox=[-81.9713,41.275,-81.3748,42.1006])

In [7]:
dataCuyBounds

Unnamed: 0,ZCTA5CE10,AFFGEOID10,GEOID10,ALAND10,AWATER10,geometry
0,44123,8600000US44123,44123,6672614,1111163,"POLYGON ((-81.54259 41.60567, -81.52974 41.614..."
1,44233,8600000US44233,44233,69162840,360814,"POLYGON ((-81.78510 41.27645, -81.75314 41.276..."
2,44011,8600000US44011,44011,53889975,150096,"POLYGON ((-82.06787 41.43003, -82.06707 41.459..."
3,44095,8600000US44095,44095,23504503,337404,"POLYGON ((-81.48864 41.63135, -81.48692 41.632..."
4,44140,8600000US44140,44140,11825077,6444600,"POLYGON ((-81.96916 41.49334, -81.96828 41.505..."
...,...,...,...,...,...,...
69,44131,8600000US44131,44131,41649577,306018,"POLYGON ((-81.68566 41.41737, -81.68560 41.419..."
70,44135,8600000US44135,44135,26124104,13438,"POLYGON ((-81.86906 41.40305, -81.86867 41.408..."
71,44144,8600000US44144,44144,14477438,94388,"POLYGON ((-81.76964 41.42202, -81.76939 41.441..."
72,44129,8600000US44129,44129,15237374,80307,"POLYGON ((-81.75699 41.42154, -81.73435 41.421..."


Note: There could be Zip codes that are outside Cuyahoga County. So you might need to filter them out using a second step using the original boundary of Cuyahoga County

### Mask Approach

In this approach we can add a mask (another GeoDataFrame or Geoseries) to extract out only the records that intersects with the mask. Let us load Cuyahoga County boundaries

In [9]:
ohCounties = gpd.read_file(r'../../largedatasets/ohio_county_boundaries/ODOT_County_Boundaries.shp')
ohCounties

Unnamed: 0,OBJECTID,COUNTY_CD,COUNTY_SEA,ODOT_DISTR,FIPS_COUNT,POP_2010,POP_2000,POP_1990,STATE_PLAN,ELEVATION_,...,LONG_WEST_,AREA_SQMI,AREA_ID,created_us,created_da,last_edite,last_edi_1,SHAPE_STAr,SHAPE_STLe,geometry
0,1,HIG,HILLSBORO,9,39071,43589,40875,35728,S,1340,...,-83.873,557.74,53,Esri_Anonymous,2015-08-11,Esri_Anonymous,2015-08-11,2.407747e+09,213157.740992,"POLYGON ((-83.78330 39.26382, -83.78312 39.263..."
1,2,HOC,LOGAN,10,39073,29380,28241,25533,S,1220,...,-82.748,423.50,54,Esri_Anonymous,2015-08-11,Esri_Anonymous,2015-08-11,1.844277e+09,215451.632241,"POLYGON ((-82.49595 39.60265, -82.49505 39.612..."
2,3,HOL,MILLERSBURG,11,39075,42366,38943,32849,N,1380,...,-82.222,424.03,55,Esri_Anonymous,2015-08-11,Esri_Anonymous,2015-08-11,1.905310e+09,193045.013152,"POLYGON ((-81.87727 40.66713, -81.87564 40.667..."
3,4,HUR,NORWALK,3,39077,59626,59487,56240,N,1200,...,-82.843,495.96,56,Esri_Anonymous,2015-08-11,Esri_Anonymous,2015-08-11,2.267372e+09,197349.121399,"POLYGON ((-82.83547 41.14407, -82.83549 41.145..."
4,5,FRA,COLUMBUS,6,39049,1163414,1068978,961437,S,1130,...,-83.255,543.97,143,Esri_Anonymous,2015-08-11,Esri_Anonymous,2015-08-11,2.401407e+09,211754.151636,"POLYGON ((-83.24596 39.96574, -83.24595 39.965..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
83,84,VIN,MCARTHUR,10,39163,13435,12806,11098,S,1140,...,-82.763,414.92,28,Esri_Anonymous,2015-08-11,Esri_Anonymous,2015-08-11,1.794678e+09,212461.296202,"POLYGON ((-82.51710 39.36942, -82.51595 39.379..."
84,85,WAR,LEBANON,8,39165,212693,158383,113909,S,1060,...,-84.366,407.21,29,Esri_Anonymous,2015-08-11,Esri_Anonymous,2015-08-11,1.770899e+09,178414.982467,"POLYGON ((-83.98859 39.44441, -83.98941 39.435..."
85,86,HAS,CADIZ,11,39067,15864,15856,16085,N,1360,...,-81.341,411.07,51,Esri_Anonymous,2015-08-11,Esri_Anonymous,2015-08-11,1.829951e+09,181149.223466,"POLYGON ((-81.27273 40.36923, -81.27271 40.369..."
86,87,PIC,CIRCLEVILLE,6,39129,55698,52727,48255,S,1090,...,-83.265,506.13,77,Esri_Anonymous,2015-08-11,Esri_Anonymous,2015-08-11,2.214650e+09,215128.493147,"POLYGON ((-82.84304 39.56150, -82.84164 39.561..."


Now lets extract out Cuyahoga County

In [12]:
cuyahoga = ohCounties[ohCounties.COUNTY_CD=='CUY']

If you want to see the bounds of Cuyahoga County you can simply check the bounds attribute

In [14]:
cuyahoga.bounds

Unnamed: 0,minx,miny,maxx,maxy
73,-81.971336,41.275092,-81.374338,41.631498


Let us read the zip shape file again

In [15]:
dataCuy = gpd.read_file(r'../../largedatasets/cb_2018_us_zcta510_500k/cb_2018_us_zcta510_500k.shp',mask=cuyahoga)

In [16]:
dataCuy

Unnamed: 0,ZCTA5CE10,AFFGEOID10,GEOID10,ALAND10,AWATER10,geometry
0,44123,8600000US44123,44123,6672614,1111163,"POLYGON ((-81.54259 41.60567, -81.52974 41.614..."
1,44233,8600000US44233,44233,69162840,360814,"POLYGON ((-81.78510 41.27645, -81.75314 41.276..."
2,44011,8600000US44011,44011,53889975,150096,"POLYGON ((-82.06787 41.43003, -82.06707 41.459..."
3,44140,8600000US44140,44140,11825077,6444600,"POLYGON ((-81.96916 41.49334, -81.96828 41.505..."
4,44132,8600000US44132,44132,8144254,743486,"POLYGON ((-81.52673 41.58750, -81.51373 41.595..."
...,...,...,...,...,...,...
62,44131,8600000US44131,44131,41649577,306018,"POLYGON ((-81.68566 41.41737, -81.68560 41.419..."
63,44135,8600000US44135,44135,26124104,13438,"POLYGON ((-81.86906 41.40305, -81.86867 41.408..."
64,44144,8600000US44144,44144,14477438,94388,"POLYGON ((-81.76964 41.42202, -81.76939 41.441..."
65,44129,8600000US44129,44129,15237374,80307,"POLYGON ((-81.75699 41.42154, -81.73435 41.421..."
