# Getting Shape Data

When making data maps there are typically two components:

1. Finding quantitative or qualitative data to visualize on the map.
1. Actually drawing the map.

For this second point we need geographic information.  When working with regions defined by the US Census, this data available in the format of TIGER Shapefiles.  In this script, we will assume that the desired shapefiles are those at the tract level.

### Geopandas

Installing `geopandas` on Linux and Mac can be accomplished with `pip install geopandas`.  Installing `geopandas` on Windows is trickier and users should follow [these instructions](http://geoffboeing.com/2014/09/using-geopandas-windows/).

## Initialization

In [32]:
import geopandas as gpd
import pandas as pd
from us import states
import requests
import zipfile
import os

loc_name = "maryland"
state_codes = list([int(states.MD.fips)])
county_list = None # Extract all counties

TIGER_BASE_URL = 'http://www2.census.gov/geo/tiger/TIGER2013/'
TIGER_TRACT_DIR = 'TRACT/'

# Local Storage Parameters
LOCAL_DATA_DIR = './data/'
GEO_SUB_DIR = 'geo/'

GEO_FILE_END = '_geo_data.json'
geo_outfile = LOCAL_DATA_DIR + loc_name + GEO_FILE_END

## Get TIGER (shape) data

In [23]:
for state_id in state_codes:
    tiger_zip_file = 'tl_2013_{0}_tract.zip'.format(state_id)

    FULL_TIGER_URL = TIGER_BASE_URL + TIGER_TRACT_DIR + tiger_zip_file

    # Check if file is in directory, else download it
    if os.path.isfile(LOCAL_DATA_DIR + GEO_SUB_DIR + tiger_zip_file):
        print("Already had the file.  Great.")
    else:
        r = requests.get(FULL_TIGER_URL)

        if r.status_code == requests.codes.ok:
            print("Got the file! Copying to disk.")
            with open(LOCAL_DATA_DIR + GEO_SUB_DIR + tiger_zip_file, "wb") as f:
                f.write(r.content)
        else:
            print("Something went wrong. Status code: {0}".format(r.status_code))


Already had the file.  Great.


## Load TIGER data into GeoDataFrame

In [29]:
state_shapes = []
for idx, state_id in enumerate(state_codes):
    tiger_zip_file = 'tl_2013_{0}_tract.zip'.format(state_id)
    tiger_shape_file = 'tl_2013_{0}_tract.shp'.format(state_id)

    # Unzip file, extract contents
    zfile = zipfile.ZipFile(LOCAL_DATA_DIR + GEO_SUB_DIR + tiger_zip_file)
    zfile.extractall(LOCAL_DATA_DIR + GEO_SUB_DIR)

    # Load to GeoDataFrame
    state_shape = gpd.GeoDataFrame.from_file(LOCAL_DATA_DIR + GEO_SUB_DIR + tiger_shape_file)
    
    state_shapes.append(state_shape)

shapes = gpd.GeoDataFrame( pd.concat(state_shapes, ignore_index=True) )

# Only keep counties that we are interested in
if county_list is not None:
    shapes = shapes[shapes["COUNTYFP"].isin(county_list)]

## Eliminate unneeded attributes, export shapes to geojson

In [34]:
small_shapes = gpd.GeoDataFrame()
small_shapes["geometry"] = shapes["geometry"].simplify(tolerance=0.0001) # Simplify geometry to reduce file size
small_shapes["fips"] = shapes["GEOID"]
small_json = small_shapes.to_json()

# Write to file
with open(geo_outfile, 'w') as f:
    f.write(small_json)