# This needs to by run in a Python2 environment

### This notebook describes how to get geographic or mapping data that relates to census information. This will be used to help us check that we have each census tract in DC represented in our data, and to relate data collected via the Uber API to census information. 

This has great instructions, and a link to a notebook demo: https://pypi.python.org/pypi/cenpy/0.9.1
There are lots of other census things you can select, including different map layers and different cities, etc. 
Check it out so you can get the most relevant data for your study. Here, I only use layer 14 (census tracts) for 
my city (D.C. -> `'STATE = 11'`)

In [1]:
import cenpy as c

In [2]:
# initiate a connection with the database
conn = c.base.Connection('2010sf1')

In [3]:
# initiate a connection with the tiger shapefile database
conn.set_mapservice('tigerWMS_Census2010')

In [4]:
# look at census tracts in ESRIlayer 14
conn.mapservice.layers[14]

(ESRILayer) Census Tracts

In [5]:
# look at variables associated with a layer
conn.mapservice.layers[14].variables

Unnamed: 0,alias,domain,length,name,type
0,MTFCC,,5.0,MTFCC,esriFieldTypeString
1,OID,,,OID,esriFieldTypeDouble
2,GEOID,,11.0,GEOID,esriFieldTypeString
3,STATE,,2.0,STATE,esriFieldTypeString
4,COUNTY,,3.0,COUNTY,esriFieldTypeString
5,TRACT,,6.0,TRACT,esriFieldTypeString
6,BASENAME,,100.0,BASENAME,esriFieldTypeString
7,NAME,,100.0,NAME,esriFieldTypeString
8,LSADC,,2.0,LSADC,esriFieldTypeString
9,FUNCSTAT,,1.0,FUNCSTAT,esriFieldTypeString


In [6]:
# create a dataframe for DC (State 11)
geodata = conn.mapservice.query(layer=14, where = 'STATE = 11')

In [7]:
geodata.head()

Unnamed: 0,AREALAND,AREAWATER,BASENAME,CENTLAT,CENTLON,COUNTY,FUNCSTAT,GEOID,HU100,INTPTLAT,...,LSADC,MTFCC,NAME,OBJECTID,OID,POP100,STATE,TRACT,UR,geometry
0,579629,0,80.02,38.8915424,-76.9827573,1,S,11001008002,1644,38.8915424,...,CT,G5020,Census Tract 80.02,23766,20740331304906,3031,11,8002,U,<pysal.cg.shapes.Polygon object at 0x119cb3990>
1,535254,0,78.09,38.9015402,-76.9321955,1,S,11001007809,1339,38.9015402,...,CT,G5020,Census Tract 78.09,23770,20740331304811,2922,11,7809,U,<pysal.cg.shapes.Polygon object at 0x119cb3b10>
2,543460,0,25.01,38.9446515,-77.0317348,1,S,11001002501,1042,38.9446515,...,CT,G5020,Census Tract 25.01,24466,20740331303268,2554,11,2501,U,<pysal.cg.shapes.Polygon object at 0x119ccc390>
3,300647,0,29.0,38.9338479,-77.0299131,1,S,11001002900,1643,38.9338479,...,CT,G5020,Census Tract 29,24471,20740331303383,3962,11,2900,U,<pysal.cg.shapes.Polygon object at 0x119ce2490>
4,603721,0,25.02,38.9390574,-77.0301448,1,S,11001002502,2415,38.9390574,...,CT,G5020,Census Tract 25.02,24472,20740331303283,5973,11,2502,U,<pysal.cg.shapes.Polygon object at 0x119cee790>


In order for the rest of this processing to work in python3, we need to lose the column `'geometry'` because that cannot be read in python3 due to pysal libray not being python3 compatible. 

To make the maps later in CartoDB, I uploaded the zipped census file found at www.census.gov for the DC area. It contains the shapefiles needed to draw the census tract boundaries. To learn how to use it, check out http://cartodb.github.io/training/intermediate/uofm-workshop.html I found it super-useful!

In [8]:
del geodata['geometry']

In [9]:
geodata.columns

Index([u'AREALAND', u'AREAWATER', u'BASENAME', u'CENTLAT', u'CENTLON',
       u'COUNTY', u'FUNCSTAT', u'GEOID', u'HU100', u'INTPTLAT', u'INTPTLON',
       u'LSADC', u'MTFCC', u'NAME', u'OBJECTID', u'OID', u'POP100', u'STATE',
       u'TRACT', u'UR'],
      dtype='object')

Now we can pickle it and/or save as a .csv for later use in `'Mapping_points_across_DC'` ipython notebook

In [10]:
geodata.to_csv('./data/geodata.csv', index=False)
geodata.to_pickle('./data/geodata.pkl')

### Now move to Python3 and the notebook Mapping_points_across_DC.ipynb