# Using Pandas Reporter for Analyzing The ACS

In this example, we'll use Census Reporter and pandasreporter to combine age groups in a census table and report aggregates and ratios. To keep the extent of the data small, we'll analyze COunty subdivisions of Bexar County Texas. 

To start the anslysis, visit [Census Reporter](https://censusreporter.org) and search for "Bexar" in the "Profile" search box. Select "Bexar County, Texas". On the profile page, in the 'Find data for this place' search box, search for "Poverty Status by Sex By Age." You should get a hit for Table B17001, "Poverty Status in the Past 12 Months by Sex by Age"

On the data page for [Table B17001](https://censusreporter.org/data/table/?table=B17001&geo_ids=05000US48029&primary_geo_id=05000US48029) select, in the left margin, under "Divide Bexar County, TX into " the link for "county subdivisions". You'll end up at [a page with columns for each of the county subdivisions](https://censusreporter.org/data/table/?table=B17001&geo_ids=05000US48029,060|05000US48029&primary_geo_id=05000US48029)

Look in the URL bar, and you'll see at the end of the url, these URL quiery parameters:  `geo_ids=05000US48029,060` and `primary_geo_id=05000US48029`. The `primary_geo_id` is the geoid for Bexar County, Tx. You can also get this value from the profile page. For instance the profile page for the Cit of San Diego has the URL https://censusreporter.org/profiles/16000US0666000-san-diego-ca/, where `16000US0666000` is the geoid for San Diego. 

The `,060` part of `geoids` is the summary level for the county subdivisions. You can also get the summar level code [from this list](https://www.census.gov/geo/maps-data/data/summary_level.html). Furthermore, the first three digits of any geoid is the summar levels, so for the place San Diego, with geoid `16000US0666000`, the summary level code is `160`. 

Now you have everything you need to fetch a table with the Census Reporter API: we need the table id, the geoid of the containing geography, and the summary level code for the divisons of the containing geography.

* Table: `B17001` ( Poverty Status in the Past 12 Months by Sex by Age )
* Containing geography geoid: `05000US48029` ( Bexar County,  Tx )
* Summary level: `060` ( County Subdivisions )



In [1]:
import pandas as pd
import numpy as np
import pandasreporter as pr
df = pr.get_dataframe('B01001', '140',  '05000US06073')
df.head(2)

Unnamed: 0,geoid,name,B01001001,B01001001_m90,B01001002,B01001002_m90,B01001003,B01001003_m90,B01001004,B01001004_m90,...,B01001045,B01001045_m90,B01001046,B01001046_m90,B01001047,B01001047_m90,B01001048,B01001048_m90,B01001049,B01001049_m90
0,14000US06073009304,"Census Tract 93.04, San Diego, CA",8442.0,717.0,4294.0,520.0,388.0,209.0,69.0,86.0,...,144.0,126.0,151.0,102.0,19.0,29.0,19.0,31.0,0.0,17.0
1,14000US06073018519,"Census Tract 185.19, San Diego, CA",5928.0,426.0,2816.0,257.0,195.0,71.0,324.0,139.0,...,27.0,25.0,108.0,62.0,49.0,39.0,95.0,54.0,53.0,22.0


The `get_dataframe` returns a dataframe, but the columns have cryptic codes. You can get a new view with different column name that are easier to understand. The access methods are: 

* `df.titled_columns` for human readable titles
* `df.coded_columns` to return to the codes
* `df.ct_columns` for a combination of codes and titles. This one is the easiest to use


In [2]:
df = df.ct_columns
df.head(2)

Unnamed: 0,geoid,name,B01001001 Total,Margins for B01001001 Total,B01001002 Total Male,Margins for B01001002 Total Male,B01001003 Total Male Under 5 years,Margins for B01001003 Total Male Under 5 years,B01001004 Total Male 5 to 9 years,Margins for B01001004 Total Male 5 to 9 years,...,B01001045 Total Female 67 to 69 years,Margins for B01001045 Total Female 67 to 69 years,B01001046 Total Female 70 to 74 years,Margins for B01001046 Total Female 70 to 74 years,B01001047 Total Female 75 to 79 years,Margins for B01001047 Total Female 75 to 79 years,B01001048 Total Female 80 to 84 years,Margins for B01001048 Total Female 80 to 84 years,B01001049 Total Female 85 years and over,Margins for B01001049 Total Female 85 years and over
0,14000US06073009304,"Census Tract 93.04, San Diego, CA",8442.0,717.0,4294.0,520.0,388.0,209.0,69.0,86.0,...,144.0,126.0,151.0,102.0,19.0,29.0,19.0,31.0,0.0,17.0
1,14000US06073018519,"Census Tract 185.19, San Diego, CA",5928.0,426.0,2816.0,257.0,195.0,71.0,324.0,139.0,...,27.0,25.0,108.0,62.0,49.0,39.0,95.0,54.0,53.0,22.0


Regardless of how you set the column names, you can always index with the codes, or the last three digits of the code. Additionally, for a lot of the special math functions, you can use a string with the last three digits. 

In [3]:
assert np.round(df['B01001001'].mean(),4) == 5132.3185
assert np.round(df['001'].mean(),4) == 5132.3185

In [4]:
# Here is a nice view of all of the column names, with some example columns
df.iloc[:3].T

Unnamed: 0,0,1,2
geoid,14000US06073009304,14000US06073018519,14000US06073017009
name,0,1,2
B01001001 Total,8442,5928,4025
Margins for B01001001 Total,717,426,383
B01001002 Total Male,4294,2816,1831
Margins for B01001002 Total Male,520,257,224
B01001003 Total Male Under 5 years,388,195,63
Margins for B01001003 Total Male Under 5 years,209,71,44
B01001004 Total Male 5 to 9 years,69,324,139
Margins for B01001004 Total Male 5 to 9 years,86,139,85


In [5]:
# Or, just dump the columns
[e for e in df.columns if str(e).endswith('Total Female')]

['B01001026 Total Female', 'Margins for B01001026 Total Female']

In [22]:
# Values for Young males 15 to 19
sumsdf = pr.CensusDataFrame()
sumsdf['m1019'], sumsdf['m1019_m90'] =  df.sum_m('B01001005', 'B01001006', 'B01001007')
sumsdf.head(4)

Unnamed: 0,m1019,m1019_m90
0,161.0,148.78844
1,406.0,165.450899
2,194.0,83.624159
3,444.0,128.767232


In [23]:
# Add in total males and total females
sumsdf['males'], sumsdf['males_m90'] = df['B01001002'], df['B01001002_m90']
sumsdf['females'], sumsdf['females_m90'] = df['B01001026'], df['B01001026_m90']

In [26]:
# proportion of young males. This is a proportion because young males is a subset of all males
sumsdf['prop_m1019'], sumsdf['prop_m1019_m90'] = sumsdf.proportion('m1019', 'males')

# Ratio of male to female. Since one is not a subset of the other, use a ratio. 
sumsdf['fm'], sumsdf['fm_m90'] = sumsdf.ratio('females', 'males')
sumsdf.add_rse('fm') # Add a relative std err column




sumsdf

Unnamed: 0,m1019,m1019_m90,males,males_m90,females,females_m90,prop_m1019,prop_m1019_m90,fm,fm_m90,fm_rse
0,161.0,148.788440,4294.0,520.0,4148.0,455.0,0.037494,0.034352,0.965999,0.157837,9.932696
1,406.0,165.450899,2816.0,257.0,3112.0,304.0,0.144176,0.057262,1.105114,0.147738,8.126771
2,194.0,83.624159,1831.0,224.0,2194.0,261.0,0.105953,0.043793,1.198252,0.204470,10.373276
3,444.0,128.767232,3032.0,644.0,2480.0,282.0,0.146438,0.028917,0.817942,0.197061,14.645801
4,354.0,145.471647,3548.0,308.0,3964.0,453.0,0.099775,0.040076,1.117249,0.160338,8.724084
5,786.0,161.096245,2153.0,205.0,2374.0,241.0,0.365072,0.066260,1.102647,0.153469,8.460920
6,51.0,39.051248,1631.0,218.0,1342.0,193.0,0.031269,0.023576,0.822808,0.161547,11.935330
7,486.0,135.162865,2907.0,354.0,3117.0,297.0,0.167183,0.041802,1.072239,0.165793,9.399557
8,288.0,91.372862,2276.0,265.0,1984.0,213.0,0.126538,0.037345,0.871705,0.138056,9.627617
9,1120.0,392.048466,8833.0,978.0,10731.0,937.0,0.126797,0.042106,1.214876,0.171308,8.571958
