# Computing Total Population of each Judicial District

This notebook demonstrates using the `county_district_xref.csv` crosswalk to aggregate data from the American Community Survey by Federal Judicial District.

In [1]:
import pandas as pd
import cenpy                  # https://pypi.org/project/cenpy/ 
import census_data_aggregator # https://pypi.org/project/census-data-aggregator/

First, let's get the population for all counties from the most recent ACS

In [2]:
acs = cenpy.products.APIConnection('ACSDT5Y2018')
county_pop = acs.query(cols=['GEO_ID','B01003_001E','B01003_001M'],geo_unit='county')

In [3]:
county_pop.head()

Unnamed: 0,GEO_ID,B01003_001E,B01003_001M,state,county
0,0500000US28151,47086,-555555555,28,151
1,0500000US28111,12028,-555555555,28,111
2,0500000US28019,8321,-555555555,28,19
3,0500000US28057,23480,-555555555,28,57
4,0500000US28015,10129,-555555555,28,15


We wanted to be responsible and handle the margin of error correctly, but our sanity check above shows a number of `-555555555` values.  According to the Census Bureau's [Notes on ACS 5-Year Data
](https://www.census.gov/data/developers/data-sets/acs-5year/data-notes.html), 
> A '*****' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate.

That `*****` would be in the "margin of error annotation" column, which we didn't get, but the corresponding value for the "margin of error estimate" (`B01003_001M`) is `-555555555` so... in short, maybe we don't need to deal with the MOE? Let's take another look:

In [4]:
county_pop.groupby('B01003_001M').count()

Unnamed: 0_level_0,GEO_ID,B01003_001E,state,county
B01003_001M,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
-555555555,3104,3104,3104,3104
100,2,2,2,2
102,2,2,2,2
104,4,4,4,4
105,2,2,2,2
...,...,...,...,...
90,3,3,3,3
92,2,2,2,2
95,2,2,2,2
96,2,2,2,2


Yep, almost all of the rows indicate that we shouldn't or can't take the margin of error into account, so now all we need to do is sum the values.

In [5]:
county_pop = county_pop.rename(columns={'B01003_001E': 'total_pop'}).set_index('GEO_ID')

xref = pd.read_csv('county_district_xref.csv',index_col='geoid', usecols=['geoid','state', 'district'])
joined = xref.join(county_pop[['total_pop']].astype(int))

In [6]:
joined.groupby(['state', 'district'])[['total_pop']].sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,total_pop
state,district,Unnamed: 2_level_1
Alabama,Middle,1151252
Alabama,Northern,2870454
Alabama,Southern,842974
Alaska,Alaska,738516
Arizona,Arizona,6946685
...,...,...
West Virginia,Northern,869001
West Virginia,Southern,960053
Wisconsin,Eastern,3405147
Wisconsin,Western,2373247


In [7]:
joined.groupby(['state', 'district'])[['total_pop']].sum().to_csv('population_by_district_acs2018_5yr.csv')

In [55]:
race_by_district_base[moe_cols.values()].max()

total_moe            567.2
nh_white_moe        2622.7
nh_black_moe        4705.5
nh_amerind_moe      1834.2
nh_asian_moe        4833.5
nh_nhpi_moe         2184.8
nh_some_other_moe   3311.7
nh_twoplus_moe      6888.9
hispanic_moe        2001.3
dtype: float64

In [10]:
county_race[(county_race['B03002_004M'] > 100) 
            | (county_race['B03002_006M'] > 100) 
            | (county_race['B03002_008M'] > 100) 
            | (county_race['B03002_009M'] > 100) ]

Unnamed: 0,GEO_ID,B03002_001E,B03002_003E,B03002_004E,B03002_005E,B03002_006E,B03002_007E,B03002_008E,B03002_009E,B03002_012E,...,B03002_003M,B03002_004M,B03002_005M,B03002_006M,B03002_007M,B03002_008M,B03002_009M,B03002_012M,state,county
4,0500000US28015,10129,6511,3501,6,0,0,0,82,29,...,20,121,10,20,20,20,110,35,28,015
5,0500000US28043,21278,11683,9205,135,0,0,0,115,140,...,24,217,163,24,24,24,93,125,28,043
7,0500000US28041,13714,9896,3496,53,0,0,0,107,162,...,4,108,43,20,20,20,95,-555555555,28,041
8,0500000US28093,35787,17068,17118,15,24,0,0,281,1281,...,27,140,18,35,27,27,135,-555555555,28,093
10,0500000US28011,32592,10579,20869,41,213,0,0,189,701,...,27,70,52,112,27,27,125,-555555555,28,011
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3208,0500000US19067,15858,14593,274,90,224,27,0,183,467,...,15,79,110,32,40,15,112,-555555555,19,067
3212,0500000US19087,19926,17585,449,51,522,9,15,347,948,...,25,66,41,69,14,25,104,-555555555,19,087
3213,0500000US19145,15363,14020,260,101,147,0,0,323,512,...,15,117,82,57,15,15,105,-555555555,19,145
3214,0500000US19155,93503,82251,1335,257,619,11,7,1901,7122,...,8,193,126,179,18,8,285,-555555555,19,155
