<a href="https://colab.research.google.com/github/TimeMagazineLabs/CongressionalApportionment/blob/main/Apportionment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from IPython.display import HTML, Latex
import numpy as np
import pandas as pd

# What if D.C. or Puerto Rico Were States?

Recalculating Congressional apportionments based on the new [2020 Census decennial population count](https://www.census.gov/data/tables/2020/dec/2020-apportionment-data.html)

## The Equal Proportions Method

Since 1940, Congress [has used the "Equal Proportions Method"](https://www.census.gov/topics/public-sector/congressional-apportionment/about/computing.html) to apportion the 435 House seats to each state.

First, per the Constitution, every state gets one seat. For the remaining 385, each state's "priority" is measured by a simple formula that divides its population, as of the most recent decennial Census, but the number of seats is current has and that number plus 1.
\begin{equation}
priority = \frac{P}{\sqrt{n*(n+1)}}
\end{equation}
For each seat, one at a time, the state with the highest priority is awarded that seat. Then its priority is recalculated and it moves further back in the line.

## Loading the Data

The [GitHub repo for this demo](https://github.com/TimeMagazineLabs/CongressionalApportionment) has data files from Census.gov for the 2010 and 2020 decennial Census counts as well as the official apportionments so that we can check our work. Here's what the data looks like:

In [2]:
data_2020 = pd.read_csv('https://raw.githubusercontent.com/TimeMagazineLabs/CongressionalApportionment/main/data/apportionment_2020.csv',dtype={'State': 'string', 'Abbr': 'string', 'Reps': 'Int64'})
data_2020['Per_Rep'] = np.int64(data_2020['Population'] / data_2020['Reps'])
data_2020.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep
0,Alabama,AL,2020,5030053,7,718579
1,Alaska,AK,2020,736081,1,736081
2,Arizona,AZ,2020,7158923,9,795435
3,Arkansas,AR,2020,3013756,4,753439
4,California,CA,2020,39576757,52,761091


## Recalculating the apportionments
Before we get fancy, let's recalculate the apportionment and make sure it matches the official tally. We'll make a copy of the data frame for each trial run and add columns for the calculated reps and the priority from the Equal Proportions Method algorithm. Remember, each state starts with 1 seat, so we'll initialize `RepsCalculated` to 1 and start the apportionment with 385 seats remaining.

Since the data includes Puerto Rico and D.C., neither of which are eligible for reps, we'll need to remove them.

In [3]:
data_2020_states_only = data_2020[~data_2020['State'].isin(["District of Columbia", "Puerto Rico"])]
data_2020_states_only.reset_index(drop=True)
print(data_2020.shape[0], data_2020_states_only.shape[0])

52 50


In [4]:
test_2020 = data_2020_states_only.copy()
test_2020['RepsCalculated'] = 1
#test_2020['RepsCalculated'] = test_2020['RepsCalculated'].astype('Int64')
test_2020['Priority'] = 0.0
test_2020.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,1,0.0
1,Alaska,AK,2020,736081,1,736081,1,0.0
2,Arizona,AZ,2020,7158923,9,795435,1,0.0
3,Arkansas,AR,2020,3013756,4,753439,1,0.0
4,California,CA,2020,39576757,52,761091,1,0.0


First, we need to write a Python function for the Equal Proportions Method and calculate the initial priorities for each state, which we can do with the Pandas `.apply` function.

In [5]:
def EqualProportionsMethod(st):
  reps = st["RepsCalculated"]
  priority = st["Population"] / (np.sqrt((reps + 1) * reps))
  return priority

In [6]:
test_2020["Priority"] = test_2020.apply(EqualProportionsMethod, 1)
test_2020.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,1,3556785.0
1,Alaska,AK,2020,736081,1,736081,1,520487.9
2,Arizona,AZ,2020,7158923,9,795435,1,5062123.0
3,Arkansas,AR,2020,3013756,4,753439,1,2131047.0
4,California,CA,2020,39576757,52,761091,1,27984990.0


As expected, California has the highest priority, which we can determine with the `.idxmax()` method, which returns the index of the row with the highest value of a specified column

In [7]:
test_2020.loc[test_2020['Priority'].idxmax(),'State']

'California'

Great, now we just need a function to find the state with the highest priority, add a seat to that state, and recalculate it's priority. (It would be wasteful to recompute each state's priority after each assignment since they don't change unless a seat is added.) While we're at it, let's make a list called `ORDER` to see the order in which states get a representative. This should be initialized once per complete trial

In [8]:
def addNextSeat(df, ORDER=[]):
  indexNext = df['Priority'].idxmax() # The index of the row with the highest priority
  # Add a seat to the state in that row
  df.loc[indexNext,'RepsCalculated'] += 1
  # Recompute the priority for this state
  df.loc[indexNext,'Priority'] = EqualProportionsMethod(df.loc[indexNext])
  # Add the state to the ORDER list
  ORDER.append(df.loc[indexNext,'Abbr'])

Let's try that once and make sure it works.

In [9]:
ORDER_TEST = []  
addNextSeat(test_2020, ORDER_TEST)
print(ORDER_TEST)
test_2020.head()

['CA']


Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,1,3556785.0
1,Alaska,AK,2020,736081,1,736081,1,520487.9
2,Arizona,AZ,2020,7158923,9,795435,1,5062123.0
3,Arkansas,AR,2020,3013756,4,753439,1,2131047.0
4,California,CA,2020,39576757,52,761091,2,16157140.0


Great! There are now 384 seats left. Let's give it a go

In [10]:
SEATS_LEFT = 384
while SEATS_LEFT > 0:
  addNextSeat(test_2020, ORDER_TEST)
  SEATS_LEFT -= 1

test_2020.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,7,672169.105833
1,Alaska,AK,2020,736081,1,736081,1,520487.866603
2,Arizona,AZ,2020,7158923,9,795435,9,754616.742459
3,Arkansas,AR,2020,3013756,4,753439,4,673896.32836
4,California,CA,2020,39576757,52,761091,52,753877.180693


In [11]:
print(ORDER_TEST)

['CA', 'TX', 'CA', 'FL', 'NY', 'TX', 'CA', 'PA', 'IL', 'CA', 'FL', 'TX', 'OH', 'NY', 'GA', 'NC', 'CA', 'MI', 'NJ', 'TX', 'FL', 'VA', 'CA', 'NY', 'WA', 'TX', 'PA', 'CA', 'IL', 'AZ', 'MA', 'TN', 'FL', 'OH', 'IN', 'CA', 'NY', 'TX', 'GA', 'MD', 'MO', 'NC', 'CA', 'WI', 'MI', 'CO', 'MN', 'FL', 'TX', 'NJ', 'CA', 'PA', 'IL', 'NY', 'SC', 'AL', 'VA', 'CA', 'TX', 'OH', 'FL', 'LA', 'KY', 'CA', 'WA', 'NY', 'GA', 'TX', 'NC', 'OR', 'CA', 'AZ', 'MI', 'PA', 'FL', 'MA', 'IL', 'TN', 'OK', 'TX', 'IN', 'CA', 'NY', 'NJ', 'OH', 'CA', 'CT', 'FL', 'TX', 'MD', 'MO', 'VA', 'WI', 'CA', 'GA', 'NY', 'PA', 'CO', 'IL', 'NC', 'TX', 'MN', 'UT', 'FL', 'CA', 'IA', 'MI', 'WA', 'NV', 'TX', 'OH', 'CA', 'AR', 'NY', 'MS', 'SC', 'KS', 'NJ', 'AZ', 'FL', 'AL', 'MA', 'CA', 'TX', 'PA', 'TN', 'IL', 'IN', 'GA', 'VA', 'CA', 'NY', 'NC', 'LA', 'TX', 'FL', 'CA', 'MI', 'KY', 'OH', 'MD', 'MO', 'TX', 'NY', 'CA', 'PA', 'OR', 'FL', 'WA', 'IL', 'WI', 'NJ', 'CA', 'CO', 'TX', 'GA', 'MN', 'NY', 'OK', 'CA', 'NC', 'AZ', 'FL', 'VA', 'TX', 'OH', 'MA

Looks good! We can double-check them all easily by taking the absolute value of the difference between our calculations from the official apportionment:

In [12]:
test_2020['Error'] = np.abs(test_2020['Reps'] - test_2020['RepsCalculated'])
print("ERROR:", sum(test_2020['Error']))
test_2020.head()

ERROR: 0


Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority,Error
0,Alabama,AL,2020,5030053,7,718579,7,672169.105833,0
1,Alaska,AK,2020,736081,1,736081,1,520487.866603,0
2,Arizona,AZ,2020,7158923,9,795435,9,754616.742459,0
3,Arkansas,AR,2020,3013756,4,753439,4,673896.32836,0
4,California,CA,2020,39576757,52,761091,52,753877.180693,0


Let's wrap this whole process in a function for easy manipulation:

In [13]:
def calculateApportionment(ignoreStates=['District of Columbia', 'Puerto Rico'], TOTAL_SEATS=435):
  ORDER_SEATS = []

  data_2020_filtered = data_2020[~data_2020['State'].isin(ignoreStates)]
  data_2020_filtered.reset_index(drop=True)

  test_2020 = data_2020_filtered.copy()
  test_2020['RepsCalculated'] = 1
  test_2020['RepsCalculated'] = test_2020['RepsCalculated'].astype('Int64')
  test_2020['Priority'] = 0.0

  test_2020["Priority"] = test_2020.apply(EqualProportionsMethod, 1)

  SEATS_LEFT = TOTAL_SEATS - test_2020.shape[0]
  while SEATS_LEFT > 0:
    addNextSeat(test_2020, ORDER_SEATS)
    SEATS_LEFT -= 1

  return test_2020, ORDER_SEATS 

In [14]:
demo_2020, order = calculateApportionment()
demo_2020.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,7,672169.105833
1,Alaska,AK,2020,736081,1,736081,1,520487.866603
2,Arizona,AZ,2020,7158923,9,795435,9,754616.742459
3,Arkansas,AR,2020,3013756,4,753439,4,673896.32836
4,California,CA,2020,39576757,52,761091,52,753877.180693


In [15]:
print("Last state to get a seat is", order[-1])

Last state to get a seat is MN


## Now to Have Some Fun!

Let's start messing around with statehood and the total number of seats

In [16]:
demo_2020_with_dc, _ = calculateApportionment(ignoreStates=['Puerto Rico'])
demo_2020_with_dc.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,7,672169.105833
1,Alaska,AK,2020,736081,1,736081,1,520487.866603
2,Arizona,AZ,2020,7158923,9,795435,9,754616.742459
3,Arkansas,AR,2020,3013756,4,753439,4,673896.32836
4,California,CA,2020,39576757,52,761091,52,753877.180693


## What's the difference?

Let's write a function to location which states lost seat(s) if DC was a state

In [17]:
def compareToReality(df):
  df['Difference'] = df['RepsCalculated'] - df['Reps']
  df['Per_Rep_Computed'] = df['Population'] / df['RepsCalculated']

  changes = df[df['Difference'] != 0]
  for index, row in df.iterrows():
    if (row['Difference'] < 0):
      print('%s LOST %s seat(s)' % (row['State'], -row['Difference']))
    elif (row['Difference'] > 0):
      print('%s GAINED %s seat(s)' % (row['State'], row['Difference']))

  

Adding DC, which gets one seat, takes it's spot from Minnesota, which makes sense since Minnesota otherwise gets the last seat in the current apportionment

In [18]:
compareToReality(demo_2020_with_dc)

District of Columbia GAINED 1 seat(s)
Minnesota LOST 1 seat(s)


What if we add a seat for DC?

In [19]:
order_436 = []
demo_2020_with_dc_436, order_436 = calculateApportionment(ignoreStates=['Puerto Rico'], TOTAL_SEATS=436)
compareToReality(demo_2020_with_dc_436)
print(order_436[-1])

District of Columbia GAINED 1 seat(s)
MN


And what if we add PR by not DC?

In [20]:
demo_2020_with_pr, _ = calculateApportionment(ignoreStates=['District of Columbia'])
compareToReality(demo_2020_with_pr)

California LOST 1 seat(s)
Colorado LOST 1 seat(s)
Minnesota LOST 1 seat(s)
Montana LOST 1 seat(s)
Puerto Rico GAINED 4 seat(s)


Likewise, adding 4 seats for PR doesn't distrupt the apportionment otherwise

In [21]:
demo_2020_with_pr_439, _ = calculateApportionment(ignoreStates=['District of Columbia'], TOTAL_SEATS=439)
compareToReality(demo_2020_with_pr_439)

Puerto Rico GAINED 4 seat(s)


Here's DC and PR, without expanding the House

In [22]:
demo_2020_with_dc_and_pr, _ = calculateApportionment(ignoreStates=[])
compareToReality(demo_2020_with_dc_and_pr)

California LOST 1 seat(s)
Colorado LOST 1 seat(s)
District of Columbia GAINED 1 seat(s)
Minnesota LOST 1 seat(s)
Montana LOST 1 seat(s)
Oregon LOST 1 seat(s)
Puerto Rico GAINED 4 seat(s)


And making room at the table

In [23]:
demo_2020_with_dc_and_pr_440, _ = calculateApportionment(ignoreStates=[], TOTAL_SEATS=440)
compareToReality(demo_2020_with_dc_and_pr_440)

District of Columbia GAINED 1 seat(s)
Puerto Rico GAINED 4 seat(s)
