The Iowa Democratic Party is uploading the results of Tuesday's caucus to https://results.thecaucuses.org.
I haven't found a source for the raw data, so we have to scrape it.

Unfortunatley, the HTML isn't the cleanest... (https://twitter.com/therriaultphd/status/1224833286599585793).

In [1]:
import io
import requests
import lxml.html
import pandas as pd

url = "https://results.thecaucuses.org"
r = requests.get(url)

root = lxml.html.parse(io.StringIO(r.text)).getroot()

In [2]:
# Bennet, Biden, etc.
head = root.find_class("thead")[0]
header = [x.text for x in list(head.iterchildren())]

# First Expression, Final Expression, SDE, ...
subhead = root.find_class("sub-head")[0]
subheader = [x.text for x in list(subhead.iterchildren())]

In [3]:
columns = pd.MultiIndex.from_arrays([
    pd.Series(header).fillna(method='ffill'),
    pd.Series(subheader).fillna(method='ffill').fillna('')
], names=['candidate', 'round'])

In [4]:
counties = root.find_class("precinct-county")
county_names = [x[0].text for x in counties]
counties_data = root.find_class("precinct-data")
county = counties_data[0]
rows = []

In [5]:
for name, county in zip(county_names, counties_data):
    if len(county) > 1:
        # satellites only have a total
        county = county[:-1]

    for precinct in county:
        # exclude total
        rows.append((name,) + tuple(x.text for x in precinct))

In [6]:
df = (
    pd.DataFrame(rows, columns=columns)
      .set_index(['County', 'Precinct'])
      .stack(level='candidate')
      .sort_index()
      .apply(pd.to_numeric)
)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,round,Final Expression,First Expression,SDE
County,Precinct,candidate,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Adair,1NW ADAIR,Bennet,0,0,0.0000
Adair,1NW ADAIR,Biden,6,6,0.0784
Adair,1NW ADAIR,Bloomberg,0,0,0.0000
Adair,1NW ADAIR,Buttigieg,8,8,0.0784
Adair,1NW ADAIR,Delaney,0,0,0.0000
...,...,...,...,...,...
Wright,Rural Clarion,Sanders,0,1,0.0000
Wright,Rural Clarion,Steyer,0,0,0.0000
Wright,Rural Clarion,Uncommitted,0,0,0.0000
Wright,Rural Clarion,Warren,0,3,0.0000


## Analysis

Statewide SDE (state delegate equivalent)

In [7]:
df.groupby(level='candidate').SDE.sum().sort_values(ascending=False).round(2)

candidate
Buttigieg      362.64
Sanders        337.89
Warren         246.18
Biden          210.34
Klobuchar      169.69
Yang            14.27
Steyer           3.76
Uncommitted      2.08
Other            0.28
Bloomberg        0.13
Patrick          0.00
Gabbard          0.00
Delaney          0.00
Bennet           0.00
Name: SDE, dtype: float64

## Calculating percents

Candidates needed at least 15% to be viable in the first round. We have raw counts.

In [8]:
precinct_totals = df.groupby(level=[0, 1])['First Expression'].sum()
precinct_totals

County  Precinct        
Adair   1NW ADAIR           40
        5GF GREENFIELD      78
Adams   Adams 1             29
        Adams 4             18
        Adams 5             15
                            ..
Wright  Eagle Grove #1       3
        Eagle Grove #2      11
        Eagle Grove #3      29
        Eagle Grove/Troy    10
        Rural Clarion       33
Name: First Expression, Length: 1104, dtype: int64

In [9]:
first_percent = df['First Expression'].div(precinct_totals)
first_percent

County  Precinct       candidate  
Adair   1NW ADAIR      Bennet         0.000000
                       Biden          0.150000
                       Bloomberg      0.000000
                       Buttigieg      0.200000
                       Delaney        0.000000
                                        ...   
Wright  Rural Clarion  Sanders        0.030303
                       Steyer         0.000000
                       Uncommitted    0.000000
                       Warren         0.090909
                       Yang           0.000000
Name: First Expression, Length: 15456, dtype: float64

This shows the percent of precincts the candidate was viable in.

In [10]:
(first_percent >= 0.15).groupby(level='candidate').mean().sort_values()

candidate
Bennet         0.000000
Delaney        0.000000
Gabbard        0.000000
Other          0.000000
Bloomberg      0.000906
Patrick        0.000906
Uncommitted    0.003623
Steyer         0.024457
Yang           0.051630
Klobuchar      0.445652
Warren         0.573370
Biden          0.627717
Sanders        0.715580
Buttigieg      0.846014
Name: First Expression, dtype: float64

After unviable candidates are removed, their supports get a chance to realign behind a second preference.
We can see who gained the most supporters in the second round.

In [11]:
(df['Final Expression'] - df['First Expression']).groupby(level='candidate').mean().sort_values(ascending=False)

candidate
Buttigieg      3.047101
Warren         1.273551
Sanders        1.025362
Uncommitted    0.298007
Other          0.032609
Delaney        0.000000
Patrick       -0.041667
Bennet        -0.086051
Bloomberg     -0.096014
Gabbard       -0.198370
Klobuchar     -0.611413
Steyer        -1.500906
Biden         -1.814312
Yang          -4.199275
dtype: float64