<h2>Do They Know?: A Brief Analysis of Accuracy of American Voters on Governmental Trivia Questions </h2>

<img src="../figs/title_photo.jpg"/>

In this analysis, we got hold of survey data conducted by the *American National Election Studies* data center. The surveys they conduct intend to capture how American citizens vote in each major election. But, rather than ask how they voted, today I'd like to know **who is voting**. More spceifically, I want to see how knowledgeable the voting public is on the machinations of the government they're electing. Does the knowledgeability differ across party lines or education? Let's find out!

In [4]:
import pandas as pd
from pathlib import Path
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
import scipy.stats as stats

<p> We'll be using data from ANES's *2012 Direct Democracy Study*. Below we can see how the data is structured<p>

In [5]:
data_path = Path("../data/anes_specialstudy_2012_directdem_dta/anes_specialstudy_2012_directdem_stata12.dta")
data = pd.read_stata(data_path)

print("Data dimensions: ", end="")
print(data.shape)
print("------------------------------")

print("Missing Values: ", end="")
print(data.isnull().sum().sum())
print("------------------------------")

print(data.iloc[0,:])

Data dimensions: (5415, 1037)
------------------------------
Missing Values: 0
------------------------------
dd_version                FINAL RELEASE, ANES 2012 Direct Democracy Stud...
dd_caseid                                                             20001
caseid                                                                 3001
weight_pre                                                           1.4708
weight_post                                                           1.518
                                                ...                        
post_ddpref_timing                                                       59
post_flawed_timing                                                       21
post_ddfavor_timing                                                      26
post_disp_close_timing                                                   13
post_qf1_timing                                                          30
Name: 0, Length: 1037, dtype: object


<p>Luckily we have no missing values! This does not mean we won't have to filter out non-answers, as we'll see later on. <p>

<p>We can also see that we have **5415** participants in the survey, and we have over **1000** columns corresponding to questions and information on the subject. This is a lot of information at our disposal, so we'll need to narrow down what we use. <p>

<p>Let's first get an idea of where our subjects reside, since that is a well-correlated proxy for their political leanings, and will key us to whether we have an adequate representation of the voter base. <p>

In [6]:
location_spread = data.groupby(by=["pre_ppstaten"]).count()

# Let's examine a map to get a more intuitive understanding

# First let's fix the index, so that the categories are the state abbreviations 
categories = list(location_spread.index)
for i in range(0,len(categories)):
    temp = categories[i].split(" ")
    categories[i] = temp[1]

# Figure code from plotly documentation: https://plotly.com/python/choropleth-maps/
fig = go.Figure(data=go.Choropleth(
    locations=categories,
    z = location_spread["caseid"].astype(float),
    locationmode = 'USA-states',
    colorscale = 'Blues',
    colorbar_title = "Participation in Survey",
))

fig.update_layout(
    title_text = 'Participation in 2021 Direct Democracy Survey by State',
    geo_scope='usa', # limit map scope to USA
)
pio.write_html(fig, file="../figs/usmap.html", auto_open=True)
fig.show()

It appears that we have a decent spread of participants in blue states, red states, and swing states. Can we take a closer look at the balance of party affiliation we have?

In [17]:
party_spread = data.groupby(by=["pre_rptyid"]).count()
fig = px.pie(party_spread, 
            values='caseid',
            names=party_spread.index,
            title='Party Affiliation', 
            color=list(party_spread.index), 
            color_discrete_map={'1. Democrat':'blue',
                                 '2. Republican':'red',
                                 '3. Independent':'green',
                                 '-4. Error':'black',
                                 '5. Other party':'purple',
                                 '-9. Refused':'grey'}
             )
pio.write_html(fig, file="../figs/party_pie.html", auto_open=True)
fig.show()

['-9. Refused', '-4. Error', '1. Democrat', '2. Republican', '3. Independent', '5. Other party']


We have a nearly three way tie between <span style="color: blue;">Democrats</span>, <span style="color: red;">Republicans</span>, and <span style="color: green;">Independents</span>. This is surprising, considering that the number of US voters who actually vote for independent candidates is much lower than one third. We can see this with examing how these participants voted. 