# Logistic Regression with American National Election Studies (ANES) 

The problem is based on survey data from the American National Election Studies (ANES) and aims to predict voter intention for Clinton based on party identification and its interaction with age. 


In [1]:
import pathlib
import bambi as bmb
import pandas as pd
SEED = 7355608

In [2]:
data = pd.read_csv(pathlib.Path("..", "..", "data", "anes.csv")) 
clinton_data = data.loc[data["vote"].isin(["clinton", "trump"]), :]
clinton_data.head()

Unnamed: 0,vote,age,party_id
0,clinton,56,democrat
1,trump,65,republican
2,clinton,80,democrat
3,trump,38,republican
4,trump,60,republican


In [3]:
clinton_model = bmb.Model("vote['clinton'] ~ party_id + party_id:age", clinton_data, family="bernoulli")
clinton_fitted = clinton_model.fit(
    draws=2000, target_accept=0.85, random_seed=SEED, idata_kwargs={"log_likelihood": True}
)
clinton_model.predict(clinton_fitted, kind="response", random_seed=SEED)

Modeling the probability that vote==clinton


Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, party_id, party_id:age]


Output()

Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 7 seconds.


In [4]:
clinton_fitted.to_netcdf(pathlib.Path("..", "..", "data", "anes.nc"))

PosixPath('../../data/anes.nc')