# Predicting Outcomes, Party Affiliation

This notebook will develop two predictive models. The first predicts the `outcome` of the roll call vote with the minimum number of senators possible. The second predicts party affilliation with the minimum number of examples. 

#### First,
we will to predict the outcome of a roll call vote ** using only a subset ** of the senators' voting records. The reason for this may seem unintuitive at first. Granted, when it comes to voting in the Senate, it is public information how each senator votes. We can determine the outcome with 100% accuracy by simple inspection. We don't need a model to predict the outcome of roll call votes after they have occured. So, *why* would we make this kind of model? 

Recall that we developed a mathematical measure of influence based on mutual information (Kullback - Leibler Divergence) that allows us to rank an individual senator by how influential her individual votes were on the final outcome. We know that we can perfectly predict the outcome of a vote with information about how all senators vote, but what if we only consider a small subset of senators? Questions we want to answer are:

  - What is the minimum number of senators needed to predict the outcome of vote?
  - Is there a number `n < 100` of influential senators' voting records, when taken together, totally predictive of the outcome? 
  - Which senators should we include? 
  
  
#### Second,
we want to predict an individual senator's party affiliation. Similar to the above, this problem is easily sovled by simple inspection. So, why make a model? Our motivataions are similar to the above:

   - Is there a minimum number  `n < 100` of example senators' records which, when used in training, will allow for 100% prediction accuracy for the party of an unseen senator's voting record?
   - Which individual issues (the issues are our features -- e.g. Senate Resolution 11) are most predictive of party affiliation? 
  


The influence score we developed is designed to help answer these questions.

In [2]:
import pandas as pd
votes = pd.read_csv('../data/cleaned_votes.csv', index_col=0)

In [4]:
votes.sample(3)

Unnamed: 0,Alexander (R-TN),Ayotte (R-NH),Baldwin (D-WI),Barrasso (R-WY),Bennet (D-CO),Blumenthal (D-CT),Blunt (R-MO),Booker (D-NJ),Boozman (R-AR),Boxer (D-CA),...,Tillis (R-NC),Toomey (R-PA),Udall (D-NM),Vitter (R-LA),Warner (D-VA),Warren (D-MA),Whitehouse (D-RI),Wicker (R-MS),Wyden (D-OR),outcome
186,1,1,0,1,1,1,0,1,1,0,...,1,1,0,1,1,0,0,1,1,1
305,1,1,1,1,1,1,0,1,1,1,...,1,1,1,0,1,1,1,1,1,1
112,1,1,0,1,0,0,1,0,1,0,...,1,1,0,1,0,0,0,1,0,1
