# Support Vector Machine

The model that this data is based in is the Congressional voting data set from the Univeristy of Calirfornia Irvine Machine Learning Repository website.

Source:

Origin:

Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional Quarterly Inc. Washington, D.C., 1985.

Donor:

Jeff Schlimmer (Jeffrey.Schlimmer '@' a.gp.cs.cmu.edu)

Data Set Information:

>This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).

Attribute Information:

1. Class Name: 2 (democrat, republican)
2. handicapped-infants: 2 (y,n)
3. water-project-cost-sharing: 2 (y,n)
4. adoption-of-the-budget-resolution: 2 (y,n)
5. physician-fee-freeze: 2 (y,n)
6. el-salvador-aid: 2 (y,n)
7. religious-groups-in-schools: 2 (y,n)
8. anti-satellite-test-ban: 2 (y,n)
9. aid-to-nicaraguan-contras: 2 (y,n)
10. mx-missile: 2 (y,n)
11. immigration: 2 (y,n)
12. synfuels-corporation-cutback: 2 (y,n)
13. education-spending: 2 (y,n)
14. superfund-right-to-sue: 2 (y,n)
15. crime: 2 (y,n)
16. duty-free-exports: 2 (y,n)
17. export-administration-act-south-africa: 2 (y,n)

Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. 

In [1]:
#Import Dependencies.
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np

In [2]:
# Read in the data into a data frame.
data = pd.read_table("house-votes-84.data", sep=",", header=None)
data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y
5,democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y
6,democrat,n,y,n,y,y,y,n,n,n,n,n,n,?,y,y,y
7,republican,n,y,n,y,y,y,n,n,n,n,n,n,y,y,?,y
8,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y
9,democrat,y,y,y,n,n,n,y,y,y,n,n,n,n,n,?,?


In [3]:
#Renamed the columns to show the title of the bills.
vote_names = data.rename(columns={0:"Party",1:"Disabled Infants",2:"Water Project Cost Sharing",3:"Adoption of the Budget Resolution",
                                  4:"Physician Fee Freeze",5:"El Salvador Aid",6:"Religious Groups is Schools",7:"Anti-Satellite Test Ban",
                                  8:"Aid to Nicaraguan Contras",9:"MX Missile",10:"Immigration", 11:"Synfuels Corporation Cutback",
                                  12:"Education Spending", 13:"Superfund Right to Sue", 14:"Crime",15:"Duty Free Exports",16:"Export Administration Act South Africa"})
vote_names

Unnamed: 0,Party,Disabled Infants,Water Project Cost Sharing,Adoption of the Budget Resolution,Physician Fee Freeze,El Salvador Aid,Religious Groups is Schools,Anti-Satellite Test Ban,Aid to Nicaraguan Contras,MX Missile,Immigration,Synfuels Corporation Cutback,Education Spending,Superfund Right to Sue,Crime,Duty Free Exports,Export Administration Act South Africa
0,republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y
5,democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y
6,democrat,n,y,n,y,y,y,n,n,n,n,n,n,?,y,y,y
7,republican,n,y,n,y,y,y,n,n,n,n,n,n,y,y,?,y
8,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y
9,democrat,y,y,y,n,n,n,y,y,y,n,n,n,n,n,?,?


In [4]:
#Replaced the question marks with NaN using .replace().
vote_data = vote_names.replace("?", np.NaN)
vote_data

Unnamed: 0,Party,Disabled Infants,Water Project Cost Sharing,Adoption of the Budget Resolution,Physician Fee Freeze,El Salvador Aid,Religious Groups is Schools,Anti-Satellite Test Ban,Aid to Nicaraguan Contras,MX Missile,Immigration,Synfuels Corporation Cutback,Education Spending,Superfund Right to Sue,Crime,Duty Free Exports,Export Administration Act South Africa
0,republican,n,y,n,y,y,y,n,n,n,y,,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,
2,democrat,,y,y,,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,,y,y,y,y
5,democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y
6,democrat,n,y,n,y,y,y,n,n,n,n,n,n,,y,y,y
7,republican,n,y,n,y,y,y,n,n,n,n,n,n,y,y,,y
8,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y
9,democrat,y,y,y,n,n,n,y,y,y,n,n,n,n,n,,


## Due to the data being categorical, the "y" and "n" data points had to be transformed into indicator variables the numbers 1 for "y" and 0 for no in order for the model to work on the data. In order to do this the method "get_dummies()" was called on the dataframe. This step is called preprocessing.

In [5]:
vote_data_encoded = pd.get_dummies(vote_data)
vote_data_encoded.head()

Unnamed: 0,Party_democrat,Party_republican,Disabled Infants_n,Disabled Infants_y,Water Project Cost Sharing_n,Water Project Cost Sharing_y,Adoption of the Budget Resolution_n,Adoption of the Budget Resolution_y,Physician Fee Freeze_n,Physician Fee Freeze_y,...,Education Spending_n,Education Spending_y,Superfund Right to Sue_n,Superfund Right to Sue_y,Crime_n,Crime_y,Duty Free Exports_n,Duty Free Exports_y,Export Administration Act South Africa_n,Export Administration Act South Africa_y
0,0,1,1,0,0,1,1,0,0,1,...,0,1,0,1,0,1,1,0,0,1
1,0,1,1,0,0,1,1,0,0,1,...,0,1,0,1,0,1,1,0,0,0
2,1,0,0,0,0,1,0,1,0,0,...,1,0,0,1,0,1,1,0,1,0
3,1,0,1,0,0,1,0,1,1,0,...,1,0,0,1,1,0,1,0,0,1
4,1,0,0,1,0,1,0,1,1,0,...,0,0,0,1,0,1,0,1,0,1


## In order to avoid overfitting the data, we dropped the "no" responses from the data set. Only the "yes" responses were used to create the model.

In [None]:
y_votes = vote_data_encoded.drop(["Disabled Infants_n","Water Project Cost Sharing_n", "Adoption of the Budget Resolution_n",
                            "Physician Fee Freeze_n", "El Salvador Aid_n","Religious Groups is Schools_n", "Anti-Satellite Test Ban_n", "Aid to Nicaraguan Contras_n","MX Missile_n","Immigration_n","Synfuels Corporation Cutback_n","Education Spending_n","Superfund Right to Sue_n", 
                           "Crime_n", "Duty Free Exports_n","Export Administration Act South Africa_n" ], axis=1)
y_votes

In [7]:
#Indicate the target and the target names that will be used for the model.
target = vote_data["Party"]
target_names = ["Democrat", "Republican"]

## This model is predicting the political party of members of Congress based on how each member voted. Since party is what is being predicted, we dropped the "Party_democrat" and "Party_republican" columns.

In [8]:
voting_data = y_votes.drop(["Party_democrat", "Party_republican"], axis=1)
feature_names = voting_data.columns
voting_data.head()

Unnamed: 0,Disabled Infants_y,Water Project Cost Sharing_y,Adoption of the Budget Resolution_y,Physician Fee Freeze_y,El Salvador Aid_y,Religious Groups is Schools_y,Anti-Satellite Test Ban_y,Aid to Nicaraguan Contras_y,MX Missile_y,Immigration_y,Synfuels Corporation Cutback_y,Education Spending_y,Superfund Right to Sue_y,Crime_y,Duty Free Exports_y,Export Administration Act South Africa_y
0,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0
2,0,1,1,0,1,1,0,0,0,0,1,0,1,1,0,0
3,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,1
4,1,1,1,0,1,1,0,0,0,0,1,0,1,1,1,1


## In this step we split the data into testing and training data.

In [9]:
from sklearn.model_selection import train_test_split

X = y_votes

X_train, X_test, y_train, y_test = train_test_split(X, target, random_state=42)

X_train.head()

Unnamed: 0,Party_democrat,Party_republican,Disabled Infants_y,Water Project Cost Sharing_y,Adoption of the Budget Resolution_y,Physician Fee Freeze_y,El Salvador Aid_y,Religious Groups is Schools_y,Anti-Satellite Test Ban_y,Aid to Nicaraguan Contras_y,MX Missile_y,Immigration_y,Synfuels Corporation Cutback_y,Education Spending_y,Superfund Right to Sue_y,Crime_y,Duty Free Exports_y,Export Administration Act South Africa_y
311,1,0,0,0,1,0,0,1,1,1,1,1,0,0,1,0,0,1
3,1,0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,1
18,0,1,0,1,0,1,1,1,0,0,0,0,0,0,1,1,0,0
208,1,0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,1,1
60,1,0,1,1,1,0,0,0,1,1,1,1,0,0,0,0,1,0


## Calling the Support Vector Machine Linear Classifier on the training data.

In [10]:
# Support vector machine linear classifier
from sklearn.svm import LinearSVC 
model = LinearSVC(multi_class="crammer_singer")
model.fit(X_train, y_train)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='crammer_singer', penalty='l2', random_state=None,
     tol=0.0001, verbose=0)

In [11]:
# Model Accuracy
print('Test Acc: %.3f' % model.score(X_test, y_test))

Test Acc: 1.000


## Test Acc: 1.000

In [12]:
# Calculate classification report
from sklearn.metrics import classification_report
predictions = model.predict(X_test)
print(classification_report(y_test, predictions,
                            target_names=target_names))

             precision    recall  f1-score   support

   Democrat       1.00      1.00      1.00        69
 Republican       1.00      1.00      1.00        40

avg / total       1.00      1.00      1.00       109

