# Congressional Voting Classification

#Objective
The main objective is to predict whether congressmen is Democrat or Republican based on voting patterns by using the decision tree with the adaboost.

#Adaboost
AdaBoost is an ensemble learning method (also known as “meta-learning”) which was initially created to increase the efficiency of binary classifiers. AdaBoost uses an iterative approach to learn from the mistakes of weak classifiers, and turn them into strong ones.


#Data Set
This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).


##Attribute Information:
1. Class Name: 2 (democrat, republican)
2. handicapped-infants: 2 (y,n)
3. water-project-cost-sharing: 2 (y,n)
4. adoption-of-the-budget-resolution: 2 (y,n)
5. physician-fee-freeze: 2 (y,n)
6. el-salvador-aid: 2 (y,n)
7. religious-groups-in-schools: 2 (y,n)
8. anti-satellite-test-ban: 2 (y,n)
9. aid-to-nicaraguan-contras: 2 (y,n)
10. mx-missile: 2 (y,n)
11. immigration: 2 (y,n)
12. synfuels-corporation-cutback: 2 (y,n)
13. education-spending: 2 (y,n)
14. superfund-right-to-sue: 2 (y,n)
15. crime: 2 (y,n)
16. duty-free-exports: 2 (y,n)
17. export-administration-act-south-africa: 2 (y,n)



#Source
The dataset can be obtained from the:
https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records

#Tasks:
1.	Obtained the dataset
2.	Apply pre-processing operations
3.	Train Adaboost model from scratch and test the model
4.	Train Adaboost model using sklearn
6.	Compare the performance of Adaboost, Random Forest and Decision Trees


## Part 1: Adaboost from Scratch

In [6]:
# Load the libraries
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier 
from sklearn.preprocessing import MinMaxScaler,LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score,classification_report
import matplotlib.pyplot as plt

In [43]:
# Load the dataset 
df=pd.read_table('house-votes-84.data',sep=',')
df.head()

Unnamed: 0,republican,n,y,n.1,y.1,y.2,y.3,n.2,n.3,n.4,y.4,?,y.5,y.6,y.7,n.5,y.8
0,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
1,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
2,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
3,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y
4,democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y


In [46]:
# Preprocessing
# Encoding categorical variables (if any)
# Feature Scaling
# Filling missing values (if any)
dum=pd.get_dummies(df.republican)
df.drop('republican',axis=1,inplace=True)
df=pd.concat([dum,df],axis=1)
cols=df.columns
enc=LabelEncoder()
for i in range(2,len(df.columns)):
    df.iloc[:,i]=enc.fit_transform(df.iloc[:,i])
from sklearn.impute import SimpleImputer
imp=SimpleImputer(missing_values=3,strategy='mean')
df=pd.DataFrame(imp.fit_transform(df),columns=cols)
df.head()

Unnamed: 0,democrat,republican,n,y,n.1,y.1,y.2,y.3,n.2,n.3,n.4,y.4,?,y.5,y.6,y.7,n.5,y.8
0,0.0,1.0,1.0,2.0,1.0,2.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,1.0,0.0
1,1.0,0.0,0.0,2.0,2.0,0.0,2.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,2.0,1.0,1.0
2,1.0,0.0,1.0,2.0,2.0,1.0,0.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,1.0,2.0
3,1.0,0.0,2.0,2.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,2.0,0.0,2.0,2.0,2.0,2.0
4,1.0,0.0,1.0,2.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,2.0


In [47]:
df=df.drop_duplicates()
df

Unnamed: 0,democrat,republican,n,y,n.1,y.1,y.2,y.3,n.2,n.3,n.4,y.4,?,y.5,y.6,y.7,n.5,y.8
0,0.0,1.0,1.0,2.0,1.0,2.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,1.0,0.0
1,1.0,0.0,0.0,2.0,2.0,0.0,2.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,2.0,1.0,1.0
2,1.0,0.0,1.0,2.0,2.0,1.0,0.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,1.0,2.0
3,1.0,0.0,2.0,2.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,2.0,0.0,2.0,2.0,2.0,2.0
4,1.0,0.0,1.0,2.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
429,0.0,1.0,1.0,1.0,2.0,2.0,2.0,2.0,1.0,1.0,2.0,2.0,1.0,2.0,2.0,2.0,1.0,2.0
430,1.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0
431,0.0,1.0,1.0,0.0,1.0,2.0,2.0,2.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,2.0,1.0,2.0
432,0.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,1.0,2.0,2.0,2.0,1.0,2.0


In [48]:
# Divide the dataset to training and testing set
X=df.drop(columns=['democrat','republican'])
y=df[['democrat','republican']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
# Implement Adaboost model from scratch
# Adaboost consist of stumps which can be created using builtin decision trees in sklearn
# Stump can be trained by keeping the max_depth as 1





In [1]:
# Train the model and test the model



In [None]:
# Evaluate the results using accuracy, precision, recall and f-measure



## Part 2: Adaboost using Sklearn

In [None]:
# Use the preprocessed dataset here



In [None]:
# Train the Adaboost Model using builtin Sklearn Dataset



In [None]:
# Test the model with testing set and print the accuracy, precision, recall and f-measure



In [None]:
# Play with parameters such as
# number of decision trees
# Criterion for splitting
# Max depth
# Minimum samples per split and leaf



## Part 3: Compare the models

In [None]:
# Train Adaboost, Random Forest and Decision tree models from sklearn



In [None]:
# Run the model on testing set



In [None]:
# Compare their accuracy, precision, recall and f-measure

