# Congressional Voting Classification

#Objective
The main objective is to predict whether congressmen is Democrat or Republican based on voting patterns by using the decision tree with the adaboost.

#Adaboost
AdaBoost is an ensemble learning method (also known as “meta-learning”) which was initially created to increase the efficiency of binary classifiers. AdaBoost uses an iterative approach to learn from the mistakes of weak classifiers, and turn them into strong ones.


#Data Set
This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).


##Attribute Information:
1. Class Name: 2 (democrat, republican)
2. handicapped-infants: 2 (y,n)
3. water-project-cost-sharing: 2 (y,n)
4. adoption-of-the-budget-resolution: 2 (y,n)
5. physician-fee-freeze: 2 (y,n)
6. el-salvador-aid: 2 (y,n)
7. religious-groups-in-schools: 2 (y,n)
8. anti-satellite-test-ban: 2 (y,n)
9. aid-to-nicaraguan-contras: 2 (y,n)
10. mx-missile: 2 (y,n)
11. immigration: 2 (y,n)
12. synfuels-corporation-cutback: 2 (y,n)
13. education-spending: 2 (y,n)
14. superfund-right-to-sue: 2 (y,n)
15. crime: 2 (y,n)
16. duty-free-exports: 2 (y,n)
17. export-administration-act-south-africa: 2 (y,n)



#Source
The dataset can be obtained from the:
https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records

#Tasks:
1.	Obtained the dataset
2.	Apply pre-processing operations
3.	Train Adaboost model from scratch and test the model
4.	Train Adaboost model using sklearn
6.	Compare the performance of Adaboost, Random Forest and Decision Trees


## Part 1: Adaboost from Scratch

In [None]:
# Load the libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Load the dataset 



In [None]:
# Preprocessing
# Encoding categorical variables (if any)
# Feature Scaling
# Filling missing values (if any)



In [None]:
# Divide the dataset to training and testing set



In [None]:
# Implement Adaboost model from scratch
# Adaboost consist of stumps which can be created using builtin decision trees in sklearn
# Stump can be trained by keeping the max_depth as 1
class DecisionStumps:
    def __init__(self):
        self.polarity=1
        self.feature_idx= None
        self.threshold= None
        slef.alpha= None
    def predict(self,X):
        n_samples= X.shape[0]
        X_column= X[:, self.feature_idx]
        predictions= np.ones(n_samples)
        
        if(self.polarity==1):
            predictions[X_columns<self.threshold]=-1
        else:
            predictions[X_columns>self.threshold]=1
        return predictions





In [None]:
class Adaboost:
    def __init__(self,n_clf=5):
        self.n_clf= n_clf
    def fit(self,X,y):
        n_samples, n_features= X.shape[0]
        w= np.full(n_samples,(1/n_samples))
        
        self.clfs=[]
        for _ in self.n_clf:
            clf= DecisionStump()
            
            
            min_error= float('inf')
            for feature_i in range(n_features):
                X_column= X[:, self.feature_i]
                thresholds= np.unique(X_columns)
                for threshold in thresholds:
                    p=1
                    predictions= np.ones(n_samples)
                    predictions[X_columns<polarity]=-1
                    misclassified= w[y!= predictions]
                    error= sum(misclassified)
                    
                    if error>0.5:
                        error= 1-error
                        p=-1
                    if error<min_error:
                        min_error= error
                        clf.polarity=p
                        clf.threshold= threshold
                        clf.feature_idx= feature_i
            EPS=1e-10
            clf.alpha= 0.5*np.log((1-error)/(error+EPS))
            predictions= clf.predict(X)
            w*= np.exp(clf.alpha*y*predictions)
            w/= np.sum(w)
            
            self.clfs.append(clf)
            
            
    def predict(self,X):
        clf_preds=[clf.alpha* clf.predict(X) for i in self.clfs ]
        y_pred= np.sum(clf_preds, axis=0)
        y_pred= np.sign(y_pred)
        return y_pred
            
                

                

In [1]:
# Train the model and test the model



In [None]:
# Evaluate the results using accuracy, precision, recall and f-measure



## Part 2: Adaboost using Sklearn

In [None]:
# Use the preprocessed dataset here



In [None]:
# Train the Adaboost Model using builtin Sklearn Dataset



In [None]:
# Test the model with testing set and print the accuracy, precision, recall and f-measure



In [None]:
# Play with parameters such as
# number of decision trees
# Criterion for splitting
# Max depth
# Minimum samples per split and leaf



## Part 3: Compare the models

In [None]:
# Train Adaboost, Random Forest and Decision tree models from sklearn



In [None]:
# Run the model on testing set



In [None]:
# Compare their accuracy, precision, recall and f-measure

