### Description
In this project, I will make use of the newly-implemented neural networks in Scikit-Learn version 0.18. A MLPClassifier (Multi-Layer Perception) model will be used on a house-votes dataset to predict whether the vote can from a Republican or a Democrat.

### Dataset
This dataset was obtained from [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records). The Congressional Voting Records Dataset contains the following 16 key votes by the U.S. House of Representatives:

1. handicapped-infants 
2. water-project-cost-sharing 
3. adoption-of-the-budget-resolution 
4. physician-fee-freeze
5. el-salvador-aid
6. religious-groups-in-schools
7. anti-satellite-test-ban
8. aid-to-nicaraguan-contras
9. mx-missile
10. immigration
11. synfuels-corporation-cutback
12. education-spending
13. superfund-right-to-sue
14. crime
15. duty-free-exports
16. export-administration-act-south-africa

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier

In [2]:
# load dataset
df = pd.read_csv('dataset/house-votes-84.data', header=None)

In [3]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y


In [4]:
df.shape

(435, 17)

In [5]:
# seperating target from features
y = df[0]
X = df.iloc[:,1:17]

In [6]:
# convert y values as numeric
le = LabelEncoder()
y = le.fit_transform(y)

In [7]:
# calculate baseline
max(float(sum(y))/len(y),1-float(sum(y))/len(y))

0.6137931034482759

In [8]:
def to_numeric(x):
    if x=='y':
        return 1
    if x=='n':
        return -1
    if x=='?':
        return 0

In [9]:
# convert X values as numeric
for i in range(1,17):
    X[i] = X[i].apply(to_numeric)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [10]:
# train-test-split to split test data from training data
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [11]:
# initial the MLPClassifier and 20 neurons for each of 3 hidden layers
mlp = MLPClassifier(hidden_layer_sizes=(20,20,20))
mlp.fit(X_train,y_train)



MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(20, 20, 20), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [12]:
# predict X values
predictions = mlp.predict(X_test)

In [13]:
# create a confusin matrix and classification report to check how well the model performed
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test,predictions))

[[65  1]
 [ 2 41]]
             precision    recall  f1-score   support

          0       0.97      0.98      0.98        66
          1       0.98      0.95      0.96        43

avg / total       0.97      0.97      0.97       109

