# Neural Networks

The model that this data is based in is the Congressional voting data set from the Univeristy of Calirfornia Irvine Machine Learning Repository website.

Source:

Origin:

Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional Quarterly Inc. Washington, D.C., 1985.

Donor:

Jeff Schlimmer (Jeffrey.Schlimmer '@' a.gp.cs.cmu.edu)

Data Set Information:

>This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).

Attribute Information:

1. Class Name: 2 (democrat, republican)
2. handicapped-infants: 2 (y,n)
3. water-project-cost-sharing: 2 (y,n)
4. adoption-of-the-budget-resolution: 2 (y,n)
5. physician-fee-freeze: 2 (y,n)
6. el-salvador-aid: 2 (y,n)
7. religious-groups-in-schools: 2 (y,n)
8. anti-satellite-test-ban: 2 (y,n)
9. aid-to-nicaraguan-contras: 2 (y,n)
10. mx-missile: 2 (y,n)
11. immigration: 2 (y,n)
12. synfuels-corporation-cutback: 2 (y,n)
13. education-spending: 2 (y,n)
14. superfund-right-to-sue: 2 (y,n)
15. crime: 2 (y,n)
16. duty-free-exports: 2 (y,n)
17. export-administration-act-south-africa: 2 (y,n)

Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. 

In [1]:
#Import dependencies.
import numpy as np
import pandas as pd

In [2]:
#Read in the data.
data = pd.read_table("house-votes-84.data", sep=",", header=None)
data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y
5,democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y
6,democrat,n,y,n,y,y,y,n,n,n,n,n,n,?,y,y,y
7,republican,n,y,n,y,y,y,n,n,n,n,n,n,y,y,?,y
8,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y
9,democrat,y,y,y,n,n,n,y,y,y,n,n,n,n,n,?,?


In [3]:
#Renamed the columns to show the title of the bills.
vote_names = data.rename(columns={0:"Party",1:"Disabled Infants",2:"Water Project Cost Sharing",3:"Adoption of the Budget Resolution",
                                  4:"Physician Fee Freeze",5:"El Salvador Aid",6:"Religious Groups is Schools",7:"Anti-Satellite Test Ban",
                                  8:"Aid to Nicaraguan Contras",9:"MX Missile",10:"Immigration", 11:"Synfuels Corporation Cutback",
                                  12:"Education Spending", 13:"Superfund Right to Sue", 14:"Crime",15:"Duty Free Exports",16:"Export Administration Act South Africa"})
vote_names

Unnamed: 0,Party,Disabled Infants,Water Project Cost Sharing,Adoption of the Budget Resolution,Physician Fee Freeze,El Salvador Aid,Religious Groups is Schools,Anti-Satellite Test Ban,Aid to Nicaraguan Contras,MX Missile,Immigration,Synfuels Corporation Cutback,Education Spending,Superfund Right to Sue,Crime,Duty Free Exports,Export Administration Act South Africa
0,republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y
5,democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y
6,democrat,n,y,n,y,y,y,n,n,n,n,n,n,?,y,y,y
7,republican,n,y,n,y,y,y,n,n,n,n,n,n,y,y,?,y
8,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y
9,democrat,y,y,y,n,n,n,y,y,y,n,n,n,n,n,?,?


In [4]:
#Replaced the question marks with NaN using .replace().
vote_data = vote_names.replace("?", np.NaN)
vote_data

Unnamed: 0,Party,Disabled Infants,Water Project Cost Sharing,Adoption of the Budget Resolution,Physician Fee Freeze,El Salvador Aid,Religious Groups is Schools,Anti-Satellite Test Ban,Aid to Nicaraguan Contras,MX Missile,Immigration,Synfuels Corporation Cutback,Education Spending,Superfund Right to Sue,Crime,Duty Free Exports,Export Administration Act South Africa
0,republican,n,y,n,y,y,y,n,n,n,y,,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,
2,democrat,,y,y,,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,,y,y,y,y
5,democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y
6,democrat,n,y,n,y,y,y,n,n,n,n,n,n,,y,y,y
7,republican,n,y,n,y,y,y,n,n,n,n,n,n,y,y,,y
8,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y
9,democrat,y,y,y,n,n,n,y,y,y,n,n,n,n,n,,


## Due to the data being categorical, the "y" and "n" data points had to be transformed into indicator variables the numbers 1 for "y" and 0 for no in order for the model to work on the data. In order to do this the method "get_dummies()" was called on the dataframe. This step is called preprocessing.

In [5]:
vote_data_encoded = pd.get_dummies(vote_data)
vote_data_encoded.head()

Unnamed: 0,Party_democrat,Party_republican,Disabled Infants_n,Disabled Infants_y,Water Project Cost Sharing_n,Water Project Cost Sharing_y,Adoption of the Budget Resolution_n,Adoption of the Budget Resolution_y,Physician Fee Freeze_n,Physician Fee Freeze_y,...,Education Spending_n,Education Spending_y,Superfund Right to Sue_n,Superfund Right to Sue_y,Crime_n,Crime_y,Duty Free Exports_n,Duty Free Exports_y,Export Administration Act South Africa_n,Export Administration Act South Africa_y
0,0,1,1,0,0,1,1,0,0,1,...,0,1,0,1,0,1,1,0,0,1
1,0,1,1,0,0,1,1,0,0,1,...,0,1,0,1,0,1,1,0,0,0
2,1,0,0,0,0,1,0,1,0,0,...,1,0,0,1,0,1,1,0,1,0
3,1,0,1,0,0,1,0,1,1,0,...,1,0,0,1,1,0,1,0,0,1
4,1,0,0,1,0,1,0,1,1,0,...,0,0,0,1,0,1,0,1,0,1


## In order to avoid overfitting the data, we dropped the "no" responses from the data set. Only the "yes" responses were used to create the model.

In [6]:
y_votes = vote_data_encoded.drop(["Disabled Infants_n","Water Project Cost Sharing_n", "Adoption of the Budget Resolution_n",
                            "Physician Fee Freeze_n", "El Salvador Aid_n","Religious Groups is Schools_n", "Anti-Satellite Test Ban_n", "Aid to Nicaraguan Contras_n","MX Missile_n","Immigration_n","Synfuels Corporation Cutback_n","Education Spending_n","Superfund Right to Sue_n", 
                           "Crime_n", "Duty Free Exports_n","Export Administration Act South Africa_n" ], axis=1)
y_votes

Unnamed: 0,Party_democrat,Party_republican,Disabled Infants_y,Water Project Cost Sharing_y,Adoption of the Budget Resolution_y,Physician Fee Freeze_y,El Salvador Aid_y,Religious Groups is Schools_y,Anti-Satellite Test Ban_y,Aid to Nicaraguan Contras_y,MX Missile_y,Immigration_y,Synfuels Corporation Cutback_y,Education Spending_y,Superfund Right to Sue_y,Crime_y,Duty Free Exports_y,Export Administration Act South Africa_y
0,0,1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0
2,1,0,0,1,1,0,1,1,0,0,0,0,1,0,1,1,0,0
3,1,0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,1
4,1,0,1,1,1,0,1,1,0,0,0,0,1,0,1,1,1,1
5,1,0,0,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1
6,1,0,0,1,0,1,1,1,0,0,0,0,0,0,0,1,1,1
7,0,1,0,1,0,1,1,1,0,0,0,0,0,0,1,1,0,1
8,0,1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
9,1,0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,0


## This model is predicting the political party of members of Congress based on how each member voted. Since party is what is being predicted, we dropped the "Party_democrat" and "Party_republican" columns.

In [7]:
X = y_votes.drop(["Party_democrat", "Party_republican"], axis=1)
y = vote_data["Party"]
print(X.shape, y.shape)

(435, 16) (435,)


## This cell contains multiple steps in the building of the neural network model.
## 1. Call "get_dummies()" on the dataframe in order to change the categorical data into dummy/indicator variables.
## 2. Split the data into training and testing data sets.
## 3. Scale the training and testing data.
## 4. Label encode the data set.
## 5. Convert the encoded labels to one hot encoding.


In [8]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from keras.utils import to_categorical

X = pd.get_dummies(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)
X_scaler = StandardScaler().fit(X_train)
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)


# Step 1: Label-encode data set
label_encoder = LabelEncoder()
label_encoder.fit(y_train)
encoded_y_train = label_encoder.transform(y_train)
encoded_y_test = label_encoder.transform(y_test)

# Step 2: Convert encoded labels to one-hot-encoding
y_train_categorical = to_categorical(encoded_y_train)
y_test_categorical = to_categorical(encoded_y_test)

Using TensorFlow backend.


## In this step, we are defining the model of our architecture(Layers).  This model is a normal neural network model

In [9]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(units=6, activation='relu', input_dim=16))
model.add(Dense(units=2, activation='softmax'))

In [10]:
#Model Summary.

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 6)                 102       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 14        
Total params: 116
Trainable params: 116
Non-trainable params: 0
_________________________________________________________________


In [11]:
#Compile the model.

model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])
model.fit(
    X_train_scaled,
    y_train_categorical,
    epochs=60,
    shuffle=True,
    verbose=2
)

Epoch 1/60
 - 0s - loss: 0.9830 - acc: 0.5061
Epoch 2/60
 - 0s - loss: 0.8251 - acc: 0.5675
Epoch 3/60
 - 0s - loss: 0.7010 - acc: 0.6411
Epoch 4/60
 - 0s - loss: 0.6093 - acc: 0.7025
Epoch 5/60
 - 0s - loss: 0.5425 - acc: 0.7761
Epoch 6/60
 - 0s - loss: 0.4912 - acc: 0.7883
Epoch 7/60
 - 0s - loss: 0.4537 - acc: 0.8098
Epoch 8/60
 - 0s - loss: 0.4228 - acc: 0.8252
Epoch 9/60
 - 0s - loss: 0.3983 - acc: 0.8282
Epoch 10/60
 - 0s - loss: 0.3775 - acc: 0.8374
Epoch 11/60
 - 0s - loss: 0.3613 - acc: 0.8405
Epoch 12/60
 - 0s - loss: 0.3466 - acc: 0.8436
Epoch 13/60
 - 0s - loss: 0.3337 - acc: 0.8497
Epoch 14/60
 - 0s - loss: 0.3229 - acc: 0.8528
Epoch 15/60
 - 0s - loss: 0.3128 - acc: 0.8558
Epoch 16/60
 - 0s - loss: 0.3038 - acc: 0.8558
Epoch 17/60
 - 0s - loss: 0.2957 - acc: 0.8558
Epoch 18/60
 - 0s - loss: 0.2875 - acc: 0.8620
Epoch 19/60
 - 0s - loss: 0.2805 - acc: 0.8620
Epoch 20/60
 - 0s - loss: 0.2739 - acc: 0.8589
Epoch 21/60
 - 0s - loss: 0.2677 - acc: 0.8589
Epoch 22/60
 - 0s - lo

<keras.callbacks.History at 0x111c6d2e8>

In [12]:
model_loss, model_accuracy = model.evaluate(X_test_scaled, y_test_categorical, verbose=2)
print(f"Normal Neural Network - Loss: {model_loss}, Accuracy: {model_accuracy}")

Normal Neural Network - Loss: 0.17908615735145883, Accuracy: 0.9082568812807765


## Normal Neural Network - Loss: 0.17908615735145883, Accuracy: 0.9082568812807765

## For a deep learning network, we add an additional hidden layer of six nodes.

In [13]:
# Deep Learning

deep_model = Sequential()
deep_model.add(Dense(units=6, activation='relu', input_dim=16))
deep_model.add(Dense(units=6, activation='relu'))
deep_model.add(Dense(units=2, activation='softmax'))

In [14]:
deep_model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

deep_model.fit(
    X_train_scaled,
    y_train_categorical,
    epochs=100,
    shuffle=True,
    verbose=2
)

Epoch 1/100
 - 0s - loss: 0.6523 - acc: 0.8006
Epoch 2/100
 - 0s - loss: 0.6333 - acc: 0.8282
Epoch 3/100
 - 0s - loss: 0.6126 - acc: 0.8558
Epoch 4/100
 - 0s - loss: 0.5900 - acc: 0.8589
Epoch 5/100
 - 0s - loss: 0.5641 - acc: 0.8650
Epoch 6/100
 - 0s - loss: 0.5354 - acc: 0.8681
Epoch 7/100
 - 0s - loss: 0.5038 - acc: 0.8834
Epoch 8/100
 - 0s - loss: 0.4700 - acc: 0.8865
Epoch 9/100
 - 0s - loss: 0.4364 - acc: 0.8804
Epoch 10/100
 - 0s - loss: 0.4046 - acc: 0.8896
Epoch 11/100
 - 0s - loss: 0.3756 - acc: 0.8896
Epoch 12/100
 - 0s - loss: 0.3484 - acc: 0.8926
Epoch 13/100
 - 0s - loss: 0.3254 - acc: 0.8926
Epoch 14/100
 - 0s - loss: 0.3047 - acc: 0.8896
Epoch 15/100
 - 0s - loss: 0.2876 - acc: 0.8957
Epoch 16/100
 - 0s - loss: 0.2723 - acc: 0.8926
Epoch 17/100
 - 0s - loss: 0.2583 - acc: 0.9018
Epoch 18/100
 - 0s - loss: 0.2466 - acc: 0.9018
Epoch 19/100
 - 0s - loss: 0.2367 - acc: 0.9018
Epoch 20/100
 - 0s - loss: 0.2270 - acc: 0.9080
Epoch 21/100
 - 0s - loss: 0.2189 - acc: 0.9049
E

<keras.callbacks.History at 0x1a1c2a5f60>

In [15]:
model_loss, model_accuracy = deep_model.evaluate(X_test_scaled, y_test_categorical, verbose=2)
print(f"Deep Neural Network - Loss: {model_loss}, Accuracy: {model_accuracy}")

Deep Neural Network - Loss: 0.09792803829416223, Accuracy: 0.9633027528404096


## Deep Neural Network - Loss: 0.09792803829416223, Accuracy: 0.9633027528404096

In [16]:
#Save the normal neural network model.
model.save("Congressional_Votes_Normal_Neural_Nework_Model.h5")

In [17]:
#Save the deep neural network model.

deep_model.save(("Congressional_Votes_Deep_Neural_Network_Model.h5"))

## The deep learning neural network model performed better than the normal neural netwok model.