# Majority Vote Ensemble

This model will combine the power of logistic regression, K Nearest Neighbors, Support Vector Machine, and Random Forest classifiers to predict the survival of passengers on the Titanic. The dataset comes from Kaggle.

In [15]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [13]:
import warnings
warnings.filterwarnings('ignore')

Importing the training data, cleaning it, and creating X and y arrays of the inputs and known outputs:

In [2]:
data = pd.read_csv('train.csv')
df = data.fillna({'Age': 29.7, 'Embarked':'S'})

dummies = pd.get_dummies(df.Sex)
df = pd.concat([df,dummies], axis = 'columns')
dummies2 = pd.get_dummies(df.Embarked)
df = pd.concat([df,dummies2], axis = 'columns')
df.drop(['PassengerId','Name','Ticket','Cabin','Sex','male','Embarked','S'], axis = 1, inplace = True)
X = df.iloc[:,1:9].values
y = df.iloc[:,0].values

Splitting into training and test sets:

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1, stratify = y)

### Building the model

Importing the neccessary libraries for a Majority Vote classifier:

In [4]:
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC

The four classifiers to be feed into the Majority Vote:

In [5]:
clf1 = LogisticRegression(penalty = 'l2', C = 10, random_state = 1)
clf2 = KNeighborsClassifier(n_neighbors = 1, p =2, metric = 'minkowski')
clf3 = SVC(C = 1, gamma = 0.1, kernel = 'rbf', probability = True)
clf4 = RandomForestClassifier(random_state = 1)

Work-flow pipelines for the first three classifiers above. Note that these classifiers require the input data to be feature scaled, while the random forest classifier does not.

In [6]:
pipe1 = Pipeline([['sc', StandardScaler()],['clf', clf1]])
pipe2 = Pipeline([['sc', StandardScaler()],['clf', clf2]])
pipe3 = Pipeline([['sc', StandardScaler()],['clf', clf3]])

Setting up the Majority Vote classifier:

In [11]:
clf_labels = ['Logistics Regression', 'KNN', 'SVM','Decision Tree', 'Majority Vote']

mv_clf = VotingClassifier(estimators = [(clf_labels[0], pipe1), 
                                        (clf_labels[1], pipe2), 
                                        (clf_labels[2], pipe3),
                                        (clf_labels[3], clf4)], 
                          voting = 'soft')

Comparing the accuracy of all classifiers:

In [14]:
all_clf = [pipe1, pipe2, pipe3, clf4, mv_clf]
for clf, label in zip(all_clf,clf_labels):
    scores = cross_val_score(estimator = clf, X = X_train, y = y_train, cv = 10, scoring = 'roc_auc')
    print("Accuracy : %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))

Accuracy : 0.86 (+/- 0.06) [Logistics Regression]
Accuracy : 0.74 (+/- 0.05) [KNN]
Accuracy : 0.86 (+/- 0.06) [SVM]
Accuracy : 0.84 (+/- 0.05) [Decision Tree]
Accuracy : 0.87 (+/- 0.05) [Majority Vote]


### Fitting the model

Importing and cleaning the testing data:

In [17]:
data2 = pd.read_csv('test.csv')
df2 = data2.fillna({'Age': 29.7, 'Embarked':'S'})

dummies = pd.get_dummies(df2.Sex)
df2 = pd.concat([df2,dummies], axis = 'columns')
dummies2 = pd.get_dummies(df2.Embarked)
df2 = pd.concat([df2,dummies2], axis = 'columns')
df2.drop(['PassengerId','Name','Ticket','Cabin','Sex','male','Embarked','S'], axis = 1, inplace = True)
x_test = df2.iloc[:,:].values
x_test[152,4] = df2['Fare'].mean()

Running the fit() and predict() methods:

In [18]:
mv_clf = mv_clf.fit(X_train, y_train)

y_pred = mv_clf.predict(x_test)

In [19]:
y_pred

array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1,
       1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1,
       1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1,
       1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
       1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
       1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0,
       0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
       1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1,
       0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,