# Voting Classifier

## Contents

- [Imports](#imports)
- [Load DataFrames](#load-data)
- [Decide Majority Vote](#decide-majority-vote)
- [Create Submission DF](#create-submission)
- [Save Submission](#save-submission)

**This notebook takes all the predictions and returns a new csv, with the majority prediction for each passenger ID**

# Imports

In [17]:
import pandas as pd
import numpy as np

# Load DataFrames

In [18]:
model_names = ['RF', 'XGB', 'LGBM']
submission_dict = {}

for name in model_names:
    path = f'../../data/submissions/{name}_train_test_data_1.csv'
    
    df = pd.read_csv(path)
    
    submission_dict[name] = df

In [19]:
dfs = []

for name, df in submission_dict.items():
    df = df.rename(columns={'Transported': name})
    dfs.append(df)

merged_df = pd.concat(dfs, axis=1).loc[:,~pd.concat(dfs, axis=1).columns.duplicated()].reset_index(drop=False)


df = merged_df[['PassengerId', 'RF', 'XGB', 'LGBM']]

In [20]:
df.head(3)

Unnamed: 0,PassengerId,RF,XGB,LGBM
0,0013_01,True,True,True
1,0018_01,False,False,False
2,0019_01,True,True,True


# Decide Majority Vote

In [21]:
df['Transported'] = df.drop(columns=['PassengerId']).apply(lambda row: row.value_counts().idxmax(), axis=1)

In [22]:
df.head(3)

Unnamed: 0,PassengerId,RF,XGB,LGBM,Transported
0,0013_01,True,True,True,True
1,0018_01,False,False,False,False
2,0019_01,True,True,True,True


# Create Submission

In [23]:
submission = df[['PassengerId', 'Transported']]
submission.head(3)

Unnamed: 0,PassengerId,Transported
0,0013_01,True
1,0018_01,False
2,0019_01,True


# Save Submission

In [24]:
submission.to_csv(f'../../data/submissions/VC_train_test_data_1_0.csv', index=False)
display("Submission file generated successfully.")

'Submission file generated successfully.'