# Unlocking the Key to Finding Your Match

## Introduction
Struggling to find a match in the modern dating scene, I became determined to uncover the secret to success. My mom always assured me I was handsome, so looks couldn't be the issue. Could it be because I'm under 6 feet tall? My ethnicity? Or maybe my job title? Armed with my newfound knowledge of machine learning, I set out to dive deep into dating data and discover the hidden ingredients for finding my perfect match.

## Project Topic
Understanding the factors that lead to success on a second date can provide valuable insights into what I need to improve to find my ideal match. In this analysis, I employed several supervised machine learning algorithms, including logistic regression, random forest, and gradient boosting models, to predict the traits that make each gender more attractive for a match. The dataset used for this analysis includes various attributes related to dating. By training predictive models on this data, I aim not only to enhance my knowledge of machine learning but also to apply these findings to my personal quest for the perfect match.

## About The Data
The data I used is from Speed Dating dataset from Kaggle: https://www.kaggle.com/annavictoria/speed-dating-experiment
The data was gathered from 552 participants in experimental speed dating events from 2002-2004.
During the events, the attendees would have a four minute "first date" with every other participant of the opposite sex.
At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes:
- Attractiveness
- Sincerity
- Intelligence
- Fun
- Ambition
- Shared Interests.

The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include:
- demographics
- dating habits
- self-perception across key attributes
- beliefs on what others find valuable in a mate
- lifestyle information

See the speed-dating-data-key.doc for data dictionary and question key.


# Import Packages

In [2]:
# importing packages
%matplotlib inline
import pandas as pd
pd.options.display.max_rows = 1000 #shows truncated results
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import metrics

# Initial Look

In [6]:
# importing data
date_data = pd.read_csv('data/speed_dating_data.csv', encoding="ISO-8859-1") # this encoding handles reading non-ASCII characters. 
date_data.head()

Unnamed: 0,iid,id,gender,idg,condtn,wave,round,position,positin1,order,...,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
0,1,1.0,0,1,1,1,10,7,,4,...,5.0,7.0,7.0,7.0,7.0,,,,,
1,1,1.0,0,1,1,1,10,7,,3,...,5.0,7.0,7.0,7.0,7.0,,,,,
2,1,1.0,0,1,1,1,10,7,,10,...,5.0,7.0,7.0,7.0,7.0,,,,,
3,1,1.0,0,1,1,1,10,7,,5,...,5.0,7.0,7.0,7.0,7.0,,,,,
4,1,1.0,0,1,1,1,10,7,,7,...,5.0,7.0,7.0,7.0,7.0,,,,,
