# Speed Date Analysis

## Project 🚧
The marketing team needs help on a new project. They are experiencing a decrease in the number of matches, and they are trying to find a way to understand what makes people interested into each other.

They decided to run a speed dating experiment with people who had to give Tinder lots of informations about themselves that could ultimately reflect on ther dating profile on the app.

Tinder then gathered the data from this experiment. Each row in the dataset represents one speed date between two people, and indicates wether each of them secretly agreed to go on a second date with the other person.

## Goals 🎯
Use the dataset to understand what makes people interested into each other to go on a second date together:

- You may use descriptive statistics
- You may use visualisations

## Scope of this project 🖼️
Data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.

The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information. See the Speed Dating Data Key document below for details.

In [18]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv("data/SpeedDatingData.csv", encoding = 'unicode_escape')
display(dataset)
dataset.describe(include='all')

Unnamed: 0,iid,id,gender,idg,condtn,wave,round,position,positin1,order,...,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
0,1,1.0,0,1,1,1,10,7,,4,...,5.0,7.0,7.0,7.0,7.0,,,,,
1,1,1.0,0,1,1,1,10,7,,3,...,5.0,7.0,7.0,7.0,7.0,,,,,
2,1,1.0,0,1,1,1,10,7,,10,...,5.0,7.0,7.0,7.0,7.0,,,,,
3,1,1.0,0,1,1,1,10,7,,5,...,5.0,7.0,7.0,7.0,7.0,,,,,
4,1,1.0,0,1,1,1,10,7,,7,...,5.0,7.0,7.0,7.0,7.0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8373,552,22.0,1,44,2,21,22,14,10.0,5,...,8.0,5.0,7.0,6.0,7.0,9.0,5.0,9.0,5.0,6.0
8374,552,22.0,1,44,2,21,22,13,10.0,4,...,8.0,5.0,7.0,6.0,7.0,9.0,5.0,9.0,5.0,6.0
8375,552,22.0,1,44,2,21,22,19,10.0,10,...,8.0,5.0,7.0,6.0,7.0,9.0,5.0,9.0,5.0,6.0
8376,552,22.0,1,44,2,21,22,3,10.0,16,...,8.0,5.0,7.0,6.0,7.0,9.0,5.0,9.0,5.0,6.0


Unnamed: 0,iid,id,gender,idg,condtn,wave,round,position,positin1,order,...,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
count,8378.0,8377.0,8378.0,8378.0,8378.0,8378.0,8378.0,8378.0,6532.0,8378.0,...,3974.0,3974.0,3974.0,3974.0,3974.0,2016.0,2016.0,2016.0,2016.0,2016.0
unique,,,,,,,,,,,...,,,,,,,,,,
top,,,,,,,,,,,...,,,,,,,,,,
freq,,,,,,,,,,,...,,,,,,,,,,
mean,283.675937,8.960248,0.500597,17.327166,1.828837,11.350919,16.872046,9.042731,9.295775,8.927668,...,7.240312,8.093357,8.388777,7.658782,7.391545,6.81002,7.615079,7.93254,7.155258,7.048611
std,158.583367,5.491329,0.500029,10.940735,0.376673,5.995903,4.358458,5.514939,5.650199,5.477009,...,1.576596,1.610309,1.459094,1.74467,1.961417,1.507341,1.504551,1.340868,1.672787,1.717988
min,1.0,1.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,...,2.0,2.0,3.0,2.0,1.0,2.0,2.0,4.0,1.0,1.0
25%,154.0,4.0,0.0,8.0,2.0,7.0,14.0,4.0,4.0,4.0,...,7.0,7.0,8.0,7.0,6.0,6.0,7.0,7.0,6.0,6.0
50%,281.0,8.0,1.0,16.0,2.0,11.0,18.0,8.0,9.0,8.0,...,7.0,8.0,8.0,8.0,8.0,7.0,8.0,8.0,7.0,7.0
75%,407.0,13.0,1.0,26.0,2.0,15.0,20.0,13.0,14.0,13.0,...,8.0,9.0,9.0,9.0,9.0,8.0,9.0,9.0,8.0,8.0


## Important datas
When we saw the data descrition, theses data seems to be important to analyse:

### The general data :
- iid
- pid : partner’s iid number
- gender
- match: 1=yes, 0=no
- dec_o: decision of partner the night of event
- attr_o: rating by partner the night of the event, for all 6 attributes
- samerace: participant and the partner were the same race. 1= yes, 0=no
- age: age
- age_o : age of partner
- field_cd: code of field of study
- career_c: career coded
- mn_sat: Median SAT score for the undergraduate institution where attended. Taken from Barron’s 25th Edition college profile book.  Proxy for intelligence.
- income: Median household income based on zipcode using the Census Bureau website
- goal: What is your primary goal in participating in this event?

### The attributes the participant thinks important
- imprace: How important is it to you (on a scale of 1-10) that a person you date be of the same racial/ethnic background?
- imprelig: How important is it to you (on a scale of 1-10) that a person you date be of the same religious background?

Rate the importance of the following attributes on a scale of 1-10 
- attr1_s:  Attractive
- sinc1_s: Sincere
- intel1_s: Intelligent
- fun1_s: Fun
- amb1_s: Ambitious
- shar1_s: Has shared interests/hobbies

Rate your own attributes:
- attr3_s:  Attractive
- sinc3_s: Sincere
- intel3_s: Intelligent
- fun3_s: Fun
- amb3_s: Ambitious

Rate the importance of attribute 


### Interrests:
- sports: Playing sports/ athletics
- tvsports: Watching sports
- excersice: Body building/exercising
- dining: Dining out
- museums: Museums/galleries
- art: Art
- hiking:  Hiking/camping
- gaming: Gaming
- clubbing: Dancing/clubbing
- reading: Reading
- tv: Watching TV
- theater: Theater
- movies: Movies
- concerts: Going to concerts
- music: Music
- shopping: Shopping
- yoga: Yoga/meditation

### Futur:
- you_call: How many have you contacted to set up a date?
- them_cal: How many have contacted you?
- date_3: Have you been on a date with any of your matches?	Yes=1	No=2


In [19]:
dataset = dataset[['iid', 'pid', 'wave', 'gender', 'match', 'dec_o', 'attr_o', 'samerace', 'age', 'age_o', 'field_cd', 'career_c', 'mn_sat', 'income', 'goal', 'attr1_s', 'sinc1_s', 'intel1_s', 'fun1_s', 'amb1_s', 'shar1_s', 'attr3_s', 'sinc3_s', 'intel3_s', 'fun3_s', 'amb3_s', 'sports', 'tvsports', 'exercise', 'dining', 'museums', 'art', 'hiking', 'gaming', 'clubbing', 'reading', 'tv', 'theater', 'movies', 'concerts', 'music', 'shopping', 'yoga', 'you_call', 'them_cal', 'date_3']]
display(dataset)
dataset.describe(include='all')

Unnamed: 0,iid,pid,wave,gender,match,dec_o,attr_o,samerace,age,age_o,...,tv,theater,movies,concerts,music,shopping,yoga,you_call,them_cal,date_3
0,1,11.0,1,0,0,0,6.0,0,21.0,27.0,...,9.0,1.0,10.0,10.0,9.0,8.0,1.0,1.0,1.0,0.0
1,1,12.0,1,0,0,0,7.0,0,21.0,22.0,...,9.0,1.0,10.0,10.0,9.0,8.0,1.0,1.0,1.0,0.0
2,1,13.0,1,0,1,1,10.0,1,21.0,22.0,...,9.0,1.0,10.0,10.0,9.0,8.0,1.0,1.0,1.0,0.0
3,1,14.0,1,0,1,1,7.0,0,21.0,23.0,...,9.0,1.0,10.0,10.0,9.0,8.0,1.0,1.0,1.0,0.0
4,1,15.0,1,0,1,1,8.0,0,21.0,24.0,...,9.0,1.0,10.0,10.0,9.0,8.0,1.0,1.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8373,552,526.0,21,1,0,1,10.0,0,25.0,26.0,...,3.0,7.0,9.0,10.0,10.0,7.0,3.0,2.0,0.0,0.0
8374,552,527.0,21,1,0,0,6.0,0,25.0,24.0,...,3.0,7.0,9.0,10.0,10.0,7.0,3.0,2.0,0.0,0.0
8375,552,528.0,21,1,0,0,2.0,0,25.0,29.0,...,3.0,7.0,9.0,10.0,10.0,7.0,3.0,2.0,0.0,0.0
8376,552,529.0,21,1,0,1,5.0,0,25.0,22.0,...,3.0,7.0,9.0,10.0,10.0,7.0,3.0,2.0,0.0,0.0


Unnamed: 0,iid,pid,wave,gender,match,dec_o,attr_o,samerace,age,age_o,...,tv,theater,movies,concerts,music,shopping,yoga,you_call,them_cal,date_3
count,8378.0,8368.0,8378.0,8378.0,8378.0,8378.0,8166.0,8378.0,8283.0,8274.0,...,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,3974.0,3974.0,3974.0
unique,,,,,,,,,,,...,,,,,,,,,,
top,,,,,,,,,,,...,,,,,,,,,,
freq,,,,,,,,,,,...,,,,,,,,,,
mean,283.675937,283.863767,11.350919,0.500597,0.164717,0.419551,6.190411,0.395799,26.358928,26.364999,...,5.304133,6.776118,7.919629,6.825401,7.851066,5.631281,4.339197,0.780825,0.981631,0.37695
std,158.583367,158.584899,5.995903,0.500029,0.370947,0.493515,1.950305,0.489051,3.566763,3.563648,...,2.529135,2.235152,1.700927,2.156283,1.791827,2.608913,2.717612,1.611694,1.382139,0.484683
min,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,18.0,18.0,...,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
25%,154.0,154.0,7.0,0.0,0.0,0.0,5.0,0.0,24.0,24.0,...,3.0,5.0,7.0,5.0,7.0,4.0,2.0,0.0,0.0,0.0
50%,281.0,281.0,11.0,1.0,0.0,0.0,6.0,0.0,26.0,26.0,...,6.0,7.0,8.0,7.0,8.0,6.0,4.0,0.0,1.0,0.0
75%,407.0,408.0,15.0,1.0,0.0,1.0,8.0,1.0,28.0,28.0,...,7.0,9.0,9.0,8.0,9.0,8.0,7.0,1.0,1.0,1.0


In [36]:
match_per_wave = dataset.groupby('wave')['match'].value_counts()
display(type(match_per_wave))
match_per_wave
#sns.boxplot(dataset.groupby('wave')['match'])
#sns.FacetGrid(match_per_wave, col= , row= )


pandas.core.series.Series

wave  match
1     0        138
      1         62
2     0        546
      1         62
3     0        174
      1         26
4     0        518
      1        130
5     0        136
      1         54
6     0         40
      1         10
7     0        426
      1         86
8     0        164
      1         36
9     0        676
      1        124
10    0        132
      1         30
11    0        754
      1        128
12    0        350
      1         42
13    0        148
      1         32
14    0        594
      1        126
15    0        558
      1        126
16    0         72
      1         24
17    0        232
      1         48
18    0         66
      1          6
19    0        376
      1         74
20    0         74
      1         10
21    0        824
      1        144
Name: match, dtype: int64

### Number match per wave
Observe that to check if the variations of the waves as an importance and if a wave that is biased

## Number of interest in common and number of match

## Correlation of interest

distinguer entre les resultat femmes et homme

Just correlation not causality