⚠️ This project is mandatory for certification bloc #2.

![Tinder](https://full-stack-assets.s3.eu-west-3.amazonaws.com/M03-EDA/Tinder-Symbole.png)

# Speed Dating with Tinder

## Company's description 📇

<a href="https://tinder.com/" target="_blank">Tinder</a> is an online dating and geosocial networking application. In Tinder, users "swipe right" to like or "swipe left" to dislike other users' profiles, which include their photos, a short bio, and a list of their interests.

Tinder was launched by Sean Rad at a hackathon held at the Hatch Labs incubator in West Hollywood in 2012.

As of 2021, Tinder has recorded more than 65 billion matches worldwide.

## Project 🚧

The marketing team needs help on a new project. They are experiencing a decrease in the number of matches, and they are trying to find a way to understand **what makes people interested into each other**. 

They decided to run a speed dating experiment with people who had to give Tinder lots of informations about themselves that could ultimately reflect on ther dating profile on the app.

Tinder then gathered the data from this experiment. Each row in the dataset represents one speed date between two people, and indicates wether each of them secretly agreed to go on a second date with the other person.

## Goals 🎯

Use the dataset to understand what makes people interested into each other to go on a second date together:
* You may use descriptive statistics
* You may use visualisations

## Scope of this project 🖼️

Data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.

The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information. See the Speed Dating Data Key document below for details.

[Dataset](https://full-stack-assets.s3.eu-west-3.amazonaws.com/M03-EDA/Speed+Dating+Data.csv)

[Dataset Description](https://full-stack-assets.s3.eu-west-3.amazonaws.com/M03-EDA/Speed+Dating+Data+Key.doc)

## Helpers 🦮

To help you achieve this project, here are a few tips that should help youbest destinations on a map

Data Exploration Ideas :
* What are the least desirable attributes in a male partner? Does this differ for female partners?
* How important do people think attractiveness is in potential mate selection vs. its real impact?
* Are shared interests more important than a shared racial background?
* Can people accurately predict their own perceived value in the dating market?
* In terms of getting a second date, is it better to be someone's first speed date of the night or their last?

## Deliverable 📬

To complete this project, your team should deliver:

A notebook with:
* descriptive statistics
* visualisations
* captions and interpretations on how the stats and visualisations are relevant to why people agree to a second date

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
#pio.renderers.default = "svg" # this line must be commented if working on colab

In [2]:
with open('Speed+Dating+Data.csv', 'r', encoding='latin1') as f:
    df = pd.read_csv(f)

In [7]:
df.head()

Unnamed: 0,iid,id,gender,idg,condtn,wave,round,position,positin1,order,...,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
0,1,1.0,0,1,1,1,10,7,,4,...,5.0,7.0,7.0,7.0,7.0,,,,,
1,1,1.0,0,1,1,1,10,7,,3,...,5.0,7.0,7.0,7.0,7.0,,,,,
2,1,1.0,0,1,1,1,10,7,,10,...,5.0,7.0,7.0,7.0,7.0,,,,,
3,1,1.0,0,1,1,1,10,7,,5,...,5.0,7.0,7.0,7.0,7.0,,,,,
4,1,1.0,0,1,1,1,10,7,,7,...,5.0,7.0,7.0,7.0,7.0,,,,,


Speed dating organisation description:
*iid: unique participant id number
*pid: partner’s iid number 
    ==> 551 participants in total
*id: participant id number in a wave
*partner: partner id number in a wave.
    ==> The number of participants can be different and the women/men ratio too.
*gender: 0 woman/1 man
idg: id within a gender
*wave: wave number 
    ==> There are 21 waves.
*round: number of people that met in wave
*position: station number where met partner 
*order: the number of date in the wave
*match: 0 no/1 yes

positin1: station number where started 
condtn: 1 limited choice/2 extensive choice

==>For example: in wave 10, there were 9 women and 9 men (18 participants in total) and 162 entries (each participant 18 is meeting 9 partners): 2 genders, 18 iid and 18 pid, 9 id and 9 partner, 18 idg, round=9, 9 positions, 9 orders 

Shared features
*int_corr: correlation between participant’s and partner’s ratings of interests in Time 1
*samerace: 0 no/1 yes

Partner's features:
age_o: age of partner
race_o: race of partner
dec_o: decision of partner the night of event whether or not they would like to see him or her again: 0 no/1 yes
like_o: how much do you like this person from 1 to 10?
prob_o: how probable do you think it is that this person will say 'yes' for you from 1 to 10?
met_o: have you met this person before? 0 no/1 yes
attr_o, sinc_o, intel_o, fun_o, amb_o, shar_o: rating by partner the night of the event
pf_o_att, pf_o_sin, pf_o_int, pf_o_fun, pf_o_amb, pf_o_sha : partner’s stated preference at Time 1

Participant's features
age:
race:
dec: decision the night of event whether or not they would like to see him or her again: 0 no/1 yes
like: how much do you like this person from 1 to 10?
prob: how probable do you think it is that this person will say 'yes' for you from 1 to 10?
met: have you met this person before? 0 no/1 yes
attr, sinc, intel, fun, amb, shar: rating the partner the night of the event

Participants questionnaire (Signup/Time1): 
*field: field of study/ field_cd: coded from 1 to 16

*We want to know what you look for in the opposite sex: 
attr1_1, sinc1_1, intel1_1, fun1_1, amb1_1, shar1_1	  

*What do you think the opposite sex looks for in a date?
attr2_1, sinc2_1, intel2_1, fun2_1, amb2_1,shar2_1	 

*How do you think you measure up? Please rate your opinion of your own attributes:
attr3_1, sinc3_1, intel3_1, fun3_1, amb3_1, shar3_1

*What you think MOST of your fellow men/women look for in the opposite sex?
attr4_1, sinc4_1, intel4_1, fun4_1, amb4_1, shar4_1	 

*How do you think others perceive you? 
attr5_1, sinc5_1, intel5_1, fun5_1, amb5_1, shar5_1	

Half way through meeting all potential dates during the night of the event on their scorecard:
*match_es: How many matches do you estimate you will get?
*Importance of 6 attributes for participants in a potential partner: 
attr1_s, sinc1_s, intel1_s, fun1_s, amb1_s, shar1_s	 

*Rate your opinion of your own attributes:
attr3_s, sinc3_s, intel3_s, fun3_s, amb3_s, shar3_s	 

Survey is filled out the day after participating in the event (FollowUp/Time2):
*satis_2: how satisfied were you with the people you met? 
*length:
*numdat_2:
*We want to know what you look for in the opposite sex: 
attr1_2, sinc1_2, intel1_2, fun1_2, amb1_2, shar1_2	  

*What do you think the opposite sex looks for in a date?
attr2_2, sinc2_2, intel2_2, fun2_2, amb2_2,shar2_2	 

*How do you think you measure up? Please rate your opinion of your own attributes:
attr3_2, sinc3_2, intel3_2, fun3_2, amb3_2, shar3_2

*What you think MOST of your fellow men/women look for in the opposite sex?
attr4_2, sinc4_2, intel4_2, fun4_2, amb4_2, shar4_2	 

*How do you think others perceive you? 
attr5_2, sinc5_2, intel5_2, fun5_2, amb5_2, shar5_2	 

*what is the actual importance of these attributes in the decisions you've made: 
attr7_2, sinc7_2, intel7_2, fun7_2, amb7_2, shar7_2	 


Subjects filled out 3-4 weeks after they had been sent their matches (FollowUp2/Time3):
*you_call: How many have you contacted to set up a date?
*them_cal: How many have contacted you?
*date_3: Have you been on a date with any of your matches? 1 Yes/2 No
*numdat_3: How many of your matches have you been on a date with so far?
num_in_3: If yes, how many?
*We want to know what you look for in the opposite sex: 
attr1_3, sinc1_3, intel1_3, fun1_3, amb1_3, shar1_3	  

*What do you think the opposite sex looks for in a date?
attr2_3, sinc2_3, intel2_3, fun2_3, amb2_3,shar2_3	 

*How do you think you measure up? Please rate your opinion of your own attributes:
attr3_3, sinc3_3, intel3_3, fun3_3, amb3_3, shar3_3

*What you think MOST of your fellow men/women look for in the opposite sex?
attr4_3, sinc4_3, intel4_3, fun4_3, amb4_3, shar4_3	 

*How do you think others perceive you? 
attr5_3, sinc5_3, intel5_3, fun5_3, amb5_3, shar5_3	 

*what is the actual importance of these attributes in the decisions you've made: 
attr7_3, sinc7_3, intel7_3, fun7_3, amb7_3, shar7_3	 

In [4]:
df.shape

(8378, 195)

In [5]:
df.dtypes

iid           int64
id          float64
gender        int64
idg           int64
condtn        int64
             ...   
attr5_3     float64
sinc5_3     float64
intel5_3    float64
fun5_3      float64
amb5_3      float64
Length: 195, dtype: object

In [6]:
df.describe(include='all')

Unnamed: 0,iid,id,gender,idg,condtn,wave,round,position,positin1,order,...,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
count,8378.0,8377.0,8378.0,8378.0,8378.0,8378.0,8378.0,8378.0,6532.0,8378.0,...,3974.0,3974.0,3974.0,3974.0,3974.0,2016.0,2016.0,2016.0,2016.0,2016.0
unique,,,,,,,,,,,...,,,,,,,,,,
top,,,,,,,,,,,...,,,,,,,,,,
freq,,,,,,,,,,,...,,,,,,,,,,
mean,283.675937,8.960248,0.500597,17.327166,1.828837,11.350919,16.872046,9.042731,9.295775,8.927668,...,7.240312,8.093357,8.388777,7.658782,7.391545,6.81002,7.615079,7.93254,7.155258,7.048611
std,158.583367,5.491329,0.500029,10.940735,0.376673,5.995903,4.358458,5.514939,5.650199,5.477009,...,1.576596,1.610309,1.459094,1.74467,1.961417,1.507341,1.504551,1.340868,1.672787,1.717988
min,1.0,1.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,...,2.0,2.0,3.0,2.0,1.0,2.0,2.0,4.0,1.0,1.0
25%,154.0,4.0,0.0,8.0,2.0,7.0,14.0,4.0,4.0,4.0,...,7.0,7.0,8.0,7.0,6.0,6.0,7.0,7.0,6.0,6.0
50%,281.0,8.0,1.0,16.0,2.0,11.0,18.0,8.0,9.0,8.0,...,7.0,8.0,8.0,8.0,8.0,7.0,8.0,8.0,7.0,7.0
75%,407.0,13.0,1.0,26.0,2.0,15.0,20.0,13.0,14.0,13.0,...,8.0,9.0,9.0,9.0,9.0,8.0,9.0,9.0,8.0,8.0


In [14]:
df["dec_o"].isnull().sum()

0

In [17]:
df_date = df.loc[df['iid'] == df['pid'], 'date_3'].value_counts()
df_date.head()


Series([], Name: count, dtype: int64)

In [58]:
fig = px.histogram(df, x='income', nbins=10, title='Income distribution')
fig.show()

In [59]:
df['income'].dtypes


dtype('O')

In [None]:
df['income'] = df['income'].astype(float)

In [47]:
df['income'].isnull().sum()

4099

In [48]:
df['age'].dtypes

dtype('float64')

In [49]:
df['age'].isnull().sum()

95

In [50]:
fig = px.histogram(df, x='age', color = 'gender', nbins=30, title='Age distribution')
fig.show()

In [None]:
#fig = px.scatter(df, x="age", y="income", trendline="ols")
#fig.show()

In [130]:
df_gender = df['gender'].value_counts().reset_index()
df_gender

Unnamed: 0,gender,count
0,1,4194
1,0,4184


In [None]:
fig = px.pie(df_gender, names='gender', values='count', title='Gender Distribution')
fig.show()

In [132]:
df_wave_distribution = df.groupby('gender')['wave'].value_counts().reset_index()
df_wave_distribution.head()


Unnamed: 0,gender,wave,count
0,0,21,484
1,0,11,441
2,0,9,400
3,0,14,360
4,0,15,342


In [133]:
df_wave_distribution['gender'].replace(0, 'Women', inplace=True)
df_wave_distribution['gender'].replace(1, 'Men', inplace=True)


A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.





In [134]:
df_wave_distribution.head()

Unnamed: 0,gender,wave,count
0,Women,21,484
1,Women,11,441
2,Women,9,400
3,Women,14,360
4,Women,15,342


In [135]:
fig = px.bar(df_wave_distribution, x="wave", y="count", color = 'gender', title='Wave Distribution')
fig.show()

In [137]:
fig = px.histogram(df, x="attr1_1")
fig.show()

In [65]:
df = df[~df['wave'].isin([6, 7, 8, 9])]

In [66]:
df.shape

(6816, 195)

In [67]:
fig = px.histogram(df, x="attr1_1")
fig.show()

In [78]:
df_attractivity_attribute = df.groupby('gender')['attr1_1'].mean(numeric_only = True).round(2).reset_index()
df_attractivity_attribute

Unnamed: 0,gender,attr1_1
0,0,18.79
1,1,29.11


In [79]:
df_sincerity_attribute = df.groupby('gender')['sinc1_1'].mean(numeric_only = True).round(2).reset_index()
df_sincerity_attribute

Unnamed: 0,gender,sinc1_1
0,0,18.38
1,1,16.23


In [80]:
df_intelligence_attribute = df.groupby('gender')['intel1_1'].mean(numeric_only = True).round(2).reset_index()
df_intelligence_attribute

Unnamed: 0,gender,intel1_1
0,0,21.57
1,1,19.56


In [81]:
df_fun_attribute = df.groupby('gender')['fun1_1'].mean(numeric_only = True).round(2).reset_index()
df_fun_attribute

Unnamed: 0,gender,fun1_1
0,0,17.04
1,1,17.66


In [82]:
df_ambition_attribute = df.groupby('gender')['amb1_1'].mean(numeric_only = True).round(2).reset_index()
df_ambition_attribute

Unnamed: 0,gender,amb1_1
0,0,12.01
1,1,7.5


In [83]:
df_shared_interest_attribute = df.groupby('gender')['shar1_1'].mean(numeric_only = True).round(2).reset_index()
df_shared_interest_attribute


Unnamed: 0,gender,shar1_1
0,0,12.26
1,1,10.26


In [86]:
df_final1 = df_intelligence_attribute.merge(df_attractivity_attribute,on='gender')
display(df_final1)

Unnamed: 0,gender,intel1_1,attr1_1
0,0,21.57,18.79
1,1,19.56,29.11


In [88]:
df_final2 = df_final1.merge(df_sincerity_attribute,on='gender')
display(df_final2)

Unnamed: 0,gender,intel1_1,attr1_1,sinc1_1
0,0,21.57,18.79,18.38
1,1,19.56,29.11,16.23


In [89]:
df_final3 = df_final2.merge(df_fun_attribute,on='gender')
display(df_final3)

Unnamed: 0,gender,intel1_1,attr1_1,sinc1_1,fun1_1
0,0,21.57,18.79,18.38,17.04
1,1,19.56,29.11,16.23,17.66


In [90]:
df_final4 = df_final3.merge(df_ambition_attribute,on='gender')
display(df_final4)

Unnamed: 0,gender,intel1_1,attr1_1,sinc1_1,fun1_1,amb1_1
0,0,21.57,18.79,18.38,17.04,12.01
1,1,19.56,29.11,16.23,17.66,7.5


In [91]:
df_final = df_final4.merge(df_shared_interest_attribute,on='gender')
display(df_final)

Unnamed: 0,gender,intel1_1,attr1_1,sinc1_1,fun1_1,amb1_1,shar1_1
0,0,21.57,18.79,18.38,17.04,12.01,12.26
1,1,19.56,29.11,16.23,17.66,7.5,10.26


In [93]:
df_final['gender'].replace(0, 'Women', inplace=True)
df_final['gender'].replace(1, 'Men', inplace=True)


A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.





In [108]:
px.bar?

[1;31mSignature:[0m
[0mpx[0m[1;33m.[0m[0mbar[0m[1;33m([0m[1;33m
[0m    [0mdata_frame[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mx[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0my[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mcolor[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpattern_shape[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mfacet_row[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mfacet_col[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mfacet_col_wrap[0m[1;33m=[0m[1;36m0[0m[1;33m,[0m[1;33m
[0m    [0mfacet_row_spacing[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mfacet_col_spacing[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mhover_name[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mhover_data[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mcustom_data[0m[1;33m=[0m[1;32mNone[0m[

In [107]:
fig = px.bar(df_final, y="gender", x=['intel1_1', 'attr1_1', 'sinc1_1', 'fun1_1', 'amb1_1', 'shar1_1'], color = 'gender', title='Sincerity Attribute')
fig.show()

ValueError: Plotly Express cannot process wide-form data with columns of different type.

In [104]:
fig = px.bar(df_final, y="gender", x=['sinc1_1', 'attr1_1', 'fun1_1', 'amb1_1', 'shar1_1'], color = 'gender', title='Sincerity Attribute')
fig.show()

In [102]:
fig = px.bar(df_final, y="gender", x=['sinc1_1', 'attr1_1', 'fun1_1', 'amb1_1', 'shar1_1'], color = 'gender', title='Sincerity Attribute')
fig.show()


In [103]:
fig = px.bar(df_final, x="attr1_1", y='sinc1_1', color = 'gender', title='Sincerity Attribute')
fig.show()


In [112]:
df_final_swapped = df_final.T
display(df_final_swapped[0])


gender      Women
intel1_1    21.57
attr1_1     18.79
sinc1_1     18.38
fun1_1      17.04
amb1_1      12.01
shar1_1     12.26
Name: 0, dtype: object

In [100]:
fig = px.bar(df_final_swapped, x="gender", y=['sinc1_1', 'attr1_1', 'fun1_1', 'amb1_1', 'shar1_1'], color = 'gender', title='Sincerity Attribute')
fig.show()

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0, 1] but received: gender