![Tinder](https://full-stack-assets.s3.eu-west-3.amazonaws.com/M03-EDA/Tinder-Symbole.png)

# Speed Dating with Tinder

## Company's description üìá

<a href="https://tinder.com/" target="_blank">Tinder</a> is an online dating and geosocial networking application. In Tinder, users "swipe right" to like or "swipe left" to dislike other users' profiles, which include their photos, a short bio, and a list of their interests.

Tinder was launched by Sean Rad at a hackathon held at the Hatch Labs incubator in West Hollywood in 2012.

As of 2021, Tinder has recorded more than 65 billion matches worldwide.

## Project üöß

The marketing team needs help on a new project. They are experiencing a decrease in the number of matches, and they are trying to find a way to understand **what makes people interested into each other**. 

They decided to run a speed dating experiment with people who had to give Tinder lots of informations about themselves that could ultimately reflect on ther dating profile on the app.

Tinder then gathered the data from this experiment. Each row in the dataset represents one speed date between two people, and indicates wether each of them secretly agreed to go on a second date with the other person.

## Goals üéØ

Use the dataset to understand what makes people interested into each other to go on a second date together:
* You may use descriptive statistics
* You may use visualisations

## Scope of this project üñºÔ∏è

Data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.

The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information. See the Speed Dating Data Key document below for details.

[Dataset](https://full-stack-assets.s3.eu-west-3.amazonaws.com/M03-EDA/Speed+Dating+Data.csv)

[Dataset Description](https://full-stack-assets.s3.eu-west-3.amazonaws.com/M03-EDA/Speed+Dating+Data+Key.doc)

## Helpers ü¶Æ

To help you achieve this project, here are a few tips that should help youbest destinations on a map

Data Exploration Ideas :
* What are the least desirable attributes in a male partner? Does this differ for female partners?
* How important do people think attractiveness is in potential mate selection vs. its real impact?
* Are shared interests more important than a shared racial background?
* Can people accurately predict their own perceived value in the dating market?
* In terms of getting a second date, is it better to be someone's first speed date of the night or their last?

## Deliverable üì¨

To complete this project, your team should deliver:

A notebook with:
* descriptive statistics
* visualisations
* captions and interpretations on how the stats and visualisations are relevant to why people agree to a second date

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy.stats import pearsonr, binomtest, ranksums
pd.options.display.max_columns = None

# EDA dataframe

In [2]:
Tinder = pd.read_csv("https://full-stack-assets.s3.eu-west-3.amazonaws.com/M03-EDA/Speed+Dating+Data.csv", 
                     encoding='windows-1252')

Tinder['gender_name'] = Tinder['gender'].replace({0: 'Female', 1: 'Male'})

print(Tinder.shape)
display(Tinder.head())
display(Tinder.describe(include="all"))

(8378, 196)


Unnamed: 0,iid,id,gender,idg,condtn,wave,round,position,positin1,order,partner,pid,match,int_corr,samerace,age_o,race_o,pf_o_att,pf_o_sin,pf_o_int,pf_o_fun,pf_o_amb,pf_o_sha,dec_o,attr_o,sinc_o,intel_o,fun_o,amb_o,shar_o,like_o,prob_o,met_o,age,field,field_cd,undergra,mn_sat,tuition,race,imprace,imprelig,from,zipcode,income,goal,date,go_out,career,career_c,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,exphappy,expnum,attr1_1,sinc1_1,intel1_1,fun1_1,amb1_1,shar1_1,attr4_1,sinc4_1,intel4_1,fun4_1,amb4_1,shar4_1,attr2_1,sinc2_1,intel2_1,fun2_1,amb2_1,shar2_1,attr3_1,sinc3_1,fun3_1,intel3_1,amb3_1,attr5_1,sinc5_1,intel5_1,fun5_1,amb5_1,dec,attr,sinc,intel,fun,amb,shar,like,prob,met,match_es,attr1_s,sinc1_s,intel1_s,fun1_s,amb1_s,shar1_s,attr3_s,sinc3_s,intel3_s,fun3_s,amb3_s,satis_2,length,numdat_2,attr7_2,sinc7_2,intel7_2,fun7_2,amb7_2,shar7_2,attr1_2,sinc1_2,intel1_2,fun1_2,amb1_2,shar1_2,attr4_2,sinc4_2,intel4_2,fun4_2,amb4_2,shar4_2,attr2_2,sinc2_2,intel2_2,fun2_2,amb2_2,shar2_2,attr3_2,sinc3_2,intel3_2,fun3_2,amb3_2,attr5_2,sinc5_2,intel5_2,fun5_2,amb5_2,you_call,them_cal,date_3,numdat_3,num_in_3,attr1_3,sinc1_3,intel1_3,fun1_3,amb1_3,shar1_3,attr7_3,sinc7_3,intel7_3,fun7_3,amb7_3,shar7_3,attr4_3,sinc4_3,intel4_3,fun4_3,amb4_3,shar4_3,attr2_3,sinc2_3,intel2_3,fun2_3,amb2_3,shar2_3,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3,gender_name
0,1,1.0,0,1,1,1,10,7,,4,1,11.0,0,0.14,0,27.0,2.0,35.0,20.0,20.0,20.0,0.0,5.0,0,6.0,8.0,8.0,8.0,8.0,6.0,7.0,4.0,2.0,21.0,Law,1.0,,,,4.0,2.0,4.0,Chicago,60521,69487.0,2.0,7.0,1.0,lawyer,,9.0,2.0,8.0,9.0,1.0,1.0,5.0,1.0,5.0,6.0,9.0,1.0,10.0,10.0,9.0,8.0,1.0,3.0,2.0,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,35.0,20.0,15.0,20.0,5.0,5.0,6.0,8.0,8.0,8.0,7.0,,,,,,1,6.0,9.0,7.0,7.0,6.0,5.0,7.0,6.0,2.0,4.0,,,,,,,,,,,,6.0,2.0,1.0,,,,,,,19.44,16.67,13.89,22.22,11.11,16.67,,,,,,,,,,,,,6.0,7.0,8.0,7.0,6.0,,,,,,1.0,1.0,0.0,,,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,,,,,,,,,,,,,5.0,7.0,7.0,7.0,7.0,,,,,,Female
1,1,1.0,0,1,1,1,10,7,,3,2,12.0,0,0.54,0,22.0,2.0,60.0,0.0,0.0,40.0,0.0,0.0,0,7.0,8.0,10.0,7.0,7.0,5.0,8.0,4.0,2.0,21.0,Law,1.0,,,,4.0,2.0,4.0,Chicago,60521,69487.0,2.0,7.0,1.0,lawyer,,9.0,2.0,8.0,9.0,1.0,1.0,5.0,1.0,5.0,6.0,9.0,1.0,10.0,10.0,9.0,8.0,1.0,3.0,2.0,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,35.0,20.0,15.0,20.0,5.0,5.0,6.0,8.0,8.0,8.0,7.0,,,,,,1,7.0,8.0,7.0,8.0,5.0,6.0,7.0,5.0,1.0,4.0,,,,,,,,,,,,6.0,2.0,1.0,,,,,,,19.44,16.67,13.89,22.22,11.11,16.67,,,,,,,,,,,,,6.0,7.0,8.0,7.0,6.0,,,,,,1.0,1.0,0.0,,,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,,,,,,,,,,,,,5.0,7.0,7.0,7.0,7.0,,,,,,Female
2,1,1.0,0,1,1,1,10,7,,10,3,13.0,1,0.16,1,22.0,4.0,19.0,18.0,19.0,18.0,14.0,12.0,1,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,1.0,21.0,Law,1.0,,,,4.0,2.0,4.0,Chicago,60521,69487.0,2.0,7.0,1.0,lawyer,,9.0,2.0,8.0,9.0,1.0,1.0,5.0,1.0,5.0,6.0,9.0,1.0,10.0,10.0,9.0,8.0,1.0,3.0,2.0,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,35.0,20.0,15.0,20.0,5.0,5.0,6.0,8.0,8.0,8.0,7.0,,,,,,1,5.0,8.0,9.0,8.0,5.0,7.0,7.0,,1.0,4.0,,,,,,,,,,,,6.0,2.0,1.0,,,,,,,19.44,16.67,13.89,22.22,11.11,16.67,,,,,,,,,,,,,6.0,7.0,8.0,7.0,6.0,,,,,,1.0,1.0,0.0,,,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,,,,,,,,,,,,,5.0,7.0,7.0,7.0,7.0,,,,,,Female
3,1,1.0,0,1,1,1,10,7,,5,4,14.0,1,0.61,0,23.0,2.0,30.0,5.0,15.0,40.0,5.0,5.0,1,7.0,8.0,9.0,8.0,9.0,8.0,7.0,7.0,2.0,21.0,Law,1.0,,,,4.0,2.0,4.0,Chicago,60521,69487.0,2.0,7.0,1.0,lawyer,,9.0,2.0,8.0,9.0,1.0,1.0,5.0,1.0,5.0,6.0,9.0,1.0,10.0,10.0,9.0,8.0,1.0,3.0,2.0,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,35.0,20.0,15.0,20.0,5.0,5.0,6.0,8.0,8.0,8.0,7.0,,,,,,1,7.0,6.0,8.0,7.0,6.0,8.0,7.0,6.0,2.0,4.0,,,,,,,,,,,,6.0,2.0,1.0,,,,,,,19.44,16.67,13.89,22.22,11.11,16.67,,,,,,,,,,,,,6.0,7.0,8.0,7.0,6.0,,,,,,1.0,1.0,0.0,,,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,,,,,,,,,,,,,5.0,7.0,7.0,7.0,7.0,,,,,,Female
4,1,1.0,0,1,1,1,10,7,,7,5,15.0,1,0.21,0,24.0,3.0,30.0,10.0,20.0,10.0,10.0,20.0,1,8.0,7.0,9.0,6.0,9.0,7.0,8.0,6.0,2.0,21.0,Law,1.0,,,,4.0,2.0,4.0,Chicago,60521,69487.0,2.0,7.0,1.0,lawyer,,9.0,2.0,8.0,9.0,1.0,1.0,5.0,1.0,5.0,6.0,9.0,1.0,10.0,10.0,9.0,8.0,1.0,3.0,2.0,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,35.0,20.0,15.0,20.0,5.0,5.0,6.0,8.0,8.0,8.0,7.0,,,,,,1,5.0,6.0,7.0,7.0,6.0,6.0,6.0,6.0,2.0,4.0,,,,,,,,,,,,6.0,2.0,1.0,,,,,,,19.44,16.67,13.89,22.22,11.11,16.67,,,,,,,,,,,,,6.0,7.0,8.0,7.0,6.0,,,,,,1.0,1.0,0.0,,,15.0,20.0,20.0,15.0,15.0,15.0,,,,,,,,,,,,,,,,,,,5.0,7.0,7.0,7.0,7.0,,,,,,Female


Unnamed: 0,iid,id,gender,idg,condtn,wave,round,position,positin1,order,partner,pid,match,int_corr,samerace,age_o,race_o,pf_o_att,pf_o_sin,pf_o_int,pf_o_fun,pf_o_amb,pf_o_sha,dec_o,attr_o,sinc_o,intel_o,fun_o,amb_o,shar_o,like_o,prob_o,met_o,age,field,field_cd,undergra,mn_sat,tuition,race,imprace,imprelig,from,zipcode,income,goal,date,go_out,career,career_c,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,exphappy,expnum,attr1_1,sinc1_1,intel1_1,fun1_1,amb1_1,shar1_1,attr4_1,sinc4_1,intel4_1,fun4_1,amb4_1,shar4_1,attr2_1,sinc2_1,intel2_1,fun2_1,amb2_1,shar2_1,attr3_1,sinc3_1,fun3_1,intel3_1,amb3_1,attr5_1,sinc5_1,intel5_1,fun5_1,amb5_1,dec,attr,sinc,intel,fun,amb,shar,like,prob,met,match_es,attr1_s,sinc1_s,intel1_s,fun1_s,amb1_s,shar1_s,attr3_s,sinc3_s,intel3_s,fun3_s,amb3_s,satis_2,length,numdat_2,attr7_2,sinc7_2,intel7_2,fun7_2,amb7_2,shar7_2,attr1_2,sinc1_2,intel1_2,fun1_2,amb1_2,shar1_2,attr4_2,sinc4_2,intel4_2,fun4_2,amb4_2,shar4_2,attr2_2,sinc2_2,intel2_2,fun2_2,amb2_2,shar2_2,attr3_2,sinc3_2,intel3_2,fun3_2,amb3_2,attr5_2,sinc5_2,intel5_2,fun5_2,amb5_2,you_call,them_cal,date_3,numdat_3,num_in_3,attr1_3,sinc1_3,intel1_3,fun1_3,amb1_3,shar1_3,attr7_3,sinc7_3,intel7_3,fun7_3,amb7_3,shar7_3,attr4_3,sinc4_3,intel4_3,fun4_3,amb4_3,shar4_3,attr2_3,sinc2_3,intel2_3,fun2_3,amb2_3,shar2_3,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3,gender_name
count,8378.0,8377.0,8378.0,8378.0,8378.0,8378.0,8378.0,8378.0,6532.0,8378.0,8378.0,8368.0,8378.0,8220.0,8378.0,8274.0,8305.0,8289.0,8289.0,8289.0,8280.0,8271.0,8249.0,8378.0,8166.0,8091.0,8072.0,8018.0,7656.0,7302.0,8128.0,8060.0,7993.0,8283.0,8315,8296.0,4914,3133.0,3583.0,8315.0,8299.0,8299.0,8299,7314.0,4279.0,8299.0,8281.0,8299.0,8289,8240.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8299.0,8277.0,1800.0,8299.0,8299.0,8299.0,8289.0,8279.0,8257.0,6489.0,6489.0,6489.0,6489.0,6489.0,6467.0,8299.0,8299.0,8299.0,8299.0,8289.0,8289.0,8273.0,8273.0,8273.0,8273.0,8273.0,4906.0,4906.0,4906.0,4906.0,4906.0,8378.0,8176.0,8101.0,8082.0,8028.0,7666.0,7311.0,8138.0,8069.0,8003.0,7205.0,4096.0,4096.0,4096.0,4096.0,4096.0,4096.0,4000.0,4000.0,4000.0,4000.0,4000.0,7463.0,7463.0,7433.0,1984.0,1955.0,1984.0,1984.0,1955.0,1974.0,7445.0,7463.0,7463.0,7463.0,7463.0,7463.0,5775.0,5775.0,5775.0,5775.0,5775.0,5775.0,5775.0,5775.0,5775.0,5775.0,5775.0,5775.0,7463.0,7463.0,7463.0,7463.0,7463.0,4377.0,4377.0,4377.0,4377.0,4377.0,3974.0,3974.0,3974.0,1496.0,668.0,3974.0,3974.0,3974.0,3974.0,3974.0,3974.0,2016.0,2016.0,2016.0,2016.0,2016.0,2016.0,2959.0,2959.0,2959.0,2959.0,2959.0,2959.0,2959.0,2959.0,2959.0,2959.0,2959.0,2016.0,3974.0,3974.0,3974.0,3974.0,3974.0,2016.0,2016.0,2016.0,2016.0,2016.0,8378
unique,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,259,,241,68.0,115.0,,,,269,409.0,261.0,,,,367,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2
top,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Business,,UC Berkeley,1400.0,26908.0,,,,New York,0.0,55080.0,,,,Finance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Male
freq,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,521,,107,403.0,241.0,,,,522,355.0,124.0,,,,202,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4194
mean,283.675937,8.960248,0.500597,17.327166,1.828837,11.350919,16.872046,9.042731,9.295775,8.927668,8.963595,283.863767,0.164717,0.19601,0.395799,26.364999,2.756653,22.495347,17.396867,20.270759,17.459714,10.685375,11.84593,0.419551,6.190411,7.175256,7.369301,6.400599,6.778409,5.47487,6.134498,5.208251,1.960215,26.358928,,7.662488,,,,2.757186,3.784793,3.651645,,,,2.122063,5.006762,2.158091,,5.277791,6.425232,4.575491,6.245813,7.783829,6.985781,6.714544,5.737077,3.881191,5.745993,7.678515,5.304133,6.776118,7.919629,6.825401,7.851066,5.631281,4.339197,5.534131,5.570556,22.514632,17.396389,20.265613,17.457043,10.682539,11.845111,26.39436,11.071506,12.636308,15.566805,9.780089,11.014845,30.362192,13.273691,14.416891,18.42262,11.744499,11.854817,7.084733,8.294935,7.70446,8.403965,7.578388,6.941908,7.927232,8.284346,7.426213,7.617611,0.419909,6.189995,7.175164,7.368597,6.400598,6.777524,5.474559,6.134087,5.207523,0.948769,3.207814,20.791624,15.434255,17.243708,15.260869,11.144619,12.457925,7.21125,8.082,8.25775,7.6925,7.58925,5.71151,1.843495,2.338087,32.819556,13.529923,15.293851,18.868448,7.286957,12.156028,26.217194,15.865084,17.813755,17.654765,9.913436,12.760263,26.806234,11.929177,12.10303,15.16381,9.342511,11.320866,29.344369,13.89823,13.958265,17.967233,11.909735,12.887976,7.125285,7.931529,8.238912,7.602171,7.486802,6.827964,7.394106,7.838702,7.279415,7.332191,0.780825,0.981631,0.37695,1.230615,0.934132,24.384524,16.588583,19.411346,16.233415,10.898075,12.699142,31.330357,15.654266,16.679563,16.418155,7.823909,12.207837,25.610341,10.751267,11.524839,14.276783,9.207503,11.253802,24.970936,10.923285,11.952687,14.959108,9.526191,11.96627,7.240312,8.093357,8.388777,7.658782,7.391545,6.81002,7.615079,7.93254,7.155258,7.048611,
std,158.583367,5.491329,0.500029,10.940735,0.376673,5.995903,4.358458,5.514939,5.650199,5.477009,5.491068,158.584899,0.370947,0.303539,0.489051,3.563648,1.230689,12.569802,7.044003,6.782895,6.085526,6.126544,6.362746,0.493515,1.950305,1.740575,1.550501,1.954078,1.79408,2.156163,1.841258,2.129354,0.245925,3.566763,,3.758935,,,,1.230905,2.845708,2.805237,,,,1.407181,1.444531,1.105246,,3.30952,2.619024,2.801874,2.418858,1.754868,2.052232,2.263407,2.570207,2.620507,2.502218,2.006565,2.529135,2.235152,1.700927,2.156283,1.791827,2.608913,2.717612,1.734059,4.762569,12.587674,7.0467,6.783003,6.085239,6.124888,6.362154,16.297045,6.659233,6.717476,7.328256,6.998428,6.06015,16.249937,6.976775,6.263304,6.577929,6.886532,6.167314,1.395783,1.40746,1.564321,1.076608,1.778315,1.498653,1.627054,1.283657,1.779129,1.773094,0.493573,1.950169,1.740315,1.550453,1.953702,1.794055,2.156363,1.841285,2.129565,0.989889,2.444813,12.968524,6.915322,6.59642,5.356969,5.514028,5.921789,1.41545,1.455741,1.179317,1.626839,1.793136,1.820764,0.975662,0.63124,17.15527,7.977482,7.292868,8.535963,6.125187,8.241906,14.388694,6.658494,6.535894,6.129746,5.67555,6.651547,16.402836,6.401556,5.990607,7.290107,5.856329,6.296155,14.551171,6.17169,5.398621,6.100307,6.313281,5.615691,1.37139,1.503236,1.18028,1.5482,1.744634,1.411096,1.588145,1.280936,1.647478,1.521854,1.611694,1.382139,0.484683,1.294557,0.753902,13.71212,7.471537,6.124502,5.163777,5.900697,6.557041,17.55154,9.336288,7.880088,7.231325,6.100502,8.615985,17.477134,5.740351,6.004222,6.927869,6.385852,6.516178,17.007669,6.226283,7.01065,7.935509,6.403117,7.012067,1.576596,1.610309,1.459094,1.74467,1.961417,1.507341,1.504551,1.340868,1.672787,1.717988,
min,1.0,1.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,1.0,1.0,0.0,-0.83,0.0,18.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,18.0,,1.0,,,,1.0,0.0,1.0,,,,1.0,1.0,1.0,,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,2.0,2.0,1.0,3.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,1.0,0.0,0.0,3.0,1.0,4.0,3.0,2.0,1.0,1.0,1.0,10.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,4.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,3.0,2.0,1.0,2.0,2.0,4.0,1.0,1.0,
25%,154.0,4.0,0.0,8.0,2.0,7.0,14.0,4.0,4.0,4.0,4.0,154.0,0.0,-0.02,0.0,24.0,2.0,15.0,15.0,17.39,15.0,5.0,9.52,0.0,5.0,6.0,6.0,5.0,6.0,4.0,5.0,4.0,2.0,24.0,,5.0,,,,2.0,1.0,1.0,,,,1.0,4.0,1.0,,2.0,4.0,2.0,5.0,7.0,6.0,5.0,4.0,2.0,4.0,7.0,3.0,5.0,7.0,5.0,7.0,4.0,2.0,5.0,2.0,15.0,15.0,17.39,15.0,5.0,9.52,10.0,6.0,8.0,10.0,5.0,7.0,20.0,10.0,10.0,15.0,6.0,10.0,6.0,8.0,7.0,8.0,7.0,6.0,7.0,8.0,6.0,7.0,0.0,5.0,6.0,6.0,5.0,6.0,4.0,5.0,4.0,0.0,2.0,14.81,10.0,10.0,10.0,7.0,9.0,7.0,7.0,8.0,7.0,7.0,5.0,1.0,2.0,20.0,10.0,10.0,10.0,0.0,5.0,16.67,10.0,15.0,15.0,5.0,10.0,10.0,8.0,8.0,9.0,5.0,7.0,19.15,10.0,10.0,15.0,10.0,10.0,7.0,7.0,8.0,7.0,7.0,6.0,6.0,7.0,6.0,6.0,0.0,0.0,0.0,1.0,1.0,15.22,10.0,16.67,14.81,5.0,10.0,20.0,10.0,10.0,10.0,0.0,5.0,10.0,7.0,7.0,9.0,5.0,7.0,10.0,7.0,7.0,9.0,6.0,5.0,7.0,7.0,8.0,7.0,6.0,6.0,7.0,7.0,6.0,6.0,
50%,281.0,8.0,1.0,16.0,2.0,11.0,18.0,8.0,9.0,8.0,8.0,281.0,0.0,0.21,0.0,26.0,2.0,20.0,18.37,20.0,18.0,10.0,10.64,0.0,6.0,7.0,7.0,7.0,7.0,6.0,6.0,5.0,2.0,26.0,,8.0,,,,2.0,3.0,3.0,,,,2.0,5.0,2.0,,6.0,7.0,4.0,6.0,8.0,7.0,7.0,6.0,3.0,6.0,8.0,6.0,7.0,8.0,7.0,8.0,6.0,4.0,6.0,4.0,20.0,18.18,20.0,18.0,10.0,10.64,25.0,10.0,10.0,15.0,10.0,10.0,25.0,15.0,15.0,20.0,10.0,10.0,7.0,8.0,8.0,8.0,8.0,7.0,8.0,8.0,8.0,8.0,0.0,6.0,7.0,7.0,7.0,7.0,6.0,6.0,5.0,0.0,3.0,17.65,15.79,18.42,15.91,10.0,12.5,7.0,8.0,8.0,8.0,8.0,6.0,1.0,2.0,30.0,10.0,15.0,20.0,5.0,10.0,20.0,16.67,19.05,18.37,10.0,13.0,25.0,10.0,10.0,15.0,10.0,10.0,25.0,15.0,15.0,18.52,10.0,13.95,7.0,8.0,8.0,8.0,8.0,7.0,8.0,8.0,7.0,7.0,0.0,1.0,0.0,1.0,1.0,20.0,16.67,20.0,16.33,10.0,14.29,25.0,15.0,18.0,17.0,10.0,10.0,20.0,10.0,10.0,12.0,9.0,10.0,20.0,10.0,10.0,15.0,10.0,10.0,7.0,8.0,8.0,8.0,8.0,7.0,8.0,8.0,7.0,7.0,
75%,407.0,13.0,1.0,26.0,2.0,15.0,20.0,13.0,14.0,13.0,13.0,408.0,0.0,0.43,1.0,28.0,4.0,25.0,20.0,23.81,20.0,15.0,16.0,1.0,8.0,8.0,8.0,8.0,8.0,7.0,7.0,7.0,2.0,28.0,,10.0,,,,4.0,6.0,6.0,,,,2.0,6.0,3.0,,7.0,9.0,7.0,8.0,9.0,9.0,8.0,8.0,6.0,8.0,9.0,7.0,9.0,9.0,8.0,9.0,8.0,7.0,7.0,8.0,25.0,20.0,23.81,20.0,15.0,16.0,35.0,15.0,16.0,20.0,15.0,15.0,40.0,18.75,20.0,20.0,15.0,15.63,8.0,9.0,9.0,9.0,9.0,8.0,9.0,9.0,9.0,9.0,1.0,8.0,8.0,8.0,8.0,8.0,7.0,7.0,7.0,2.0,4.0,25.0,20.0,20.0,20.0,15.0,16.28,8.0,9.0,9.0,9.0,9.0,7.0,3.0,3.0,40.0,20.0,20.0,24.0,10.0,20.0,30.0,20.0,20.0,20.0,15.0,16.67,40.0,15.0,15.0,20.0,10.0,15.0,38.46,19.23,17.39,20.0,15.09,16.515,8.0,9.0,9.0,9.0,9.0,8.0,8.0,9.0,8.0,8.0,1.0,1.0,1.0,1.0,1.0,30.0,20.0,20.0,20.0,15.0,16.67,40.0,20.0,20.0,20.0,10.0,20.0,37.0,15.0,15.0,20.0,10.0,15.0,35.0,15.0,15.0,20.0,10.0,15.0,8.0,9.0,9.0,9.0,9.0,8.0,9.0,9.0,8.0,8.0,


## General numbers

In [3]:
print(f'number of participants in the study: {len(Tinder['iid'].unique())}')
print(f'number of different waves: {len(Tinder['wave'].unique())}')
print(f'maximum number of partners met during the waves:', len(Tinder['order'].unique()))
print(f'number of decisions to continue: {sum(Tinder['dec']==1)}, i.e. a mean of {round(
    sum(Tinder['dec']==1)/len(Tinder['iid'].unique()), ndigits=1)} per participant')
print(f'number of matches: {sum(Tinder['match']==1)}, i.e. a mean of {round(
    sum(Tinder['match']==1)/len(Tinder['iid'].unique()), ndigits=1)} matches per participant')
print(f'number of same-race meetings: {sum(Tinder['samerace']==1)}, i.e. {round(
    sum(Tinder['samerace']==1)/Tinder.shape[0]*100, ndigits=1)} percent of the meetings')
print(f'number of dates that occurred after the meetings: {sum(Tinder['date_3']==1)}, i.e. {round(
    sum(Tinder['date_3']==1)/Tinder.shape[0]*100, ndigits=1)} percent of the meetings')

number of participants in the study: 551
number of different waves: 21
maximum number of partners met during the waves: 22
number of decisions to continue: 3518, i.e. a mean of 6.4 per participant
number of matches: 1380, i.e. a mean of 2.5 matches per participant
number of same-race meetings: 3316, i.e. 39.6 percent of the meetings
number of dates that occurred after the meetings: 1498, i.e. 17.9 percent of the meetings


## Genders' repartition

In [5]:
fig = px.pie(Tinder.groupby('iid')['gender_name'].first(), names = 'gender_name', color='gender_name',
             color_discrete_map = {'Female':'deeppink','Male':'deepskyblue'},
             title='Percentage of males and females in the data')
fig.update_layout(width=600, legend_title_text='Gender', title_x=0.5)
fig.show()

fig = px.histogram(Tinder, x='match', color='gender_name', color_discrete_map = {'Female':'deeppink','Male':'deepskyblue'},
                   text_auto='f', title='Effect of the gender on the probability of matching'
                   )
fig.update_layout(width=800, legend_title_text='Gender', title_x=0.5)
fig.show()

fig = px.histogram(Tinder, x='dec_o', color='gender_name', color_discrete_map = {'Female':'deeppink','Male':'deepskyblue'},
                   text_auto='f', title="Effect of the gender on the probability of the partner's positive decision"
                   )
fig.update_layout(width=800, legend_title_text='Gender', title_x=0.5)
fig.show()

Conclusion : la r√©partition homme/femme est tr√®s bonne, il y a autant de matchs pour les deux sexes, en revanche les femmes sont plus souvent accept√©es pour un match que les hommes

## Ages' repartition

In [105]:
fig = px.histogram(Tinder, 'age', color='gender_name', color_discrete_map = {'Female':'deeppink','Male':'deepskyblue'},
             barmode="overlay", title='Repartition of ages per gender (means in dashed lines)')
fig.add_vline(x=Tinder.loc[Tinder['gender_name']=='Female','age'].mean(), line_dash="dash", line_color="deeppink")
fig.add_vline(x=Tinder.loc[Tinder['gender_name']=='Male','age'].mean(), line_dash="dash", line_color="deepskyblue")
fig.update_layout(legend_title_text='Gender')
fig.show()

print(f'moyenne age femmes : {Tinder.loc[Tinder['gender_name']=='Female','age'].mean()}, \
      des hommes : {Tinder.loc[Tinder['gender_name']=='Male','age'].mean()}')

fig = px.box(Tinder, 'age', color='gender_name', color_discrete_map = {'Female':'deeppink','Male':'deepskyblue'},
       notched=True, title="Box plots of the repartition of ages per gender")
fig.update_layout(legend_title_text='Gender')
fig.show()

moyenne age femmes : 26.105850934692885,       des hommes : 26.609269932756966


Conclusion : les deux courbes suivent √† peu pr√®s une loi normale, avec une sur-repr√©sentation des hommes de 27 ans et des femmes de 33-35 ans. Les hommes √©taient en moyenne 1/2 ann√©e plus √¢g√©s que les femmes

# Question: Can people accurately predict their own perceived value in the dating market?

## Choix des variables √† √©tudier

Variables d'int√©r√™t pour r√©pondre √† cette question, d'apr√®s le fichier 'Speed+Dating+Data+Key.doc' :
- avant les rencontres :
    - combien de partenaires (sur 20) cette personne pense vont demander √† continuer avec elle : 'expnum' 
    - comment se per√ßoit-elle : 'attr3_1'¬†Attractive, 'sinc3_1' Sincere, 'int3_1' Intelligent, 'fun3_1' Fun, 'amb3_1' Ambitious
    - comment pense-t-elle √™tre per√ßue par les autres : 'attr5_1'¬†Attractive, 'sinc5_1' Sincere, 'int5_1' Intelligent, 'fun5_1' Fun, 'amb5_1' Ambitious
- pendant les rencontres :
    - comment elle a not√© ses partenaires (et donc comment ils l'ont not√©e dans leurs fiches) : 'attr', 'sinc', 'intel', 'fun', 'amb' 
    - quelle probabilit√© met-elle (sur 10) sur le fait que son partenaire demande √† continuer : 'prob'
- au milieu du speed dating :
    - comment se per√ßoit-elle : 'attr3_s'¬†Attractive, 'sinc3_s' Sincere, 'int3_s' Intelligent, 'fun3_s' Fun, 'amb3_s' Ambitious
- √† la fin du speed dating :
    - comment se per√ßoit-elle : 'attr3_2'¬†Attractive, 'sinc3_2' Sincere, 'int3_2' Intelligent, 'fun3_2' Fun, 'amb3_2' Ambitious
    - comment pense-t-elle √™tre per√ßue par les autres : 'attr5_2'¬†Attractive, 'sinc5_2' Sincere, 'int5_2' Intelligent, 'fun5_2' Fun, 'amb5_2' Ambitious
- 3-4 semaines apr√®s le speed dating :
    - comment se per√ßoit-elle : 'attr3_3'¬†Attractive, 'sinc3_3' Sincere, 'int3_3' Intelligent, 'fun3_3' Fun, 'amb3_3' Ambitious
    - comment pense-t-elle √™tre per√ßue par les autres : 'attr5_3'¬†Attractive, 'sinc5_3' Sincere, 'int5_3' Intelligent, 'fun5_3' Fun, 'amb5_3' Ambitious

In [8]:
# choix des variables √† √©tudier selon leur taux de remplissage

col_of_interest = ['expnum', 'prob', 'attr', 'sinc', 'intel', 'fun', 'amb', 'attr_o', 'sinc_o', 'intel_o', 'fun_o', 'amb_o',
                   'attr5_1', 'sinc5_1', 'intel5_1', 'fun5_1', 'amb5_1', 'attr5_2', 'sinc5_2', 'intel5_2', 'fun5_2', 'amb5_2', 
                   'attr5_3', 'sinc5_3', 'intel5_3', 'fun5_3', 'amb5_3', 'attr3_1', 'sinc3_1', 'intel3_1', 'fun3_1', 'amb3_1', 
                   'attr3_s', 'sinc3_s', 'intel3_s', 'fun3_s', 'amb3_s', 'attr3_2', 'sinc3_2', 'intel3_2', 'fun3_2', 'amb3_2', 
                   'attr3_3', 'sinc3_3', 'intel3_3', 'fun3_3', 'amb3_3']

dict_missing = {}
for i in range(len(col_of_interest)):
    dict_missing[i] = [col_of_interest[i], (100 * Tinder.loc[:,col_of_interest[i]].isnull().sum() / Tinder.shape[0])]
df_missing = pd.DataFrame(data=dict_missing).T
df_missing.columns = ['column', 'perc_missing']
df_missing = df_missing.sort_values('perc_missing')
df_missing = df_missing.reset_index(drop=True)

fig = px.bar(df_missing, x='column', y='perc_missing',
        labels={
            'column': 'Feature of interest',
            'perc_missing': 'Missing values (%)'
        },
        text_auto='.2f', title='Percentage of missing values in each column of interest')

fig.update_xaxes(tickangle=45)
fig.update_layout(yaxis_range=[0,100])
fig.show()

Conclusions : 
- les variables remplies au d√©but du processus ont peu de missing values, on va travailler sur : 
    - comment la personne se per√ßoit-elle (*3_1 et *3_2)
    - comment elle a √©t√© not√©e par son partenaire ('attr_o', 'sinc_o', 'intel_o', 'fun_o', 'amb_o')
- on pourrait √©galement travailler sur comment elle a not√© ses partenaires ('attr', 'sinc', 'intel', 'fun', 'amb'), mais les conclusions seraient moins int√©ressantes a priori
- la variable 'prob' a √©t√© beaucoup renseign√©e, on pourra la comparer √† la d√©cision effectivement prise par son partenaire ('dec_o')
- les personnes ont visiblement du mal √† noter comment elles pensent que les autres les per√ßoivent (*5_\*), on pourrait s'y int√©resser dans un second temps mais en gardant √† l'esprit qu'elles ont plus de 40% de valeurs manquantes
- la variable 'expnum' a √©t√© tr√®s peu renseign√©e, on ne s'y int√©ressera pas

## Relation entre l'auto-perception et les notes r√©ellement re√ßues

In [9]:
df_perception = Tinder.groupby('iid').agg({'attr':'mean', 'sinc':'mean', 'intel':'mean', 'fun':'mean', 'amb':'mean',
                           'attr_o':'mean', 'sinc_o':'mean', 'intel_o':'mean', 'fun_o':'mean', 'amb_o':'mean',
                           'attr3_1':'first', 'sinc3_1':'first', 'intel3_1':'first', 'fun3_1':'first', 'amb3_1':'first', 
                           'attr3_2':'first', 'sinc3_2':'first', 'intel3_2':'first', 'fun3_2':'first', 'amb3_2':'first',
                           'gender':'first', 'gender_name':'first'})
df_perception.head()

Unnamed: 0_level_0,attr,sinc,intel,fun,amb,attr_o,sinc_o,intel_o,fun_o,amb_o,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,attr3_2,sinc3_2,intel3_2,fun3_2,amb3_2,gender,gender_name
iid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,5.7,7.3,7.3,6.8,6.3,6.7,7.4,8.0,7.2,8.0,6.0,8.0,8.0,8.0,7.0,6.0,7.0,8.0,7.0,6.0,0,Female
2,6.4,7.0,7.7,6.1,6.5,7.7,7.1,7.9,7.5,7.5,7.0,5.0,8.0,10.0,3.0,7.0,6.0,8.0,9.0,4.0,0,Female
3,8.1,8.6,9.4,7.7,8.8,6.5,7.1,7.3,6.2,7.111111,8.0,9.0,9.0,8.0,8.0,,,,,,0,Female
4,6.4,8.9,8.6,7.8,7.8,7.0,7.1,7.7,7.5,7.7,7.0,8.0,7.0,9.0,8.0,6.0,8.0,7.0,8.0,6.0,0,Female
5,6.3,6.0,7.0,6.0,5.6,5.3,7.7,7.6,7.2,7.8,6.0,3.0,10.0,6.0,8.0,6.0,6.0,9.0,9.0,9.0,0,Female


Question : est-ce que l'auto-√©valuation √©volue dans le temps, entre avant les speed datings et le milieu des speed datings ?

In [10]:
features = ['attr', 'sinc', 'intel', 'fun', 'amb']
names = ['Attractive', 'Sincere', 'Intelligent', 'Fun', 'Ambitious']

fig = make_subplots(rows = 1, cols = 4)
for i in range(4):
    fig.add_trace(
        go.Box(
            y = df_perception[str(features[i])+"3_1"], name = str(names[i])+"_before",
            notched=True, showlegend=False),
            row = 1,
            col = i+1)
    fig.add_trace(
        go.Box(
            y = df_perception[str(features[i])+"3_2"], name = str(names[i])+"_middle",
            notched=True, showlegend=False),
            row = 1,
            col = i+1)
fig.update_layout(title_text="Comparaison of the participants' auto-evaluation before and in the middle of the speed datings")
fig.show()

Conclusion : tr√®s peu d'√©volution entre les deux, √† part pour la sinc√©rit√© qui diminue (peut-√™tre √† cause du contexte particulier des speed datings) 

-> on va √©tudier uniquement les *3_1 dans la suite de l'√©tude

Question : Est-ce qu'il y a une corr√©lation, pour les 5 attributs de l'√©tude (Attractive, Sincere, Intelligent, Fun, Ambitious) entre l'auto-√©valuation que la personne r√©alise avant les speed datings, et l'√©valuation effective de ses partenaires ?

In [68]:
fig = make_subplots(rows = 5, cols = 1)
for i in range(5):
    fig.add_trace(
        go.Box(
            x = df_perception[str(features[i])+"3_1"], 
            y = df_perception[str(features[i])+"_o"],
            notched=True,
            name = names[i]
            ),
            row = i+1,
            col = 1)
fig.update_xaxes(range=[1, 11], title="participtants' auto-evaluation")
fig.update_yaxes(range=[0, 11], title="partners' mean")
fig.update_layout(width=1000, height=1000, title_x=0.5,
                  title_text="Comparaison of the participants' auto-evaluation with their partners' mean")
fig.show()

Conclusion : il semble que pour certains attributs (notamment Attractive et Fun), les deux augmentent bien en m√™me temps, ce qui signifie que les personnes pr√©disent effectivement leur 'valeur' sur le 'march√©' du speed dating

V√©rifions si c'est significatif, avec un test de corr√©lation de Pearson pour chaque attribut

In [13]:
for i in range(5):
    Tinder_woNan = Tinder[[str(features[i])+"3_1", str(features[i])+"_o"]].copy()
    Tinder_woNan = Tinder_woNan.dropna()
    print(f"test de Pearson pour l'attribut {names[i]} et les deux sexes : \n \
          {pearsonr(Tinder_woNan[str(features[i])+"3_1"], Tinder_woNan[str(features[i])+"_o"], alternative='two-sided')}")

test de Pearson pour l'attribut Attractive et les deux sexes : 
           PearsonRResult(statistic=0.17532352105301927, pvalue=1.030305914129389e-56)
test de Pearson pour l'attribut Sincere et les deux sexes : 
           PearsonRResult(statistic=0.0009773656751386247, pvalue=0.9303629411343852)
test de Pearson pour l'attribut Intelligent et les deux sexes : 
           PearsonRResult(statistic=0.02253756349189459, pvalue=0.0441159411834098)
test de Pearson pour l'attribut Fun et les deux sexes : 
           PearsonRResult(statistic=0.14596762605872252, pvalue=5.520575611271679e-39)
test de Pearson pour l'attribut Ambitious et les deux sexes : 
           PearsonRResult(statistic=0.056185865685096426, pvalue=1.0120179382673853e-06)


Conclusion : c'est effectivement significatif pour Attractive et Fun, mais √©galement pour Ambitious et (de peu) Intelligent

Question : y a-t-il une diff√©rence plus marqu√©e pour un sexe ou l'autre ?

In [69]:
fig = make_subplots(rows = 3, cols = 2, subplot_titles=(names))
legend_traces = {}
for i in range(5):
    scatter = px.scatter(df_perception, x=str(features[i])+"3_1", y=str(features[i])+"_o", 
                         color='gender_name', color_discrete_map = {'Female':'deeppink','Male':'deepskyblue'},
                         opacity=0.6, trendline="ols")
    for trace in scatter.data:
        if trace.name not in legend_traces:
            legend_traces[trace.name] = trace
            fig.add_trace(trace, row=(i//2)+1, col=(i%2)+1)
        else:
            fig.add_trace(trace.update(showlegend=False), row=(i//2)+1, col=(i%2)+1)
fig.update_xaxes(title_text="participants' auto-evaluation")
fig.update_yaxes(title_text="partners' mean")
fig.update_layout(width=1000, height=700, yaxis_range=[0,11], yaxis2_range=[0,11], title_x=0.5,
                  title_text="Comparaison of the participants' auto-evaluation with their partners' mean")
fig.show()

In [60]:
# tests de corr√©lation de Pearson pour chaque attribut et chaque sexe

for i in range(5):
    Tinder_woNan = Tinder[[str(features[i])+"3_1", str(features[i])+"_o", 'gender_name']].copy()
    Tinder_woNan = Tinder_woNan.dropna()
    for sexe in ['Male', 'Female']:
        print(f"test de Pearson pour l'attribut {names[i]} et le sexe {sexe} : \n \
          {pearsonr(Tinder_woNan.loc[Tinder_woNan['gender_name'] == sexe, str(features[i])+"3_1"], 
                    Tinder_woNan.loc[Tinder_woNan['gender_name'] == sexe, str(features[i])+"_o"], alternative='two-sided')}")


test de Pearson pour l'attribut Attractive et le sexe Male : 
           PearsonRResult(statistic=0.20552546793014617, pvalue=7.485068684951476e-40)
test de Pearson pour l'attribut Attractive et le sexe Female : 
           PearsonRResult(statistic=0.11538418046526877, pvalue=2.1386813236942733e-13)
test de Pearson pour l'attribut Sincere et le sexe Male : 
           PearsonRResult(statistic=-0.021379406239046574, pvalue=0.17533413734699888)
test de Pearson pour l'attribut Sincere et le sexe Female : 
           PearsonRResult(statistic=0.017518300258721264, pvalue=0.26937539596041643)
test de Pearson pour l'attribut Intelligent et le sexe Male : 
           PearsonRResult(statistic=0.02475378916229479, pvalue=0.11741672565478274)
test de Pearson pour l'attribut Intelligent et le sexe Female : 
           PearsonRResult(statistic=0.011472562789788235, pvalue=0.4695540239971435)
test de Pearson pour l'attribut Fun et le sexe Male : 
           PearsonRResult(statistic=0.201370911479087

Conclusion : 
Les tuples (attribut, sexe) pour lesquels les personnes pr√©disent effectivement leur 'valeur' sur le 'march√©' du speed dating sont :
- Attractive pour les deux sexes (mais beaucoup plus fort pour les hommes)
- Fun pour les deux sexes (mais beaucoup plus fort pour les hommes)
- Ambitious pour les hommes

On a donc 'perdu' des attributs qui √©taient significatifs avec les deux sexes m√©lang√©s : 
- Ambitious pour les femmes : √©tant donn√© l'√©cart entre les deux p-values hommes-femmes ainsi que l'augmentation de la p-value hommes par rapport √† celle des deux sexes m√©lang√©s, on peut en d√©duire que c'√©tait uniquement l'√©chantillon masculin qui avait donn√© de la significativit√© au test des deux sexes m√©lang√©s
- Intelligent pour les deux : car il √©tait peu significatif donc en diminuant la taille de chaque √©chantillon, chaque test a perdu de la puissance

## Relation entre la sensation que le partenaire va accepter de nous revoir, et sa d√©cision r√©elle

Question : les participants arrivent-ils bien √† percevoir ce que leur partenaire a ressenti ?

In [77]:
# y a-t-il assez de partenaires qui r√©pondent oui pour qu'on puisse faire une √©tude statistique ?

print("nombre de partenaires qui veulent donner suite au speed dating ou non : \n", Tinder['dec_o'].value_counts())

print("test binomial pour v√©rifier si la diff√©rence du nombre de non et oui est significative : \n", 
      binomtest(Tinder['dec_o'].value_counts()[1], Tinder.shape[0], p=0.5, alternative='two-sided'))

# comme on pouvait s'y attendre, il y a plus de partenaires qui ne veulent pas donner suite que de partenaires qui acceptent. 
# Cependant, cette diff√©rence n'est pas tr√®s √©lev√©e (mais n√©anmoins significative)

nombre de partenaires qui veulent donner suite au speed dating ou non : 
 dec_o
0    4863
1    3515
Name: count, dtype: int64
test binomial pour v√©rifier si la diff√©rence du nombre de non et oui est significative : 
 BinomTestResult(k=3515, n=8378, alternative='two-sided', statistic=0.41955120553831465, pvalue=3.164020331927398e-49)


In [71]:
fig = go.Figure()
fig.add_trace(go.Histogram(histfunc="count", x=Tinder.loc[Tinder['dec_o']==1,'prob'], name="Partner_Yes"))
fig.add_trace(go.Histogram(histfunc="count", x=Tinder.loc[Tinder['dec_o']==0,'prob'], name="Partner_No"))
fig.update_xaxes(title_text="participants' idea of probability that their partner will say yes")
fig.update_yaxes(title_text="count")
fig.update_layout(title_text="Histogram of the participants' idea of their partner's decision", title_x=0.5)
fig.show()

In [74]:
# test statistique pour v√©rifier si la courbe rouge est significativement plus √† gauche que la bleue
# (r√©alis√© sur les valeurs enti√®res de 'prob' car les valeurs en *.5 n'ont pas assez d'√©chantillons pour des tests statistiques)

dico_test = {}
for i in range(11):
    dico_test['prob_'+str(i)] = Tinder.loc[Tinder['prob']==i,:].groupby('dec_o')['prob'].count()
df_test = pd.DataFrame(dico_test)
display(df_test)

print("test de Wilcoxon de comparaison des m√©dianes :\n", 
      ranksums(df_test.iloc[0,:], df_test.iloc[1,:], alternative='greater')) 

Unnamed: 0_level_0,prob_0,prob_1,prob_2,prob_3,prob_4,prob_5,prob_6,prob_7,prob_8,prob_9,prob_10
dec_o,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,34,283,338,477,583,1059,818,579,295,86,61
1,15,132,201,231,349,740,577,551,357,155,127


test de Wilcoxon de comparaison des m√©dianes :
 RanksumsResult(statistic=0.6894826131275873, pvalue=0.2452598055207555)


Conclusion : la courbe rouge n'est pas significativement plus √† droite de la bleue, contrairement √† ce qu'on aurait pu penser. Les participants ne parviennent donc pas bien √† pr√©dire ce que va d√©cider leur partenaire

Question : est-ce que l'un des sexes arrive √† faire cette pr√©diction ?

In [92]:
print("nombre de partenaires qui veulent donner suite au speed dating ou non, pour les hommes : \n", 
      Tinder.loc[Tinder['gender_name']=='Male', 'dec_o'].value_counts())
print("test binomial hommes : \n", 
      binomtest(Tinder.loc[Tinder['gender_name']=='Male', 'dec_o'].value_counts()[1], 
                Tinder.loc[Tinder['gender_name']=='Male', :].shape[0], p=0.5, alternative='two-sided'))
print('')
print("nombre de partenaires qui veulent donner suite au speed dating ou non, pour les femmes : \n", 
      Tinder.loc[Tinder['gender_name']=='Female', 'dec_o'].value_counts())
print("test binomial femmes : \n", 
      binomtest(Tinder.loc[Tinder['gender_name']=='Female', 'dec_o'].value_counts()[1], 
                Tinder.loc[Tinder['gender_name']=='Female', :].shape[0], p=0.5, alternative='two-sided'))

nombre de partenaires qui veulent donner suite au speed dating ou non, pour les hommes : 
 dec_o
0    2665
1    1529
Name: count, dtype: int64
test binomial hommes : 
 BinomTestResult(k=1529, n=4194, alternative='two-sided', statistic=0.36456843109203624, pvalue=1.3157407067381332e-69)

nombre de partenaires qui veulent donner suite au speed dating ou non, pour les femmes : 
 dec_o
0    2198
1    1986
Name: count, dtype: int64
test binomial femmes : 
 BinomTestResult(k=1986, n=4184, alternative='two-sided', statistic=0.47466539196940727, pvalue=0.0011037694431248893)


In [86]:
fig = make_subplots(rows = 2, cols = 1)
fig.add_trace(go.Histogram(histfunc="count", 
                           x=Tinder.loc[(Tinder['dec_o']==1) & (Tinder['gender_name']=='Male'),'prob'], 
                           marker={'color': 'deepskyblue'}, name="Males_Yes"), 
              row=1, col=1)
fig.add_trace(go.Histogram(histfunc="count", 
                           x=Tinder.loc[(Tinder['dec_o']==0) & (Tinder['gender_name']=='Male'),'prob'], 
                           marker={'color': 'blue'}, name="Males_No"), 
              row=1, col=1)
fig.add_trace(go.Histogram(histfunc="count", 
                           x=Tinder.loc[(Tinder['dec_o']==1) & (Tinder['gender_name']=='Female'),'prob'], 
                           marker={'color': 'pink'}, name="Females_Yes"), 
              row=2, col=1)
fig.add_trace(go.Histogram(histfunc="count", 
                           x=Tinder.loc[(Tinder['dec_o']==0) & (Tinder['gender_name']=='Female'),'prob'], 
                           marker={'color': 'red'}, name="Females_No"), 
              row=2, col=1)
fig.update_xaxes(title_text="participants' idea of probability that their partner will say yes")
fig.update_yaxes(title_text="count")
fig.update_layout(width=1200, height=600,
                  title_text="Histogram of the participants' idea of their partner's decision", title_x=0.5)
fig.show()


In [89]:
# autre fa√ßon de le repr√©senter

fig = px.box(Tinder, y='prob', x='dec_o', color='gender_name', 
             color_discrete_map = {'Female':'deeppink','Male':'deepskyblue'}, 
             notched=True, 
             labels={
            'dec_o': 'Real choice of the partner',
            'prob': 'Perception of the participant'
             },
             title="Box plot of the participants' perception of their partners' decision")
fig.update_layout(legend_title_text='Gender', title_x=0.5)
fig.show()

In [82]:
# tests statistiques pour v√©rifier si les courbes 'oui' sont significativement plus √† droite que les courbes 'non'

dico_test_male = {}
dico_test_female = {}
for i in range(11):
    dico_test_male['prob_'+str(i)] = Tinder.loc[(Tinder['prob']==i) & (Tinder['gender_name']=='Male'),:].groupby('dec_o')['prob'].count()
    dico_test_female['prob_'+str(i)] = Tinder.loc[(Tinder['prob']==i) & (Tinder['gender_name']=='Female'),:].groupby('dec_o')['prob'].count()
df_test_male = pd.DataFrame(dico_test_male)
df_test_female = pd.DataFrame(dico_test_female)
display(df_test_male)
display(df_test_female)

print("test de Wilcoxon de comparaison des m√©dianes pour les hommes :\n", 
      ranksums(df_test_male.iloc[0,:], df_test_male.iloc[1,:], alternative='greater')) 
print("test de Wilcoxon de comparaison des m√©dianes pour les femmes :\n", 
      ranksums(df_test_female.iloc[0,:], df_test_female.iloc[1,:], alternative='greater')) 

Unnamed: 0_level_0,prob_0,prob_1,prob_2,prob_3,prob_4,prob_5,prob_6,prob_7,prob_8,prob_9,prob_10
dec_o,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,17,149,174,293,344,546,456,328,141,48,40
1,6,56,73,97,165,297,259,273,145,70,59


Unnamed: 0_level_0,prob_0,prob_1,prob_2,prob_3,prob_4,prob_5,prob_6,prob_7,prob_8,prob_9,prob_10
dec_o,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,17,134,164,184,239,513,362,251,154,38,21
1,9,76,128,134,184,443,318,278,212,85,68


test de Wilcoxon de comparaison des m√©dianes pour les hommes :
 RanksumsResult(statistic=1.1491376885459788, pvalue=0.12524960489747994)
test de Wilcoxon de comparaison des m√©dianes pour les femmes :
 RanksumsResult(statistic=0.22982753770919576, pvalue=0.4091128927164975)


Conclusions : 
- aucun des deux sexes ne parvient significativement √† pr√©dire la d√©cision de son partenaire. Les diff√©rences marqu√©es que l'on voit sur l'histogramme des hommes sont probablement dues au fait qu'il y a beaucoup plus de non pour les hommes (comme le montrent les tests binomiaux)
- le boxplot montre que :
    - les deux sexes se surestiment lorsque le partenaire va dire non (probabilit√© minimale √† 1, m√©diane √† 5), 
    - dans les cas o√π le partenaire √† dit oui, les hommes l'ont mieux per√ßu que les femmes mais les deux sexes ont mis des notes plus larges (0 √† 10 au lieu de 1 √† 9)

# Conclusion globale

Les deux sexes ne parviennent pas √† pr√©dire correctement si leur partenaire de speed dating va accepter de les revoir ou non. 

En revanche, les deux sexes parviennent bien √† auto-√©valuer leur 'valeur' sur le 'march√©' sur au moins deux attributs : l'attractivit√© et l'humour. Les hommes le font encore mieux que les femmes sur ces deux attributs, et sur un troisi√®me : l'ambition.

Perspectives : 
- concernant l'auto-√©valuation des attributs : les tests sont tr√®s significatifs, gr√¢ce √† la grande taille des √©chantillons, donc il faudrait tester si ce jeu de donn√©es pourrait suffire √† pr√©dire en Machine Learning la 'compatibilit√©' entre deux personnes, selon leur auto-√©valuation de leurs attributs
- on pourrait √©galement r√©aliser, avec ce jeu de donn√©es, l'√©tude de la relation entre comment on se per√ßoit (les 5 attributs *3_\*) et comment on pense que les autres nous per√ßoivent (les 5 attributs *5_\*), et dans une moindre mesure (√† cause des missing values) leur √©volution dans le temps