## SPEED DATING EXPERIMENT (classification)

In [225]:
import os
import numpy as np
import pandas as pd
import csv
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
from sklearn import linear_model, feature_selection, neighbors, metrics, grid_search, cross_validation

pd.set_option('display.max_rows', 10)
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_columns', 10)

%matplotlib inline
plt.style.use('ggplot')



pd.set_option('display.max_rows', 10)
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_columns', 10)

In [226]:
df_raw = pd.read_csv(os.path.join('..', 'CODE', 'speed-dating-experiment', 'Speed Dating Data.csv'))

In [227]:
df_raw

Unnamed: 0,iid,id,gender,idg,condtn,...,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
0,1,1.0,0,1,1,...,,,,,
1,1,1.0,0,1,1,...,,,,,
2,1,1.0,0,1,1,...,,,,,
3,1,1.0,0,1,1,...,,,,,
4,1,1.0,0,1,1,...,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
8373,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0
8374,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0
8375,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0
8376,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0


In [228]:
df_raw.columns

Index([u'iid', u'id', u'gender', u'idg', u'condtn', u'wave', u'round',
       u'position', u'positin1', u'order',
       ...
       u'attr3_3', u'sinc3_3', u'intel3_3', u'fun3_3', u'amb3_3', u'attr5_3',
       u'sinc5_3', u'intel5_3', u'fun5_3', u'amb5_3'],
      dtype='object', length=195)

In [229]:
df_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8378 entries, 0 to 8377
Columns: 195 entries, iid to amb5_3
dtypes: float64(174), int64(13), object(8)
memory usage: 12.5+ MB


In [230]:
df = df_raw[df_raw.columns[0:]]

### Dropping Uncessisary Columns:

In [231]:
df.drop('position', axis = 1, inplace = True)
df.drop('positin1', axis = 1, inplace = True)
df.drop('field', axis = 1, inplace = True)
df.drop('field_cd', axis = 1, inplace = True)
df.drop('undergrd', axis = 1, inplace = True)
df.drop('mn_sat', axis = 1, inplace = True)
df.drop('tuition', axis = 1, inplace = True)
df.drop('from', axis = 1, inplace = True)
df.drop('zipcode', axis = 1, inplace = True)
df.drop('income', axis = 1, inplace = True)
df.drop('career', axis = 1, inplace = True)
df.drop('career_c', axis = 1, inplace = True)

In [232]:
df.columns

Index([u'iid', u'id', u'gender', u'idg', u'condtn', u'wave', u'round',
       u'order', u'partner', u'pid',
       ...
       u'attr3_3', u'sinc3_3', u'intel3_3', u'fun3_3', u'amb3_3', u'attr5_3',
       u'sinc5_3', u'intel5_3', u'fun5_3', u'amb5_3'],
      dtype='object', length=183)

In [233]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8378 entries, 0 to 8377
Columns: 183 entries, iid to amb5_3
dtypes: float64(171), int64(12)
memory usage: 11.7 MB


In [234]:
df_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8378 entries, 0 to 8377
Columns: 195 entries, iid to amb5_3
dtypes: float64(174), int64(13), object(8)
memory usage: 12.5+ MB


In [235]:
df.count()

iid         8378
id          8377
gender      8378
idg         8378
condtn      8378
            ... 
attr5_3     2016
sinc5_3     2016
intel5_3    2016
fun5_3      2016
amb5_3      2016
dtype: int64

***
## General Variable KEY:

| Variable | Description |
| ---| ---|
|attr | Attractive|
|sinc |Sincere  |
|intel | Intelligent|
| fun | Fun|
| amb | Ambitious|
| shar |Shared Interests/Hobbies

***Each attribute has a unique code at the end of the variable which references the survey question and when in the experiment the question was being asked*** (signup, during dating expirement, after dating experiment)

Example: 

attr**1_1** 

Variable: attractivness

Question: 'what do you look for in the opposite sex?' 

Point in experiment: signup survey

*vs.*

attr**1_2** 

Variable: attractivness

Question: 'what do you look for in the opposite sex?' 

Point in experiment: after dating event

***

## QUESTION 1:	
### Does one’s perception of themselves affect their dating outcomes? Does this differ by gender?

**Hypothesis**: people who have lower self esteem (i.e. negatively evaluate themselves by giving lower scores on the attribute scale) will get less dates/matches; while those who give themselves higher ratings will get more.  Women are more likely to give themselves more critical ratings than men, thus negatively affecting their outcome. 



| Variable CODE | Scale | When during Experiment? |Question| 
| :------:| :------:| :------: |:------|
|  **3_1**| 1-10 | Signup| Based on what you think the opposite sex looks for in a date, how do you think you measure up?
|**3_2**| 1-10| After event| Based on what you think the opposite sex looks for in a date, how do you think you measure up?
|**5_1**| 1-10| Signup|How do you think others perceive you? |
|**5_2**| 1-10| After event|How do you think others perceive you? |
|**3_s**| 1-10| During event|Rate your opinion of your own attributes  |


**exphappy**: Overall, on a scale of 1-10, how happy do you expect to be with the people you meet during the speed-dating event.

**expnum**: Out of the 20 people you will meet, how many do you expect will be interested in dating you? 

**match_es**: How many matches do you estimate you will get (a match occurs when you and your partner both check “Yes” next to decision)?: 

**match**	 (1=yes, 0=no)

**dec**: decision (1=yes, 0 = no)

**dec_o**: decision of partner (1=yes, 0 = no)

**round**: number of people that met in wave

**iid**: 	unique subject number, group(wave id gender): use this to count # of matches someone got 

**gender** (1=M | 0 =F)


> ### ISSUES/QUESTIONS
- FIND WAYS TO WEIGHT THE AVERAGE OF SCORES
- FIND A WAY TO LOOK AT SELF PERCEPTION VS. WHAT OTHERS THINK 
(i.e. someone could give themselves a '10' attractive score' but other's only gave them an '7' but they get the most dates b/c have high self-esteem. 
- HOW TO ADD UP THE # OF MATCHES SOMEONE GETS (reference iid#?)
- WHICH VARIABLES ABOVE SHOULD I USE? SHOULD I WEIGHT THEM? THEY HAVE DIFFERENT # OF OBSERVATIONS  

#### Attractivness at 3_1 (# of observations = 8273)

In [236]:
len(df.attr3_1.dropna())

8273

In [237]:
len(df.sinc3_1.dropna())

8273

#### Attractivness at 3_2 (# of observations = 7463)

In [238]:
df.attr3_2.unique()

array([  6.,   7.,  nan,   5.,  10.,   8.,   3.,   9.,   4.,   2.])

In [239]:
len(df.attr3_2.dropna())

7463

In [240]:
len(df.sinc3_2.dropna())

7463

#### Attractivness at  5_1 (# of observations = 4906)

In [241]:
len(df.attr5_1.dropna())

4906

In [242]:
len(df.sinc5_1.dropna())

4906

#### Attractivness at 5_2 (# of observations = 4377)

In [243]:
len(df.attr5_2.dropna())

4377

In [244]:
len(df.sinc5_2.dropna())

4377

#### Attractivness at  3_s (# of observations = 4000)

In [245]:
len(df.attr3_s.dropna())

4000

In [246]:
len(df.sinc3_s.dropna())

4000

### Look at exphappy, expnum and match_es  (&round)

___expnum___:  Overall, on a scale of 1-10, how happy do you expect to be with the people you meet during the speed-dating event

In [247]:
len(df.expnum.dropna())

1800

In [248]:
df.expnum.describe()

#data during signup 

count    1800.000000
mean        5.570556
std         4.762569
min         0.000000
25%         2.000000
50%         4.000000
75%         8.000000
max        20.000000
Name: expnum, dtype: float64

***match_es***: How many matches do you estimate you will get (a match occurs when you and your partner both check “Yes” next to decision)? (this was during the experiment after meeting people)

In [249]:
df.match_es.describe()

#data during experiment

count    7205.000000
mean        3.207814
std         2.444813
min         0.000000
25%         2.000000
50%         3.000000
75%         4.000000
max        18.000000
Name: match_es, dtype: float64

***round***: number of people that met in wave

In [250]:
df[['round']].describe()

Unnamed: 0,round
count,8378.0
mean,16.872046
std,4.358458
min,5.0
25%,14.0
50%,18.0
75%,20.0
max,22.0


>Observations: max # of people that a round met with was 22 people; match_es max # of people thought they would match with was 18

***exphappy***: Overall, on a scale of 1-10, how happy do you expect to be with the people you meet during the speed-dating event.

In [251]:
df.exphappy.describe()

count    8277.000000
mean        5.534131
std         1.734059
min         1.000000
25%         5.000000
50%         6.000000
75%         7.000000
max        10.000000
Name: exphappy, dtype: float64

### Look at 3_1  - attractiveness

Q: Based on what do you think the opposite sex looks for in a date, how do you think you measure up?  **at signup**

In [252]:
subset_df = df[df.columns[0:]]

In [253]:
len(subset_df.attr3_1)

8378

In [254]:
subset_df.attr3_1.dropna(inplace = True)

#removing NaN values

In [255]:
len(subset_df.attr3_1)

8273

In [256]:
subset_df

Unnamed: 0,iid,id,gender,idg,condtn,...,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
0,1,1.0,0,1,1,...,,,,,
1,1,1.0,0,1,1,...,,,,,
2,1,1.0,0,1,1,...,,,,,
3,1,1.0,0,1,1,...,,,,,
4,1,1.0,0,1,1,...,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
8373,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0
8374,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0
8375,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0
8376,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0


In [257]:
subset_df.attr3_1.describe()

count    8273.000000
mean        7.084733
std         1.395783
min         2.000000
25%         6.000000
50%         7.000000
75%         8.000000
max        10.000000
Name: attr3_1, dtype: float64

#### Looking at match_es

In [258]:
subset_df.match_es.describe()

count    7205.000000
mean        3.207814
std         2.444813
min         0.000000
25%         2.000000
50%         3.000000
75%         4.000000
max        18.000000
Name: match_es, dtype: float64

In [259]:
len(subset_df.match_es)

8378

In [260]:
len(subset_df.match_es.dropna())

7205

In [261]:
subset_df.match_es.dropna(inplace = True)

#removing NaN values

In [262]:
len(subset_df.match_es)

7205

#### Looking at match (# of matches)

In [263]:
subset_df.match.describe()

count    8378.000000
mean        0.164717
std         0.370947
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         1.000000
Name: match, dtype: float64

In [264]:
len(subset_df.match)

8378

In [265]:
len(subset_df.match.dropna())

8378

In [266]:
subset_df.attr3_1.value_counts()

7.0     2914
8.0     2217
6.0     1100
9.0      729
5.0      642
10.0     268
4.0      238
3.0      145
2.0       20
Name: attr3_1, dtype: int64

In [267]:
dummy_ranks = pd.get_dummies(subset_df.attr3_1, prefix = 'attr_3_1_self')

In [268]:
dummy_ranks

Unnamed: 0,attr_3_1_self_2.0,attr_3_1_self_3.0,attr_3_1_self_4.0,attr_3_1_self_5.0,attr_3_1_self_6.0,attr_3_1_self_7.0,attr_3_1_self_8.0,attr_3_1_self_9.0,attr_3_1_self_10.0
0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...
8373,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8374,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8375,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8376,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [269]:
dummy_ranks.rename(columns={'attr_3_1_self_1.0': 'attr_3_1_self_1',
                        'attr_3_1_self_2.0': 'attr_3_1_self_2',
                        'attr_3_1_self_3.0': 'attr_3_1_self_3',
                        'attr_3_1_self_4.0': 'attr_3_1_self_4',
                        'attr_3_1_self_5.0': 'attr_3_1_self_5',
                        'attr_3_1_self_6.0': 'attr_3_1_self_6',
                        'attr_3_1_self_7.0': 'attr_3_1_self_7',
                        'attr_3_1_self_8.0': 'attr_3_1_self_8',
                        'attr_3_1_self_9.0': 'attr_3_1_self_9',
                        'attr_3_1_self_10.0': 'attr_3_1_self_10',}, inplace = True)


dummy_ranks

Unnamed: 0,attr_3_1_self_2,attr_3_1_self_3,attr_3_1_self_4,attr_3_1_self_5,attr_3_1_self_6,attr_3_1_self_7,attr_3_1_self_8,attr_3_1_self_9,attr_3_1_self_10
0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...
8373,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8374,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8375,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8376,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [270]:
subset_df = subset_df.join([dummy_ranks])

In [271]:
##subset_df.drop('attr3_1', axis = 1, inplace = True)

#### Look at Attractivness (3_1) & Match Rating

In [272]:
pd.crosstab(subset_df.attr_3_1_self_10, subset_df.match, margins=True)

match,0,1,All
attr_3_1_self_10,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,6713,1292,8005
1.0,209,59,268
All,6998,1380,8378


> Odds of getting a match if found themselves a 10/10 attractive vs. found themselves below a 10 = 59:209 

### Look at match & attractivness (3_1) compared by gender 

#### FEMALE:

In [273]:
df_gender_female = subset_df[subset_df.gender == 0]
df_gender_female[['attr3_1']].describe()

Unnamed: 0,attr3_1
count,4117.0
mean,7.219092
std,1.336886
min,2.0
25%,7.0
50%,7.0
75%,8.0
max,10.0


>ave rating for women: 7.22

**Look at women w/ self rating of '10':**

In [274]:
pd.crosstab(df_gender_female.attr_3_1_self_10, df_gender_female.match, margins=True)

match,0,1,All
attr_3_1_self_10,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,3385,667,4052
1.0,58,7,65
All,3494,690,4184


Probability of women getting a date if gave themselves higher 'attractive' scores:


In [275]:
p = (7./65.)
per = p*100
per

10.76923076923077

Odds of women getting a date if rating self as '10' for attractive vs. other scores:  __7:58__

Odds ratio: odds of women getting a date if rated themselves as '10' attractiveness vs. women who rated themselves lower 

In [276]:
##Odds ratio: 

o = (7./58.) / (667./3385.)
o

0.6124954763997312

> Odds of getting a date if you were a women who rated themselves as a '10' vs those who rated themselves lower is 40% higher

#### MALE:

In [279]:
df_gender_male = subset_df[subset_df.gender == 1]
df_gender_male[['attr3_1']].describe()

Unnamed: 0,attr3_1
count,4156.0
mean,6.951636
std,1.439621
min,2.0
25%,6.0
50%,7.0
75%,8.0
max,10.0


>ave rating for men: 6.95

**Look at men w/ self rating of '10':**

In [282]:
pd.crosstab(df_gender_male.attr_3_1_self_10, df_gender_male.match, margins=True)

match,0,1,All
attr_3_1_self_10,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,3328,625,3953
1.0,151,52,203
All,3504,690,4194


Probability of men getting a date if gave themselves higher 'attractive' scores:


In [283]:
p = (52./151.)
per = p*100
per

34.437086092715234

> 34% more likely to get a match

Odds ratio: odds of men getting a date if rated themselves as '10' attractiveness vs. men who rated themselves lower 

In [284]:
##Odds ratio: 

o = (52./151.) / (625./3328.)
o

1.8337059602649004

> Odds of getting a date if a male and rated themselves as '10' attractiness is 1.83:1 (183% greater chances). 

### CALCUALTE THE # OF MATCHES SOMEONE GOT - then do a correlation

> need to add up the # of matches someone gets  | need to reference id#

### Look for rows of data that have values for 3_1, 3_2, 5_1, 5_2, 3_s

## QUESTION 2:	

### Does one’s perception of their gender generalizations differ from their own evaluations of what’s important when it comes to selecting mates? 

**Question 2B**: *Does this differ from self-evaluations (Q3)?  E.g.: do men rate ‘attractiveness’ as less important for their own dating choices but more important for other men?*

**Hypothesis**: men will rate ‘attractiveness’ as less important for their own dating choices but more important for other men’s decisions when choosing a partner. 



| Variable CODE | Scale | When during Experiment? |Question| 
| :------:| :------:| :------: |:------|
|  **4_1**| 100pts | Signup| what you think MOST of your fellow men/women look for in the opposite sex.
|**4_2**| 100pts| After event| what you think MOST of your fellow men/women look for in the opposite sex.

gender (1=M | 0 =F)


****

## QUESTION 3:	
### 3.	What do men look for in the opposite sex? Does this differ from women? 

**Question 3B**: *How important do people think attractiveness is in potential mate selection vs. its real impact?*

**Hypothesis**: Hypothesis: men more likely to rate ‘attractiveness’ as more important than women when looking for a mate; women are more likely to rate ‘sincere’ as more important. 


| Variable CODE (interviewee) | Variable CODE (partner) | Scale | When during Experiment? |Question| 
| :---:| :----:| :------: | :------: |:------|
|  **1_1**| **pf_o_att; pf_o_sha** *rating by partner*| 100pts | Signup| what do you look for in the opposite sex? |
|  **attr; shar**| **attr_o; shar_o** *rating by partner* | 1-10 | During event (after each date)| what do you look for in the opposite sex?  |
|  **1_s**| | 1-10 scale & 100pts | During Event|what do you look for in the opposite sex?  |
|  **1_2**| | 100pts | After Event| what do you look for in the opposite sex? |
|  **7_2**| | 100pts | After Event| Based on yes/no decisions during speed dating event, distribute points to attributes that best reflect the actual importance of these attributes in your decisions|
| **like** |**like_o** *rating by partner* | 1-10 | During event (after each date)|  How much do you like the person? |
| **prob** |**prob_o** *rating by partner* | 1-10 | During event (after each date)|  How probable do you think it is that this person will say 'yes' for you? |



gender (1=M | 0 =F)

dec_o: 	decision of partner the night of event

dec: decision of interviewee the night of event



In [None]:
y_max = df[ ['attr1_1', 'sinc1_1', 'intel1_1', 'fun1_1', 'amb1_1', 'shar1_1'] ].max(axis = 1)

y_max.loc['impt_atr_other1'] = 'Attractive'
y_max.loc[df.sinc1_1 > y_max, 'impt_atr_other1'] = 'Sincere'
y_max.loc[df.inte1_1 > y_max, 'impt_atr_other1'] = 'Intelligent'
y_max.loc[df.fun1_1 > y_max, 'impt_atr_other1'] = 'Fun'
y_max.loc[df.amb1_1 > y_max, 'impt_atr_other1'] = 'Ambitious'
y_max.loc[df.shar1_1 > y_max, 'impt_atr_other1'] = 'Shared_Interests'

****

## QUESTION 4:	
### What do women THINK men look for in the opposite sex? What about men? Does it differ from before dating event to after? Does this differ from actual results (Q3)?  

**Hypothesis**: women think men give more weight to attractiveness but both men and women give the most weight to attractiveness vs. other attributes. 


| Variable CODE | Scale | When during Experiment? |Question| 
| :---:| :----:| :------: |:------|
|  **2_1**| 100pts | Signup| What do you think the opposite sex looks for in a date? |
|  **2_2**| 100pts | After Event| What do you think the opposite sex looks for in a date? |


gender (1=M | 0 =F)



#### Look at the mean, max and min for all 6 attributes at signup survey (2_1) compared to after dating event (2_2):

In [285]:
df.attr2_1.describe() 
#signup survey

count    8299.000000
mean       30.362192
std        16.249937
min         0.000000
25%        20.000000
50%        25.000000
75%        40.000000
max       100.000000
Name: attr2_1, dtype: float64

In [286]:
df.attr2_2.describe()
#after dating event

count    5775.000000
mean       29.344369
std        14.551171
min         0.000000
25%        19.150000
50%        25.000000
75%        38.460000
max        85.000000
Name: attr2_2, dtype: float64

#### Look at Males and 2_1 values

In [287]:
df_gender_male = df[df.gender == 1]
df_gender_male

Unnamed: 0,iid,id,gender,idg,condtn,...,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
100,11,1.0,1,2,1,...,,,,,
101,11,1.0,1,2,1,...,,,,,
102,11,1.0,1,2,1,...,,,,,
103,11,1.0,1,2,1,...,,,,,
104,11,1.0,1,2,1,...,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
8373,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0
8374,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0
8375,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0
8376,552,22.0,1,44,2,...,9.0,5.0,9.0,5.0,6.0


#### Look at Male's rating of attractivness at 2_1

In [288]:
df_gender_male[['attr2_1']].describe()

Unnamed: 0,attr2_1
count,4174.0
mean,25.092631
std,13.334847
min,0.0
25%,16.67
50%,20.0
75%,30.0
max,95.0


#### Look at Male's rating of attractivness at 2_2

In [289]:
df_gender_male[['attr2_2']].describe()

Unnamed: 0,attr2_2
count,2940.0
mean,25.792765
std,13.65316
min,0.0
25%,16.67
50%,20.0
75%,30.0
max,80.0


#### Look at female's rating of attractivness at 2_1 & 2_2

In [290]:
df_gender_female = df[df.gender == 0]
df_gender_female

Unnamed: 0,iid,id,gender,idg,condtn,...,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
0,1,1.0,0,1,1,...,,,,,
1,1,1.0,0,1,1,...,,,,,
2,1,1.0,0,1,1,...,,,,,
3,1,1.0,0,1,1,...,,,,,
4,1,1.0,0,1,1,...,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
7889,530,22.0,0,43,2,...,3.0,8.0,8.0,5.0,5.0
7890,530,22.0,0,43,2,...,3.0,8.0,8.0,5.0,5.0
7891,530,22.0,0,43,2,...,3.0,8.0,8.0,5.0,5.0
7892,530,22.0,0,43,2,...,3.0,8.0,8.0,5.0,5.0


In [291]:
#### Female: 2_1
df_gender_female[['attr2_1']].describe()

Unnamed: 0,attr2_1
count,4125.0
mean,35.694349
std,17.171131
min,10.0
25%,23.26
50%,30.0
75%,50.0
max,100.0


In [292]:
#### Female: 2_2
df_gender_female[['attr2_2']].describe()

Unnamed: 0,attr2_2
count,2835.0
mean,33.027513
std,14.54034
min,10.0
25%,20.83
50%,30.0
75%,40.0
max,85.0


> Observations: Women on average think men give more weight to 'attractivness'

#### Look at Women's ratings for men at 2_1

In [293]:
df_gender_female[['attr2_1', 'sinc2_1', 'intel2_1', 'fun2_1', 'amb2_1', 'shar2_1']].describe()

Unnamed: 0,attr2_1,sinc2_1,intel2_1,fun2_1,amb2_1,shar2_1
count,4125.0,4125.0,4125.0,4125.0,4125.0,4125.0
mean,35.694349,11.343646,12.532022,18.73351,9.230638,12.645113
std,17.171131,6.254626,5.135046,6.50548,5.314698,6.130889
min,10.0,0.0,0.0,0.0,0.0,0.0
25%,23.26,5.0,10.0,15.0,5.0,10.0
50%,30.0,10.0,11.36,20.0,10.0,11.9
75%,50.0,15.0,15.0,20.0,13.16,16.67
max,100.0,30.0,30.0,50.0,30.0,30.0


#### Look at Mens's ratings for women at 2_1

In [294]:
df_gender_male[['attr2_1', 'sinc2_1', 'intel2_1', 'fun2_1', 'amb2_1', 'shar2_1']].describe()

Unnamed: 0,attr2_1,sinc2_1,intel2_1,fun2_1,amb2_1,shar2_1
count,4174.0,4174.0,4174.0,4174.0,4164.0,4164.0
mean,25.092631,15.181078,16.279633,18.115379,14.234815,11.071924
std,13.334847,7.128021,6.705605,6.635233,7.346384,6.10383
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,16.67,10.0,10.0,15.0,10.0,5.0
50%,20.0,15.0,16.28,19.57,15.0,10.0
75%,30.0,20.0,20.0,20.0,20.0,15.0
max,95.0,50.0,40.0,44.0,50.0,30.0


In [295]:
df.sinc2_1.describe()
#signup survey

count    8299.000000
mean       13.273691
std         6.976775
min         0.000000
25%        10.000000
50%        15.000000
75%        18.750000
max        50.000000
Name: sinc2_1, dtype: float64

In [296]:
df.sinc2_2.describe()
#after dating event

count    5775.00000
mean       13.89823
std         6.17169
min         0.00000
25%        10.00000
50%        15.00000
75%        19.23000
max        40.00000
Name: sinc2_2, dtype: float64

In [297]:
df.intel2_1.describe()
#signup survey

count    8299.000000
mean       14.416891
std         6.263304
min         0.000000
25%        10.000000
50%        15.000000
75%        20.000000
max        40.000000
Name: intel2_1, dtype: float64

In [298]:
df.intel2_2.describe()
#after dating event

count    5775.000000
mean       13.958265
std         5.398621
min         0.000000
25%        10.000000
50%        15.000000
75%        17.390000
max        30.770000
Name: intel2_2, dtype: float64

In [299]:
df.fun2_1.describe()
#signup survey

count    8299.000000
mean       18.422620
std         6.577929
min         0.000000
25%        15.000000
50%        20.000000
75%        20.000000
max        50.000000
Name: fun2_1, dtype: float64

In [300]:
df.fun2_2.describe()
#after dating event

count    5775.000000
mean       17.967233
std         6.100307
min         0.000000
25%        15.000000
50%        18.520000
75%        20.000000
max        40.000000
Name: fun2_2, dtype: float64

In [301]:
df.amb2_1.describe()
#signup survey

count    8289.000000
mean       11.744499
std         6.886532
min         0.000000
25%         6.000000
50%        10.000000
75%        15.000000
max        50.000000
Name: amb2_1, dtype: float64

In [302]:
df.amb2_2.describe()
#after dating event

count    5775.000000
mean       11.909735
std         6.313281
min         0.000000
25%        10.000000
50%        10.000000
75%        15.090000
max        50.000000
Name: amb2_2, dtype: float64

In [303]:
df.shar2_1.describe()
#signup survey

count    8289.000000
mean       11.854817
std         6.167314
min         0.000000
25%        10.000000
50%        10.000000
75%        15.630000
max        30.000000
Name: shar2_1, dtype: float64

In [304]:
df.shar2_2.describe()
#after dating event

count    5775.000000
mean       12.887976
std         5.615691
min         0.000000
25%        10.000000
50%        13.950000
75%        16.515000
max        30.000000
Name: shar2_2, dtype: float64