# 5 Demographics
In this section, we evaluate basic demographics - age, gender, race, and income - and their effects on decision and yes rate. The study is broken into three main parts:
* Univariate Analysis
* Bivariate Analysis
* Multivariate Analysis

## 5.1 Key Findings


In [None]:
# libraries
%matplotlib inline

import os
import sys
import inspect

currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
projdir = os.path.dirname(currentdir)
srcdir = os.path.join(projdir,"src")
datasrc = os.path.join(srcdir, "data")

sys.path.insert(0,currentdir)
sys.path.insert(0,projdir)
sys.path.insert(0,srcdir)
sys.path.insert(0,datasrc)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import numpy as np
import pandas as pd
pd.set_option('display.max_colwidth', 500)
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

import read
from shared import directories
sys.path.append(directories.ANALYSIS_DIR)
sys.path.append(directories.UTILITIES_DIR)
from analysis import trivariate

import univariate, bivariate, independence, visual, description
import warnings
warnings.filterwarnings('ignore')

## 5.2 Data

In [None]:
# Get Data
sd = read.read()
df = sd['all']
df_male = sd['male']
df_female = sd['female']
demographics = ["age","age_diff","age_o","gender","income","race","race_o"]
df_demographics = df[demographics]
df_male_demographics = df_male[demographics]
df_female_demographics = df_female[demographics]

## 5.3 Univariate Analysis

### 5.3.1 Univariate Analysis Key Questions
- What was the distribution of age?
- What was the distribution of age differences?
- What were the counts by gender?
- What was the distribution of income?
- What was the distribution of race?

### 5.3.2 Univariate Analysis Key Findings
Age
- non-normal distribution with median of 26 and range[18,55] 
- 95 missing values

Age Difference
- Normally distributed
- Mean age difference was about 6 months. 
- Ranged from [-32,20] 
- A negative age difference is when the male is younger than the female participant

Gender
- 10 more males than females in the dataset???

Income
- Median income was about \$43K
- Range was [\$8.6k,\$109k]
- Distribution right skewed 

Race
- The distribution of the races was 
    - Caucasian	56%
    - Latino	8%
    - Other	7%
    - Black	5%

In [None]:
result = univariate.analysis(df_demographics)
for r in result:
    r['desc']
    r['plot']


## 5.4 Bivariate Analysis

### 5.4.1 Bivariate Analysis: Independent Variables

#### 5.4.1.1 Bivariate Analysis: Independent Variables: Key Questions    
- What was the association of age and gender?
- What was the association of income and gender?
- What was the association of income by race?

#### 5.4.1.1.1 Age and Gender
Statistically significant difference in ages between genders.  Median male age was 27, female 26.

In [None]:
i, d, p = bivariate.analysis(df,x='age', y='gender')
p
d
i


#### 5.4.1.1.2 Income and Gender
Statistically significant difference in incomes by gender. Median male earnings were about $46k, females $42k

In [None]:
i, d, p = bivariate.analysis(df,x='income', y='gender')
p
d
i


#### 5.4.1.1.2 Income and Race
Statistically significant difference in incomes by race. See median incomes below.

In [None]:
i, d, p = bivariate.analysis(df,x='income', y='race')
p
d
i


#### 5.4.1.2 Bivariate Analysis: Dependent Variables: Decision: Key Questions        
- To what degree was subject age and partner decision associated
- To what degree was subject income and partner decision associated
- To what degree was subject race and partner decision associated

#### 5.4.1.2.1 Bivariate Analysis: Age and Decision

In [None]:
i, d, p = bivariate.analysis(df,x='age', y='dec_o')
p
d
i


#### 5.4.1.2.2 Bivariate Analysis: Income and Decision

In [None]:
i, d, p = bivariate.analysis(df,x='income', y='dec_o')
p
d
i


#### 5.4.1.2.3 Bivariate Analysis: Race and Decision

In [None]:
i, d, p = bivariate.analysis(df,x='race', y='dec_o')
p
d
i


#### 5.4.1.3 Bivariate Analysis: Dependent Variables: Yes Rate: Key Questions    
- To what degree was subject age and yes rate associated
- To what degree was subject income and yes rate associated
- To what degree was subject race and yes rate associated

#### 5.4.1.3.1 Bivariate Analysis: Age and Yes Rate

In [None]:
i, d, p = bivariate.analysis(df,x='age', y='yes_rate')
p
d
i


#### 5.4.1.3.2 Bivariate Analysis: Income and Decision

In [None]:
i, d, p = bivariate.analysis(df,x='income', y='yes_rate')
p
d
i


#### 5.4.1.3.3 Bivariate Analysis: Race and Decision

In [None]:
i, d, p = bivariate.analysis(df,x='race', y='yes_rate')
p
d
i


## 5.5 Multivariate Analysis

#### 5.5.2.1 Multivariate Analysis: Dependent Variables: Decision: Key Questions        
- What was the association between subject’s age, gender and partner decision
- What was the association between subject’s income, gender and partner decision
- What was the association between subject’s race, gender and partner decision
- What was the association between subject’s income, race and partner decision
- What was the association between subject’s income, race, gender and partner decision
- What was the association between subject’s age, race, gender and partner decision 

#### 5.5.2.2 Multivariate Analysis: Dependent Variables: Decision: Key Findings

#### 5.5.3.1 Multivariate Analysis: Dependent Variables: Yes Rate: Key Questions    
- What was the association between subject’s age, gender and yes rate
- What was the association between subject’s income, gender and yes rate
- What was the association between subject’s race, gender and yes rate
- What was the association between subject’s income, race and yes rate
- What was the association between subject’s income, race, gender and yes rate
- What was the association between subject’s income, race, gender and yes rate

#### 5.5.3.2 Multivariate Analysis: Dependent Variables: Yes Rate: Key Findings