# Statistical Inference in Mythbusters
## Battle of the Sexes - Round 2 (Season 2013 Ep05)

Myth: Women are better at multi-tasking than men

In [2]:
import numpy as np
import pandas as pd
from scipy import stats

In [3]:
d = {'Men': pd.Series([50, 80, 50, 80, 90, 80, 50, 60, 50, 50]),
     'Women': pd.Series([100, 70, 60, 80, 80, 80, 80, 60, 50, 60])}
df = pd.DataFrame(d)

In [4]:
print("Men:",df['Men'].mean(),df['Men'].std())
print("Women:",df['Women'].mean(),df['Women'].std())

Men: 64.0 16.46545204697129
Women: 72.0 14.757295747452437


### Hypothesis Testing

A **two-sample t-test** investigates whether the means of two independent data samples differ from one another. In a two-sample test, the null hypothesis is that the means of both groups are the same. Unlike the one sample-test where we test against a known population parameter, the two sample test only involves sample means. You can conduct a two-sample t-test by passing with the stats.ttest_ind() function. 

Null Hypothesis: Men and Women are similar in multi-tasking

Alternate Hypothesis: Men and Women are dissimilar in multi-tasking

### 2 sample t-test

### From actual sample data

In [5]:
stats.ttest_ind(df['Men'],df['Women'])

Ttest_indResult(statistic=-1.1441551070947107, pvalue=0.26754709030461915)

### From descriptive stats - equal variance

In [6]:
stats.ttest_ind_from_stats(mean1 = df['Men'].mean(), std1 = df['Men'].std(), nobs1 = len(df),
                     mean2 = df['Women'].mean(), std2 = df['Women'].std(), nobs2 = len(df),
                     equal_var=True)

Ttest_indResult(statistic=-1.1441551070947109, pvalue=0.2675470903046191)

### From descriptive stats - unequal variance

In [7]:
stats.ttest_ind_from_stats(mean1 = df['Men'].mean(), std1 = df['Men'].std(), nobs1 = len(df),
                     mean2 = df['Women'].mean(), std2 = df['Women'].std(), nobs2 = len(df),
                     equal_var=False)

Ttest_indResult(statistic=-1.1441551070947109, pvalue=0.26772270869674814)

The pvalue is > 0.05, so we fail to reject the null hypothesis.

Hence, there is no evidence that women are better at multi-tasking than men.

**MYTH BUSTED!!!**

### Power analysis to determine sample size

In [8]:
import statsmodels.stats.power as smp

In [9]:
# effect size for a difference in mean of 10 and a standard deviation of 15
e = 10/15
e

0.6666666666666666

In [10]:
# solve for n, given effect_size, power, ratio (nobs1/nobs2)
smp.TTestIndPower().solve_power(e, power=0.80, ratio=1, alpha=0.05, alternative='two-sided')

36.305687896793614

The output shows that we need 36 subjects per group, for a total of 72, to have a reasonable chance of detecting a meaningful difference, if that difference actually exists between the two populations.