# 11/16

Independent vs. dependent samples

* independent samples: Two samples are from two totally unrelated groups (e.g. observations in each sample are from two different groups)
    * randpomized clinical trials
    * expiriment vs. control
* dependant samples: two sample are related in some way *e.g. two groups where observatios are matched or one group sampled twice)
    * e.g. a group of students about to take a class + they are given the same test before and after the class to see if they learned. did their scores improve?
    * before + after (pre + post)
    * repeated measures

In [39]:
import numpy as np
import scipy.stats as stats 

## 2 Sample Independent Example

In [40]:
expiriment = [61,102,119,128,62,158,271,57,266,137]
control = [24,125,43,62,32,138,53,117,97,63]

null hypothesis:  
No difference

alternate hypothesis:  
There is a difference

Step 2:

$\alpha$ = 0.05

Step 3:

t = $\frac{\bar{x}_{exp} - \bar{x}_{cont}}{\sqrt{Sp^2(\frac{1}{n_{exp}} + \frac{1}{n_{cont}}}}$

$n_{exp} = 10$

$n_{cont} = 10$

Step 4:

$\alpha = 0.05$  

df = $n_{exp} + n_{cont} - 2$  
10 + 10 - 2 = 18

t-critical = 1.734

if t-computed > 1.734, then reject H0, otherwise we do not have sufficient data in our example to reject H0

Step 5:

t-computed = 2.1851550736735224

we reject h0 because:
* t-computed is > t-critical  

as well as that: 
* p is less than alpha(0.005)

This means that there is a difference betweent he expirimental and control groups!

In [41]:
exp = [61,102,119,128,62,158,271,57,266,137]
cont = [24,125,43,62,32,138,53,117,97,63]

Xbar_exp = np.mean(exp)
Xbar_cont = np.mean(cont)
s_exp = np.std(exp,ddof=1)
s_cont = np.std(cont,ddof=1)
n_exp = len(exp)
n_cont = len(cont)

df = n_exp + n_cont - 2
sp2 = ((n_exp - 1) * s_exp ** 2 + (n_cont - 1) * s_cont **2) / (n_exp + n_cont - 2) 
t = (Xbar_exp - Xbar_cont)/np.sqrt(sp2 * (1/n_exp + 1 / n_cont))
print("t:",t)


t: 2.1851550736735224


In [42]:
# check work with scipy
# one tailed test
t,p = stats.ttest_ind(exp,cont)
print(t,p/2) # divide by two because it is a one tailed test

2.1851550736735232 0.02117066199247144


## 2 Sample Dependent Example
A class takes a typing test before and after they take training

In [43]:
before = [45,52,34,38,47,42,61,53,52,49]
after = [49,56,31,46,54,39,68,55,50,55]

#### Step 1:

H0 - after <= before

H1 - before > after

#### Step 2:

$\alpha = 0.05$

#### Step 3:

t = $\frac{\bar{x}_{exp} - \bar{x}_{cont}}{\sqrt{Sp^2(\frac{1}{n_{exp}} + \frac{1}{n_{cont}}}}$

#### Step 4:

alpha = 0.05

one tailed

df = n - 1  
= 10-1 = 9

t-critical = 1.833

In [44]:
before = np.array([45,52,34,38,47,42,61,53,52,49])
after = np.array([49,56,31,46,54,39,68,55,50,55])

n = len(after)

d = after - before
dbar = np.mean(d)
Sd = np.std(d,ddof=1)
Sdbar = Sd / np.sqrt(n)

t = dbar / Sdbar

print("t:",t)

t: 2.208963121532172


In [45]:
# check work with scipy
# one tailed test
t,p = stats.ttest_rel(after,before) # rel because it is relative (dependent)
print(t,p/2) # divide by one because it is a two tailed test

2.208963121532172 0.02726870763166227


#### Step 5:

Since t-computed is 2.2089 is greater than t-critical, 1.833, we reject H0

#### Conclusion:

At the 0.05 level of significance, the traning session apears to have improved typing skills.