# Homework 4: Statistics Exercise

In [6]:
# You must run this cell before starting your assignment

!pip install -q otter-grader

import otter
grader = otter.Notebook("hw4.ipynb")

import numpy as np
import pandas as pd

from scipy.stats import spearmanr

We previously reviewed Shepard and Metzler's (1971) mental rotation data. The same data is loaded again below.

In [7]:
rotations = pd.DataFrame({
    'angle': [0, 20, 40, 60, 80, 100, 120, 140, 160, 180],
    'rt': [1023, 1167, 1382, 1578, 1842, 1976, 2198, 2445, 2583, 2791]
})
rotations

Unnamed: 0,angle,rt
0,0,1023
1,20,1167
2,40,1382
3,60,1578
4,80,1842
5,100,1976
6,120,2198
7,140,2445
8,160,2583
9,180,2791


**Problem 1**: Run a permutation test to assess the potential linear relationship between the variables in this data.

Additional specifications:
- The observed test statistic for the above data should be a float called `shep_observed`.
- Compute correlation using numpy: `np.corrcoef(x, y)[0, 1]`.
- Manipulate only the `rt` column, not the `angle` column.
- Remember to use `df = df.copy()` where applicable.
- Run exactly 10,000 simulations and store the results in a numpy array called `shep_null`.
- Store the p-value in a float called `shep_p`.

In [8]:
np.random.seed(0) # DO NOT change this line

# Your code goes here

def one_permutation(df):
    df_copy = df.copy()
    df_copy['rt'] = df_copy['rt'].sample(frac=1).values
    return df_copy

def compute_p_value(test_stat: float, null_stats: np.array): 
    p_value = 0
    for x in null_stats: 
        if x >= test_stat:
            p_value += 1
    
    return p_value / len(null_stats)

df = rotations.copy()
shep_null = np.zeros(10000)
shep_observed = np.corrcoef(df['angle'], df['rt'])[0, 1]

for i in range(10000):
    permu = one_permutation(rotations)
    shep_null[i] = np.corrcoef(permu['angle'], permu['rt'])[0, 1]

shep_p = np.float64(compute_p_value(shep_observed, shep_null))

In [9]:
grader.check("q1")

We previously reviewed Hick's (1952) choice data. The same data is loaded again below.

In [10]:
np.random.seed(1)
base_rts = {1: 180, 2: 250, 3: 290, 4: 310, 5: 325, 6: 335, 7: 345, 8: 350, 9: 355, 10: 360}
hick_data = {'n_alternatives': [], 'rt': []}
for n, base_rt in base_rts.items():
    hick_data['n_alternatives'].append(n)
    noise = np.random.normal(0, base_rt * 0.01)
    rt = max(base_rt + noise, 150)
    hick_data['rt'].append(round(rt, 1))
choices = pd.DataFrame(hick_data)
choices

Unnamed: 0,n_alternatives,rt
0,1,182.9
1,2,248.5
2,3,288.5
3,4,306.7
4,5,327.8
5,6,327.3
6,7,351.0
7,8,347.3
8,9,356.1
9,10,359.1


**Problem 2**: Run a permutation test to assess the potential nonlinear (monotonic) relationship between the variables in Hick's data.

Additional specifications:
- The observed test statistic for the above data should be a float called `hick_observed`.
- Compute Spearman's rho using: `spearmanr(x, y)[0]`.
- Manipulate only the `rt` column, not the `n_alternatives` column.
- Run exactly 10,000 simulations and store the results in a numpy array called `hick_null`.
- Store the p-value in a float called `hick_p`.

In [None]:
np.random.seed(0) # DO NOT change this line
from scipy.stats import spearmanr

# Your code goes here
df = choices.copy()
hick_null = np.zeros(10000)
hick_observed = spearmanr(df['n_alternatives'], df['rt'])[0]

for i in range(10000):
    permu = one_permutation(choices)
    shep_null[i] = spearmanr(permu['n_alternatives'], permu['rt'])[0]

hick_p = np.float64(compute_p_value(hick_observed, hick_null))

In [18]:
grader.check("q2")

We previously reviewed Craik and Tulving's (1975) memory retention data. 

Below, we load data from a **within-subjects** version of this experiment wherein each participant is exposed to stimuli from both processing conditions. Each row is one participant, and each column is one of two conditions.

In [13]:
mem = pd.DataFrame({
    'deep':    [12, 9, 11, 9, 13, 10, 9, 11, 10, 12],
    'shallow': [9,  8, 10, 11, 7, 11, 9, 10, 10, 8]
})
mem

Unnamed: 0,deep,shallow
0,12,9
1,9,8
2,11,10
3,9,11
4,13,7
5,10,11
6,9,9
7,11,10
8,10,10
9,12,8


**Problem 3**: Run a permutation test to assess the potential relationship between the variables in the above data.

**NOTE:** Because we are working with a within-subjects form of this data now, we can't apply the same exact permutation test procedure that we used previously for the between-groups version of the data. In within-subjects designs, observations are said to be "paired": each subject has two measurements for each of two conditions. These observations are also said to be "dependent" because they both "depend" on the same person. We can still break any systematic relationship between condition labels and measurements using shuffling, but we have to do it **within** each participant. That is, we maintain the relationship between individuals and measurements, but remove the relationship between measurements and conditions. To accomplish this, we can "flip a coin" for each participant in the data. If heads, we swap their scores for their deep/shallow conditions. If tails, we leave them unchanged. One way to flip a coin in Python is to run `np.random.random() < 0.5` and check if the resulting bool is `True` or `False`.

Additional specifications:
- The observed test statistic for the above data should be a float called `mem_observed`.
- To swap values in Python, either assign simultaneously, e.g., `a, b = b, a` OR use a temporary third variable.
- Make sure to use `df = df.copy()` where applicable.
- Run exactly 10,000 simulations and store the results in a numpy array called `mem_null`.
- Store the p-value in a float called `mem_p`.

In [14]:
np.random.seed(0) # DO NOT change this line

# Your code goes here

def swap(df, i):
    if np.random.random() < 0.5: 
        df.loc[i, 'deep'], df.loc[i, 'shallow'] = df.loc[i, 'shallow'], df.loc[i, 'deep']
        
df = mem.copy()
mem_null = np.zeros(10000)
mem_observed = df['deep'].mean() - df['shallow'].mean()

for i in range(10000):
    permu = df.copy()
    for j in range(len(permu)): 
        swap(permu, j)
    mem_null[i] = permu['deep'].mean() - permu['shallow'].mean()
    
mem_p = np.float64(compute_p_value(mem_observed, mem_null))

In [15]:
grader.check("q3")

Note that for parametric within-subjects tests, one can import and make use of `ttest_rel` from `scipy.stats` to perform tests with a single line of code. Such tests go by multiple names: "repeated-measures t-test", "paired-samples t-test", or "dependent-samples t-test".