# Assignment 11

Please fill in blanks in the *Answer* sections of this notebook. To check your answer for a problem, run the Setup, Answer, and Result sections. DO NOT MODIFY SETUP OR RESULT CELLS. See the [README](https://github.com/mortonne/datascipsych) for instructions on setting up a Python environment to run this notebook.

Write your answers for each problem. Then restart the kernel, run all cells, and then save the notebook. Upload your notebook to Canvas.

If you get stuck, read through the other notebooks in this directory, ask us for help in class, or ask other students for help in class or on the weekly discussion board.

## Problem: one-sample t-test (3 points)

### Run one-sample t-test (2 points)
Given the `targets` DataFrame defined below, use `pg.ttest` to run a one-sample t-test assessing whether the mean response is greater than 0.5. Assign the results DataFrame to a variable called `ttest1`.

### Interpret the results (1 point)
Answer the questions about interpretation of the t-test (0.5 points per question). Edit the markdown cell (you can double-click on it to edit it, or click on the pencil icon) to add your answer below each question. Then click on the check icon to switch back to rendered text.

### Setup

In [16]:
import polars as pl
import pingouin as pg
from IPython.display import display
data = pl.read_csv("gen_recog2.csv")
targets = (
    data.filter(pl.col("trial_type") == "target")
    .group_by("subject")
    .agg(pl.col("response").mean())
    .sort("subject")
)
ttest1 = None
targets.head()

subject,response
str,f64
"""subj01""",0.566667
"""subj02""",0.566667
"""subj03""",0.583333
"""subj04""",0.7
"""subj05""",0.783333


### Answer

In [17]:
ttest1 = pg.ttest(targets["response"], 0.5, alternative= "greater")

> Was the mean response significantly greater than 0.5 (yes or no)?

yes

> What was the effect size?

1.356984


### Result

In [18]:
vars = [ttest1]
if all([v is not None for v in vars]):
    # this should print your variables
    display(ttest1)
    
    # this should not throw any errors
    assert round(ttest1.loc["T-test", "T"], 2) == 7.43
    assert ttest1.loc["T-test", "dof"] == 29
    assert ttest1.loc["T-test", "alternative"] == "greater"

Unnamed: 0,T,dof,alternative,p-val,CI95%,cohen-d,BF10,power
T-test,7.432506,29,greater,1.71833e-08,"[0.59, inf]",1.356984,815500.0,1.0


## Problem: two-sample paired t-test (3 points)

### Run paired t-test (2 points)
Given the `p_old` DataFrame defined below, use `pg.ttest` to run a two-sample paired t-test assessing whether the probability of responding "old" was different depending on whether an item was a word or a picture. Assign the results DataFrame to a variable called `ttest2`.

### Interpret the results (1 point)
Answer the questions about interpretation of the t-test (0.5 points per question). Edit the markdown cell (you can double-click on it to edit it, or click on the pencil icon) to add your answer below each question. Then click on the check icon to switch back to rendered text.

### Setup

In [19]:
p_old = (
    data.filter(pl.col("trial_type") == "target")
    .pivot("item_type", index="subject", values="response", aggregate_function="mean")
)
ttest2 = None
p_old.head()

subject,word,picture
str,f64,f64
"""subj01""",0.733333,0.4
"""subj02""",0.533333,0.6
"""subj03""",0.566667,0.6
"""subj04""",0.666667,0.733333
"""subj05""",0.733333,0.833333


### Answer

In [20]:
ttest2 = pg.ttest(p_old["word"], p_old["picture"], paired=True)

> Was the probability of responding "old" different between words and pictures (yes or no)?

yes

> What was the t-statistic?

-0.393257

### Result

In [21]:
vars = [ttest2]
if all([v is not None for v in vars]):
    # this should print your variables
    display(ttest2)
    
    # this should not throw any errors
    assert round(ttest2.loc["T-test", "T"], 2) == -0.39
    assert ttest2.loc["T-test", "dof"] == 29
    assert ttest2.loc["T-test", "alternative"] == "two-sided"

Unnamed: 0,T,dof,alternative,p-val,CI95%,cohen-d,BF10,power
T-test,-0.393257,29,two-sided,0.697006,"[-0.06, 0.04]",0.093302,0.209,0.078431


## Problem: one-way repeated-measures ANOVA (3 points)

### Run one-way ANOVA (2 points)
Given the `m_st` DataFrame defined below, use `pg.rm_anova` to run a one-way repeated-measures ANOVA assessing whether performance varies with study time. Assign the result to a variable called `anova1`.

### Interpret the results (1 point)
Answer the questions about interpretation of the ANOVA (0.5 points per question). Edit the markdown cell (you can double-click on it to edit it, or click on the pencil icon) to add your answer below each question. Then click on the check icon to switch back to rendered text.

### Setup

In [22]:
m_st = (
    data.pivot("trial_type", index=["subject", "study_time"], values="response", aggregate_function="mean")
    .select(
        "subject",
        "study_time",
        performance=pl.col("target") - pl.col("lure")
    )
)
anova1 = None
m_st.head()

subject,study_time,performance
str,i64,f64
"""subj01""",1,0.2
"""subj01""",2,0.0
"""subj01""",4,0.45
"""subj02""",1,0.1
"""subj02""",2,0.2


### Answer

In [23]:
anova1 = pg.rm_anova(m_st.to_pandas(), dv="performance", within="study_time", subject= "subject")

> Did study time significantly affect performance (yes or no)?

no

> What was the effect size?

0.225539

### Result

In [24]:
vars = [anova1]
if all([v is not None for v in vars]):
    # this should print your variables
    display(anova1)
    
    # this should not throw any errors
    assert round(anova1.loc[0, "F"], 2) == 15.57
    assert anova1.loc[0, "ddof1"] == 2
    assert anova1.loc[0, "ddof2"] == 58

Unnamed: 0,Source,ddof1,ddof2,F,p-unc,ng2,eps
0,study_time,2,58,15.573874,4e-06,0.225539,0.965251


## Problem: two-way repeated-measures ANOVA (3 points)

### Run two-way ANOVA (1.5 points) 
Given the `m_st_it` DataFrame defined below, use `pg.rm_anova` to run a two-way repeated-measures ANOVA assessing whether performance varies with study time and item type. Assign the result to a variable called `anova2`.

You will probably get some `FutureWarning` messages printed when you run `pg.rm_anova`. This is normal (it just means that the authors of Pingouin need to make some updates in their code to avoid problems in the future).

### Interpret the results (1.5 points)
Answer the questions about interpretation of the ANOVA (0.5 points per question). Edit the markdown cell (you can double-click on it to edit it, or click on the pencil icon) to add your answer below each question. Then click on the check icon to switch back to rendered text.

### Setup

In [25]:
m_st_it = (
    data.pivot("trial_type", index=["subject", "study_time", "item_type"], values="response", aggregate_function="mean")
    .select(
        "subject",
        "study_time",
        "item_type",
        performance=pl.col("target") - pl.col("lure")
    )
)
anova2 = None
m_st_it.head(6)

subject,study_time,item_type,performance
str,i64,str,f64
"""subj01""",1,"""word""",0.2
"""subj01""",1,"""picture""",0.2
"""subj01""",2,"""word""",-0.1
"""subj01""",2,"""picture""",0.1
"""subj01""",4,"""word""",0.6
"""subj01""",4,"""picture""",0.3


### Answer

In [37]:
anova2 = pg.rm_anova(m_st_it.to_pandas(), dv="performance", within=["study_time", "item_type"], subject= "subject")

  data.groupby(level=1, axis=1, observed=True, group_keys=False)
  .diff(axis=1)


> Did you observe a significant main effect of study time (yes or no)?

no

> Did you observe a significant main effect of item type (yes or no)?

yes

> Did you observe a significant interaction effect (yes or no)?

yes

### Result

In [38]:
vars = [anova2]
if all([v is not None for v in vars]):
    # this should print your variables
    display(anova2)
    
    # this should not throw any errors
    assert round(anova2.loc[0, "F"], 2) == 15.57
    assert round(anova2.loc[1, "F"], 2) == 3.89
    assert round(anova2.loc[2, "F"], 2) == 0.23
    assert anova2.loc[0, "ddof1"] == 2
    assert anova2.loc[0, "ddof2"] == 58
    assert anova2.loc[1, "ddof1"] == 1
    assert anova2.loc[1, "ddof2"] == 29
    assert anova2.loc[2, "ddof1"] == 2
    assert anova2.loc[2, "ddof2"] == 58

Unnamed: 0,Source,SS,ddof1,ddof2,MS,F,p-unc,p-GG-corr,ng2,eps
0,study_time,1.443,2,58,0.7215,15.573874,4e-06,5e-06,0.140566,0.965251
1,item_type,0.150222,1,29,0.150222,3.890454,0.058164,0.058164,0.016742,1.0
2,study_time * item_type,0.022111,2,58,0.011056,0.233351,0.792616,0.784701,0.0025,0.964082


## Problem (graduate students): t-test power analysis (3 points)

Read about [power analysis](https://en.wikipedia.org/wiki/Power_(statistics)) and  using the Pingouin `power_ttest` [function](https://pingouin-stats.org/build/html/generated/pingouin.power_ttest.html). The probability of making a false-positive error is often called $\alpha$. NHST is designed to control this probability. It is also important to consider false-negative errors. The probability of a false-negative error is called $\beta$. The *statistical power* is the probability of detecting an effect if it is present, defined as $1 - \beta$. Pingouin, and other statistical packages, allow us to estimate the statistical power of a given inferential test, based on the effect size and sample size.

Calculate the statistical power for a two-sample t-test if $d=0.6$ and $n=20$. Assign it to a variable called `power`.

Calculate the required sample size for a paired t-test if $d=0.45$ and $\mathrm{power}=0.8$, and the alternative hypothesis is `"greater"`. Assign it to a variable called `n`.

Calculate the effect size achieved for a one-sample t-test given $n=30$ and $\mathrm{power}=0.8$. Assign it to a variable called `d`.

### Setup

In [28]:
power = None
n = None
d = None

### Answer

In [29]:
# your code here

### Result

In [30]:
vars = [power, n, d]
if all([v is not None for v in vars]):
    # this should print your variables
    print(power, n, d)
    
    # this should not throw any errors
    assert round(power, 2) == 0.46
    assert round(n, 2) == 31.93
    assert round(d, 2) == 0.53

## Problem (graduate students): ANOVA power analysis (3 points)

Read about the Pingouin `power_rm_anova` [function](https://pingouin-stats.org/build/html/generated/pingouin.power_rm_anova.html). This [documentation](https://cran.r-project.org/web/packages/effectsize/vignettes/anovaES.html) from the R `effectsize` package gives background on the $\eta^2$ measure of effect size.

Calculate the statistical power for a repeated-measures ANOVA where $\eta^2=0.1$, there are three repeated measurements $m$ (that is, three within-subjects conditions), and 10 participants $n$. Assign it to a variable called `power`.

Calculate the required sample size for a repeated-measures ANOVA where $\eta^2=0.1$, there are four within-subjects conditions $m$, and $\mathrm{power}=0.8$. Assign it to a variable called `n`.

Calculate the required effect size for a repeated-measures ANOVA where there are 25 participants $n$, five within-subjects conditions $m$, and $\mathrm{power}=0.8$. Assign it to a variable called `eta_squared`.

### Setup

In [31]:
power = None
n = None
eta_squared = None

### Answer

In [32]:
# your code here

### Result

In [33]:
vars = [power, n, eta_squared]
if all([v is not None for v in vars]):
    # this should print your variables
    print(power, n, eta_squared)
    
    # this should not throw any errors
    assert round(power, 2) == 0.56
    assert round(n, 2) == 13.61
    assert round(eta_squared, 2) == 0.05

## Problem (graduate students): pairwise tests (2 points)

The [pairwise_tests](https://pingouin-stats.org/build/html/generated/pingouin.pairwise_tests.html#pingouin.pairwise_tests) function can be used to compare all individual groups of a given factor. It includes various methods (defined using the `padjust` input) for dealing with the [multiple comparisons problem](https://en.wikipedia.org/wiki/Multiple_comparisons_problem). When many statistical tests are run, the probability of there being at least one false positive increases rapidly. Various methods have been developed to control the rate of false positives while keeping statistical power relatively high.

Given the `m_st` DataFrame, which has the average recognition memory performance for each subject and study time (1 2, or 4 seconds), use `pg.pairwise_tests` to test all pairwise comparisons between study time conditions. Use the FDR-BH method (false-discovery rate control using the Benjamini–Hochberg procedure) to calculate adjusted p-values. Assign the resulting DataFrame to a variable called `pw_tests`.

### Setup

In [34]:
pw_tests = None
m_st.head()

subject,study_time,performance
str,i64,f64
"""subj01""",1,0.2
"""subj01""",2,0.0
"""subj01""",4,0.45
"""subj02""",1,0.1
"""subj02""",2,0.2


### Answer

In [35]:
# your code here

### Result

In [36]:
vars = [pw_tests]
if all([v is not None for v in vars]):
    # this should print your variables
    display(pw_tests)
    
    # this should not throw any errors
    assert (pw_tests["p-adjust"] == "fdr_bh").all()
    assert round(pw_tests.loc[0, "T"], 2) == -3.65
    assert pw_tests.loc[0, "dof"] == 29
    assert round(pw_tests.loc[1, "T"], 2) == -5.10
    assert pw_tests.loc[1, "dof"] == 29
    assert round(pw_tests.loc[2, "T"], 2) == -1.96
    assert pw_tests.loc[0, "dof"] == 29