# Analysis

The main thing we'll go through today is analysis of the Demo Experiment data. We have data from 8 participants who completed the experiment.

**Before we can start:**

**a**) [Download the demographics data](demog.csv) as a `.csv` file, and place it in a directory that makes sense for you (e.g., called "demo_analysis").

**b**) [Download the stimuli](stimuli.csv) as a `.csv` file, and place it in the same demo analysis folder.

**c**) [Download the trials data](demo_data.zip) as a `.zip` file.

**d**) Extract the `trials` folder from the `.zip` file and put it in your demo analysis folder.

**e**) Start a new R script in RStudio, and save the `.R` file to the demo analysis folder.

After following these instructions, you should have a directory structure that looks something like this:

```
demo_analysis
├─ trials
│  ├─ trials_p171220.csv
│  ├─ trials_p203530.csv
│  └─ ...
├─ analysis.R
└─ demog.csv
└─ stimuli.csv
```

Once everyone has completed these steps, we'll begin...

## 1: Load the Relevant Libraries

These are the libraries we'll use for the analysis. Each library has a comment explaining what we will use it for.

In [19]:
options(repr.plot.width=3.5, repr.plot.height=3)
library(ggplot2)
theme_set(theme_classic() + theme(legend.position="top"))

In [20]:
library(readr)    # for reading the data into R
library(purrr)    # for easily importing multiple files
library(dplyr)    # for wrangling data (e.g., adding/renaming columns)

library(ggplot2)  # for visualising data

## 2. Import the Demographic Data

First, we import the demographics data. This gives us some details about the participants who took part.

In [21]:
demog <- read_csv("demo_demog.csv")

[1mRows: [22m[34m9[39m [1mColumns: [22m[34m7[39m
[36m--[39m [1mColumn specification[22m [36m--------------------------------------------------------[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): participant_id, gender, dominant_hand, first_language
[32mdbl[39m (3): age, frame_rate_Hz, total_time_secs

[36mi[39m Use `spec()` to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


This will print some details about the columns imported. Let's have a look at the data we imported.

In [22]:
demog

participant_id,age,gender,dominant_hand,first_language,frame_rate_Hz,total_time_secs
<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>
p171220,18,female,right,german,60.93062,720.8187
p203530,23,female,right,german,60.14155,544.3295
p209846,23,male,right,german,59.94025,775.5403
p215942,27,female,right,other,59.85515,734.7305
p233174,22,male,left,german,,597.8216
p331426,20,female,right,german,59.16292,784.9782
p498198,22,female,left,german,59.87221,837.8138
p699576,20,male,right,german,60.02556,1008.8191
p908815,19,male,right,german,,


In [23]:
demog |> print()

[90m# A tibble: 9 x 7[39m
  participant_id   age gender dominant_hand first_language frame_rate_Hz
  [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m         [3m[90m<chr>[39m[23m                  [3m[90m<dbl>[39m[23m
[90m1[39m p171220           18 female right         german                  60.9
[90m2[39m p203530           23 female right         german                  60.1
[90m3[39m p209846           23 male   right         german                  59.9
[90m4[39m p215942           27 female right         other                   59.9
[90m5[39m p233174           22 male   left          german                  [31mNA[39m  
[90m6[39m p331426           20 female right         german                  59.2
[90m7[39m p498198           22 female left          german                  59.9
[90m8[39m p699576           20 male   right         german                  60.0
[90m9[39m p908815           19 male 

## 3: Import the Stimuli

Next we import the list of stimuli:

In [24]:
stim <- read_csv("stimuli.csv")

stim

[1mRows: [22m[34m300[39m [1mColumns: [22m[34m7[39m
[36m--[39m [1mColumn specification[22m [36m--------------------------------------------------------[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): condition, text, corr_ans
[32mdbl[39m (4): item_nr, snd, ZipfSUBTLEX, word_len

[36mi[39m Use `spec()` to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


item_nr,condition,text,corr_ans,snd,ZipfSUBTLEX,word_len
<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
1,word,Akzente,ralt,0.7164028,2.71,7
1,pseudoword,Unwante,lalt,0.5337820,,7
2,word,Drücker,ralt,0.5730488,3.01,7
2,pseudoword,quokker,lalt,0.5511313,,7
3,word,Erwachsener,ralt,0.6809480,3.95,11
3,pseudoword,Exzingsener,lalt,0.5324633,,11
4,word,Exorzisten,ralt,0.6235078,2.64,10
4,pseudoword,ucinzisten,lalt,0.4170587,,10
5,word,Fabel,ralt,0.6343845,2.60,5
5,pseudoword,Pafel,lalt,0.5519994,,5


In [25]:
stim |> print()

[90m# A tibble: 300 x 7[39m
   item_nr condition  text        corr_ans   snd ZipfSUBTLEX word_len
     [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m    [3m[90m<dbl>[39m[23m       [3m[90m<dbl>[39m[23m    [3m[90m<dbl>[39m[23m
[90m 1[39m       1 word       Akzente     ralt     0.716        2.71        7
[90m 2[39m       1 pseudoword Unwante     lalt     0.534       [31mNA[39m           7
[90m 3[39m       2 word       Drücker     ralt     0.573        3.01        7
[90m 4[39m       2 pseudoword quokker     lalt     0.551       [31mNA[39m           7
[90m 5[39m       3 word       Erwachsener ralt     0.681        3.95       11
[90m 6[39m       3 pseudoword Exzingsener lalt     0.532       [31mNA[39m          11
[90m 7[39m       4 word       Exorzisten  ralt     0.624        2.64       10
[90m 8[39m       4 pseudoword ucinzisten  lalt     0.417       [31mNA[39m          10
[90m 9[39m     

This has 8 columns:

* `item_nr` - an ID for each matched pair of words and pseudowords

* `condition` - whether each item is a `"word"` or `"pseudoword"`

* `text` - the text shown to the participant

* `corr_ans` - whether the correct answer was to press the left alt key (`"lalt"`) or the right alt key (`"ralt"`)

* `snd` - semantic neighbourhood diversity, calculated using the fastText model, using the method described by Hendrix & Sun ([2020](https://doi.org/10.1037/xlm0000819))

* `ZipfSUBTLEX` - a measure of word frequency from a german subtitle corpus: SUBTLEX-DE (Bysbaert et al., [2011](https://doi.org/10.1027/1618-3169/a000123)). For more info on the "Zipf" measure, see Brysbaert et al. ([2017](
https://doi.org/10.1177/096372141772752))

* `word_len` - number of letters in the word / pseudoword

## 3: Import the Trials

Let's import the experiment data from the `trials` folder:

In [26]:
# first, get a list of all .csv data files
data_paths <- list.files("trials", pattern=".*\\.csv$", full.names=TRUE)

# now, iterate over these with `read_csv()` to import each
trials <- map_df(data_paths, read_csv)

# print the contents
trials

[1mRows: [22m[34m310[39m [1mColumns: [22m[34m8[39m
[36m--[39m [1mColumn specification[22m [36m--------------------------------------------------------[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): participant_id, condition, text, resp
[32mdbl[39m (3): item_nr, rt_ms, time_so_far_secs
[33mlgl[39m (1): is_practice_trial

[36mi[39m Use `spec()` to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m310[39m [1mColumns: [22m[34m8[39m
[36m--[39m [1mColumn specification[22m [36m--------------------------------------------------------[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): participant_id, condition, text, resp
[32mdbl[39m (3): item_nr, rt_ms, time_so_far_secs
[33mlgl[39m (1): is_practice_trial

[36mi[39m Use `spec()` to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set `show_col_types = FA

participant_id,item_nr,condition,text,is_practice_trial,resp,rt_ms,time_so_far_secs
<chr>,<dbl>,<chr>,<chr>,<lgl>,<chr>,<dbl>,<dbl>
p171220,4,pseudoword,segt,TRUE,lalt,887.1145,34.90737
p171220,4,word,sagt,TRUE,ralt,682.5982,37.31386
p171220,5,pseudoword,tobreren,TRUE,lalt,751.4031,39.54651
p171220,1,pseudoword,Bage,TRUE,lalt,529.5305,41.86358
p171220,3,pseudoword,Torrerk,TRUE,lalt,551.4779,43.93993
p171220,3,word,korrekt,TRUE,ralt,489.4925,46.05284
p171220,2,pseudoword,sanzes,TRUE,lalt,455.5583,48.10536
p171220,2,word,Mannes,TRUE,lalt,565.0679,50.11874
p171220,1,word,Haie,TRUE,ralt,632.8114,52.23046
p171220,5,word,sicheren,TRUE,ralt,588.5707,54.41275


In [27]:
trials |> print()

[90m# A tibble: 2,480 x 8[39m
   participant_id item_nr condition  text     is_practice_trial resp  rt_ms
   [3m[90m<chr>[39m[23m            [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m    [3m[90m<lgl>[39m[23m             [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
[90m 1[39m p171220              4 pseudoword segt     TRUE              lalt   887.
[90m 2[39m p171220              4 word       sagt     TRUE              ralt   683.
[90m 3[39m p171220              5 pseudoword tobreren TRUE              lalt   751.
[90m 4[39m p171220              1 pseudoword Bage     TRUE              lalt   530.
[90m 5[39m p171220              3 pseudoword Torrerk  TRUE              lalt   551.
[90m 6[39m p171220              3 word       korrekt  TRUE              ralt   489.
[90m 7[39m p171220              2 pseudoword sanzes   TRUE              lalt   456.
[90m 8[39m p171220              2 word       Mannes   TRUE              lalt   

This gives us one data frame with all trials in the experiment.

There are 9 columns of data:

* `participant_id` - a unique ID for each paricipant

* `item_nr`- the same item ID as in `stim`

* `condition` - whether the trial showed a `"word"` or `"pseudoword"`

* `text` - the text that the participant saw

* `is_practice_trial` - whether each trial was a practice trial. The first 10 trials for each participant will be practice trials.

* `resp` - left alt key (`"lalt"`) or the right alt key (`"ralt"`)

## 4: Join the Data Together

