# Week 08: Basic Inference Statistics

## Introduction 

In this tutorial, we will learn to how to perform basic tests for null-hypothesis testing.

**Preparation and session set up**

Before turning to the code below, please install the packages by running the code below this paragraph. If you have already installed the packages mentioned below, then you can skip ahead and ignore this section. To install the necessary packages, simply run the following code - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).


In [None]:
# install packages
#install.packages("here")
#install.packages("dplyr")
#install.packages("ggplot2")


Now that we have installed the packages, we activate them as shown below.



In [None]:
# activate packages
library(here)
library(dplyr)
library(ggplot2)


##  Tutorial Activity 

Go into groups - each group and help each other to bring the data into the correct format, visualize the data and perform the test..

## Task 1

In English, we commonly use suffixation, the plural -s, to indicate number (*one tree* vs *three tree***s**) while other languages, e.g., Japanese or Chinese do not use suffixation to indicate plurality (*one tree* vs *three tree*). 

The data represent nouns produced by intermediate and advanced learners of English and if the nouns had a number marking error. 

RQ: Do more advanced learners produce fewer errors in terms of number marking?

Perform a X2-test to answer the RQ. 

**Load data**


In [None]:
# load data
dat <- readxl::read_excel(here::here("data", "week8d1.xlsx"))
# inspect
head(dat)


Bring data into correct format.



In [None]:
x2dat <- dat %>%
  dplyr::group_by(Proficiency, PluralError) %>%
  dplyr::summarise(Errors = n()) %>%
  tidyr::spread(PluralError, Errors) %>%
  dplyr::ungroup() %>%
  dplyr::select(-Proficiency) %>%
  as.matrix()
rownames(x2dat) <- names(table(dat$Proficiency))
x2dat


Visualize data

Assocplot


In [None]:
assocplot(x2dat)



Assocplot



In [None]:
mosaicplot(x2dat, shade = T)



In [None]:
chisq.test(x2dat)



Determine effect size



In [None]:
effectsize::cramers_v(x2dat)



Write-up results


> A χ2-test confirmed a highly significant correlation of moderate size between the proficiency level of speakers and their likelihood to commit errors in plural marking (χ2 = 16.81, df = 1, p < .001***, ϕ = .30).


## Task 2

Paired t-test. 

45 students received training in avoiding errors when writing essays in English.

The students were asked to write a 1,000 word essay before the training and at the end of the training.

Use a paired t-test to see if the trainign was successful.


In [None]:
# load data
tdat <- readxl::read_excel(here::here("data", "week8d2.xlsx"))
# inspect
head(tdat)


In [None]:
tdat %>%
  tidyr::gather(Tested, Errors, Before:After) %>%
  ggplot(aes(Tested, Errors)) +
  geom_boxplot()


Perform test



In [None]:
t.test(tdat$Before, tdat$After, paired = T)



Write-up results



In [None]:
report::report(t.test(tdat$After, tdat$Before, paired = T))



## Task 3

Independent t-test.

Imagine you want to investigate if L1 Japanese learners of English differ in the length with which they produce vowel sounds from L1 Australian English speakers. This would be important because vowel length in English is meaning distinguishing as in *bit* vs *beat*. Thus, English speakers pay particular attestation to vowel duration and notice unnaturally long and short vowels as being weird or more difficult to understand.

To this end, Martin has extracted vowel duration for you from Japanese learners of English and L1 English speakers.

The RQ is if Japanese learners of English differ from L1 English speakers in terms of vowel duration.

Can you answer the RQ based on the week8t3.xlsx data set?

Load data


In [None]:
# load data
t2dat <- readxl::read_excel(here::here("data", "week8d3.xlsx"))
# inspect
head(t2dat)


In [None]:
t2dat  %>%
  ggplot(aes(L1, Duration)) +
  geom_boxplot()


Perform independent t-test



In [None]:
t.test(t2dat$Duration ~ t2dat$L1)



Write-up results



In [None]:
report::report(t.test(t2dat$Duration ~ t2dat$L1))



## Task 4

You go to a party organized by the UQ R Users Group and ask people how many alcoholic drinks they have had. Most people there did not drink any alcohol or only 1 drink (but Martin and his friends had a few drinks - bad Martin!). You then ask everybody to read a tongue twister and you record how often they made a mistake. 

You do this to find out if drinking alcohol leads to more errors when speaking.

Load the data set `week8d4.xlsx`. Visualize the dat and have a look - what test should you use? 


In [None]:
wdat <- readxl::read_excel(here::here("data", "week8d4.xlsx"))
# inspect
head(wdat)


In [None]:
wdat  %>%
  ggplot(aes(Drinks)) +
  geom_density(fill = "lightblue", alpha = .2) +
  theme_bw()


In [None]:
wdat  %>%
  ggplot(aes(Errors)) +
  geom_density(fill = "orange", alpha = .2) +
  theme_bw()


In [None]:
wilcox.test(wdat$Drinks - wdat$Errors, paired = F) 



In [None]:
report::report(wilcox.test(wdat$Drinks - wdat$Errors) )



## Task 5 (Optional)

Imagine, you want to test if more proficient students are faster at recognizing words than less proficient students. To this end, you have set up an experiment where participants are shown non-words such as *swimp* or *doughten* as well as proper English words such as *drank* and *wimpering* and recorded the time it took participants to press a key when they recognized that the prompt was a real word. 

The idea is that you could simply show students prompts and have them press keys to determine their proficiency rather than having them perform IELTS tests.

What test would you use to test if reaction times correspond with proficiency?

Load the data set `week8d5.xlsx` and perform the test. What does it show?


In [None]:
odat <-  readxl::read_excel(here::here("data", "week8d5.xlsx"))
# inspect
head(odat)


Visualize



In [None]:
ggplot(odat, aes(Word, ReactionTime)) +
  geom_boxplot()


In [None]:
t.test(odat$ReactionTime ~ odat$Word)



In [None]:
report::report(t.test(odat$ReactionTime ~ odat$Word))

