# 8B: Does Laptop Use Distract Other Students?

In [None]:
# Load the CourseKata library
suppressPackageStartupMessages({
    library(coursekata)
    library(gridExtra)
})

## Laptop Distraction Study

Do you ever use your laptop to "multitask" during class (e.g., check social media, surf websites, shop)? Is it possible that your laptop use is affecting the other students around you? 

Some researchers decided to study this question with students trying to learn during a meteorology lecture. These students were randomly assigned to be seated right behind students using their laptop to multitask *or* right behind students who were taking paper notes. 

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/jnb_dJ1DDtFP-image.png">
 
At the end of the lecture, all participants were tested on their factual comprehension of the lecture (20 questions) and their ability to apply the knowledge they learned (20 questions).

#### Study information

Sana, F., Weston, T., & Cepeda, N. J. (2013) https://doi.org/10.1016/j.compedu.2012.10.003



## 1.0 - The Data & Hypothesis

**1.1:** What are your thoughts about viewing other students' multitasking? Do you think it would have an effect on learning? 


**1.2:** In the cell below, take a look at the data frame `laptops`. Each row in the data frame contains data from a student participant.

Which columns should we learn more about?

In [None]:
# Load the data frame
link <- "https://docs.google.com/spreadsheets/d/e/2PACX-1vQA7gzpSepI7u5zu7JaMpxdEDNpreOGuHQlPcPx_CsONMmqGSEoU2qePuOWnVh_kErQnbnJ_eCqIzzz/pub?gid=1809544437&single=true&output=csv"
laptops <- read.csv(link, header = TRUE)

# Take a look at the data frame


If you do have questions about the variables, take a look at these **variable descriptions**:

- `id` ID for the participant.
- `condition` Whether the student could see another student on a laptop multitasking (view) or not (no-view)
- `fact` The proportion of fact-based questions answered correctly by a student.
- `applied` The proportion of knowledge application questions answered correctly by a student.
- `total` The proportion of all questions answered correctly by a student.
- `age` The age of the student in years
- `gender` Whether the student was female (1) or male (2)
- `english_first` Whether English is the student’s first language (1) or not (2)
- `familiarity` How much familiarity the student had with the lecture content, self-rated from none (1), to somewhat (4), to very (7)
- `interesting` How interesting the student found the lecture content, self-rated from none (1), to somewhat (4), to very (7)
- `engaging` How engaging the student found the lecture content, self-rated from none (1), to somewhat (4), to very (7)
- `notes_preference` The method that the student generally prefers to take notes: pen (1), laptop (2), audio (3), or none (4)
- `distracted` How distracted the student felt by the confederates, self-rated from not applicable(0) or none (1),  to somewhat (4), to very (7)
- `distraction_effect` How detrimental the student felt the distraction was to their learning, self-rated from not applicable (0) or none (1), to somewhat (4), to very (7)

In addition to these variables, the quantity and quality of the notes were rated by the experimenter most familiar with the material. The experimenter did not know which condition the notes were from (i.e., they were *blind* to participants' condition) while scoring. 

- `notes_quantity` The amount of notes taken by the student during the lecture were rated by the most knowledgeable experimenter who did not know what condition the notes came from; experimenter-rated from few (1), to average (4), to a lot (7)
- `notes_quality` Similar to the notes quantity, the quality of the notest taken by the student during the lecture; experimenter-rated from poor (1), to average (4), to great (7)



**1.3:** If viewing other people's laptops indeed affect students, which of the variables above might be interesting outcomes to consider? 


## 2.0 - Explore and Model Variation

**Hypothesis:** The researchers had predicted that `total` performance would differ based on which `condition` students were in.

**2.1:** Write a word equation and modify the jitter plot below to help us explore this hypothesis. What do you think of this hypothesis from the data that you see?

In [None]:
#gf_jitter( ~ , data = laptops, width = .1) 

**2.2:** In the code cell *above*, find the best fitting model of your word equation and put it on the visualization above.

**2.3:** Write your best fitting model using GLM notation and interpret the parameter estimates. 

$$Y_i = ... X_i+ e_i$$

$$total_i = ... condition_i+ e_i$$

## 3.0 - Evaluate Models

### Evaluate the Model Fit to the Sample Data

**3.1:** What are some ways you have learned to evaluate this model?

### Evaluate the Models of the DGP

That's great that the best fitting model explains some of the variation in *this sample of data* (e.g., PRE = .37). But, what we care about is whether viewing multi-taskers explains some of the variation in the DGP!

**Word Equations (and GLM Models) of the DGP**

Here's a word equation and model in GLM notation for the idea that `condition` has *something* to do with the variation in `total` in the DGP.

- **total = condition + other stuff**
- $total_i = \beta_0 + \beta_1condition_i + \epsilon_i$

**3.1:** Write a word equation and GLM notation for the idea that `condition` has *nothing* to do with the variation in `total` (it's just other stuff).

**3.2:** Which DGP is being mimicked when we use the `shuffle()` function? Implement it in the plot provided below. 

In [None]:
# modify this code to mimic such a DGP
#gf_jitter(total ~ condition, data = laptops, width = .1, color = "navyblue") 


**3.3:** How is the data from the empty model of the DGP (aka shuffle) different from the original data? 

**3.4, Draw:** On the graph handout your instructor has printed for you, visually estimate and draw in the means for the original and shuffled data. We know the sample $b_1$ (the difference between the means) is -0.18. How do the shuffled b1s compare to the original? 

## 4.0 - Focus on the DGP where $\beta_1=0$

We saw in the graphs that most of the shuffled data have a $b_1$ close to 0. 

Here we want you to consider an analogy: DGPs generate data just like parents give birth to kids. The $\beta_1$ in the DGP is the parent and the $b_1$s in these samples are the kids. The kids ($b_1$s) tend to be similar to the parent ($\beta_1=0$).

**4.1:** Although the graphs were helpful for us to see that the $b_1$s are close to 0, we can skip straight to looking at the shuffled $b_1$s.

Run the code below a few times. Why does one stay the same and the other change?

In [None]:
sample_b1 <- b1(total ~ condition, data = laptops)
sample_b1

b1(shuffle(total) ~ condition, data = laptops)

**4.2:** Modify the code below to generate a bunch of $b_1$s (like 10 or 20) from the DGP where $\beta_1=0$. 

Recall that the sample $b_1$ in this laptops experiment was approximately -0.18. Where does the sample $b_1$ fall in relation to these shuffled $b_1$s?


In [None]:
b1(shuffle(total) ~ condition, data = laptops)

**4.3:** So what do you think about the DGP where $\beta_1=0$? Could it have been the DGP that generated our sample $b_1$?

**4.4:** Going back to the researchers, what does this mean for their hypothesis? 

## 5.0 - Exploring Other Outcome Variables

**5.1:** There are other outcome variables in this data frame that the researchers measured. Perhaps `condition` had an effect on other outcomes as well. Develop your own hypothesis and analyze the data. 
