# 3A: Art and Math - Exploring Variation and the DGP

In [None]:
# This code will load the R packages we will use

suppressPackageStartupMessages({
    library(mosaic)
    library(supernova)
    library(fivethirtyeight)
    library(Lock5withR)})

font_size = function (size) {
    theme(text = element_text(size = size))}

## 1.0 - Where are our eyes?

<img src="https://i.imgur.com/dJgmIbz.png" title="a sketch of a head" width = 20% align = "left" valign = "bottom"/> 
When people are taught to sketch faces, they are told to place the eyes at the midline of the head (see picture). This sometimes strikes people as wrong because they think the eyes are slightly above the midline of the head. What do you think about this idea?


## 2.0 - Head Sketch Survey Data

To help us put this artistic advice to the test, some students from Cal State LA have collected some data by measuring their heads. The whole survey can be found here: [Head Sketch Survey](https://docs.google.com/document/d/10X8G1BPz0nlEi409VV4LuORQNKf8LG4oCeB4wHhZN5Q/edit).

First, students were asked to guess whether their eyes were at their midline (before measuring). Then they measured their heads with a centimeter ruler. They filled in these boxes.

|                                               |                                           | 
|-----------------------------------------------|-------------------------------------------| 
|   <img src="https://imgur.com/8O6tz2e.png" title="Students saw an oval representing a head and had to fill in the length of their head (if their chin was considered 0 centimeters) and calculate the midline." width = "100%"/>  |   <img src="https://imgur.com/0RWl1kR.png" title="Students were also asked how much higher their eyes were relative to midline." width = "100%"/>   | 

Afterwards, they had a partner measure their heads for them. 


In [None]:
# Run this code to save the csv file into a data frame
# Why doesn't it print anything?

tidyheads <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vTJaUfRbLITc3EKByD3oDD4PhwfPBTN6FdAnCJUukTQwfd_k8ZGOzFj8wWwPIc1tw9XvJkPxz7k41zV/pub?output=csv", header = TRUE)


Here are the variables you will find in this data frame:

- `Participant` The participant number for each person
- `Head` Length of head in cm (self-measured)
- `Midline` How high up the midline should be in cm (self-measured)
- `EyesHigherMid` How much higher than the midline the eyes are in cm (self-measured)
- `EyeLevel` How high up the eyes are relative to chin in cm (self-measured)
- `Head2` Length of head in cm (partner-measured)
- `Midline2` How high up the midline should be in cm (partner-measured)
- `EyesHigherMid2` How much higher than the midline the eyes are in cm (partner-measured)
- `Height` Height in cm
- `EyeThink` Whether students thought their eyes were basically at, above, or below the midline of their head (before measuring)
- `EyeActual` Whether students’ eyes were actually at, above, or below the midline of their head (after measuring) based on their self-measurements


2.1 - Which of the variables might help us explore the idea that our eyes are at the midline of our face? 

In [None]:
# Write some R code to take a look at the values for that variable (or those variables). 
# What do you notice?


## 3.0 - Exploring and Describing the Data

3.1 - Which of these variables can we explore with a histogram: `EyesHigherMid` or `EyeActual`? If you aren’t sure, go ahead and try out a little bit of R code.

3.2 - Create a histogram with that variable (feel free to play around with arguments and chain markers such as color, fill, labels, density curves, bins, and binwidth!).

3.3 - Why does this distribution look like this? Does this help us explore whether our eyes are generally around the midline of our face? 

## 4.0 - Cleaning the Data

4.1 - What can we do to better explore the idea that our eyes are around the midline of our face? 

(Hint: You might want to check out the variables `EyesHigherMid` and `EyeActual`. If you clean your data up, save your new variable as `EyesHigherMid.clean`.)

4.2 - What does the data tell you about the “eyes at midline” artistic advice?

## 5.0 - Thinking about the DGP and Different Samples

|    With Mistakes                                           |     Without Mistakes                                      | 
|-----------------------------------------------|-------------------------------------------| 
|   <img src="https://imgur.com/DQWEHm5.png" title = "temp" width = "90%"> |  <img src="https://imgur.com/ZBy8fD8.png" width = "90%"> |


5.1 - If a different class of college students measured how much higher their eyes are above the midline, would the distribution of their data be similar? Would it be more similar to the one with mistakes or the one without mistakes? How would it be similar/different? 


5.2 - Notice that you have a little theory about the Data Generating Process (DGP)! That even if people’s eyes are not actually 12 cm above the midline, we will still see those data. What is our little theory about people?


5.3 - We know that even if people basically act in the same way, a new sample of data won’t come out exactly the same. How different could samples be? We can use R to mock up a DGP that creates samples similar to our actual data. 

Try running the following code, what is it doing? 

In [None]:
newsample <- resample(tidyheads, 3)
newsample

5.4 - Modify the code below to pretend to do this little study again with another 18 students. How would we take a look at the new distribution of `EyesHigherMid`? 

In [None]:
newsample <- resample(tidyheads, 3)


5.5 - Run the code you wrote above a few times. What do you notice? What changes across these simulated samples? What stays the same?

## 6.0 - Resample versus Sample

6.1 - In addition to the function `resample()` there is another function called `sample()` that is very similar. Here we have copied and pasted the code from above. But this time, modify it to create a new sample with the function `sample()` instead of `resample()`. 

In [None]:
newsample <- resample(tidyheads, 18)

gf_histogram(~ EyesHigherMid, data = newsample, color = "put in a color here")

6.2 - Run it a few times. What do you notice? Does it change? Why or why not? What’s different about sample and resample?

6.3 - Could we use `sample()` to pretend to do this study again with 100 students? Could we use `resample()`? Try them to see what happens.

6.4 - Try resampling with 18, 100, or 5000 students. Which of these looks the most like the sample that we started out with? Why?

In [None]:
newsample <- resample(tidyheads, 18)

gf_histogram(~ EyesHigherMid, data = newsample, color = "put in a color here")

## 7.0 - Reflect and Connect


7.1 - How did the shapes of the distributions we looked at today connect to the data generating process?


### By default, please close and halt!

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=b397dcc9-fa0e-422f-bc9c-1a96cd02ab2e' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>