# Art and Math - Exploring Variation and the DGP 

## Chapter 3.10-3.13 Samples, Populations, and the Data Generating Process (DGP)

In [None]:
# This code will load the R packages we will use
suppressPackageStartupMessages({
    library(coursekata)
})

# set styles
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

# Load the data frame: 
Eyes <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vTJaUfRbLITc3EKByD3oDD4PhwfPBTN6FdAnCJUukTQwfd_k8ZGOzFj8wWwPIc1tw9XvJkPxz7k41zV/pub?output=csv", header = TRUE) %>%
  select(Participant, Midline, EyesHigherMid, EyeLevel)
names(Eyes)[4] <- "EyesHigherChin"

## 1 Where are our eyes?

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/jnb_SyJmv8SS-d-Jgm-Ibz-Imgur.png" title="a sketch of a head" width = 20% align = "left" valign = "bottom"/> 
When people are taught to sketch faces, they are told to place the eyes at the midline of the head (see picture). This sometimes strikes people as wrong because they think the eyes are slightly above the midline of the head. What do you think about this idea?


### The Head Sketch Survey Data: `Eyes`

To help us put this artistic advice to the test, some students from Cal State LA have collected some data by measuring their heads. The whole survey can be found here: [Head Sketch Survey](https://docs.google.com/document/d/10X8G1BPz0nlEi409VV4LuORQNKf8LG4oCeB4wHhZN5Q/edit). Feel free to look the survey document over to get an idea of the instructions they had to follow (but you do NOT need to measure your own head).

First, students were asked to guess whether their eyes were at their midline (before measuring). Then they measured their heads with a centimeter ruler. They filled in these boxes.


### 1.1 Look at the data frame.

In [None]:
# Print a little of the `Eyes` data frame


Here are the descriptions for the variables you will find in the `Eyes` data frame:

- `Participant` The participant number for each person
- `Midline` How high is your midline (from your chin to the top of your head), in cm (self-measured)?
- `EyesHigherMid` How much higher than the midline are your eyes, in cm (self-measured)?
- `EyesHigherChin` How high are your eyes from your chin, in cm (self-measured)?


<img src="https://i.postimg.cc/81twFD9j/head-sketch.png" title="a sketch of a head with descriptions labeling where each of the measurements were taken" width = 70% align = "center"/> 

## 2 Exploring and Describing the Data

### 2.1 Create a histogram to explore variation in `EyesHigherMid`.

Feel free to play around with arguments and chain markers such as color, fill, labels, density curves, bins, and binwidth!

In [None]:
# run code here



### 2.2 Why does this distribution look like this? Does this help us explore whether our eyes are generally around the midline of our face? 

2.2 Response:

### 2.3 How might we use the other variables in the dataset to figure out the distance between the middle of the face and the eyes?

2.3 Response:

### 2.4 Correct the mistake and save it as a new variable to better explore the idea that our eyes are around the midline of our face.

In [None]:
# You can correct the mistakes in EyesHigherMid by 
# subtracting Midline from EyesHigherChin
Eyes$EyesCalc <- (Eyes$_______) - (Eyes$_______)

# Check that the correction worked
head(Eyes)

# Explore the new variable `EyesCalc` with a visualization



### 2.5 What does the data tell you about the “eyes at midline” artistic advice?

2.5 Response:

## 3 Thinking about the DGP and Different Samples

|    With Mistakes                                           |     Without Mistakes                                      | 
|-----------------------------------------------|-------------------------------------------| 
|   <img src="https://i.postimg.cc/dJ22zv9L/with-mistakes.png" title = "temp" width = "90%"> |  <img src="https://i.postimg.cc/81TRG3yw/without-mistakes.png" width = "90%"> |


### 3.1 - If a different class of college students measured how much higher their eyes are above the midline, would the distribution of their data be similar? 

Would it be more similar to the one with mistakes or the one without mistakes? How would it be similar/different? 

(Notice that you are generating a little theory about the Data Generating Process (DGP)!)


3.1 Response:

### 3.2 Modify the code below to pretend to do this little study again with another 18 students, and run it a few times. What changes and what stays the same?

In [None]:
# Modify to take a random sample of 18 students
new_sample <- resample(Eyes, 3)

# Look at new distribution of EyesHigherMid
gf_histogram(~EyesHigherMid, data=new_sample)

3.2 Response:



### 3.3 Why is it helpful to look at all these different random samples? 

3.3 Response:

### 3.4 Could we use `sample()` to pretend to do this study again with 18 students? How about 100 students? Try it to see what happens.

In [None]:
# Changed from resample() to sample()
# Try simulating 18 students, then 100 students
new_sample <- sample(Eyes, 18)

gf_histogram(~ EyesHigherMid, data = new_sample, fill = "blue")


3.4 Response:

### 3.5 Try resampling with 18, 100, or 5000 students. Which of these looks the most like the sample that we started out with? Why?

In [None]:
# Sample of 18 students
new_sample1 <- resample(Eyes, 18)

gf_histogram(~ EyesHigherMid, data = new_sample1, fill = "blue")

# Sample of 100 students
new_sample2 <- resample(Eyes, 100)

gf_histogram(~ EyesHigherMid, data = new_sample2, fill = "yellow")

# Sample of 5000 students
new_sample3 <- resample(Eyes, 5000)

gf_histogram(~ EyesHigherMid, data = new_sample3, fill = "green4")

3.5 Response:

## 4 Shape and the DGP


### 4.1 How did the shapes of the distributions we looked at today connect to the data generating process?


4.1 Response: