# The Mean as a Model (COMPLETE)

## Chapter 5.1-5.4 Overview Notebook

In [None]:
# run this to set up the notebook
library(coursekata)
library(IRdisplay)
# set styles
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles_v2.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))
# load data
lungs <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSz3JaTYY0RXoLQU-o1S45MDKudy6QfIQDtGLGoiy54JOydB7iELKxGbPqo_0uEvKUGCpZ_UQxu4PgM/pub?gid=1447217380&single=true&output=csv")
FEV <- c(3.0,3.2,3.8,4.6,3.8,2.3,4.8,4.1)
row <- c(1:8)
FEV_data <- data.frame(row, FEV)

<div class="teacher-note">
<b>Section Goals:</b> In this section students will explore the mean as a model, the simplest of all models. Students will learn that the mean is often the best prediction of a future value if you know nothing else about the observation.

- Students will explore the mean as a model, and begin to understand that it is a better model than many other models that represent an entire distribution with a single number because the mean balances the deviations (or more generally residuals) above and below it.
- Students will compare the median as a model to the mean, and also reflect on why using the mean as the predicted value for any new observation is better than having no model at all (i.e., just making a wild guess).
- Students will begin to develop an informal idea of "best-fitting" model, that although all models are imperfect, the mean is just as likely to yield a prediction that is too high as one that is too low, and it thus minimizes error.
- Students will be introduced to the idea that DATA = MODEL + ERROR. Using hand calculations, they will see that for each data point in a small sample, the model prediction from the empty model plus the residual (our measure of error from the model) sums up to 0.
</ul>
A <a href="https://docs.google.com/document/d/17yMnTGV4U0MW66g8Unj3VLTIhlxAiCsvB_m9kbw4xCw/edit?usp=sharing" target="_blank">printable student guided-notes worksheet</a> is available to go with this Jupyter notebook, as well as a student version of this notebook.
</div>

<a id =9> </a> 
# Table of Contents

1 [What Is a Model?](#1)

2 [Creating a Simple Model](#2)

3 [We Just Made a Statistical Model](#3)

4 [DATA = MODEL + ERROR](#4)

5 [Exploring the Mean With a Little More Data](#5)

6 [Making Predictions for the 8 Individuals in FEV_data](#6)

7 [Summing the Residuals](#7)
  

<div style="text-align: right">
  <a href="#9">Back to top</a>
</div>

<a id =1> </a>
## 1 What Is a Model?

### 1.1 Word Equations as Models

So far, we’ve used word equations as informal models. For example:

**lung function = smoking + other stuff**

This expresses the hypothesis that smoking can explain the variation in lung function, but other stuff (other explanatory variables) will too. 

### 1.2 Why We Need Statistical Models

Word equations are useful for expressing ideas, but they can’t make predictions. To do this, we need data. This means we need to measure things like lung function and smoking status. With measurements in hand, we can build a **statistical model** to generate predictions.

> A working definition of statistical model: **A statistical model is a function (think mathematical function) that generates a predicted value on an outcome variable.** 

### 1.3 Simple Example of a Statistical Model

Let's say you want to predict the age of a randomly chosen 12th grader. All you know is that they are in 12th grade. You might use the age of a 12th grader you know (say, a 17-year-old) as your prediction. That number (17) serves as a simple statistical model.

<div style="text-align: right">
  <a href="#9">Back to top</a>
</div>

<a id =2> </a>
## 2 Creating a Simple Model

To begin our modeling journey, let's start with the simplest model of all: a model with no predictors. 

We’ll come back to the question of whether smoking lowers lung function. But before we go there: **What is the predicted level of lung function for people in general, whether they smoke or not?**

To make a quantitative prediction, we need a way to measure lung function (our outcome variable of interest). Lung function is often measured using a spirometer, which produces a value called FEV (Forced Expiratory Volume). It's measured like this: take in a deep breath, then exhale as fast and forcefully as you can into a spirometer. The amount of air you can blow out in the first second (in liters) is your `FEV`.

### Videos of Participants Testing Their FEV

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/student1.png" alt="Screenshot of student 1 testing their FEV" width=40%>

[Video 1](https://vimeo.com/1095937076)

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/student2.png" alt="Screenshot of student 2 testing their FEV" width=40%>

[Video 2](https://vimeo.com/1095937062)

<img src="https://coursekata-course-assets.s3.us-west-1.amazonaws.com/UCLATALL/czi-stats-course/student3.png" alt="Screenshot of student 3 testing their FEV" width=40%>

[Video 3](https://vimeo.com/1095937039)


<div class="discussion-question">
<h3>2.1 Discussion Question: Let's say you randomly select one of the students in your class. What would you predict would be their FEV? Explain how you made your prediction.</h3>
</div>

<div class="teacher-note">
<b>Teacher Note:</b> Most students will say they have no idea what a student's value on FEV might be. They know what a liter is, but unless they have more information such as what a typical student's FEV is, they won't have any basis on which to make a prediction. Help them see that if they knew something about the distribution of FEV, it would help them make a better prediction.
</div>

### 2.2 Using Data to Create Models

Most people aren't experts in lung function so it's hard to guess how much air someone can exhale in one second. That’s where data can help. Let's see how.

<div class="guided-notes">
    
### 2.3 One randomly-selected adult did the spirometer test. Their score on FEV was 3.0 liters. What would you predict the next randomly-selected adult's score to be?
- Explain how you arrived at your prediction.
    
</div>

<div class="teacher-note">
<b>Teacher Note:</b> Now students know at least what one possible score would be. In the absence of any other information, this score would be the best prediction of the next score, even though it's likely to be wrong. Make the point that unless they have reason to believe that this score is unusually high or low, it is the best they can do.
</div>

<div class="guided-notes">
    
### 2.4 A second randomly-selected adult did the spirometer test. Their score on FEV was 3.2 liters. Now you have two data points. What would you predict the next person's score to be? 
- Explain how you arrived at your prediction.
    
</div>

<div class="discussion-question">
<h3>2.5 Discussion Question: Did your prediction change from the previous one? Why or why not? If you had a third piece of data, would that change your prediction?</h3>
</div>

<div class="teacher-note">
<b>Teacher Note:</b> We want students to realize that the more information (data) they have, the better their prediction will be. We also want them to begin to realize that the mean of the known data points might be a good model for predicting a randomly selected person's score.
</div>

<div class="discussion-question">
<h3>2.6 Discussion Question: You've been predicting what the next randomly-sampled person's FEV will be. What if we asked you to predict the next 100 people's scores. Would you make the same prediction for all of them? Why or why not?</h3>
</div>

<div class="teacher-note"> 

<b>Teacher Note:</b> Many students will say they wouldn't give the same prediction to all 100 people because they *know* people vary. Not everyone will have the same FEV. That’s true! But remind them: **prediction isn’t about being right every time. It’s about using the best guess based on what you know.**

If we know nothing else about these individuals (e.g., age, height, or smoking status), there’s no reason to expect one score to be higher than another. The best we can do is give each person the same prediction, even though some people's FEV will be higher or lower than that prediction.

<b>Follow-up question:</b> If you wouldn’t give the same prediction to everyone, how would you decide who to give a higher or lower score to? (Remember, you don't have any other info about them!)

This is a tough but important idea: **predicting the same value for everyone isn’t about being right, it’s about being less wrong**. We know the prediction is wrong for each person, but using one best guess minimizes how far off we are, on average, compared to randomly guessing high or low.
</div>

<div style="text-align: right">
  <a href="#9">Back to top</a>
</div>

<a id =3> </a>
## 3 We Just Made a Statistical Model

Let's reflect on what we just did. Even though we were just using two data points, we were following the basic steps of statistical modeling. 

### 3.1 Basic Steps of Statistical Modeling
1. **Specify the model:** We chose a very simple model, one with no explanatory variables. Intuitively, we picked a number in the middle of the values we had (e.g., 3.1 instead of 3.0 or 3.2). Formally, this is a one-parameter model that uses the arithmetic mean to generate predictions by estimating just one feature of the population or DGP (i.e., the mean).
> To specify a model is to choose the function we’ll use to make predictions.

2. **Fit the model:** With one value, we used it as our prediction. When a second value came in, we averaged the two. With three, we might take the average of all three. 
> To fit a model is to use data to calculate the best version of the specified function.

3. **Use the model to make predictions:** What would we predict the next observation to be? Now that we’ve fit our one-parameter model, we can use it to make a prediction. That prediction is simply the mean we calculated (which represents our best estimate of the population mean).
> To use a model is to apply the fitted value(s) to make predictions.

<div class="guided-notes">
    
### 3.2 Which of these phrases go with each step of modeling: Specify, Fit, or Use?
    
- “I calculated the average FEV of the people we had in our data set.”
- “We predicted that the next person’s FEV would be 3.1 liters.”
- “Choose a function to make predictions.”
- “I'm going to use a model that doesn’t use any explanatory variables.”
- “I used the data we had to estimate the mean of the population or DGP.”
- “Our model predicts everyone’s score to be about the same.”
- “We used the mean of 3.1 to predict the next value.”
- “We decided to make a one-parameter model.”
- “The average of the data points is 3.1.”

</div>

<div class="teacher-note">
    
**Sample Responses:** 

- **Fit** “I calculated the average FEV of the people we had in our data set.”
- **Use** “We predicted that the next person’s FEV would be 3.1 liters.”
- **Specify** “Choose a function to make predictions.”
- **Specify** “I'm going to use a model that doesn’t use any explanatory variables.”
- **Fit** “I used the data we had to estimate the mean of the population or DGP.”
- **Use** “Our model predicts everyone’s score to be about the same.”
- **Use** “We used the mean of 3.1 to predict the next value.”
- **Specify** “We decided to make a one-parameter model.”
- **Fit** “The average of the data points is 3.1.”
</div>

<div style="text-align: right">
  <a href="#9">Back to top</a>
</div>

<a id =4> </a>
## 4 DATA = MODEL + ERROR

### 4.1 The Empty Model

The mean is the simplest of all models (we will call it the **empty model** or **null model**). It can be represented in a word equation like this:

**FEV = mean + error**

Notice that as we move from informal models to statistical models, we write the word equation a little differently. 

**Instead of saying "other stuff" we say "error."** Why? Because once we make a specific numerical prediction, we can compare it to the actual data point and quantify how far off we were. The difference between the actual value and the model prediction is called the **residual**, or more generally **error**.

<div class="guided-notes">
    
### 4.2 We know DATA = MODEL + ERROR. If you know the value of a DATA point (actual) and the MODEL prediction for that data point, how would you calculate the ERROR? Write an equation to show how to calculate ERROR.
</div>

### 4.3 Why predict data that we already have? 

Next, we’ll use our model to predict the FEV for the two people already in our dataset and then calculate the error for each. That might feel a little strange. After all, we already know their FEV values! Why would we want to *predict* them based on a model?

This is actually a standard part of modeling: we check how well the fitted model would have predicted the data it was built from. Later, this strategy will let us compare different models by seeing which one produces less error overall.

In more advanced statistics, you can also use this general strategy to check how models perform on new data. But for now, let’s start by learning how to measure how close (or far) our predictions are on the data we used to fit the model.

<table border="1" style="font-size: 18px; margin-left: 0; border-collapse: collapse;">
  <thead>
    <tr>
      <th style="border: 1px solid black;">Row</th>
      <th style="border: 1px solid black;">FEV</th>
      <th style="border: 1px solid black;">Model Prediction</th>
      <th style="border: 1px solid black;">Residual (Error)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black;">1</td>
      <td style="border: 1px solid black;">3.0</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black;">2</td>
      <td style="border: 1px solid black;">3.2</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
  </tbody>
</table>

<div class="guided-notes">
    
### 4.4 Assuming we are using the mean as our model of `FEV`, fill in the model predictions and residuals for a dataset with only two rows.
    
</div>

<div class="discussion-question">

### 4.5 Discussion Questions: 
    
- What does it mean if the residual is negative? What if it is positive?
- What does it mean to say DATA = MODEL + ERROR?
- Would this always be true? Explain.

</div>

<div class="teacher-note">
    
**Teacher Note:** 

- If a residual is negative then the actual data point is less than the value predicted by the model. If positive, then the actual data point is more than the predicted value.
- A model can generate a predicted value for every row in a dataset. The actual data value will always be the sum of the prediction and the residual.
- This would always be true. We calculated ERROR from DATA and MODEL. 
</div>

<div style="text-align: right">
  <a href="#9">Back to top</a>
</div>

<a id =5> </a>
## 5 Exploring the Mean With a Little More Data

We've loaded up a data frame of `FEV` scores from a group of 8 students. (The two data points we looked at earlier are in the first two rows of this data frame.) The data frame is called `FEV_data`. Write some code to see what's in the data frame.

In [None]:
# 5
# run code here

In [None]:
# Sample Response
str(FEV_data)
head(FEV_data)

<div class="discussion-question">

### 5.1 Discussion Question: Look at the `head()` of the data frame. What is the outcome variable? What is the explanatory variable?

</div>

<div class="teacher-note">
    
**Teacher Note:** This is a trick question, but one worth asking. `FEV` is definitely the outcome variable. But `row` is not an explantory variable; it is just the row number of the observation. There is no possible explanatory variable in this data frame.
</div>

<div class="guided-notes">
    
### 5.2 We now have data from 8 students. Let’s use it to **fit** the empty model of `FEV`.
- Write the R code you would use to fit a one-parameter model.  
- Re-write the word equation **FEV = mean + error** to represent the **fitted version** of the model. 

</div>

<div class="teacher-note">
    
**Teacher Note:** Based on the data we have (8 data points), our best model of `FEV`, assuming we have no other information, is the mean of the 8 data points. We can calculate the mean of our 8 data points using the `favstats()` function in R. The empty model prediction is 3.7.
</div>

In [None]:
# 5.2
# run code here

In [None]:
# 5.2
# Sample Response
favstats(~FEV, data=FEV_data)

<div class="discussion-question">

### 5.3 Discussion Question: This new prediction is quite a bit higher than the prediction we made earlier. Which do you think is a better prediction? The current one, or the one based on 2 data points? Why? 

</div>

<div class="teacher-note">

<b>Teacher Note:</b> Encourage students to reflect on how having more data affects the quality of a model. Remind them that a single observation might be unusually high or low, but the mean of many observations gives us a more balanced prediction.

</div>

### 5.4 Run the code below to create three visualizations of the distribution of `FEV` in `FEV_data`.
Look at the three graphs to see how they each represent the same data in a slightly different way.

In [None]:
# run this code
gf_histogram(~FEV, data=FEV_data)
gf_point(FEV ~ 1, data=FEV_data)
gf_point(FEV ~ row, data=FEV_data)

<div class="guided-notes">
    
### 5.5 For each of the three graphs, on your guided notes:
    
- Draw a circle around the two data points we looked at earlier (they came from this dataset): 3.0 and 3.2.
- Draw in the empty model on each graph.
- Draw in the residuals from the two data points you circled.
    
</div>

<div style="text-align: right">
  <a href="#9">Back to top</a>
</div>

<a id =6> </a>
## 6 Making Predictions for the 8 Individuals in FEV_data
Below is the data from `FEV_data` with some extra columns added.

<div class="guided-notes">

### 6.1 On your guided notes, fill in the missing cells for each row under **Model Prediction** and **Residual (Error)**.
    
</div>

<div class="teacher-note">
    
**Teacher Note:** 
- Make sure students understand why the Model Prediction column has the same value in every cell. 
- The residual is calculated as (FEV - Model Prediction). Make sure students recognize that the negative residuals are cases where the data was lower than the prediction, positive for cases where the data was higher.
</div>

<table style="font-size: 18px; margin-left: 0; border-collapse: collapse;">
  <thead>
    <tr>
      <th style="border: 1px solid black;">Row</th>
      <th style="border: 1px solid black;">FEV</th>
      <th style="border: 1px solid black;">Model Prediction</th>
      <th style="border: 1px solid black;">Residual (Error)</th>
      <th style="border: 1px solid black;">MODEL + ERROR</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black;">1</td>
      <td style="border: 1px solid black;">3.0</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black;">2</td>
      <td style="border: 1px solid black;">3.2</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black;">3</td>
      <td style="border: 1px solid black;">3.8</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr> 
      <td style="border: 1px solid black;">4</td>
      <td style="border: 1px solid black;">4.6</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr> 
      <td style="border: 1px solid black;">5</td>
      <td style="border: 1px solid black;">3.8</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr> 
      <td style="border: 1px solid black;">6</td>
      <td style="border: 1px solid black;">2.3</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr> 
      <td style="border: 1px solid black;">7</td>
      <td style="border: 1px solid black;">4.8</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr> 
    <tr>
      <td style="border: 1px solid black;">8</td>
      <td style="border: 1px solid black;">4.1</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
  </tbody>
</table>

<div class="discussion-question">

### 6.2 Discussion: Some of the residuals are positive, some negative. 
    
- Explain why none of the residuals on this table are equal to 0. 
- Would it ever be possible to have a residual of exactly 0? Why or why not?
</div>

<div class="teacher-note">
    
**Sample Responses:**
- If there was someone who had an FEV exactly equal to the prediction (the mean), their residual would be 0.
- No one had that value (3.7) in this data set.
</div>

<div class="guided-notes">

### 6.3 Fill in the final column in the table by adding together the model prediction and the residual for each row.
    
- Do you notice a pattern once you do this? What is it?
- Does DATA = MODEL + ERROR? Why?
</div>

<div class="teacher-note">
    
**Sample Responses:**
- Pattern: You get the data again!
- Students may say that yes, DATA = MODEL + ERROR because we’ve separated the actual data point into two parts: the model’s prediction and what’s left over (the error). 
- Some students may explain it algebraically. Since error = data – model, model + (data - model) will just give us data. 
- Others may wonder if it’s always exactly equal (especially when working with rounded numbers), so be ready to clarify that sometimes when we do things by "hand" there might be tiny differences due to rounding.
</div>

<div style="text-align: right">
  <a href="#9">Back to top</a>
</div>

<a id =7> </a>
## 7 Summing the Residuals
Something interesting happens when we some the residuals (or errors) from the model prediction. 

<div class="guided-notes">

### 7.1 Add up the 8 residuals from the mean in the data table above. What do you get?
  
</div>

<div class="teacher-note">
    
**Sample Response:** 0.
</div>

### 7.2 Why the mean is such a good model

There are many functions we could use to model an outcome variable with a single number: the mean, median, mode, or even just a made-up value.

But if we measure error using residuals, **the mean is the best model because it perfectly balances the residuals**. No matter the distribution, the sum of the residuals from the mean will always be 0. Any other value will produce negative and positive residuals that don’t balance each other (resulting in a sum greater than or less than 0).

### 7.3 Let's try the median as a model
What if we use the median instead of the mean as the model? Run the `favstats()` code below to get both the median and the mean of our eight-student dataset. They are very close. Is the mean really a better model as measured by residuals?

In [None]:
# 7.3
# run code here


In [None]:
# 7.3
# Sample Response
favstats(~FEV, data=FEV_data)

<div class="guided-notes">

### 7.4 Fill in the table in your guided notes with the model predictions and residuals using the median as the model.
</div>

<table style="font-size: 18px; margin-left: 0; border-collapse: collapse;">
  <thead>
    <tr>
      <th style="border: 1px solid black;">Row</th>
      <th style="border: 1px solid black;">FEV</th>
      <th style="border: 1px solid black;">Model Prediction</th>
      <th style="border: 1px solid black;">Residual (Error)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid black;">1</td>
      <td style="border: 1px solid black;">3.0</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black;">2</td>
      <td style="border: 1px solid black;">3.2</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr>
      <td style="border: 1px solid black;">3</td>
      <td style="border: 1px solid black;">3.8</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr> 
      <td style="border: 1px solid black;">4</td>
      <td style="border: 1px solid black;">4.6</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr> 
      <td style="border: 1px solid black;">5</td>
      <td style="border: 1px solid black;">3.8</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr> 
      <td style="border: 1px solid black;">6</td>
      <td style="border: 1px solid black;">2.3</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
    <tr> 
      <td style="border: 1px solid black;">7</td>
      <td style="border: 1px solid black;">4.8</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr> 
    <tr>
      <td style="border: 1px solid black;">8</td>
      <td style="border: 1px solid black;">4.1</td>
      <td style="border: 1px solid black;"></td>
      <td style="border: 1px solid black;"></td>
    </tr>
  </tbody>
</table>

<div class="guided-notes">

### 7.5 Sum the residuals from the median to see what you get.
</div>

<div class="teacher-note">
    
**Teacher Note:** If you have time, you could let students try other numbers to model the distribution. Whatever numbers they choose, none will balance the residuals perfectly except the mean.
</div>

<div class="guided-notes">

### 7.6 Why might you prefer to use the mean as a one-parameter model instead of the median?
</div>

# End of notebook

In [None]:
# run this code
gf_histogram(~FEV, data=FEV_data) + theme_bw(base_size = 25) 
gf_point(FEV ~ 1, data=FEV_data) + theme_bw(base_size = 25) 
gf_point(FEV ~ row, data=FEV_data) + theme_bw(base_size = 25) 

In [None]:
library(ggforce)

In [None]:
empty_model <- lm(FEV ~ NULL, data = FEV_data)
gf_histogram(~FEV, data=FEV_data) %>%
  gf_model(empty_model) + 
  theme_bw(base_size = 25) +
  geom_ellipse(aes(x0 = 3.01, y0 = .5, a = .1, b = .5, angle = 0))+
  geom_ellipse(aes(x0 = 3.23, y0 = .5, a = .1, b = .5, angle = 0)) + 
  annotate("segment", x = 3.01, xend = 3.7, y = .6, yend = .6, colour = "purple", size=3, alpha=0.6)+ 
  annotate("segment", x = 3.23, xend = 3.7, y = .4, yend = .4, colour = "purple", size=3, alpha=0.6)


In [None]:
little_bit <- .0015
gf_point(FEV ~ 1, data=FEV_data) %>%
  gf_model(empty_model) %>%
  gf_point(3 ~ 1, shape = 1, size = 6) %>%
  gf_point(3.2 ~ 1, shape = 1, size = 6) %>%
  gf_point(3 ~ .95, shape = 1, size = .1, alpha = 0) %>%
  gf_point(3 ~ 1.05, shape = 1, size = .1, alpha = 0) + 
  annotate("segment", y = 3.01, yend = 3.7, x = 1-little_bit, xend = 1-little_bit, colour = "purple", size=3, alpha=0.6) + 
  annotate("segment", y = 3.23, yend = 3.7, x = 1+little_bit, xend = 1+little_bit, colour = "purple", size=3, alpha=0.6) + 
  theme_bw(base_size = 25) 


In [None]:
gf_point(FEV ~ row, data=FEV_data) %>%
  gf_model(empty_model)  %>%
  gf_point(3 ~ 1, shape = 1, size = 6) %>%
  gf_point(3.2 ~ 2, shape = 1, size = 6) %>%
  gf_point(3 ~ .95, shape = 1, size = .1, alpha = 0) %>%
  gf_point(3 ~ 1.05, shape = 1, size = .1, alpha = 0) + 
  annotate("segment", y = 3.01, yend = 3.7, x = 1, xend = 1, colour = "purple", size=3, alpha=0.6) + 
  annotate("segment", y = 3.23, yend = 3.7, x = 2, xend = 2, colour = "purple", size=3, alpha=0.6) + 
  theme_bw(base_size = 25) 