# Spoonfuls of Sugar: Exploring Cereal Sweetness (COMPLETE)

## Chapter 3.5-3.9 Five-Number Summary, Boxplots, and Bar Graphs

In [None]:
# This code will load the R packages we will use
library(coursekata)

# set styles
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles_v2.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

# data source:
# https://r-packages.io/datasets/Cereal 


<div class="teacher-note">
    <b>Teacher Note:</b> The purpose of this mini-JNB is to practice creating boxplots and bar graphs, and connecting the five-number summary to boxplots and histograms.
</div>

## 1. The `Cereal` data

<figure style="margin: 0 auto; text-align: center; width: 450px;">
  <img src="https://i.postimg.cc/LmnGzVf6/breakfast-cereals.png" 
       alt="Boxes of breakfast cereals" 
       style="width: 100%; height: auto;">
  <figcaption style="font-size: 0.9em; margin-top: 5px;">
      Food for thought:
      Do you think you could tell which cereals are sweeter by looking at the boxes? 
      Which one do you think is the sweetest?
  </figcaption>
</figure>




According to [research](https://pmc.ncbi.nlm.nih.gov/articles/PMC4188247/#:~:text=Meta%2Danalysis.&text=It%20examined%2014%20studies%20in,1.46%3B%20P%20%3C%200.0001), cereal can be a part of a healthy and balanced diet. But cereal also has a reputation for having a lot of added sugar. Let's see what the data in the `Cereal` data frame have to say.

**Variable Descriptions**

- `Name`	Brand name of cereal 
- `Company`	Manufacturer coded as G=General Mills, K=Kellog's or Q=Quaker
- `Serving`	Serving size (in cups)
- `Calories`	Calories (per cup)
- `Fat`	Fat (grams per cup)
- `Sodium`	Sodium (mg per cup)
- `Carbs`	Carbohydrates (grams per cup)
- `Fiber`	Dietary Fiber (grams per cup)
- `Sugars`	Sugars (grams per cup)
- `Protein`	Protein (grams per cup)

### 1.1 Run some code to look at a few rows of the `Cereal` data

In [None]:
# run code here


# sample code
sample(Cereal, 10)

## 2. Explore Variation in `Sugars`
### 2.1 Create a histogram of `Sugars` and describe the distribution (note the shape, center, spread and weird things).

In [None]:
# run code here

# sample code
gf_histogram(~Sugars, data = Cereal)
gf_histogram(~Sugars, data = Cereal, bins = 8, boundary = 0)
gf_histogram(~Sugars, data = Cereal, binwidth = 1, boundary = 0)

Describe the distribution.


<div class="teacher-note">
    
<b>Sample Response</b>:

- bimodal (one peak around 3 g per cup and another peak around 13 g per cup)
- range goes from 2 to 20 g of sugar per cup
- only a few cereals with 5 to 10 g of sugar per cup
</div>

### 2.2 Just from looking at the histogram alone, estimate where Q1 and Q3 will be. Hint: Approximately where is the middle 50% of the data?

<div class="teacher-note">
    
<b>Sample Response</b>:

- Maybe between 5 and 15

</div>

### 2.3 Add a boxplot to your histogram and generate the favstats(). Where is the Q1 and Q3? (How close was your estimate?)

In [None]:
# run code here
# Bonus Question: Does changing the bins or binwidth of the histogram affect the boxplot?

# sample responses
favstats(~Sugars, data = Cereal)

gf_histogram(~Sugars, data = Cereal, boundary = 0, binwidth = 1) %>%
  gf_boxplot(width = 1)


<div class="teacher-note">
    
<b>Sample Response</b>:

Actual Q1 and Q3 are 4.5 and 14.525.

</div>

### 2.4 Find your favorite cereal from the list below. Based on its sugar content, which quartile does it fall into?

In [None]:
# run this to arrange the Cereals in alphabetic order
arrange(Cereal, Name)

The name of your favorite cereal:

How many grams of sugar it has per cup:

Which quartile it falls into:

<div class="teacher-note">
    
<b>Sample Response</b>:

- example: Reese's Puffs, 16 g, 4th quartile
- example: Kix, 2.3 g, 1st quartile

<b>Teacher Note</b>:
- A common error is to call quartiles "Q1" but note that Q1 is the boundary and the quartile is just called 1st, 2nd, 3rd, 4th.
</div>

### 2.5 How many grams of sugar does the sweetest cereal have? Which cereal is it?

In [None]:
# run code here

# sample code
arrange(Cereal, -Sugars)

<div class="teacher-note">
    
<b>Sample Response</b>:

The cereal with the highest amount of sugar is Raisin Bran. It has 20 grams of sugar.

</div>

### 2.6 Given the daily sugar recommendations, what would be a reasonable amount of sugar for breakfast cereal? How does your favorite cereal compare?

For reference, the [American Heart Association](https://www.heart.org/en/healthy-living/healthy-eating/eat-smart/sugar/added-sugars) recommends limiting added sugar intake to about 30 grams per day.

<div class="teacher-note">

<b>Teacher Note:</b>
Encourage discussion or estimation. Here are some follow up questions that students might consider:
- If the daily recommendation is ~30 grams, how many grams of sugar would be reasonable at breakfast?
- Remember the sugar listed is per cup — how many cups do you eat at breakfast?

<b>Sample Response:</b>  
- If I'm supposed to eat no more than 30g of sugar per day, maybe that's about 10g per meal. Maybe I eat 2 cups so maybe a reasonable amount of sugar is 5g per cup.
- Reese's Puffs has 16g per cup, so even one cup is already over the limit. I usually eat a big bowl so maybe that's like 2 cups, which would be 32g — almost my whole day's worth of sugar!

</div>

## 3 Which companies make these cereals?

### 3.1 Explore the distribution of Company with a visualization.

G=General Mills, K=Kellog's or Q=Quaker

In [None]:
# run code here

# sample code
gf_bar(~Company, data = Cereal)

### 3.2 Describe the center and spread of the distribution.

<div class="teacher-note">
    
<b>Sample Response</b>:

Center: General Mills has the highest number of cereals represented in this dataset (the mode).
    
Spread: The groups are not evenly spread out. There are fewer Q cereals, and more G cereals than K.

</div>

### 3.3 What's the percentage of cereals represented by Quaker?

Use 2 different methods in R to figure this out.

In [None]:
# Run code here

# sample code
tally(~Company, data = Cereal, format = "proportion", margins = TRUE)
tally(~Company, data = Cereal)/30
gf_percents(~Company, data = Cereal)
gf_props(~Company, data = Cereal)

### 3.4 Create a new categorical variable that identifies cereals with a reasonable amount of sugar per cup. Use your answer from 2.6 to decide what counts as "reasonable." Then, make a bar graph to show how many cereals fall into each category.

In [None]:
# sample response
Cereal$Reasonable <- Cereal$Sugars < 5

gf_percents(~Reasonable, data = Cereal)