<a href="https://colab.research.google.com/github/JordanDCunha/R-for-Data-Science-2e-/blob/main/Chapter_11.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üé® **11.1 Introduction ‚Äî Communicating with Graphics**

In **Chapter 10**, you learned how to use plots as *tools for exploration*. Exploratory plots are fast, purposeful, and disposable‚Äîyou usually know exactly which variables you‚Äôre plotting before you even look, and many plots are created only to be immediately discarded.

Now the goal changes.

Once you understand your data, you need to **communicate that understanding to others**. Your audience:
- Likely does **not** share your background knowledge
- Is **not deeply invested** in the data
- Needs help quickly forming a clear mental model

To do that, your graphics must be **self-explanatory**. This chapter introduces the ggplot2 tools that help transform exploratory plots into **clear, polished, explanatory graphics**.

This chapter assumes:
- You already know *what* you want to show
- You mainly need help with *how* to show it effectively

Because of that, it pairs best with a **conceptual visualization book**. A highly recommended companion is  
üìò *The Truthful Art* by **Albert Cairo**, which focuses on *thinking* about visualizations rather than mechanics.

---

## üîß **11.1.1 Prerequisites**

This chapter primarily builds on **ggplot2**, with support from:
- **dplyr** for data manipulation
- **scales** for controlling breaks, labels, transformations, and palettes
- **ggrepel** for non-overlapping text labels
- **patchwork** for combining multiple plots

Make sure these packages are installed before proceeding.


In [None]:
library(tidyverse)
library(scales)
library(ggrepel)
library(patchwork)


# üè∑Ô∏è **11.2 Labels ‚Äî Making Graphics Self-Explanatory**

The easiest way to transform an **exploratory** graphic into an **expository** one is by improving its **labels**. In ggplot2, labels are added using the `labs()` function.

Good labels help your audience immediately understand:
- What the variables represent
- What units are being used
- What the *main takeaway* of the plot is

---

## üß† Titles, Subtitles, and Captions

- **Title**: Summarizes the *main finding* of the plot  
  ‚ùå Avoid titles that just describe the plot structure  
  ‚úî Prefer titles that communicate insight

- **Subtitle**: Adds supporting context or nuance
- **Caption**: Typically used for data sources or notes

You can also use `labs()` to replace:
- Axis labels
- Legend titles  
This is especially important when variable names are short or cryptic‚Äîalways spell things out and include units when relevant.

---

## ‚ûó Mathematical Notation in Labels

Labels don‚Äôt have to be plain text. You can include **mathematical expressions** by using `quote()` instead of character strings. ggplot2 supports a rich set of mathematical symbols via `plotmath`.

---

## üß™ Exercises

1. Create a fuel economy plot with customized:
   - Title
   - Subtitle
   - Caption
   - X label
   - Y label
   - Color legend label

2. Recreate a fuel economy plot where **both color and shape** vary by drive train type.

3. Take an exploratory plot you‚Äôve created recently and add informative titles and labels to make it clear to a non-technical audience.


In [None]:
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  labs(
    x = "Engine displacement (L)",
    y = "Highway fuel economy (mpg)",
    color = "Car type",
    title = "Fuel efficiency generally decreases with engine size",
    subtitle = "Two seaters (sports cars) are an exception due to their light weight",
    caption = "Data from fueleconomy.gov"
  )

df <- tibble(
  x = 1:10,
  y = cumsum(x^2)
)

ggplot(df, aes(x, y)) +
  geom_point() +
  labs(
    x = quote(x[i]),
    y = quote(sum(x[i]^2, i == 1, n))
  )


# ‚úèÔ∏è **11.3 Annotations ‚Äî Highlighting What Matters**

Beyond labeling axes, legends, and titles, effective graphics often require **annotating specific observations or groups**. Annotations help guide attention, explain anomalies, and communicate insights directly on the plot.

---

## üìù Text Labels with `geom_text()`

`geom_text()` works like `geom_point()`, but adds a new aesthetic: `label`. This allows you to place text directly on a plot.

Labels usually come from:
- The same dataset used in the plot, or
- A **separate tibble** created specifically for annotation

In practice, it‚Äôs common to create a small dataset that identifies the most interesting observations and then layer it on top of the main plot.

Text appearance can be customized with arguments like:
- `size`
- `fontface`
- `hjust` / `vjust` (horizontal and vertical justification)

---

## üö´ Avoiding Overlap with `ggrepel`

When labels overlap with each other or the data, readability suffers. The **ggrepel** package solves this problem with:
- `geom_text_repel()`
- `geom_label_repel()`

These functions automatically reposition labels so they don‚Äôt collide, greatly improving clarity‚Äîespecially in dense plots.

---

## üîç Highlighting Individual Observations

You can combine multiple layers to emphasize important points:
- Use `geom_text_repel()` to label them
- Overlay colored or hollow points to visually highlight them

This is particularly useful for calling out **outliers** or unusual cases.

---

## üß∞ Other Annotation Tools

ggplot2 offers many geoms for annotation:
- `geom_hline()` / `geom_vline()` for reference lines
- `geom_rect()` for highlighting regions
- `geom_segment()` with `arrow()` for directional emphasis
- `annotate()` for adding one-off annotations without creating a tibble

As a general rule:
- **Geoms** ‚Üí annotate subsets of data  
- **annotate()** ‚Üí add standalone elements

---

## üí¨ Annotating with Explanatory Text

Longer explanatory text can be wrapped using `stringr::str_wrap()` before adding it to a plot. Combined with arrows or labels, this is a powerful way to communicate the main takeaway directly on the visualization.

---

## üß™ Exercises

1. Use `geom_text()` with infinite positions (`Inf`, `-Inf`) to place text in all four corners of a plot.
2. Use `annotate()` to add a point in the middle of a plot without creating a tibble. Customize its appearance.
3. Explore how `geom_text()` behaves with faceting:
   - Label only one facet
   - Add a different label to each facet
4. Which arguments in `geom_label()` control the background box?
5. What are the four arguments to `arrow()`? Create plots demonstrating the most important options.


In [None]:
label_info <- mpg |>
  group_by(drv) |>
  arrange(desc(displ)) |>
  slice_head(n = 1) |>
  mutate(
    drive_type = case_when(
      drv == "f" ~ "front-wheel drive",
      drv == "r" ~ "rear-wheel drive",
      drv == "4" ~ "4-wheel drive"
    )
  ) |>
  select(displ, hwy, drv, drive_type)

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point(alpha = 0.3) +
  geom_smooth(se = FALSE) +
  geom_label_repel(
    data = label_info,
    aes(label = drive_type),
    fontface = "bold",
    size = 5,
    nudge_y = 2
  ) +
  theme(legend.position = "none")

potential_outliers <- mpg |>
  filter(hwy > 40 | (hwy > 20 & displ > 5))

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  geom_text_repel(
    data = potential_outliers,
    aes(label = model)
  ) +
  geom_point(data = potential_outliers, color = "red") +
  geom_point(
    data = potential_outliers,
    color = "red",
    size = 3,
    shape = "circle open"
  )

trend_text <- "Larger engine sizes tend to have lower fuel economy." |>
  stringr::str_wrap(width = 30)

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  annotate(
    geom = "label",
    x = 3.5, y = 38,
    label = trend_text,
    hjust = "left",
    color = "red"
  ) +
  annotate(
    geom = "segment",
    x = 3, y = 35,
    xend = 5, yend = 25,
    color = "red",
    arrow = arrow(type = "closed")
  )


# ‚ú® 11.3 Annotations ‚Äî Highlighting What Matters

Annotations help you **tell the story behind your data**, not just show it. While labels like titles and axis names explain the structure of a plot, annotations draw attention to *specific observations, groups, or takeaways*.

---

## üè∑Ô∏è Text-based annotations

- **`geom_text()`** adds text labels tied to data points using the `label` aesthetic.
- **`geom_label()`** works the same way, but adds a background box behind the text.
- Horizontal and vertical alignment are controlled with **`hjust`** and **`vjust`**.

Labels can come from:
- the main dataset, or  
- a *separate tibble* that contains only the observations you want to annotate.

---

## üö´ Avoiding overlapping labels

Dense plots often suffer from unreadable labels. The **ggrepel** package solves this:

- **`geom_text_repel()`**
- **`geom_label_repel()`**

These automatically reposition labels so they don‚Äôt overlap with each other or the data.

---

## üéØ Highlighting important observations

A common pattern:
- label only *interesting points*
- add a second geom layer (e.g., larger or hollow points) to visually emphasize them

This keeps the plot clean while still guiding attention.

---

## üß≠ Reference annotations

Beyond text, ggplot2 provides many geoms for annotation:

- **`geom_hline()` / `geom_vline()`** ‚Üí reference lines  
- **`geom_rect()`** ‚Üí highlight regions  
- **`geom_segment()` + `arrow()`** ‚Üí point to features  

For one-off annotations not tied to a dataset, use **`annotate()`**.

---

## üìù `annotate()` vs geoms

- Use **geoms** when annotating *subsets of data*
- Use **`annotate()`** for *single elements* like explanatory text, arrows, or markers

You can even combine multiple annotation layers (e.g., text + arrow) to explain a trend clearly.

---

## üí° Big idea

Annotations turn plots from **descriptive** to **communicative**.  
They highlight insights, reduce cognitive load, and help your audience see what *you* see.


In [None]:
library(tidyverse)
library(ggrepel)

# Prepare label data
label_info <- mpg |>
  group_by(drv) |>
  arrange(desc(displ)) |>
  slice_head(n = 1) |>
  mutate(
    drive_type = case_when(
      drv == "f" ~ "front-wheel drive",
      drv == "r" ~ "rear-wheel drive",
      drv == "4" ~ "4-wheel drive"
    )
  )

# Annotated plot
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point(alpha = 0.3) +
  geom_smooth(se = FALSE) +
  geom_label_repel(
    data = label_info,
    aes(label = drive_type),
    fontface = "bold",
    size = 5,
    nudge_y = 2
  ) +
  theme(legend.position = "none")


# üé® 11.5 Themes ‚Äî Styling the Story Around Your Data

Themes control **everything that isn‚Äôt the data itself**: backgrounds, grids, fonts, legend placement, spacing, and alignment. While geoms show *what* the data says, themes shape *how* the message feels.

---

## üß± Built-in themes

ggplot2 includes eight built-in themes, with **`theme_gray()`** as the default. Others like **`theme_bw()`**, **`theme_minimal()`**, and **`theme_classic()`** offer cleaner or more publication-friendly styles.

External packages such as **ggthemes** expand your options even further, letting you match corporate branding, news outlets, or academic journals.

---

## üîß Fine-grained customization with `theme()`

The `theme()` function lets you modify individual non-data components, such as:

- legend placement and direction  
- font size, face, and color  
- plot title and caption alignment  
- background and border styling  

These are controlled with **`element_*()`** helpers:
- `element_text()` ‚Üí text styling  
- `element_rect()` ‚Üí boxes and borders  
- `element_line()` ‚Üí grid lines and axes  

You can also control whether titles and captions align to the **plot panel** or the **entire plot area** using `plot.title.position` and `plot.caption.position`.

---

## üéØ Why themes matter

Good theming:
- reduces visual clutter  
- improves readability  
- reinforces the plot‚Äôs main message  
- helps your audience focus on insights, not formatting  

A well-chosen theme turns a correct plot into a **compelling** one.


In [None]:
library(tidyverse)
library(ggthemes)

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  labs(
    title = "Larger engine sizes tend to have lower fuel economy",
    caption = "Source: fueleconomy.gov",
    x = "Engine displacement (L)",
    y = "Highway fuel economy (mpg)"
  ) +
  theme_economist() +
  theme(
    axis.title.x = element_text(color = "blue", face = "bold"),
    axis.title.y = element_text(color = "blue", face = "bold")
  )


# üß© 11.6 Layout ‚Äî Combining Multiple Plots with *patchwork*

Up to now, we‚Äôve focused on crafting **individual plots**. But real analysis often requires **multiple plots working together**. The **patchwork** package extends ggplot2 so you can arrange plots into clean, flexible layouts using intuitive operators.

---

## ‚ûï Combining plots

Once plots are saved as objects, you can combine them directly:
- `+` places plots **side by side**
- `|` explicitly places plots **horizontally**
- `/` stacks plots **vertically**

These operators work because patchwork **redefines how `+` behaves** when ggplot objects are involved.

---

## üß± Building complex layouts

By grouping plots with parentheses, you control layout precedence‚Äîjust like math. This lets you design multi-row, multi-column structures with minimal code.

---

## üéõ Shared legends, titles, and spacing

Patchwork also supports:
- **Collecting legends** from multiple plots into one
- **Positioning legends** globally
- **Adjusting relative heights/widths** of plot sections
- **Adding a common title, subtitle, and caption**

The `&` operator applies theme changes to the **entire patchwork**, not individual plots‚Äîperfect for shared legend placement.

---

## üß† Key takeaway

Patchwork turns multiple ggplots into a **single cohesive graphic**, making it ideal for dashboards, reports, and storytelling where comparisons matter.


In [None]:
library(tidyverse)
library(patchwork)

p1 <- ggplot(mpg, aes(x = drv, y = cty, color = drv)) +
  geom_boxplot(show.legend = FALSE) +
  labs(title = "Plot 1")

p2 <- ggplot(mpg, aes(x = drv, y = hwy, color = drv)) +
  geom_boxplot(show.legend = FALSE) +
  labs(title = "Plot 2")

p3 <- ggplot(mpg, aes(x = cty, color = drv, fill = drv)) +
  geom_density(alpha = 0.5) +
  labs(title = "Plot 3")

p4 <- ggplot(mpg, aes(x = hwy, color = drv, fill = drv)) +
  geom_density(alpha = 0.5) +
  labs(title = "Plot 4")

p5 <- ggplot(mpg, aes(x = cty, y = hwy, color = drv)) +
  geom_point(show.legend = FALSE) +
  facet_wrap(~drv) +
  labs(title = "Plot 5")

(guide_area() / (p1 + p2) / (p3 + p4) / p5) +
  plot_annotation(
    title = "City and highway mileage for cars with different drive trains",
    caption = "Source: fueleconomy.gov"
  ) +
  plot_layout(
    guides = "collect",
    heights = c(1, 3, 2, 4)
  ) &
  theme(legend.position = "top")
