<a href="https://colab.research.google.com/github/JordanDCunha/R-for-Data-Science-2e-/blob/main/Chapter_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ‚úçÔ∏è **4.1 Names**

We first touched on **naming conventions** back in **Section 2.3**, and now it‚Äôs time to lock them in properly.

Good names make your code:
- **Easier to read**
- **Easier to debug**
- **Much easier to return to later**

---

## ‚úÖ **Recommended Naming Rules**

When creating variables (with **`<-`** or **`mutate()`**):

- Use **lowercase letters only**
- Use **numbers** when helpful
- Use **`_` (underscore)** to separate words  
- Follow **snake_case**

‚úîÔ∏è This style is readable, consistent, and works beautifully with autocomplete.

---

## üëç **Good Example**

Clear, descriptive, and readable:

```r
short_flights <- flights |> filter(air_time < 60)


In [None]:
# Good naming practice
short_flights <- flights |>
  filter(air_time < 60)

# Bad naming practice (avoid)
SHORTFLIGHTS <- flights |>
  filter(air_time < 60)


# ‚ú® **4.2 Spaces**

Consistent spacing makes your code **cleaner**, **more readable**, and **easier to scan**. Think of it as punctuation for programming.

---

## ‚ûï **Spacing Around Operators**

Put spaces **on both sides** of mathematical and logical operators, **except `^`**.

### ‚úÖ **Strive for**
- Clear separation
- Easy to read expressions

### ‚ùå **Avoid**
- Cramped or inconsistent spacing

---

## üìê **Parentheses & Function Calls**

- **No spaces** just inside or outside parentheses in function calls
- **Always add a space after commas**, like normal writing

---

## üßπ **Alignment for Readability**

Extra spaces are totally fine when they:
- Improve alignment
- Make code easier to skim
- Help visually compare expressions

This is especially useful inside **`mutate()`** or similar functions.

---

## ‚≠ê **Key Takeaway**

Good spacing:
- Reduces mental load
- Makes bugs easier to spot
- Helps others (and future-you) understand your code faster


In [None]:
# Strive for
z <- (a + b)^2 / d

# Avoid
z<-( a + b ) ^ 2/d


# Strive for
mean(x, na.rm = TRUE)

# Avoid
mean (x ,na.rm=TRUE)


# Aligned spacing for clarity
flights |>
  mutate(
    speed      = distance / air_time,
    dep_hour   = dep_time %/% 100,
    dep_minute = dep_time %%  100
  )


# ‚ú® **4.3 Pipes**

The **pipe operator `|>`** is central to readable, maintainable R code. Good pipe style makes your analysis easier to **extend**, **reorder**, and **skim**.

---

## üîó **Pipe Placement & Spacing**

- Always put a **space before `|>`**
- `|>` should usually be the **last thing on a line**
- This creates a clean, vertical structure where the **verbs line up**

### ‚úÖ **Strive for**
- Easy scanning
- Simple extension

### ‚ùå **Avoid**
- Dense, unreadable chains

---

## üß± **Pipelines with Arguments**

### üìå **Functions with named arguments**
(e.g. `mutate()`, `summarize()`):
- Put **each argument on its own line**

### üìå **Functions without named arguments**
(e.g. `filter()`, `select()`):
- Keep on **one line** if it fits
- Otherwise, put **each argument on its own line**

---

## üìê **Indentation Rules**

- After the first pipe, **indent each line by two spaces**
- If arguments are on separate lines, indent **two extra spaces**
- Closing `)` should:
  - Be on its **own line**
  - Align vertically with the function name

RStudio will usually handle this for you automatically.

---

## üß© **Short vs. Long Pipelines**

- **One-liners are fine** when truly simple
- Prefer vertical formatting if the code might grow
- Pipelines longer than **10‚Äì15 lines** should usually be broken up

üëâ Give intermediate results **informative names**, especially after:
- `summarize()`
- `pivot_*()`
- Major structural changes

Good names help readers understand *what* happened and *why*.

---

## ‚≠ê **Key Takeaways**

- Vertical pipes = readable logic
- Consistent indentation = easy debugging
- Informative names = self-documenting code


In [None]:
# Strive for
flights |>
  filter(!is.na(arr_delay), !is.na(tailnum)) |>
  count(dest)

# Avoid
flights|>filter(!is.na(arr_delay), !is.na(tailnum))|>count(dest)


# Strive for (named arguments)
flights |>
  group_by(tailnum) |>
  summarize(
    delay = mean(arr_delay, na.rm = TRUE),
    n = n()
  )

# Avoid
flights |>
  group_by(
    tailnum
  ) |>
  summarize(delay = mean(arr_delay, na.rm = TRUE), n = n())


# Avoid (misaligned indentation)
flights|>
  group_by(tailnum) |>
  summarize(
             delay = mean(arr_delay, na.rm = TRUE),
             n = n()
           )

# Avoid (closing parenthesis misaligned)
flights|>
  group_by(tailnum) |>
  summarize(
  delay = mean(arr_delay, na.rm = TRUE),
  n = n()
  )


# Compact one-liner (acceptable when simple)
df |> mutate(y = x + 1)

# Preferred for future extensibility
df |>
  mutate(
    y = x + 1
  )


# üé® **4.4 ggplot2**

The same **style rules used for pipes (`|>`)** also apply to **ggplot2 layers**.  
Just treat the **`+` operator exactly like `|>`**.

This means:
- Put **one layer per line**
- Add **spaces before and after `+`**
- Align layers vertically so plots are easy to read and modify

---

## ‚ûï **ggplot2 as a Pipeline**

A ggplot is built **step by step**, just like a data pipeline:
1. Start with data
2. Transform it
3. Pass it into `ggplot()`
4. Add layers with `+`

Keeping each step on its own line makes the plot structure obvious at a glance.

---

## üìê **Formatting Rules for ggplot2**

- `+` should usually be the **last thing on a line**
- If a layer has **multiple arguments**, put **each argument on its own line**
- Indentation rules are the same as for pipes:
  - Two spaces after each line break
  - Extra two spaces for arguments inside functions
  - Closing `)` aligned with the function name

---

## ‚ö†Ô∏è **Pipe ‚Üí Plus Transition**

You‚Äôll often see a transition from:
- **`|>`** (data manipulation)
- to **`+`** (plot construction)

This awkward switch exists because **ggplot2 predates the pipe operator**.  
Annoying‚Äîbut once you recognize it, it‚Äôs easy to read.

---

## ‚≠ê **Key Takeaways**

- Treat `+` like `|>`
- One layer per line
- Vertical structure = readable plots
- Clean formatting makes plots easy to extend later


In [None]:
# Simple ggplot pipeline
flights |>
  group_by(month) |>
  summarize(
    delay = mean(arr_delay, na.rm = TRUE)
  ) |>
  ggplot(aes(x = month, y = delay)) +
  geom_point() +
  geom_line()


# ggplot with many arguments (each on its own line)
flights |>
  group_by(dest) |>
  summarize(
    distance = mean(distance),
    speed = mean(distance / air_time, na.rm = TRUE)
  ) |>
  ggplot(aes(x = distance, y = speed)) +
  geom_smooth(
    method = "loess",
    span = 0.5,
    se = FALSE,
    color = "white",
    linewidth = 4
  ) +
  geom_point()


# üß≠ **4.5 Sectioning Comments**

As your **R scripts grow longer**, they can quickly become hard to navigate.  
**Sectioning comments** help you break a script into **clear, logical chunks**, making it easier to read, debug, and revisit later.

---

## ‚úÇÔ∏è **What Are Sectioning Comments?**

Sectioning comments are **special comment headers** that look like this:




They act as **visual dividers** and allow tools like **RStudio** to recognize sections automatically.

---

## üß† **Why Use Them?**

Sectioning comments help you:

- üìö Organize long scripts into meaningful blocks  
- üîç Jump quickly to specific parts of your code  
- üß™ Separate data loading, cleaning, modeling, and plotting  
- üë• Make your code easier for others (and future-you) to understand  

---

## üöÄ **RStudio Superpower**

RStudio provides a built-in shortcut to create section headers:

**Cmd / Ctrl + Shift + R**

Once added, sections appear in the **code navigation dropdown** in the **bottom-left corner** of the editor, letting you jump between sections instantly.

---

## üó∫Ô∏è **Best Practices**

- Use **descriptive section titles**
- Keep sections **high-level** (don‚Äôt overdo it)
- Place section headers **before major tasks**
- Use consistent formatting across your script

---

## ‚úÖ **Takeaway**

If your script is longer than a screen or two,  
**sectioning comments aren‚Äôt optional ‚Äî they‚Äôre essential.**


In [None]:
# Load data --------------------------------------

flights <- nycflights13::flights


# Clean data --------------------------------------

clean_flights <- flights |>
  filter(!is.na(dep_delay), !is.na(arr_delay))


# Summarize data ----------------------------------

monthly_delay <- clean_flights |>
  group_by(month) |>
  summarize(
    avg_delay = mean(arr_delay),
    n = n()
  )


# Plot data ---------------------------------------

monthly_delay |>
  ggplot(aes(x = month, y = avg_delay)) +
  geom_line() +
  geom_point()
