<a href="https://colab.research.google.com/github/JordanDCunha/R-for-Data-Science-2e-/blob/main/Chapter_12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üß† 12.1 Introduction ‚Äî Logical Vectors in R

In this chapter, you‚Äôll learn how to work with **logical vectors**, one of the simplest but most powerful data types in R. Each element of a logical vector can take **only three values**:

- **TRUE**
- **FALSE**
- **NA** (missing)

Although logical vectors rarely appear directly in raw datasets, they are created and used **constantly** during data analysis‚Äîespecially for **filtering**, **mutating**, and **conditional logic**.

---

## üî¢ What you‚Äôll learn in this chapter

You‚Äôll progress through the core tools in a logical sequence:

1. **Numeric comparisons**  
   The most common way to create logical vectors (e.g., `<`, `>`, `==`).

2. **Boolean algebra**  
   Combining logical vectors using operators like **AND (`&`)**, **OR (`|`)**, and **NOT (`!`)**.

3. **Summaries of logical values**  
   Counting or checking conditions across vectors.

4. **Conditional transformations**  
   Using **`if_else()`** and **`case_when()`** to make changes based on logical conditions.

---

## üì¶ 12.1.1 Prerequisites

Most functions covered here come from **base R**, but we‚Äôll still load the **tidyverse** so we can conveniently use tools like **`mutate()`**, **`filter()`**, and pipes.  
Examples will also use the **`nycflights13::flights`** dataset.

---

## üß™ Working with simple vectors

To clearly demonstrate individual functions, we‚Äôll often work with **small, made-up vectors** created using `c()`. This makes concepts easier to understand‚Äîeven if it feels less realistic at first.

‚ö†Ô∏è Important reminder:  
**Anything you can do to a standalone vector, you can also do to a column inside a data frame** using `mutate()` and related functions.

---

## üìä From vectors to data frames

You‚Äôll see how the same operation behaves:
- on a **raw vector**, and
- inside a **tibble**, where it becomes part of a structured dataset.

This connection is key to applying logical reasoning in real analyses.


In [None]:
library(tidyverse)
library(nycflights13)

x <- c(1, 2, 3, 5, 7, 11, 13)
x * 2

df <- tibble(x)
df |>
  mutate(y = x * 2)


# üîç 12.2 Comparisons ‚Äî Logical Vectors in Practice

Logical vectors are everywhere in R. Any time you filter rows, create conditional variables, or branch logic, you‚Äôre relying on values that are either **TRUE**, **FALSE**, or **NA**.

---

## üî¢ Creating logical vectors with comparisons

The most common way to create logical vectors is with numeric comparisons:

- `<`, `<=`, `>`, `>=`
- `==` (equal)
- `!=` (not equal)

These comparisons are often written *inline* inside `filter()`, where they are computed, used, and discarded immediately. While concise, this can become hard to read when conditions get complex.

---

## üß† Making logic explicit with `mutate()`

For complex conditions, it‚Äôs often clearer to **name intermediate logical variables** using `mutate()`. This improves readability and makes it easier to check each condition separately before combining them.

Once created, logical variables behave just like any other column and can be reused in `filter()` or summaries.

---

## ‚ö†Ô∏è Floating point comparisons (`==` can fail)

Due to limited numerical precision, computers often cannot represent decimal numbers exactly. As a result, values that *print* as equal may not be exactly equal internally.

Because of this, `==` is unreliable for numeric equality tests involving calculations.  
‚úÖ Use **`dplyr::near()`**, which checks whether numbers are close within a small tolerance.

---

## ‚ùì Missing values (`NA`) are contagious

Missing values represent *unknowns*. Any comparison involving `NA` usually returns `NA`, including:

- `10 == NA`
- `NA > 5`
- even `NA == NA`

This means `filter(x == NA)` will **never work**.

‚úÖ To detect missing values, always use **`is.na()`**.

---

## üìä Sorting with missing values

By default, `arrange()` places missing values at the end.  
You can override this behavior by sorting on `is.na()` first, giving you explicit control over where `NA`s appear.

---

## üß† Key takeaways

- Comparisons create logical vectors  
- Logical vectors power filtering and conditional logic  
- Use `near()` for numeric equality  
- Use `is.na()` to detect missing values  
- Naming logical steps improves clarity and debuggability


In [None]:
library(tidyverse)
library(nycflights13)

# Explicit logical variables instead of inline filter logic
flights_logical <- flights |>
  mutate(
    daytime = dep_time > 600 & dep_time < 2000,
    approx_ontime = abs(arr_delay) < 20,
    .keep = "used"
  )

# Floating point comparison issue
x <- c(1 / 49 * 49, sqrt(2)^2)
x == c(1, 2)
near(x, c(1, 2))

# Correct way to filter missing values
flights |>
  filter(is.na(dep_time))

# Controlling NA placement when sorting
flights |>
  filter(month == 1, day == 1) |>
  arrange(desc(is.na(dep_time)), dep_time)


# üìä 12.4 Summaries (Logical Vectors in R)

Logical vectors (`TRUE`, `FALSE`, `NA`) are powerful for **summarizing patterns** in data, especially when combined with `dplyr`. You can summarize them using **logical summaries**, **numeric summaries**, and **logical subsetting**.

---

## üß† Logical summaries
Two core functions:
- **`any(x)`** ‚Üí returns `TRUE` if *at least one* value is `TRUE`
- **`all(x)`** ‚Üí returns `TRUE` only if *every* value is `TRUE`

Both support `na.rm = TRUE` to ignore missing values.  
They‚Äôre useful for asking yes/no questions at the group level (e.g., *Did any flight have a long delay that day?*).

---

## üî¢ Numeric summaries of logical vectors
When used numerically:
- `TRUE` ‚Üí `1`
- `FALSE` ‚Üí `0`

This means:
- **`sum(x)`** = number of `TRUE`s  
- **`mean(x)`** = proportion of `TRUE`s  

These give more detailed summaries than `any()` / `all()`.

---

## üéØ Logical subsetting
Logical vectors can also be used **inside summaries** to focus on subsets of interest using the base `[` operator.

Example idea:
- `arr_delay[arr_delay > 0]` ‚Üí only delayed flights
- `arr_delay[arr_delay < 0]` ‚Üí early flights

This lets you compute multiple conditional summaries **in one pass**, without separate filtering steps.

---

## üìù Key takeaways
- Use `any()` / `all()` for coarse TRUE/FALSE questions
- Use `sum()` / `mean()` for counts and proportions
- Logical vectors behave like numbers in summaries
- Inline logical subsetting avoids extra `filter()` calls


In [None]:
library(tidyverse)
library(nycflights13)

# Logical summaries by day
flights |>
  group_by(year, month, day) |>
  summarize(
    all_delayed = all(dep_delay <= 60, na.rm = TRUE),
    any_long_delay = any(arr_delay >= 300, na.rm = TRUE),
    .groups = "drop"
  )

# Numeric summaries of logical vectors
flights |>
  group_by(year, month, day) |>
  summarize(
    proportion_delayed = mean(dep_delay <= 60, na.rm = TRUE),
    count_long_delay = sum(arr_delay >= 300, na.rm = TRUE),
    .groups = "drop"
  )

# Logical subsetting inside summarize()
flights |>
  group_by(year, month, day) |>
  summarize(
    behind = mean(arr_delay[arr_delay > 0], na.rm = TRUE),
    ahead  = mean(arr_delay[arr_delay < 0], na.rm = TRUE),
    n = n(),
    .groups = "drop"
  )


# ‚ú® 12.5 Conditional Transformations (Logical Vectors)

Logical vectors power **conditional transformations** in R‚Äîdoing different things depending on whether conditions are `TRUE`, `FALSE`, or `NA`. Two core tools handle this cleanly in the tidyverse: **`if_else()`** and **`case_when()`**.

---

## üîÄ `if_else()`
Use `if_else(condition, true, false, missing)` when you have **two outcomes**.
- `condition`: a logical vector
- `true` / `false`: values used when the condition is `TRUE` / `FALSE`
- `missing` (optional): value used when the condition is `NA`

It‚Äôs strict about **types**: outputs must be compatible.

---

## üß© `case_when()`
Use `case_when()` when you have **multiple conditions**.
- Write rules as `condition ~ value`
- The **first matching condition wins**
- Use `.default` as a catch-all
- More verbose than nested `if_else()`, but far clearer

---

## ‚ö†Ô∏è Type compatibility
Both functions require compatible output types:
- Logical ‚Üî Numeric ‚úÖ
- Character ‚Üî Factor ‚úÖ
- Date ‚Üî Datetime ‚úÖ
- `NA` is compatible with everything

Mixing incompatible types (e.g., character and numeric) will error‚Äîby design.

---

## üõ´ Real-world use
`case_when()` shines for readable labels, like flight status categories derived from arrival delays. Be careful to avoid **overlapping conditions** and order rules intentionally.


In [None]:
library(tidyverse)
library(nycflights13)

# if_else(): even vs odd (0‚Äì20)
x <- 0:20
if_else(x %% 2 == 0, "even", "odd")

# if_else(): weekdays vs weekends
days <- c("Monday", "Saturday", "Wednesday")
if_else(days %in% c("Saturday", "Sunday"), "weekend", "weekday")

# if_else(): absolute value
x2 <- c(-3:3, NA)
if_else(x2 < 0, -x2, x2)

# case_when(): label flight arrival status
flights |>
  mutate(
    status = case_when(
      is.na(arr_delay)      ~ "cancelled",
      arr_delay < -30       ~ "very early",
      arr_delay < -15       ~ "early",
      abs(arr_delay) <= 15  ~ "on time",
      arr_delay < 60        ~ "late",
      arr_delay < Inf       ~ "very late"
    ),
    .keep = "used"
  )

# case_when(): US holidays (logical flag + label)
flights |>
  mutate(
    is_holiday = case_when(
      month == 1  & day == 1  ~ TRUE,   # New Year's Day
      month == 7  & day == 4  ~ TRUE,   # July 4th
      month == 11 & day %in% 22:28 ~ TRUE, # Thanksgiving window
      month == 12 & day == 25 ~ TRUE,   # Christmas
      .default = FALSE
    ),
    holiday = case_when(
      month == 1  & day == 1  ~ "New Year's Day",
      month == 7  & day == 4  ~ "Independence Day",
      month == 11 & day %in% 22:28 ~ "Thanksgiving",
      month == 12 & day == 25 ~ "Christmas",
      .default = NA_character_
    ),
    .keep = "used"
  )
