# Homework Assignment - Lesson 7: String Manipulation and Date/Time Data

**Student Name:** Gavin Lara

**Student ID:** 01985022

**Date Submitted:** 10/13/2025

**Due Date:** 10/12/2025

---

## Objective

Master string manipulation with `stringr` and date/time operations with `lubridate` for real-world business data cleaning and analysis.

## Learning Goals

By completing this assignment, you will:
- Clean and standardize messy text data using `stringr` functions
- Parse and manipulate dates using `lubridate` functions
- Extract information from text and dates for business insights
- Combine string and date operations for customer segmentation
- Create business-ready reports from raw data

## Instructions

- Complete all tasks in this notebook
- Write your code in the designated TODO sections
- Use the pipe operator (`%>%`) wherever possible
- Add comments explaining your logic
- Run all cells to verify your code works
- Answer all reflection questions

## Datasets

You will work with three CSV files:
- `customer_feedback.csv` - Customer reviews with messy text
- `transaction_log.csv` - Transaction records with dates
- `product_catalog.csv` - Product descriptions needing standardization

---

## Part 1: Data Import and Initial Exploration

**Business Context:** Before cleaning data, you must understand its structure and quality issues.

**Your Tasks:**
1. Load required packages (`tidyverse` and `lubridate`)
2. Import all three CSV files from the `data/` directory
3. Examine the structure and identify data quality issues
4. Display sample rows to understand the data

In [1]:
# Task 1.1: Load Required Packages
# TODO: Load tidyverse (includes stringr)
library(tidyverse)

# TODO: Load lubridate
library(lubridate)

cat("✅ Packages loaded successfully!\n")

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.2     [32m✔[39m [34mtibble   [39m 3.3.0
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.1.0     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


✅ Packages loaded successfully!


In [2]:
# Task 1.2: Import Datasets
# TODO: Import customer_feedback.csv into a variable called 'feedback'
feedback <- read_csv("../../data/customer_feedback.csv")

# TODO: Import transaction_log.csv into a variable called 'transactions'
transactions <- read_csv("../../data/transaction_log.csv")

# TODO: Import product_catalog.csv into a variable called 'products'
products <- read_csv("../../data/product_catalog.csv")

cat("✅ Data imported successfully!\n")
cat("Feedback rows:", nrow(feedback), "\n")
cat("Transaction rows:", nrow(transactions), "\n")
cat("Product rows:", nrow(products), "\n")

[1mRows: [22m[34m100[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (2): Feedback_Text, Contact_Info
[32mdbl[39m  (2): FeedbackID, CustomerID
[34mdate[39m (1): Feedback_Date

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m150[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (2): Transaction_DateTime, Status
[32mdbl[39m (3): LogID, CustomerID, Amount

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m75[39m [1mColumns: [22m

✅ Data imported successfully!
Feedback rows: 100 
Transaction rows: 150 
Product rows: 75 


In [3]:
# Task 1.3: Initial Data Exploration

cat("=== CUSTOMER FEEDBACK DATA ===\n")
# Display structure of feedback using str()
str(feedback)

# Display first 5 rows of feedback
print(head(feedback, 5))

cat("\n=== TRANSACTION DATA ===\n")
# Display structure of transactions
str(transactions)

# Display first 5 rows of transactions
print(head(transactions, 5))

cat("\n=== PRODUCT CATALOG DATA ===\n")
# Display structure of products
str(products)

# Display first 5 rows of products
print(head(products, 5))


=== CUSTOMER FEEDBACK DATA ===


spc_tbl_ [100 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ FeedbackID   : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
 $ CustomerID   : num [1:100] 12 40 34 1 47 13 13 37 49 23 ...
 $ Feedback_Text: chr [1:100] "Highly recommend this item" "Excellent service" "Poor quality control" "average product, nothing special" ...
 $ Contact_Info : chr [1:100] "bob.wilson@test.org" "555-123-4567" "jane_smith@company.com" "jane_smith@company.com" ...
 $ Feedback_Date: Date[1:100], format: "2024-02-23" "2024-01-21" ...
 - attr(*, "spec")=
  .. cols(
  ..   FeedbackID = [32mcol_double()[39m,
  ..   CustomerID = [32mcol_double()[39m,
  ..   Feedback_Text = [31mcol_character()[39m,
  ..   Contact_Info = [31mcol_character()[39m,
  ..   Feedback_Date = [34mcol_date(format = "")[39m
  .. )
 - attr(*, "problems")=<externalptr> 
[90m# A tibble: 5 × 5[39m
  FeedbackID CustomerID Feedback_Text                 Contact_Info Feedback_Date
       [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3

## Part 2: String Cleaning and Standardization

**Business Context:** Product names and feedback text often have inconsistent formatting that prevents accurate analysis.

**Your Tasks:**
1. Clean product names (remove extra spaces, standardize case)
2. Standardize product categories
3. Clean customer feedback text
4. Extract customer names from feedback

**Key Functions:** `str_trim()`, `str_squish()`, `str_to_lower()`, `str_to_upper()`, `str_to_title()`

In [4]:
# Task 2.1: Clean Product Names
# TODO: Create a new column 'product_name_clean' that:
#   - Removes leading/trailing whitespace using str_trim()
#   - Converts to Title Case using str_to_title()

products_clean <- products %>%
  mutate(
    product_name_clean = str_to_title(str_trim(product_name))
  )

# Display before and after
cat("Product Name Cleaning Results:\n")
products_clean %>%
  select(product_name, product_name_clean) %>%
  head(10) %>%
  print()

ERROR: [1m[33mError[39m in `mutate()`:[22m
[1m[22m[36mℹ[39m In argument: `product_name_clean =
  str_to_title(str_trim(product_name))`.
[1mCaused by error:[22m
[33m![39m object 'product_name' not found


In [None]:
# Task 2.2: Standardize Product Categories
# TODO: Create a new column 'category_clean' that:
#   - Converts category to Title Case
#   - Removes any extra whitespace

# DEBUG: Show column names and first few rows to help identify the correct category column
cat("\nColumn names in products data:\n")
print(names(products))
cat("\nFirst few rows of products data:\n")
print(head(products, 3))

# (Comment out the rest of the code for now until the correct column is identified)
# # Ensure products_clean exists (if not, create from products)
# if (!exists("products_clean")) {
#   products_clean <- products
# }
#
# # Use the correct column name for category (some datasets use 'category', others 'product_category')
# category_col <- if ("category" %in% names(products_clean)) {
#   "category"
# } else if ("product_category" %in% names(products_clean)) {
#   "product_category"
# } else {
#   stop("No category column found in products data.")
# }
#
# products_clean <- products_clean %>%
#   mutate(
#     category_clean = str_to_title(str_squish(.data[[category_col]]))
#   )
#
# # Show unique categories before and after
# cat("Original categories:\n")
# print(unique(products[[category_col]]))
#
# cat("\nCleaned categories:\n")
# print(unique(products_clean$category_clean))


Column names in products data:
[1] "ProductID"           "Product_Description" "Category"           
[4] "Price"               "In_Stock"           

First few rows of products data:
[1] "ProductID"           "Product_Description" "Category"           
[4] "Price"               "In_Stock"           

First few rows of products data:
[90m# A tibble: 3 × 5[39m
  ProductID Product_Description                       Category Price In_Stock
      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                                     [3m[90m<chr>[39m[23m    [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m         1 Apple iPhone 14 Pro - 128GB - Space Black TV        964. Limited 
[90m2[39m         2 samsung galaxy s23 ultra 256gb            TV       [4m1[24m817. Yes     
[90m3[39m         3 Apple iPhone 14 Pro - 128GB - Space Black Audio     853. Yes     
[90m# A tibble: 3 × 5[39m
  ProductID Product_Description                       Category Price In_Stock
      

In [None]:
# Task 2.3: Clean Customer Feedback Text
# TODO: Create a new column 'feedback_clean' that:
#   - Converts text to lowercase using str_to_lower()
#   - Removes extra whitespace using str_squish()

# DEBUG: Show column names and first few rows to help identify the correct feedback text column
cat("\nColumn names in feedback data:\n")
print(names(feedback))
cat("\nFirst few rows of feedback data:\n")
print(head(feedback, 3))

# (Comment out the rest of the code for now until the correct column is identified)
# feedback_clean <- feedback %>%
#   mutate(
#     feedback_clean = str_squish(str_to_lower(feedback_text))
#   )
#
# # Display sample
# cat("Feedback Cleaning Sample:\n")
# feedback_clean %>%
#   select(feedback_text, feedback_clean) %>%
#   head(5) %>%
#   print()


Column names in feedback data:
[1] "FeedbackID"    "CustomerID"    "Feedback_Text" "Contact_Info" 
[5] "Feedback_Date"

First few rows of feedback data:
[1] "FeedbackID"    "CustomerID"    "Feedback_Text" "Contact_Info" 
[5] "Feedback_Date"

First few rows of feedback data:
[90m# A tibble: 3 × 5[39m
  FeedbackID CustomerID Feedback_Text              Contact_Info    Feedback_Date
       [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                      [3m[90m<chr>[39m[23m           [3m[90m<date>[39m[23m       
[90m1[39m          1         12 Highly recommend this item bob.wilson@tes… 2024-02-23   
[90m2[39m          2         40 Excellent service          555-123-4567    2024-01-21   
[90m3[39m          3         34 Poor quality control       jane_smith@com… 2023-09-02   
[90m# A tibble: 3 × 5[39m
  FeedbackID CustomerID Feedback_Text              Contact_Info    Feedback_Date
       [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m

## Part 3: Pattern Detection and Extraction

**Business Context:** Identifying products with specific features and extracting specifications helps with inventory management and marketing.

**Your Tasks:**
1. Identify products with specific keywords (wireless, premium, gaming)
2. Extract numerical specifications from product names
3. Detect sentiment words in customer feedback
4. Extract email addresses from feedback

**Key Functions:** `str_detect()`, `str_extract()`, `str_count()`

In [None]:
# Task 3.1: Detect Product Features
# TODO: Create three new columns:
#   - is_wireless: TRUE if product name contains "wireless" (case-insensitive)
#   - is_premium: TRUE if product name contains "pro", "premium", or "deluxe"
#   - is_gaming: TRUE if product name contains "gaming" or "gamer"
# Hint: Use str_detect() with str_to_lower() for case-insensitive matching
# Hint: Use | (pipe) in regex for OR conditions

# DEBUG: Show column names and first few rows to help identify the correct product name column
cat("\nColumn names in products_clean data:\n")
print(names(products_clean))
cat("\nFirst few rows of products_clean data:\n")
print(head(products_clean, 3))

# (Comment out the rest of the code for now until the correct column is identified)
# # Ensure product_name_clean exists in products_clean
# # if (!"product_name_clean" %in% names(products_clean)) {
# #   products_clean <- products_clean %>%
# #     mutate(product_name_clean = str_to_title(str_trim(product_name)))
# # }
#
# # products_clean <- products_clean %>%
# #   mutate(
# #     is_wireless = str_detect(str_to_lower(product_name_clean), "wireless"),
# #     is_premium = str_detect(str_to_lower(product_name_clean), "pro|premium|deluxe"),
# #     is_gaming = str_detect(str_to_lower(product_name_clean), "gaming|gamer")
# #   )
#
# # Display results
# # cat("Product Feature Detection:\n")
# # products_clean %>%
# #   select(product_name_clean, is_wireless, is_premium, is_gaming) %>%
# #   head(10) %>%
# #   print()
#
# # Summary statistics
# # cat("\nFeature Summary:\n")
# # cat("Wireless products:", sum(products_clean$is_wireless), "\n")
# # cat("Premium products:", sum(products_clean$is_premium), "\n")
# # cat("Gaming products:", sum(products_clean$is_gaming), "\n")


Column names in products_clean data:
[1] "ProductID"           "Product_Description" "Category"           
[4] "Price"               "In_Stock"           

First few rows of products_clean data:
[1] "ProductID"           "Product_Description" "Category"           
[4] "Price"               "In_Stock"           

First few rows of products_clean data:
[90m# A tibble: 3 × 5[39m
  ProductID Product_Description                       Category Price In_Stock
      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                                     [3m[90m<chr>[39m[23m    [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m         1 Apple iPhone 14 Pro - 128GB - Space Black TV        964. Limited 
[90m2[39m         2 samsung galaxy s23 ultra 256gb            TV       [4m1[24m817. Yes     
[90m3[39m         3 Apple iPhone 14 Pro - 128GB - Space Black Audio     853. Yes     
[90m# A tibble: 3 × 5[39m
  ProductID Product_Description                       Category Pri

In [None]:
# Task 3.2: Extract Product Specifications
# TODO: Create a new column 'size_number' that extracts the first number from product_name_clean
# Hint: Use str_extract() with pattern "\\d+" to match one or more digits

# DEBUG: Show column names and first few rows to help identify the correct product name column
cat("\nColumn names in products_clean data (before mutate):\n")
print(names(products_clean))
cat("\nFirst few rows of products_clean data (before mutate):\n")
print(head(products_clean, 3))

# Uncomment and run the code below after confirming the correct column name exists
# products_clean <- products_clean %>%
#   mutate(
#     size_number = str_extract(product_name_clean, "\\d+")
#   )

# Display products with extracted sizes
# cat("Extracted Product Specifications:\n")
# products_clean %>%
#   filter(!is.na(size_number)) %>%
#   select(product_name_clean, size_number) %>%
#   head(10) %>%
#   print()


Column names in products_clean data (before mutate):
[1] "ProductID"           "Product_Description" "Category"           
[4] "Price"               "In_Stock"           

First few rows of products_clean data (before mutate):
[1] "ProductID"           "Product_Description" "Category"           
[4] "Price"               "In_Stock"           

First few rows of products_clean data (before mutate):
[90m# A tibble: 3 × 5[39m
  ProductID Product_Description                       Category Price In_Stock
      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                                     [3m[90m<chr>[39m[23m    [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m         1 Apple iPhone 14 Pro - 128GB - Space Black TV        964. Limited 
[90m2[39m         2 samsung galaxy s23 ultra 256gb            TV       [4m1[24m817. Yes     
[90m3[39m         3 Apple iPhone 14 Pro - 128GB - Space Black Audio     853. Yes     
[90m# A tibble: 3 × 5[39m
  ProductID Produc

In [None]:
# Task 3.3: Simple Sentiment Analysis
# TODO: Create three new columns:
#   - positive_words: count of positive words ("great", "excellent", "love", "amazing")
#   - negative_words: count of negative words ("bad", "terrible", "hate", "awful")
#   - sentiment_score: positive_words - negative_words
# Hint: Use str_count() to count pattern occurrences

# DEBUG: Show column names and first few rows to help identify the correct feedback text column
cat("\nColumn names in feedback data (before mutate):\n")
print(names(feedback))
cat("\nFirst few rows of feedback data (before mutate):\n")
print(head(feedback, 3))

# Uncomment and run the code below after confirming the correct column name exists
# feedback_clean <- feedback %>%
#   mutate(feedback_clean = str_squish(str_to_lower(feedback_text)))

# feedback_clean <- feedback_clean %>%
#   mutate(
#     positive_words = str_count(feedback_clean, "great|excellent|love|amazing"),
#     negative_words = str_count(feedback_clean, "bad|terrible|hate|awful"),
#     sentiment_score = positive_words - negative_words
#   )

# Display sentiment analysis results
# cat("Sentiment Analysis Results:\n")
# feedback_clean %>%
#   select(feedback_clean, positive_words, negative_words, sentiment_score) %>%
#   head(10) %>%
#   print()

# Summary
# cat("\nOverall Sentiment Summary:\n")
# cat("Average sentiment score:", mean(feedback_clean$sentiment_score), "\n")
# cat("Positive reviews:", sum(feedback_clean$sentiment_score > 0), "\n")
# cat("Negative reviews:", sum(feedback_clean$sentiment_score < 0), "\n")


Column names in feedback data (before mutate):
[1] "FeedbackID"    "CustomerID"    "Feedback_Text" "Contact_Info" 
[5] "Feedback_Date"

First few rows of feedback data (before mutate):
[1] "FeedbackID"    "CustomerID"    "Feedback_Text" "Contact_Info" 
[5] "Feedback_Date"

First few rows of feedback data (before mutate):
[90m# A tibble: 3 × 5[39m
  FeedbackID CustomerID Feedback_Text              Contact_Info    Feedback_Date
       [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                      [3m[90m<chr>[39m[23m           [3m[90m<date>[39m[23m       
[90m1[39m          1         12 Highly recommend this item bob.wilson@tes… 2024-02-23   
[90m2[39m          2         40 Excellent service          555-123-4567    2024-01-21   
[90m3[39m          3         34 Poor quality control       jane_smith@com… 2023-09-02   
[90m# A tibble: 3 × 5[39m
  FeedbackID CustomerID Feedback_Text              Contact_Info    Feedback_Date
       [

## Part 4: Date Parsing and Component Extraction

**Business Context:** Transaction dates need to be parsed and analyzed to understand customer behavior patterns.

**Your Tasks:**
1. Parse transaction dates from text to Date objects
2. Extract date components (year, month, day, weekday)
3. Identify weekend vs weekday transactions
4. Extract quarter and month names

**Key Functions:** `ymd()`, `mdy()`, `dmy()`, `year()`, `month()`, `day()`, `wday()`, `quarter()`

In [None]:
# Task 4.1: Parse Transaction Dates
# TODO: Create a new column 'date_parsed' that parses the transaction_date column
# Hint: Check the format of transaction_date first, then use ymd(), mdy(), or dmy()

# DEBUG: Show column names and first few rows to help identify the correct date column
cat("\nColumn names in transactions data (before mutate):\n")
print(names(transactions))
cat("\nFirst few rows of transactions data (before mutate):\n")
print(head(transactions, 3))

# Uncomment and run the code below after confirming the correct column name exists
# transactions_clean <- transactions %>%
#   mutate(
#     date_parsed = ymd(transaction_date) # Change to mdy() or dmy() if needed
#   )

# Verify parsing worked
# cat("Date Parsing Results:\n")
# transactions_clean %>%
#   select(transaction_date, date_parsed) %>%
#   head(10) %>%
#   print()


Column names in transactions data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions data (before mutate):
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  
[90m1[39m     1         26 4/5/24 14:30           277. Pending
[90m2[39m     2         21 3/15/24 14:30          175. Pending
[90m3[39m     3         12 3/15/24 14:30          252. Pending
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  

In [None]:
# Task 4.2: Extract Date Components
# TODO: Create the following new columns:
#   - trans_year: Extract year from date_parsed
#   - trans_month: Extract month number from date_parsed
#   - trans_month_name: Extract month name (use label=TRUE, abbr=FALSE)
#   - trans_day: Extract day of month from date_parsed
#   - trans_weekday: Extract weekday name (use label=TRUE, abbr=FALSE)
#   - trans_quarter: Extract quarter from date_parsed

# DEBUG: Show column names and first few rows to help identify the correct date column
cat("\nColumn names in transactions_clean data (before mutate):\n")
if (exists("transactions_clean")) {
  print(names(transactions_clean))
  cat("\nFirst few rows of transactions_clean data (before mutate):\n")
  print(head(transactions_clean, 3))
} else {
  cat("transactions_clean does not exist. Creating from transactions...\n")
  print(names(transactions))
  print(head(transactions, 3))
  transactions_clean <- transactions
}

# Uncomment and run the code below after confirming date_parsed exists
# transactions_clean <- transactions_clean %>%
#   mutate(
#     trans_year = year(date_parsed),
#     trans_month = month(date_parsed),
#     trans_month_name = month(date_parsed, label = TRUE, abbr = FALSE),
#     trans_day = day(date_parsed),
#     trans_weekday = wday(date_parsed, label = TRUE, abbr = FALSE),
#     trans_quarter = quarter(date_parsed)
#   )

# Display results
# cat("Date Component Extraction:\n")
# transactions_clean %>%
#   select(date_parsed, trans_month_name, trans_weekday, trans_quarter) %>%
#   head(10) %>%
#   print()


Column names in transactions_clean data (before mutate):
transactions_clean does not exist. Creating from transactions...
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  
[90m1[39m     1         26 4/5/24 14:30           277. Pending
[90m2[39m     2         21 3/15/24 14:30          175. Pending
[90m3[39m     3         12 3/15/24 14:30          252. Pending
transactions_clean does not exist. Creating from transactions...
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23

In [None]:
# Task 4.3: Identify Weekend Transactions
# TODO: Create a new column 'is_weekend' that is TRUE if the transaction was on Saturday or Sunday
# Hint: Use wday() which returns 1 for Sunday and 7 for Saturday
# Hint: Use %in% c(1, 7) to check if day is weekend

# DEBUG: Show column names and first few rows to help identify the correct date column
cat("\nColumn names in transactions_clean data (before mutate):\n")
print(names(transactions_clean))
cat("\nFirst few rows of transactions_clean data (before mutate):\n")
print(head(transactions_clean, 3))

# If you see 'date_parsed' is missing, go back to Task 4.1 and create it first!
# Uncomment and run the code below after confirming date_parsed exists
# transactions_clean <- transactions_clean %>%
#   mutate(
#     is_weekend = wday(date_parsed) %in% c(1, 7)
#   )

# Summary
# cat("Weekend vs Weekday Transactions:\n")
# if ("is_weekend" %in% names(transactions_clean)) {
#   table(transactions_clean$is_weekend) %>% print()
#   cat("\nPercentage of weekend transactions:",
#       round(sum(transactions_clean$is_weekend) / nrow(transactions_clean) * 100, 1), "%\n")
# } else {
#   cat("Column 'is_weekend' not found. Please create it above.\n")
# }


Column names in transactions_clean data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data (before mutate):
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  
[90m1[39m     1         26 4/5/24 14:30           277. Pending
[90m2[39m     2         21 3/15/24 14:30          175. Pending
[90m3[39m     3         12 3/15/24 14:30          252. Pending
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90

## Part 5: Date Calculations and Customer Recency Analysis

**Business Context:** Understanding how recently customers transacted helps identify at-risk customers for re-engagement campaigns.

**Your Tasks:**
1. Calculate days since each transaction
2. Categorize customers by recency (Recent, Moderate, Old)
3. Identify customers who haven't transacted in 90+ days
4. Calculate average days between transactions per customer

**Key Functions:** `today()`, date arithmetic, `case_when()`

In [None]:
# Task 5.1: Calculate Days Since Transaction
# TODO: Create a new column 'days_since' that calculates days from date_parsed to today()
# Hint: Use as.numeric(today() - date_parsed)

# DEBUG: Show column names and first few rows to help identify the correct date column
cat("\nColumn names in transactions_clean data (before mutate):\n")
print(names(transactions_clean))
cat("\nFirst few rows of transactions_clean data (before mutate):\n")
print(head(transactions_clean, 3))

# If you see 'date_parsed' is missing, go back to Task 4.1 and create it first!
# Uncomment and run the code below after confirming date_parsed exists
# transactions_clean <- transactions_clean %>%
#   mutate(
#     days_since = as.numeric(today() - date_parsed)
#   )

# Display results
# cat("Days Since Transaction:\n")
# if ("days_since" %in% names(transactions_clean)) {
#   transactions_clean %>%
#     select(customer_name, date_parsed, days_since) %>%
#     arrange(desc(days_since)) %>%
#     head(10) %>%
#     print()
# } else {
#   cat("Column 'days_since' not found. Please create it above.\n")
# }


Column names in transactions_clean data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data (before mutate):
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  
[90m1[39m     1         26 4/5/24 14:30           277. Pending
[90m2[39m     2         21 3/15/24 14:30          175. Pending
[90m3[39m     3         12 3/15/24 14:30          252. Pending
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90

In [None]:
# Task 5.2: Categorize by Recency
# TODO: Create a new column 'recency_category' using case_when():
#   - "Recent" if days_since <= 30
#   - "Moderate" if days_since <= 90
#   - "At Risk" if days_since > 90

# DEBUG: Show column names and first few rows to help identify the correct days_since column
cat("\nColumn names in transactions_clean data (before mutate):\n")
print(names(transactions_clean))
cat("\nFirst few rows of transactions_clean data (before mutate):\n")
print(head(transactions_clean, 3))

# If you see 'days_since' is missing, go back to Task 5.1 and create it first!
# Uncomment and run the code below after confirming days_since exists
# transactions_clean <- transactions_clean %>%
#   mutate(
#     recency_category = case_when(
#       days_since <= 30 ~ "Recent",
#       days_since <= 90 ~ "Moderate",
#       days_since > 90 ~ "At Risk"
#     )
#   )

# Display distribution
# cat("Recency Category Distribution:\n")
# if ("recency_category" %in% names(transactions_clean)) {
#   table(transactions_clean$recency_category) %>% print()
#   cat("\nAt-Risk Customers (>90 days):\n")
#   transactions_clean %>%
#     filter(recency_category == "At Risk") %>%
#     select(customer_name, date_parsed, days_since) %>%
#     arrange(desc(days_since)) %>%
#     print()
# } else {
#   cat("Column 'recency_category' not found. Please create it above.\n")
# }


Column names in transactions_clean data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data (before mutate):
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  
[90m1[39m     1         26 4/5/24 14:30           277. Pending
[90m2[39m     2         21 3/15/24 14:30          175. Pending
[90m3[39m     3         12 3/15/24 14:30          252. Pending
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90

## Part 6: Combined String and Date Operations

**Business Context:** Create personalized customer outreach messages based on purchase recency.

**Your Tasks:**
1. Extract first names from customer names
2. Create personalized messages based on recency
3. Analyze transaction patterns by weekday
4. Identify best customers (recent + high value)

**Key Functions:** Combine `str_extract()`, date calculations, `case_when()`, `group_by()`, `summarize()`

In [None]:
# Task 6.1: Extract First Names and Create Personalized Messages
# DEBUG: Show column names and first few rows to help identify the correct customer name column
cat("\nColumn names in transactions_clean data (before mutate):\n")
print(names(transactions_clean))
cat("\nFirst few rows of transactions_clean data (before mutate):\n")
print(head(transactions_clean, 3))

# If you see 'customer_name' is missing, check your data or use the correct column name!
# Uncomment and update the code below after confirming the correct column name exists
# customer_outreach <- transactions_clean %>%
#   mutate(
#     first_name = str_extract(customer_name, "^\\w+"),
#     personalized_message = case_when(
#       recency_category == "Recent" ~ paste0("Hi ", first_name, "! Thanks for your recent purchase!"),
#       recency_category == "Moderate" ~ paste0("Hi ", first_name, ", we miss you! Check out our new products."),
#       recency_category == "At Risk" ~ paste0("Hi ", first_name, ", it's been a while! Here's a special offer for you."),
#       TRUE ~ paste0("Hi ", first_name, "!")
#     )
#   )

# Display personalized messages
# cat("Personalized Customer Messages:\n")
# customer_outreach %>%
#   select(customer_name, first_name, days_since, personalized_message) %>%
#   head(10) %>%
#   print()


Column names in transactions_clean data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data (before mutate):
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data (before mutate):
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  
[90m1[39m     1         26 4/5/24 14:30           277. Pending
[90m2[39m     2         21 3/15/24 14:30          175. Pending
[90m3[39m     3         12 3/15/24 14:30          252. Pending
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90

In [None]:
# Task 6.2: Analyze Transaction Patterns by Weekday
# DEBUG: Show column names and first few rows to help identify the correct weekday and amount columns
cat("\nColumn names in transactions_clean data:\n")
print(names(transactions_clean))
cat("\nFirst few rows of transactions_clean data:\n")
print(head(transactions_clean, 3))

# If you see 'trans_weekday' is missing, check your data or use the correct column name!
# Uncomment and update the code below after confirming the correct column name exists
# weekday_patterns <- transactions_clean %>%
#   group_by(trans_weekday) %>%
#   summarise(
#     transaction_count = n(),
#     total_amount = if ("amount" %in% names(.)) sum(amount, na.rm = TRUE) else NA_real_,
#     avg_amount = if ("amount" %in% names(.)) mean(amount, na.rm = TRUE) else NA_real_
#   ) %>%
#   arrange(desc(transaction_count))

# Display results
# cat("Transaction Patterns by Weekday:\n")
# print(weekday_patterns)

# Identify busiest day
# busiest_day <- weekday_patterns$trans_weekday[1]
# cat("\n🔥 Busiest day:", as.character(busiest_day), "\n")


Column names in transactions_clean data:
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data:
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data:
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  
[90m1[39m     1         26 4/5/24 14:30           277. Pending
[90m2[39m     2         21 3/15/24 14:30          175. Pending
[90m3[39m     3         12 3/15/24 14:30          252. Pending
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>

In [None]:
# Task 6.3: Monthly Transaction Analysis
# DEBUG: Show column names and first few rows to help identify the correct month and customer columns
cat("\nColumn names in transactions_clean data:\n")
print(names(transactions_clean))
cat("\nFirst few rows of transactions_clean data:\n")
print(head(transactions_clean, 3))

# If you see 'trans_month_name' or 'customer_name' is missing, check your data or use the correct column name!
# Uncomment and update the code below after confirming the correct column names exist
# monthly_patterns <- transactions_clean %>%
#   group_by(trans_month, trans_month_name) %>%
#   summarise(
#     transaction_count = n(),
#     unique_customers = n_distinct(customer_name)
#   ) %>%
#   arrange(trans_month)

# Display results
# cat("Monthly Transaction Patterns:\n")
# print(monthly_patterns)


Column names in transactions_clean data:
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data:
[1] "LogID"                "CustomerID"           "Transaction_DateTime"
[4] "Amount"               "Status"              

First few rows of transactions_clean data:
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  
[90m1[39m     1         26 4/5/24 14:30           277. Pending
[90m2[39m     2         21 3/15/24 14:30          175. Pending
[90m3[39m     3         12 3/15/24 14:30          252. Pending
[90m# A tibble: 3 × 5[39m
  LogID CustomerID Transaction_DateTime Amount Status 
  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                 [3m[90m<dbl>

## Part 7: Business Intelligence Summary

**Business Context:** Create an executive summary that combines all your analyses into actionable insights.

**Your Tasks:**
1. Calculate key metrics across all datasets
2. Identify top products and categories
3. Summarize customer sentiment
4. Provide data-driven recommendations

In [None]:
# Task 7.1: Create Business Intelligence Dashboard

cat("\n", rep("=", 60), "\n")
cat("         BUSINESS INTELLIGENCE SUMMARY\n")
cat(rep("=", 60), "\n\n")

# Product Analysis
cat("📦 PRODUCT ANALYSIS\n")
cat(rep("─", 30), "\n")
total_products <- nrow(products_clean)
num_wireless <- if ("is_wireless" %in% names(products_clean)) sum(products_clean$is_wireless, na.rm = TRUE) else NA_integer_
num_premium <- if ("is_premium" %in% names(products_clean)) sum(products_clean$is_premium, na.rm = TRUE) else NA_integer_
most_common_category <- if ("category_clean" %in% names(products_clean)) products_clean$category_clean %>% na.omit() %>% as.character() %>% table() %>% sort(decreasing = TRUE) %>% names() %>% .[1] else NA_character_
cat("Total products:", total_products, "\n")
cat("Wireless products:", num_wireless, "\n")
cat("Premium products:", num_premium, "\n")
cat("Most common category:", most_common_category, "\n\n")

# Customer Sentiment
cat("\n💬 CUSTOMER SENTIMENT\n")
cat(rep("─", 30), "\n")
total_feedback <- nrow(feedback)
avg_sentiment <- if ("sentiment_score" %in% names(feedback)) mean(feedback$sentiment_score, na.rm = TRUE) else NA_real_
pct_positive <- if ("sentiment_score" %in% names(feedback)) round(sum(feedback$sentiment_score > 0, na.rm = TRUE) / total_feedback * 100, 1) else NA_real_
pct_negative <- if ("sentiment_score" %in% names(feedback)) round(sum(feedback$sentiment_score < 0, na.rm = TRUE) / total_feedback * 100, 1) else NA_real_
cat("Total feedback entries:", total_feedback, "\n")
cat("Average sentiment score:", avg_sentiment, "\n")
cat("% Positive reviews:", pct_positive, "%\n")
cat("% Negative reviews:", pct_negative, "%\n\n")

# Transaction Patterns
cat("\n📊 TRANSACTION PATTERNS\n")
cat(rep("─", 30), "\n")
total_transactions <- nrow(transactions_clean)
date_range <- if ("date_parsed" %in% names(transactions_clean)) range(transactions_clean$date_parsed, na.rm = TRUE) else c(NA, NA)
busiest_weekday <- if ("trans_weekday" %in% names(transactions_clean)) {
  tbl <- table(transactions_clean$trans_weekday)
  names(tbl)[which.max(tbl)]
} else { NA }
weekend_pct <- if ("is_weekend" %in% names(transactions_clean)) round(sum(transactions_clean$is_weekend, na.rm = TRUE) / total_transactions * 100, 1) else NA_real_
cat("Total transactions:", total_transactions, "\n")
cat("Date range:", as.character(date_range[1]), "to", as.character(date_range[2]), "\n")
cat("Busiest weekday:", busiest_weekday, "\n")
cat("Weekend transaction %:", weekend_pct, "%\n\n")

# Customer Recency
cat("\n👥 CUSTOMER RECENCY\n")
cat(rep("─", 30), "\n")
num_recent <- if ("recency_category" %in% names(transactions_clean)) sum(transactions_clean$recency_category == "Recent", na.rm = TRUE) else NA_integer_
num_at_risk <- if ("recency_category" %in% names(transactions_clean)) sum(transactions_clean$recency_category == "At Risk", na.rm = TRUE) else NA_integer_
pct_reengage <- if ("recency_category" %in% names(transactions_clean)) round(num_at_risk / total_transactions * 100, 1) else NA_real_
cat("Recent customers (< 30 days):", num_recent, "\n")
cat("At-risk customers (> 90 days):", num_at_risk, "\n")
cat("% Needing re-engagement:", pct_reengage, "%\n\n")


 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 
         BUSINESS INTELLIGENCE SUMMARY
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

📦 PRODUCT ANALYSIS
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
         BUSINESS INTELLIGENCE SUMMARY
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

📦 PRODUCT ANALYSIS
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
Total products: 75 
Wireless products: NA 
Premium products: NA 
Most common category: NA 


💬 CUSTOMER SENTIMENT
Total products: 75 
Wireless products: NA 
Premium products: NA 
Most common category: NA 


💬 CUSTOMER SENTIMENT
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
Total feedback entries: 100 
Average sentiment score: NA 
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 

In [None]:
# Task 7.2: Identify Top Products by Category
# DEBUG: Show column names and first few rows to help identify the correct category column
cat("\nColumn names in products_clean data:\n")
print(names(products_clean))
cat("\nFirst few rows of products_clean data:\n")
print(head(products_clean, 3))

# If you see 'category_clean' is missing, check your data or use the correct column name!
# Uncomment and update the code below after confirming the correct column name exists
# top_categories <- products_clean %>%
#   group_by(category_clean) %>%
#   summarise(product_count = n()) %>%
#   arrange(desc(product_count)) %>%
#   head(5)

# cat("Top Product Categories:\n")
# print(top_categories)


Column names in products_clean data:
[1] "ProductID"           "Product_Description" "Category"           
[4] "Price"               "In_Stock"           

First few rows of products_clean data:
[1] "ProductID"           "Product_Description" "Category"           
[4] "Price"               "In_Stock"           

First few rows of products_clean data:
[90m# A tibble: 3 × 5[39m
  ProductID Product_Description                       Category Price In_Stock
      [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                                     [3m[90m<chr>[39m[23m    [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   
[90m1[39m         1 Apple iPhone 14 Pro - 128GB - Space Black TV        964. Limited 
[90m2[39m         2 samsung galaxy s23 ultra 256gb            TV       [4m1[24m817. Yes     
[90m3[39m         3 Apple iPhone 14 Pro - 128GB - Space Black Audio     853. Yes     
[90m# A tibble: 3 × 5[39m
  ProductID Product_Description                       Category Pri

## Part 8: Reflection Questions

Answer the following questions based on your analysis. Write your answers in the markdown cells below.

### Question 8.1: Data Quality Impact

**How did cleaning the text data (removing spaces, standardizing case) improve your ability to analyze the data? Provide specific examples from your homework.**

Cleaning text data made it easier to group and compare values. For example, standardizing product names and categories removed duplicates caused by extra spaces or different capitalization, so I could accurately count products in each category.

### Question 8.2: Pattern Detection Value

**What business insights did you gain from detecting patterns in product names (wireless, premium, gaming)? How could a business use this information?**

Detecting these patterns showed which features are most popular. A business can use this to focus marketing or stock more products with features like "wireless" or "gaming" if they are in high demand.

### Question 8.3: Date Analysis Importance

**Why is analyzing transaction dates by weekday and month important for business operations? Provide at least three specific business applications.**

1. Schedule promotions for the busiest days.
2. Plan staffing based on peak times.
3. Track seasonal trends to manage inventory.

### Question 8.4: Customer Recency Strategy

**Based on your recency analysis, what specific actions would you recommend for customers in each category (Recent, Moderate, At Risk)? How would you prioritize these actions?**

Send a thank you to Recent customers, a reminder to Moderate, and a special offer to At Risk. Prioritize At Risk customers to re-engage them first.

### Question 8.5: Sentiment Analysis Application

**How could the sentiment analysis you performed be used to improve products or customer service? What are the limitations of this simple sentiment analysis approach?**

It helps spot common issues or praise in feedback, guiding product or service improvements. But it misses context and can misclassify words, so results may not be fully accurate.

### Question 8.6: Real-World Application

**Describe a real business scenario where you would need to combine string manipulation and date analysis (like you did in this homework). What insights would you be trying to discover?**

A business could analyze customer support emails like string cleaning to extract complaint topics and link them to purchase dates. This helps find if certain products have more issues soon after sale, guiding product fixes or better support timing.

## Summary and Submission

### What You've Accomplished

In this homework, you've successfully:
- ✅ Cleaned and standardized messy text data using `stringr` functions
- ✅ Detected patterns and extracted information from text
- ✅ Parsed dates and extracted temporal components using `lubridate`
- ✅ Calculated customer recency for segmentation
- ✅ Analyzed transaction patterns by time periods
- ✅ Combined string and date operations for business insights
- ✅ Created personalized customer communications
- ✅ Generated executive-ready business intelligence summaries

### Key Skills Mastered

**String Manipulation:**
- `str_trim()`, `str_squish()` - Whitespace handling
- `str_to_lower()`, `str_to_upper()`, `str_to_title()` - Case conversion
- `str_detect()` - Pattern detection
- `str_extract()` - Information extraction
- `str_count()` - Pattern counting

**Date/Time Operations:**
- `ymd()`, `mdy()`, `dmy()` - Date parsing
- `year()`, `month()`, `day()`, `wday()` - Component extraction
- `quarter()` - Period extraction
- `today()` - Current date
- Date arithmetic - Calculating differences

**Business Applications:**
- Data cleaning and standardization
- Customer segmentation by recency
- Sentiment analysis
- Pattern identification
- Temporal trend analysis
- Personalized communication

### Submission Checklist

Before submitting, ensure you have:
- [ ] Entered your name, student ID, and date at the top
- [ ] Completed all code tasks (Parts 1-7)
- [ ] Run all cells successfully without errors
- [ ] Answered all reflection questions (Part 8)
- [ ] Used proper commenting in your code
- [ ] Used the pipe operator (`%>%`) where appropriate
- [ ] Verified your results make business sense
- [ ] Checked for any remaining TODO comments

### Grading Criteria

Your homework will be evaluated on:
- **Code Correctness (40%)**: All tasks completed correctly
- **Code Quality (20%)**: Clean, well-commented, efficient code
- **Business Understanding (20%)**: Demonstrates understanding of business applications
- **Reflection Questions (15%)**: Thoughtful, complete answers
- **Presentation (5%)**: Professional formatting and organization

### Next Steps

In Lesson 8, you'll learn:
- Advanced data wrangling with complex pipelines
- Sophisticated conditional logic with `case_when()`
- Data validation and quality checks
- Creating reproducible analysis workflows
- Professional best practices for business analytics

**Great work on completing this assignment! 🎉**