<a href="https://colab.research.google.com/github/ProfessorPatrickSlatraigh/CIS3120-BMWB/blob/main/CIS3120_FOPP_Ch23_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FOPP Chapter 23 — More on Accumulation: Map, Filter, List Comprehension, and Zip

**Course:** Python Programming for Analytics
**Textbook Reference:** Runestone FOPP, Chapter 23 (Sections 23.1–23.7)
**Prerequisite Knowledge:** Basic accumulator patterns, `for` loops, functions, lists

---

## Learning Objectives

By the end of this session, you will be able to:

1. Explain how `map()`, `filter()`, `zip()`, and list comprehensions relate to the accumulator pattern.
2. Apply `map()` to transform every element in a sequence using a function.
3. Apply `filter()` to select elements from a sequence based on a Boolean condition.
4. Write list comprehensions as concise alternatives to `map()` and `filter()`.
5. Use `zip()` to combine two or more sequences element-wise.
6. Select the appropriate technique for a given data transformation task.

## How to Use This Notebook

- **Read** each markdown section before running the code cells that follow it.
- **Run** every code cell to see its output. Each cell is self-contained — you do not need to run earlier cells first unless the instructions say otherwise.
- **Quick Checks** are short exercises embedded after each topic. Try writing your solution before looking at the answer.
- **Exercises** and **Practice Problems** provide scaffolded starter code. Fill in the lines marked with `# TODO` comments.


## Session Overview

| Segment | Topic | Duration |
|---------|-------|----------|
| 1 | Opening and Review of the Accumulator Pattern | 8 min |
| 2 | `map()` | 12 min |
| 3 | `filter()` | 12 min |
| 4 | List Comprehensions | 15 min |
| 5 | `zip()` | 10 min |
| 6 | Integrated Practice and Comparison | 13 min |
| 7 | Wrap-Up and Preview | 5 min |

---
## Segment 1 — Opening and Review of the Accumulator Pattern (8 min)

### Why This Matters

In previous chapters, you learned the **accumulator pattern** — a technique where you:

1. **Initialize** an empty container (often an empty list).
2. **Iterate** through a sequence with a `for` loop.
3. **Update** the accumulator on each pass (often by appending a value).

This pattern is fundamental, and you will continue to use it. However, Python provides built-in tools — `map()`, `filter()`, list comprehensions, and `zip()` — that handle the most common accumulator use cases in a single expression. These tools are not only shorter to write; they also communicate your intent more clearly. When an analyst says "filter the transactions above $50" or "map each temperature to Celsius," these tools let you express that idea directly in code.

Today we will learn four such tools. Each one replaces a specific kind of accumulator loop.

### The Accumulator Baseline

The cell below shows the traditional accumulator approach for squaring every number in a list. Run it and confirm the output. We will revisit this exact task with each new tool to see how the code simplifies.

In [1]:
# Traditional accumulator: square every number in a list
numbers = [3, 7, 2, 10, 5]

squared = []                   # Step 1: Initialize the accumulator
for n in numbers:              # Step 2: Iterate
    squared.append(n ** 2)     # Step 3: Update the accumulator

print(squared)  # [9, 49, 4, 100, 25]

[9, 49, 4, 100, 25]


**Observe** that four lines of code are dedicated to a single idea: "apply the squaring operation to every element." The tools we learn today will express that same idea in one line.

---
## Segment 2 — `map()` (12 min)

### What `map()` Does

`map()` takes a function and an iterable (such as a list) and **applies the function to every element**, producing a new sequence of results.

**Syntax:**
```python
map(function, iterable)
```

There are two important details to remember:

1. The **first argument** must be a function — either a named function (defined with `def`) or an anonymous function (defined with `lambda`).
2. `map()` returns a **lazy iterator** (a `map` object), not a list. To see the results or work with them as a list, wrap the call in `list()`.

**When to use `map()`:** Use it when you need to apply the same transformation to every element in a sequence — unit conversions, formula applications, formatting operations, and similar tasks.

### Example 2a — `map()` with a Named Function

In [2]:
# Define a function, then pass it to map()
def square(x):
    return x ** 2

numbers = [3, 7, 2, 10, 5]
squared = list(map(square, numbers))

print(squared)  # [9, 49, 4, 100, 25]

[9, 49, 4, 100, 25]


Notice that we pass `square` — the function name **without parentheses** — as the first argument. We are handing the function itself to `map()`, not calling it. `map()` will call it once for each element in `numbers`.

### Example 2b — `map()` with a Lambda

When the transformation is simple enough to express in a single line, you can use a **lambda expression** instead of defining a separate function. A lambda is an anonymous, inline function.

**Lambda syntax:** `lambda parameter: expression`

In [3]:
numbers = [3, 7, 2, 10, 5]
squared = list(map(lambda x: x ** 2, numbers))

print(squared)  # [9, 49, 4, 100, 25]

[9, 49, 4, 100, 25]


The lambda `lambda x: x ** 2` does exactly what the `square()` function did — it takes one argument and returns its square. The difference is purely syntactic: lambdas are convenient for short, one-time-use transformations.

### Example 2c — Analytics Scenario: Temperature Conversion

In analytics work, you frequently need to apply a formula to every value in a dataset. Here we convert a list of temperatures from Fahrenheit to Celsius using the formula: C = (F − 32) × 5/9.

In [4]:
# Convert a list of temperatures from Fahrenheit to Celsius
temps_f = [32, 68, 77, 95, 212]
temps_c = list(map(lambda f: round((f - 32) * 5 / 9, 1), temps_f))

print(temps_c)  # [0.0, 20.0, 25.0, 35.0, 100.0]

[0.0, 20.0, 25.0, 35.0, 100.0]


### ✎ Quick Check — `map()` Practice

**Task:** Given a list of product prices, use `map()` to apply a 7% sales tax to each price. Round the result to two decimal places.

**Hint:** Multiplying a price by `1.07` adds 7% tax.

In [None]:
prices = [9.99, 24.50, 3.75]

# TODO: Write a map() call that applies 7% sales tax to each price.


# print(with_tax)  # Expected: [10.69, 26.22, 4.01]

---
## Segment 3 — `filter()` (12 min)

### What `filter()` Does

While `map()` transforms every element, `filter()` **selects** elements. It takes a function and an iterable, and keeps only those elements for which the function returns `True`.

**Syntax:**
```python
filter(function, iterable)
```

Key details:

1. The function must return a **Boolean value** — `True` to keep the element, `False` to discard it.
2. Like `map()`, `filter()` returns a lazy iterator. Wrap it in `list()` to get a list.

**When to use `filter()`:** Use it when you need to select a subset of elements based on a condition — records above a threshold, items matching a category, values within a valid range.

### Example 3a — `filter()` with a Named Function

In [5]:
def is_even(n):
    return n % 2 == 0

numbers = [3, 7, 2, 10, 5, 8]
evens = list(filter(is_even, numbers))

print(evens)  # [2, 10, 8]

[2, 10, 8]


The function `is_even()` returns `True` for even numbers and `False` for odd numbers. `filter()` calls it on each element and keeps only those that produce `True`.

### Example 3b — `filter()` with a Lambda

In [6]:
numbers = [3, 7, 2, 10, 5, 8]
evens = list(filter(lambda n: n % 2 == 0, numbers))

print(evens)  # [2, 10, 8]

[2, 10, 8]


### Example 3c — Analytics Scenario: Transaction Threshold

A common analytics task is to identify records that exceed a threshold. Here we flag transactions of \$50 or more for a fraud review.

In [7]:
# Filter out transactions below a $50 threshold for a fraud review
transactions = [12.50, 340.00, 49.99, 78.25, 5.00, 150.75]
flagged = list(filter(lambda t: t >= 50, transactions))

print(flagged)  # [340.0, 78.25, 150.75]

[340.0, 78.25, 150.75]


### ✎ Quick Check — `filter()` Practice

**Task:** Given a list of exam scores, use `filter()` to keep only passing scores (70 or above).

**Hint:** Your lambda should return `True` when a score is ≥ 70.

In [None]:
scores = [88, 42, 73, 65, 91, 55, 70]

# TODO: Write a filter() call that keeps only scores of 70 or above.


# print(passing)  # Expected: [88, 73, 91, 70]

---
## Segment 4 — List Comprehensions (15 min)

### What List Comprehensions Do

A **list comprehension** is a compact syntax for creating a new list by transforming and/or filtering elements from an existing sequence. It combines the functionality of `map()` and `filter()` into a single, readable expression.

**Transformation syntax** (replaces `map()`):
```python
[expression for item in iterable]
```

**Filtering syntax** (replaces `filter()`):
```python
[expression for item in iterable if condition]
```

Key advantages over `map()` and `filter()`:

- The result is a list directly — no need to wrap in `list()`.
- Transformation and filtering can be combined in one expression.
- Most Python developers consider comprehensions more readable than nested `map()`/`filter()` calls.

**Readability guideline:** If a comprehension becomes long or complex enough that it is hard to understand at a glance, consider using a traditional `for` loop instead. Clarity always takes priority over brevity.

### Example 4a — Transformation (Equivalent to `map()`)

Compare the `map()` version and the list comprehension version side by side. Both produce the same result.

In [8]:
numbers = [3, 7, 2, 10, 5]

# Using map
squared_map = list(map(lambda x: x ** 2, numbers))

# Using list comprehension
squared_lc = [x ** 2 for x in numbers]

print("map() result:   ", squared_map)
print("Comprehension:  ", squared_lc)

map() result:    [9, 49, 4, 100, 25]
Comprehension:   [9, 49, 4, 100, 25]


### Example 4b — Filtering (Equivalent to `filter()`)

Adding an `if` clause to a list comprehension achieves the same effect as `filter()`. Only elements where the condition evaluates to `True` are included in the result.

In [9]:
numbers = [3, 7, 2, 10, 5, 8]

# Using filter
evens_filter = list(filter(lambda n: n % 2 == 0, numbers))

# Using list comprehension
evens_lc = [n for n in numbers if n % 2 == 0]

print("filter() result: ", evens_filter)
print("Comprehension:   ", evens_lc)

filter() result:  [2, 10, 8]
Comprehension:    [2, 10, 8]


### Example 4c — Combined Transformation and Filtering

One of the strongest advantages of list comprehensions is the ability to **transform and filter in the same expression**. With `map()` and `filter()` alone, achieving this requires nesting calls, which quickly becomes awkward. A list comprehension handles it naturally.

In [10]:
# From a list of revenues, extract only positive revenues and convert to thousands
revenues = [45200, -1300, 78000, 0, 12400, -500]

positive_in_k = [r / 1000 for r in revenues if r > 0]

print(positive_in_k)  # [45.2, 78.0, 12.4]

[45.2, 78.0, 12.4]


### Example 4d — Working with Strings (Analytics Context)

Data cleaning is one of the most common tasks in analytics. List comprehensions are well suited for applying string operations — such as stripping whitespace, changing case, or standardizing formats — across an entire list of values.

In [11]:
# Clean and standardize a list of product category labels
categories = ["  Electronics ", "CLOTHING", "home & Garden", " toys"]

cleaned = [c.strip().title() for c in categories]

print(cleaned)  # ['Electronics', 'Clothing', 'Home & Garden', 'Toys']

['Electronics', 'Clothing', 'Home & Garden', 'Toys']


### Side-by-Side Comparison

The table below summarizes when each approach applies. Notice that list comprehensions handle all three cases, while `map()` and `filter()` each handle only one.

| Task | `map()`/`filter()` | List Comprehension |
|------|--------------------|--------------------|
| Transform every element | `list(map(lambda x: x**2, nums))` | `[x**2 for x in nums]` |
| Select elements by condition | `list(filter(lambda x: x%2==0, nums))` | `[x for x in nums if x%2==0]` |
| Transform and filter together | Nested calls (awkward) | `[x**2 for x in nums if x%2==0]` |

### ✎ Quick Check — List Comprehension Practice

**Task:** Given a list of city names, produce a new list containing only names **longer than 5 characters**, converted to **uppercase**.

**Hint:** Use `len(c)` to check the length and `.upper()` to convert to uppercase. The `if` clause goes after the `for` clause.

In [None]:
cities = ["Rome", "Tokyo", "Chicago", "Lima", "Berlin", "São Paulo"]

# TODO: Write a list comprehension that filters by length > 5 and converts to uppercase.


# print(result)  # Expected: ['CHICAGO', 'BERLIN', 'SÃO PAULO']

---
## Segment 5 — `zip()` (10 min)

### What `zip()` Does

All the tools we have seen so far — `map()`, `filter()`, and list comprehensions — operate on a **single** sequence. But in analytics, you frequently have **parallel** sequences: a list of product names alongside a list of prices, a list of months alongside a list of revenue figures. You need to combine these sequences element by element.

`zip()` does exactly that. It takes two or more iterables and produces an iterator of **tuples**, where the *i*-th tuple contains the *i*-th element from each input sequence.

**Syntax:**
```python
zip(iterable1, iterable2, ...)
```

Key details:

1. `zip()` returns a lazy iterator; wrap it in `list()` to inspect the result.
2. If the input sequences have **different lengths**, `zip()` stops at the **shortest** one. No error is raised; extra elements from longer sequences are silently ignored.
3. Each element in the result is a **tuple** — an immutable, ordered collection.

**When to use `zip()`:** Use it when you need to pair or align elements from two or more parallel lists — combining labels with values, iterating over matched data, or constructing dictionaries.

### Example 5a — Basic `zip()`

In [12]:
names = ["Alice", "Bob", "Carol"]
scores = [88, 92, 75]

paired = list(zip(names, scores))

print(paired)  # [('Alice', 88), ('Bob', 92), ('Carol', 75)]

[('Alice', 88), ('Bob', 92), ('Carol', 75)]


Each tuple in the result pairs one name with one score, preserving their positions. `'Alice'` was at index 0 in `names` and `88` was at index 0 in `scores`, so they are paired together.

### Example 5b — Iterating Over Zipped Sequences

A common pattern is to use `zip()` directly in a `for` loop, **unpacking** each tuple into separate variables. This makes the loop body clean and readable.

In [13]:
names = ["Alice", "Bob", "Carol"]
scores = [88, 92, 75]

for name, score in zip(names, scores):
    print(f"{name}: {score}")

Alice: 88
Bob: 92
Carol: 75


### Example 5c — Building a Dictionary from Two Lists

If you have a list of keys and a list of values, you can pass `zip()` directly to `dict()` to create a dictionary in one step. This is a widely used Python idiom.

In [14]:
months = ["Jan", "Feb", "Mar"]
revenue = [12000, 15500, 13200]

monthly_revenue = dict(zip(months, revenue))

print(monthly_revenue)  # {'Jan': 12000, 'Feb': 15500, 'Mar': 13200}

{'Jan': 12000, 'Feb': 15500, 'Mar': 13200}


### Example 5d — Combining `zip()` with a List Comprehension

You can combine `zip()` with a list comprehension to perform element-wise calculations across two parallel lists. This is especially useful for computing derived metrics like profit, difference, or percentage change.

In [15]:
# Compute profit for each month given revenue and cost lists
revenue = [12000, 15500, 13200]
costs   = [8000,  9200,  7800]

profits = [r - c for r, c in zip(revenue, costs)]

print(profits)  # [4000, 6300, 5400]

[4000, 6300, 5400]


### ✎ Quick Check — `zip()` Practice

**Task:** Given two lists — `products` and `prices` — create a list of formatted strings like `'Widget: $9.99'`.

**Hint:** Use an f-string inside a list comprehension with `zip()`. The format specifier `:.2f` displays a float with two decimal places.

In [None]:
products = ["Widget", "Gadget", "Doohickey"]
prices = [9.99, 24.50, 3.75]

# TODO: Write a list comprehension with zip() to create formatted strings.

# print(labels)  # Expected: ['Widget: $9.99', 'Gadget: $24.50', 'Doohickey: $3.75']

---
## Segment 6 — Integrated Practice and Comparison (13 min)

### Instructor Notes

This segment consolidates learning by presenting data scenarios that require students to apply multiple techniques. Walk through Exercise 1 together, then give students time to attempt Exercise 2 independently or in pairs.

Now that you have learned all four tools, it is time to use them together. The exercises below present realistic data scenarios that require you to choose and combine the appropriate techniques.

### Exercise 1 — Guided Walkthrough: Product Sales Analysis

**Scenario:** You have sales data for five products stored in three parallel lists.

**Tasks:**
1. Compute the total revenue for each product (units sold × unit price).
2. Produce a list of product names in uppercase.
3. Identify products with total revenue exceeding \$20,000.

**Think about which tool fits each task before looking at the solution:**
- Task 1 requires combining two parallel lists and computing a value → `zip()` + list comprehension
- Task 2 requires transforming every element → `map()` or list comprehension
- Task 3 requires filtering based on a computed value → list comprehension with `if`

In [None]:
# Data
products = ["Laptop", "Mouse", "Keyboard", "Monitor", "Webcam"]
units_sold = [120, 450, 300, 85, 210]
unit_prices = [999.99, 24.99, 74.99, 349.99, 59.99]


### Exercise 2 — Independent Work: Customer Satisfaction Survey

**Scenario:** You have survey data from a customer satisfaction study. Three parallel lists contain respondent IDs, numerical ratings (1–5), and free-text comments.

**Tasks:**
1. Create a list of tuples pairing each respondent ID with their rating.
2. Filter to only the respondents who gave a rating of 4 or 5.
3. Extract the comments for those high-rating respondents. Strip whitespace and convert each comment to lowercase.
4. Compute the average rating across all respondents.

**Guidance:**
- Task 1 is a direct application of `zip()`.
- Task 2 requires `zip()` combined with a filtering comprehension.
- Task 3 is similar to Task 2 but operates on different parallel lists.
- Task 4 does not require any of today's tools — use `sum()` and `len()`.

In [None]:
# Data — do not modify this cell
respondent_ids = [101, 102, 103, 104, 105, 106, 107]
ratings = [4, 2, 5, 3, 1, 4, 5]
comments = [
    "Great service",
    "Too slow",
    "Excellent!",
    "Average experience",
    "Very disappointed",
    "Good overall",
    "Outstanding support"
]

**Task 1:** Create a list of tuples pairing each respondent ID with their rating.

In [None]:
# Task 1: Pair respondent IDs with ratings
# Hint: Use list(zip(...))

# TODO: Write your code below


**Task 2:** Filter to only the respondents who gave a rating of 4 or 5.

In [None]:
# Task 2: Keep only (id, rating) pairs where rating >= 4
# Hint: Use a list comprehension with zip() and an if clause.
#       Unpack each tuple as (rid, r) in your for clause.

# TODO: Write your code below


**Task 3:** Extract and clean the comments for those high-rating respondents.

In [None]:
# Task 3: Get comments where the corresponding rating is >= 4,
#          then strip whitespace and convert to lowercase.
# Hint: zip(ratings, comments) pairs each rating with its comment.
#       Use .strip().lower() on each comment.

# TODO: Write your code below


**Task 4:** Compute the average rating across all respondents.

In [None]:
# Task 4: Compute the average rating.
# Hint: sum(ratings) gives the total; len(ratings) gives the count.

# TODO: Write your code below


### When to Use What — Decision Guide

Use this table as a reference when deciding which tool to apply.

| If you need to… | Recommended Tool |
|-----------------|------------------|
| Transform every element | `map()` or `[expr for x in seq]` |
| Select elements by condition | `filter()` or `[x for x in seq if cond]` |
| Transform and filter together | `[expr for x in seq if cond]` |
| Combine parallel sequences | `zip()` |
| Build a dictionary from two lists | `dict(zip(keys, values))` |

**General preference:** List comprehensions are favored in most Python codebases for their readability and versatility. Use `map()` and `filter()` when a named function already exists and the intent is clear without a lambda.

---
## Segment 7 — Wrap-Up and Preview (5 min)

### Instructor Notes

Summarize the four tools and their relationship to the accumulator pattern. Emphasize that students now have a toolkit of concise alternatives for common data operations, and that list comprehensions will appear frequently in the remainder of the course.

### Summary

Today we learned four tools that replace common accumulator patterns with concise, expressive alternatives:

- **`map(function, iterable)`** applies a function to every element. Useful when a named function already exists or when working in a functional programming style.
- **`filter(function, iterable)`** selects elements for which a function returns `True`. Useful for subsetting data by a condition.
- **List comprehensions** `[expr for x in seq if cond]` accomplish both transformation and filtering in a single expression. They are the most widely used and most Pythonic of the four tools.
- **`zip(iter1, iter2, ...)`** aligns elements from two or more parallel sequences into tuples. Useful for pairing related data and for element-wise calculations.

All four tools produce the same results as their accumulator-pattern equivalents, but with less code and clearer intent.

### Reading Assignment

- Runestone FOPP Chapter 23, Sections 23.1 through 23.5 — review the text and complete the embedded exercises.
- Complete the Chapter 23 Assessment (Section 23.7).

### Preview

In upcoming sessions, the techniques you learned today will recur frequently — especially list comprehensions and `zip()` — when you begin working with tabular data structures and dataframes.

---
## Appendix — Additional Practice Problems

The problems below provide additional practice with the four tools covered in this session. Each problem presents a scenario, provides starter data, and includes scaffolded instructions. Try writing your solution before consulting the answer.

### Problem A — Domain Extraction with Filtering

**Scenario:** You have a list of email addresses. Produce a new list containing only the **domain names** (the part after `@`), but **exclude** any addresses from `gmail.com`.

**Approach:**
- Use a list comprehension.
- The `if` clause should check whether `"gmail.com"` is **not** in the email string.
- Use `.split("@")` to separate the local part from the domain. The domain is at index `[1]`.

In [None]:
emails = ["alice@company.com", "bob@gmail.com", "carol@university.edu", "dan@gmail.com"]

# TODO: Write a list comprehension that extracts domains, excluding gmail.com addresses.

# print(domains)  # Expected: ['company.com', 'university.edu']

### Problem B — Dictionary from Filtered Pairs

**Scenario:** Two parallel lists represent students and their GPAs. Create a **dictionary** containing only those students whose GPA is 3.5 or higher.

**Approach:**
- Use `zip()` to pair each student with their GPA.
- Use a comprehension with an `if` clause to filter by GPA ≥ 3.5.
- Wrap the result in `dict()` to produce a dictionary (or use a dictionary comprehension `{k: v for ...}`).

In [None]:
students = ["Ava", "Ben", "Cara", "Dev", "Ella"]
gpas = [3.8, 3.2, 3.5, 2.9, 3.9]

# TODO: Create a dictionary of students with GPA >= 3.5.

# print(honors)  # Expected: {'Ava': 3.8, 'Cara': 3.5, 'Ella': 3.9}

### Problem C — Date Format Conversion with `map()`

**Scenario:** You have a list of date strings in `"MM/DD/YYYY"` format. Convert each to ISO 8601 format: `"YYYY-MM-DD"`.

**Approach:**
- Define a named function `to_iso()` that takes a date string, splits it on `"/"`, and rearranges the parts.
- Use `map()` to apply the function to every date in the list.
- Remember to wrap the `map()` call in `list()`.

In [None]:
dates = ["01/15/2025", "12/03/2024", "07/22/2025"]

# TODO: Define a function to_iso() that converts "MM/DD/YYYY" to "YYYY-MM-DD".
# TODO: Use map() to apply it to every date in the list.


# print(iso_dates)  # Expected: ['2025-01-15', '2024-12-03', '2025-07-22']

### Problem D — Multi-Step Sales Report

**Scenario:** You are given three parallel lists containing quarterly sales data for a team of sales representatives. Produce a summary that identifies top performers.

**Tasks:**
1. Use `zip()` and a list comprehension to compute total sales for each representative (Q1 + Q2 + Q3).
2. Use a list comprehension with an `if` clause to find the names of representatives whose total sales exceed \$150,000.
3. Use `map()` to format each total as a currency string like `"$125,000"`.

In [None]:
names = ["Rivera", "Chen", "Okafor", "Singh", "Mueller"]
q1_sales = [45000, 62000, 38000, 71000, 55000]
q2_sales = [52000, 48000, 41000, 68000, 59000]
q3_sales = [49000, 55000, 45000, 73000, 61000]

# Task 1: Compute total sales for each representative.
# Hint: zip all three quarterly lists together with names.

# TODO: Write your code for Task 1



# Task 2: Find names of reps with total sales > $150,000.
# Hint: zip(names, totals) and filter with an if clause.

# TODO: Write your code for Task 2


# Task 3: Format each total as a currency string.
# Hint: Use map() with a lambda and f-string formatting.
#       The format specifier :, adds comma separators.

# TODO: Write your code for Task 3

