---

<div style="text-align: center; margin: 2em 0; padding: 1.5em; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 12px;">
  <h2 style="color: white; font-weight: 300; letter-spacing: 2px; margin: 0; font-size: 1.4em;">
    By Christopher L. Gaughan, Ph.D.
  </h2>
</div>

---


## What you'll learn
- Fast membership tests with sets (O(1))
- Tuple unpacking in loops
- Using enumerate() safely
- Using dict.get() for safe lookups and counting
- max(..., key=...) to pick by a criterion

### How to use this notebook
- Read the brief explanation, then run the code cells.
- Look for callouts: 💡 Tips, 🛑 Common mistakes, ✅ Right patterns.


### Quick demo: find duplicates using a set
This compact example shows how a set can help detect duplicates quickly. If it feels dense, skip ahead to the clearer step-by-step version right below.


In [2]:
# Build duplicates in one pass using a set for O(1) membership checks.
# Trick: set.add(x) returns None. We use `not seen.add(file)` as a side effect
# to add unseen items while the expression stays False; only already-seen files
# satisfy the condition and get collected into `dupes`.
files_uploaded = ["data1.csv", "report.xlsx", "summary.docx",
                  "data1.csv", "report.xlsx", "new_file.txt"]
seen = set()
# Passes only duplicates; adds unseen files to `seen` on the fly
dupes = set(file for file in files_uploaded if file in seen or not seen.add(file))

print(dupes)


{'report.xlsx', 'new_file.txt', 'summary.docx', 'data1.csv'}


In [3]:
files_uploaded = ["data1.csv", "report.xlsx", "summary.docx",
                  "data1.csv", "report.xlsx", "new_file.txt"]

seen = set()
dupes = set()

for file in files_uploaded:
    if file in seen:
        dupes.add(file)  # If it's already in seen, it's a duplicate
    else:
        seen.add(file)  # Otherwise, add it to seen

print(dupes)


{'report.xlsx', 'data1.csv'}


## 📝 Example 1: Checking if a User Exists in a System
Imagine you have a huge list of registered users, and you need to check if a new user ID is already in the system.

✅ Without Sets (Slow)

In [4]:
users_list = [101, 202, 303, 404, 505, 606]  # Imagine this list is huge
user_id = 303

if user_id in users_list:  # O(n) lookup time
    print("User exists")
else:
    print("User not found")


User exists


**🔹 Problem: If users_list is big, searching will take time.**

## Time Complexity of Set Lookup: O(1)

When using a set in Python, checking if an element exists (using the `in` operator) has an **average time complexity of O(1)** - constant time.

This is because sets use **hash tables** internally, allowing for extremely fast lookups regardless of the set size.

**Comparison:**
- **List lookup**: `O(n)` - must check each element sequentially
- **Set lookup**: `O(1)` - direct access via hash value

This makes sets ideal for membership testing in large collections!

In [5]:
user_set = set(users_list)
# ✅ With a set: O(1) average lookup time
if user_id in user_set:
    print("User exists")
else:
    print("User not found")


User exists


### 1️⃣ Detecting Plagiarism (Finding Common Words in Documents)
You have two text documents, and you want to check how many common words appear in both.

In [6]:
doc1 = """Python is great for data science and machine learning"""
doc2 = """Machine learning and AI are transforming data science"""

words1 = set(doc1.lower().split())  
words2 = set(doc2.lower().split())

# Find words that appear in both documents
print(words1 & words2)  # Expected: {'and', 'data', 'science', 'machine', 'learning'}

{'science', 'machine', 'data', 'learning', 'and'}


## 2️⃣ Finding Employees Who Haven't Submitted Reports
You have two sets:
- All employees
- Employees who submitted reports

Goal: find who hasn’t submitted their report.


In [7]:
all_employees = {"Alice", "Bob", "Charlie", "David", "Eve"}
submitted_reports = {"Alice", "David", "Eve"}

# Find employees who DID NOT submit reports
print(all_employees - submitted_reports)

{'Bob', 'Charlie'}


## 3️⃣ Identifying Fraudulent Transactions
You have a list of transactions, but some accounts are blacklisted for fraud.
- Filter out any transactions from blacklisted accounts.

In [8]:
transactions = [
    ("user1", 250), ("user2", 400), ("user3", 500),
    ("user4", 100), ("user5", 750), ("user3", 600)
]
blacklisted_accounts = {"user3", "user5"}

# Create a new list without blacklisted transactions
filtered_transactions = []

for user, amount in transactions:
    if user not in blacklisted_accounts:
        filtered_transactions.append((user, amount))

print(filtered_transactions)  # Expected: [('user1', 250), ('user2', 400), ('user4', 100)]


[('user1', 250), ('user2', 400), ('user4', 100)]


## Tuple Unpacking in a List
* What is Tuple Unpacking?
* When you have a list of tuples, each tuple contains multiple values.
* You can unpack those values directly in a for loop instead of accessing them by index.

✅ Basic Example: Unpacking Tuples in a List

In [9]:
pairs = [(1, 'a'), (2, 'b'), (3, 'c')]

# Instead of doing this:
for pair in pairs:
    print(pair[0], pair[1])  # Accessing elements by index

# Do this:
for num, letter in pairs:
    print(num, letter, end=" , ")  # Unpacking the tuple directly


1 a
2 b
3 c
1 a , 2 b , 3 c , 

## 🛑 Common Mistake: Forgetting to Unpack the Tuple
### ❌ Wrong:
 

In [10]:
transactions = [("user1", 250), ("user2", 400)]
for transaction in transactions:
    if transaction[0] == "user1":  # Accessing manually
        print("Found transaction!")


Found transaction!


### ✅ Right (Unpack It!)

In [11]:
transactions = [("user1", 250), ("user2", 400)]
for user, amount in transactions:  # UNPACK THE TUPLE
    if user == "user1":
        print("Found transaction!")


Found transaction!


**💡 Why? Python lets you unpack the tuple automatically, so you can use user and amount directly.**

## Using `enumerate()` Correctly
- What does `enumerate()` do?
  - It returns pairs `(index, value)` as you iterate over a sequence.
- When to use it:
  - Whenever you need both the position (index) and the item during iteration.
- Syntax:
  - `for index, value in enumerate(sequence): ...`


✅ Basic Example:

In [12]:
names = ["Alice", "Bob", "Charlie"]

for index, name in enumerate(names):
    print(f"Index {index} -> {name}")


Index 0 -> Alice
Index 1 -> Bob
Index 2 -> Charlie


**🔹 `enumerate()` gives you a *tuple* `(index, value)`, which you unpack in the loop.**

## 🛑 Common Mistake: Misusing enumerate() with Tuples
If you already have a list of tuples, enumerate() adds an extra index before the tuple, which can confuse you.

### ❌ Wrong: Using enumerate() on a List of Tuples Incorrectly

In [13]:
transactions = [("user1", 250), ("user2", 400)]

for index, user, amount in enumerate(transactions):  # ❌ This won't work!
    print(index, user, amount)


ValueError: not enough values to unpack (expected 3, got 2)

## ✅ Right Way: Unpacking enumerate() Correctly

In [14]:
transactions = [("user1", 250), ("user2", 400)]

for index, (user, amount) in enumerate(transactions):  # ✅ Unpack the tuple properly
    print(f"{index}: {user} -> ${amount}")


0: user1 -> $250
1: user2 -> $400


### 💡 Rule of Thumb:

* If working with a regular list, use enumerate() normally.
* If working with a list of tuples, unpack the tuple inside the enumerate() loop.

## Practice Problems to Drill This Concept
Try these to lock it in:

### 🧩 Problem 1: Unpack Tuples from a List


In [15]:
products = [("Laptop", 1000), ("Phone", 500), ("Tablet", 750)]
# Print each product name and its price
for item, price in products:
    print(f'{item} costs {price}')

Laptop costs 1000
Phone costs 500
Tablet costs 750


### 🧩 Problem 2: Use `enumerate()` Correctly

In [16]:
students = [("Alice", 90), ("Bob", 85), ("Charlie", 88)]
# Print each student's position (index) in the list and their grade
for index, (student, grade) in enumerate(students):
    print(f'Student # {index}: {student} scored {grade}')

Student # 0: Alice scored 90
Student # 1: Bob scored 85
Student # 2: Charlie scored 88


### 🧩 Problem 3: Find the First Duplicate in a List

In [17]:
numbers = [4, 5, 6, 7, 5, 8, 9, 6]
# Use a set and unpacking to find the first duplicate
seen = set()

for number in numbers:
    if number in seen:
        print(number)
        break
    else:
        seen.add(number)
        
        

5


### Finding Unused API Keys
You have a set of all API keys and another set of used API keys.

Find which API keys have **never been used**.

In [18]:
all_keys = {"key1", "key2", "key3", "key4", "key5"}
used_keys = {"key1", "key3", "key5"}

# Find API keys that were never used
print(all_keys - used_keys)

{'key2', 'key4'}


### 🔧 Problem 1: Detecting Missing Records in a Data Pipeline
Your system processes data every hour, but some expected records are missing from the output.

Goal: find timestamps that are expected but did not appear (set difference).

In [19]:
expected_records = {
    "2024-03-01T01:00:00", "2024-03-01T02:00:00", "2024-03-01T03:00:00",
    "2024-03-01T04:00:00", "2024-03-01T05:00:00"
}
actual_records = {
    "2024-03-01T01:00:00", "2024-03-01T03:00:00", "2024-03-01T05:00:00"
}

missing = expected_records - actual_records  # ✅ difference: in expected, not in actual
print(missing)  # Expected: {"2024-03-01T02:00:00", "2024-03-01T04:00:00"}

{'2024-03-01T04:00:00', '2024-03-01T02:00:00'}


### 🔧 Problem 2: Identifying Frequent API Callers
Your logs track API calls made by users. You want to find users who made more than 2 API calls.

✅ Task: Use a dictionary to count the number of API calls per user and find users with more than 2 calls.

In [20]:
api_logs = [
    ("user1", "/login"), ("user2", "/data"), ("user2", "/data"),
    ("user3", "/data"), ("user2", "/login"), ("user1", "/logout"),
    ("user3", "/logout"), ("user2", "/data"), ("user3", "/data")
]

# Step 1: Count API calls per user
api_counts = {}

# Make the frequency dictionary using `get()`
for user, _ in api_logs:
    api_counts[user] = api_counts.get(user, 0) + 1  # O(1) update
print(api_counts)
# Step 2: Find users with more than 2 API calls
frequent_users = {user for user, count in api_counts.items() if count > 2}

print(frequent_users)  # Expected Output: {'user2', 'user3'}

    

{'user1': 2, 'user2': 4, 'user3': 3}
{'user2', 'user3'}


## What does dict.get() do, exactly?

`dict.get(key, default=None)`

- Looks up `key` in the dictionary.
- If the key exists → returns its value.
- If the key is missing → returns `default` instead of raising `KeyError` (defaults to `None` if you don’t provide one).

Why use `get()`?
- Safer than `d[key]` when a key might be absent.
- Lets you supply a sensible fallback without `try/except`.
- Great for counting/accumulating and reading optional settings.

Common pitfalls and tips:
- If you rely on `None` to mean “missing”, remember `get()` also returns `None` by default. Provide an explicit default if you need to distinguish: `d.get("threshold", 0)`.
- `get()` doesn’t insert the key. If you want to insert when missing, use `setdefault` or assign explicitly: `d[key] = d.get(key, 0) + 1`.
- For many missing-key initializations, `collections.defaultdict` can be even cleaner.


### Basic example: using `get()` to fetch values

In [21]:
user_ages = {"Alice": 25, "Bob": 30, "Charlie": 28}

print(user_ages.get("Alice"))
print(user_ages.get("Eve"))    # ✅ None (Key doesn’t exist, default is None)
print(user_ages.get("Eve", "Unknown"))  # ✅ "Unknown" (Custom default value)


25
None
Unknown


#### Why is this Useful?
* Using user_ages["Eve"] would throw a KeyError if "Eve" isn’t in the dictionary.
* get() avoids this error and lets you handle missing keys gracefully.

## Lets look at the `get()` function
You're given a list of API calls made by users. 
* Use get() to count how many times each user made an API call.

### Using `.get()` for Counting (Why It’s Useful in above problem)
When counting things (like API calls), .get() makes it easy to initialize missing keys.

In [22]:
api_counts = {}

# Without .get() - More complicated
for user in ["Alice", "Bob", "Alice", "Charlie", "Alice"]:
    if user in api_counts:
        api_counts[user] += 1
    else:
        api_counts[user] = 1

print(api_counts)  # {'Alice': 3, 'Bob': 1, 'Charlie': 1}


{'Alice': 3, 'Bob': 1, 'Charlie': 1}


**✅ With .get() - Cleaner & Faster:**

In [23]:
api_counts = {}

for user in ["Alice", "Bob", "Alice", "Charlie", "Alice"]:
    api_counts[user] = api_counts.get(user, 0) + 1  # 🔥 No if-else needed!

print(api_counts) 


{'Alice': 3, 'Bob': 1, 'Charlie': 1}


**Why?**

* If user already exists, .get(user, 0) returns its count and increments it.
* If user doesn’t exist, .get(user, 0) returns 0, so we start counting from 1.

### Avoiding KeyErrors with .get()
Imagine a dictionary storing configuration settings.

❌ Without .get(), you risk errors:


In [24]:
config = {"theme": "dark", "language": "English"}

print(config["theme"])  # ✅ "dark"
print(config["font_size"])  # ❌ KeyError because "font_size" isn’t in the dictionary


dark


KeyError: 'font_size'

### ✅ Using .get() Prevents Crashes:

In [25]:
print(config.get("font_size", 12))  # ✅ 12 (default value)


12


### Using .get() for Nested Dictionaries
Example: User Data in a Dictionary

In [26]:
user_profiles = {
    "Alice": {"age": 25, "location": "NY"},
    "Bob": {"age": 30}
}

# Without .get() (Risky)
print(user_profiles["Alice"]["location"])  # ✅ "NY"
# print(user_profiles["Bob"]["location"])  # ❌ KeyError

# With .get() (Safe)
print(user_profiles.get("Bob", {}).get("location", "Unknown"))  # ✅ "Unknown"


NY
Unknown


**Why?**

* `.get("Bob", {})` → Returns `{}` if "Bob" doesn’t exist, preventing an error.
* `.get("location", "Unknown")` → Returns "Unknown" if "location" is missing.


### `.get()` vs. DefaultDict (Alternative Approach)
* **If you're always using .get() to initialize missing keys, consider collections.defaultdict instead.**

In [27]:
from collections import defaultdict

api_counts = defaultdict(int)  # Default value for missing keys is 0

for user in ["Alice", "Bob", "Alice", "Charlie", "Alice"]:
    api_counts[user] += 1  # No need for .get()

print(api_counts)  # {'Alice': 3, 'Bob': 1, 'Charlie': 1}


defaultdict(<class 'int'>, {'Alice': 3, 'Bob': 1, 'Charlie': 1})


#### Difference?

* `dict.get()` → Manual way to handle missing keys.
* defaultdict → Auto-creates missing keys with a default value.


In [28]:
api_calls = ["Alice", "Bob", "Alice", "Charlie", "Alice", "Bob", "Charlie", "Charlie"]

# Count occurrences using .get()
api_counts = {}

for user in api_calls:
    api_counts[user] = api_counts.get(user, 0) + 1

print(api_counts)


{'Alice': 3, 'Bob': 2, 'Charlie': 3}


## Find Missing Configurations
You have a dictionary of **app settings**, but some keys might be missing. Use `get()` to ensure missing keys return a **default value**.

In [29]:
config = {"theme": "dark", "language": "English"}

# Get settings with defaults
theme = config.get("theme", "light")
language = config.get("language", "English")
font_size = config.get("font_size", 12)  # Default to 12
notifications = config.get("notifications", True)  # Default to True

print(theme, language, font_size, notifications)


dark English 12 True


## Find Most Common Word in a Log File
A log file contains multiple words, and you need to find the most frequent word using `get()`.

In [30]:
log_words = ["error", "warning", "info", "error", "error", "warning", "info", "info", "error"]

word_counts = {}

# Count occurrences using .get()
for word in log_words:
    word_counts[word] = word_counts.get(word, 0) + 1
print(word_counts)
# Find the most common word
most_common_word = max(word_counts, key=word_counts.get)

print(most_common_word)


error


## Count Unique Categories of Purchases
A dataset tracks customer purchases in different **product categories**. Use `get()` to count how many purchases were made in each category.

In [31]:
purchases = [
    ("Alice", "Electronics"), ("Bob", "Clothing"), ("Alice", "Home"),
    ("Charlie", "Clothing"), ("Bob", "Electronics"), ("Charlie", "Home"),
    ("Alice", "Electronics"), ("Charlie", "Electronics"), ("Bob", "Clothing")
]

category_counts = {}

for _, category in purchases:
    category_counts[category] = category_counts.get(category, 0) + 1

print(category_counts)


{'Electronics': 4, 'Clothing': 3, 'Home': 2}


## Detect Unauthorized Login Attempts
A system logs **login attempts**, but only authorized users should log in. Use `get()` to count **attempts per user**, then find users who **aren’t authorized**.

In [32]:
authorized_users = {"Alice", "Bob", "Charlie"}
login_attempts = ["Alice", "Bob", "Eve", "Frank", "Charlie", "Mallory", "David"]

# Dictionary to store counts
login_counts = {}

for user in login_attempts:
    login_counts[user] = login_counts.get(user, 0) + 1
print(login_counts)

# Find unauthorized users
unauthorized_attempts = {user: login_counts[user] for user in login_counts if user not in authorized_users}

print(unauthorized_attempts)


{'Alice': 1, 'Bob': 1, 'Eve': 1, 'Frank': 1, 'Charlie': 1, 'Mallory': 1, 'David': 1}
{'Eve': 1, 'Frank': 1, 'Mallory': 1, 'David': 1}


## Challenge Problem: Find First Duplicate Using get()
You have a list of numbers, and you want to find the *first number that appears twice* using `get()`.

In [33]:
numbers = [4, 5, 6, 7, 5, 8, 9, 6]

seen = {}

for num in numbers:
    seen[num] = seen.get(num, 0) + 1
print(seen) # just to see the dict

# Should print the first duplicate number
for key, value in seen.items():
    if value == 2:
        print(key)
        break


{4: 1, 5: 2, 6: 2, 7: 1, 8: 1, 9: 1}
5


### While I did get a solution, there is a cleaner one:

In [34]:
numbers = [4, 5, 6, 7, 5, 8, 9, 6]

seen = {}

for num in numbers:
    if seen.get(num, 0) == 1:  # If it's already in seen, it's a duplicate!
        print(num)
        break  # Stop as soon as we find the first duplicate
    seen[num] = seen.get(num, 0) + 1  # Otherwise, count it


5


## What is max(iterable, key=...)?
The `max()` function finds the largest value in an iterable. But when we use key=..., we're telling Python **how to determine what "largest" means.**

✅ Basic Example: Find the Longest Word

In [35]:
words = ["apple", "banana", "cherry", "blueberry"]

longest_word = max(words, key=len)  # Finds the word with the longest length

print(longest_word)  # ✅ "blueberry"


blueberry


### What’s Happening?

* `max()` **compares elements** using `key=len` (word length).
* Instead of comparing **alphabetically**, it compares **by length**.


## Using `max()` with Dictionaries
You can't run `max()` directly on a dictionary because it doesn't know whether to compare `keys` or `values`.
The `key=` argument **solves this problem**.

### ✅ Find the Most Common Word in a Log File

In [36]:
word_counts = {"error": 4, "warning": 2, "info": 3}

most_common_word = max(word_counts, key=word_counts.get)

print(most_common_word)  # ✅ "error" (since 4 is the highest count)


error


#### 💡 What’s Happening?

* `word_counts.get` is used as the `key=`... function.
* Python **compares values** (`4`, `2`, `3`) instead of keys (`"error"`, `"warning"`, `"info"`).
* The word with the **highest count** is selected.

## Using `max()` to Find the Largest Transaction
You have **bank transactions**, and you want to find **the largest one**.

In [37]:
transactions = [
    ("Alice", 500), ("Bob", 1200), ("Charlie", 750), ("David", 300)
]

largest_transaction = max(transactions, key=lambda x: x[1])

print(largest_transaction)  # ✅ ('Bob', 1200)


('Bob', 1200)


#### 💡 What’s Happening?

* We **unpack tuples** and use `x[1]` (transaction amount) as the `key=`....
* `max()` selects the tuple where `x[1]` is the highest.

## Using max() to Find the Employee with the Highest Salary

In [38]:
salaries = {
    "Alice": 70000,
    "Bob": 85000,
    "Charlie": 65000,
    "David": 90000
}

highest_paid_employee = max(salaries, key=salaries.get)

print(highest_paid_employee)  # ✅ "David"


David


#### 💡 Why `salaries.get`?

* `salaries.get(employee_name)` retrieves the salary value.
* `max()` finds the employee with the **highest salary**.

## Using `max()` with Multiple Criteria
* Sometimes, you need to **break ties** by comparing multiple things.
* Imagine you're ranking players by **score**, but if scores are equal, the one with the **longest name** should win.

In [39]:
players = [
    ("Alice", 95), ("Bob", 95), ("Charlie", 90), ("David", 95)
]

best_player = max(players, key=lambda x: (x[1], len(x[0])))  # Compare score first, then name length

print(best_player)  # ✅ ("David", 95) (same score as Alice & Bob, but longest name)


('Alice', 95)


## 💡 What’s Happening?

* `x[1]` → Sorts by **score** (first priority).
* `len(x[0])` → Breaks ties by **longest name**.

## Using `max()` to Find the Most Frequent IP Address
In **server logs**, we want to find which **IP address** appeared the most.

In [40]:
ip_counts = {
    "192.168.1.1": 15,
    "192.168.1.2": 30,
    "192.168.1.3": 25
}

most_frequent_ip = max(ip_counts, key=ip_counts.get)

print(most_frequent_ip)  # ✅ "192.168.1.2" (30 accesses)


192.168.1.2


#### 💡 Why Use `key=ip_counts.get`?

`max()` will use **each IP’s count** as the sorting criterion.

### 🔥 Takeaways
* ✅ `max()` with `key=...` **lets you control what "maximum" means.**
* ✅ **Dictionaries require** `get()` (`max(d, key=d.get`)) to compare values instead of keys.
* ✅ **Can compare by multiple conditions** (`key=lambda x: (x[1], x[2])`).
* ✅ **Useful in real-world scenarios** like **logs, transactions, salaries, and rankings.**



## 📝 PRACTICE PROBLEMS
🚀 Try these out to lock it in:

### 🧩 1️⃣ Find the Oldest Person

In [41]:
people = {
    "Alice": 32,
    "Bob": 45,
    "Charlie": 29,
    "David": 40
}

# Find the oldest person using max()
oldest_person =  max(people, key=people.get)

print(oldest_person)


Bob


## 🧩 2️⃣ Find the Product with the Highest Price

In [42]:
products = [
    ("Laptop", 1200),
    ("Phone", 800),
    ("Tablet", 600),
    ("Monitor", 300)
]

# Find the most expensive product
most_expensive = max(products, key=lambda x:x[1])

print(most_expensive)


('Laptop', 1200)


### 🧩 3️⃣ Find the Most Common Character in a String

In [43]:
text = "banana"

# Count occurrences of each character
char_counts = {}

for char in text:
    char_counts[char] = char_counts.get(char, 0) + 1


# Find the most common character using max()
most_common_char = max(char_counts, key=char_counts.get)

print(most_common_char)


a


## Summary: Key takeaways
- Sets give fast membership tests (average O(1)) and make differences/intersections easy.
- Unpack tuples directly in `for` loops; with `enumerate()`, unpack `(index, value)`.
- For lists of tuples, use `for i, (a, b) in enumerate(pairs)` to avoid mistakes.
- `dict.get(key, default)` prevents `KeyError` and simplifies counting patterns.
- `max(iterable, key=...)` lets you choose what “largest” means (e.g., by value, length, or multiple criteria).

If anything is unclear, add a short comment above the code cell stating your goal (e.g., “Goal: find missing timestamps”).
