# Module 3: Collections

Collections are containers that hold multiple values. Python has several built-in collection types, each with different properties.

## Learning Objectives

- Work with lists, tuples, dictionaries, and sets
- Understand mutability and its implications
- Choose the right collection type for each use case
- Access and modify nested data structures

---
## 1. Lists

Lists are ordered, mutable collections. They're like R vectors but can hold mixed types.

**R Comparison**: `c(1, 2, 3)` â†’ `[1, 2, 3]`

In [None]:
# Creating lists
numbers = [1, 2, 3, 4, 5]
names = ["Alice", "Bob", "Charlie"]
mixed = [1, "hello", 3.14, True]  # Unlike R, can mix types!

print(numbers)
print(names)
print(mixed)

### Indexing and Slicing (0-based!)

In [None]:
fruits = ["apple", "banana", "cherry", "date", "elderberry"]

print(f"First: {fruits[0]}")
print(f"Last: {fruits[-1]}")
print(f"First three: {fruits[:3]}")
print(f"Last two: {fruits[-2:]}")

### Modifying Lists

Lists are **mutable** - you can change them in place:

In [None]:
numbers = [1, 2, 3, 4, 5]
print(f"Original: {numbers}")

# Change an element
numbers[0] = 100
print(f"After change: {numbers}")

# Add to end
numbers.append(6)
print(f"After append: {numbers}")

# Remove last item
last = numbers.pop()
print(f"Popped: {last}, List now: {numbers}")

### Common List Methods

| Method | What it does |
|--------|-------------|
| `append(x)` | Add x to end |
| `extend([...])` | Add multiple items |
| `insert(i, x)` | Insert x at position i |
| `remove(x)` | Remove first occurrence of x |
| `pop()` | Remove and return last item |
| `pop(i)` | Remove and return item at i |
| `sort()` | Sort in place |
| `reverse()` | Reverse in place |

In [None]:
letters = ["c", "a", "b"]
print(f"Original: {letters}")

letters.sort()
print(f"Sorted: {letters}")

letters.reverse()
print(f"Reversed: {letters}")

### List Operations

In [None]:
a = [1, 2, 3]
b = [4, 5, 6]

# Concatenation
print(f"a + b = {a + b}")

# Repetition
print(f"a * 3 = {a * 3}")

# Membership
print(f"2 in a: {2 in a}")
print(f"7 in a: {7 in a}")

# Length
print(f"len(a): {len(a)}")

### Predict Before You Run

In [None]:
nums = [10, 20, 30, 40, 50]

# Predict each result:
# print(nums[2])
# print(nums[-2])
# print(nums[1:4])
# print(len(nums))
# print(30 in nums)

---
## 2. Tuples

Tuples are like lists but **immutable** - you can't change them after creation.

Use tuples when:
- Data shouldn't change (like coordinates)
- You want to use it as a dictionary key (lists can't be keys)
- You want to return multiple values from a function

In [None]:
# Creating tuples
point = (3, 4)
colors = ("red", "green", "blue")
single = (42,)  # Note the comma for single-element tuple!

print(f"Point: {point}")
print(f"First color: {colors[0]}")

In [None]:
# Trying to modify raises an error
point = (3, 4)
# point[0] = 10  # Uncomment to see the error!

### Tuple Unpacking

A very useful feature:

In [None]:
point = (3, 4)

# Unpack into separate variables
x, y = point
print(f"x = {x}, y = {y}")

# Also works with lists!
first, second, third = [1, 2, 3]
print(f"{first}, {second}, {third}")

---
## 3. Dictionaries

Dictionaries store key-value pairs. They're like R's named lists.

**R Comparison**: `list(a=1, b=2)` â†’ `{"a": 1, "b": 2}`

In [None]:
# Creating dictionaries
person = {
    "name": "Alice",
    "age": 25,
    "city": "New York"
}

print(person)
print(f"Name: {person['name']}")
print(f"Age: {person['age']}")

### Accessing Values

Two ways to get values:

In [None]:
person = {"name": "Alice", "age": 25}

# Square bracket notation (raises error if key doesn't exist)
print(person["name"])

# .get() method (returns None or default if key doesn't exist)
print(person.get("name"))
print(person.get("email"))  # Returns None
print(person.get("email", "not specified"))  # Returns default

### Predict Before You Run

In [None]:
scores = {"alice": 95, "bob": 87, "charlie": 92}

# Predict each result:
# print(scores["bob"])
# print(scores.get("david"))
# print(scores.get("david", 0))
# print("alice" in scores)
# print(95 in scores)  # Tricky!

### Modifying Dictionaries

In [None]:
person = {"name": "Alice", "age": 25}
print(f"Original: {person}")

# Add or update
person["email"] = "alice@example.com"
person["age"] = 26
print(f"Updated: {person}")

# Remove
del person["email"]
print(f"After delete: {person}")

### Iterating Over Dictionaries

In [None]:
scores = {"alice": 95, "bob": 87, "charlie": 92}

# Just keys (default)
for name in scores:
    print(name)

print("---")

# Keys and values
for name, score in scores.items():
    print(f"{name}: {score}")

---
## 4. Sets

Sets are unordered collections of **unique** values. Great for:
- Removing duplicates
- Membership testing (very fast!)
- Set operations (union, intersection, etc.)

In [None]:
# Creating sets
fruits = {"apple", "banana", "cherry"}
print(fruits)

# Duplicates are automatically removed
numbers = {1, 2, 2, 3, 3, 3}
print(numbers)

In [None]:
# Remove duplicates from a list
with_dupes = [1, 2, 2, 3, 3, 3, 4]
unique = list(set(with_dupes))
print(unique)

### Set Operations

In [None]:
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

print(f"a: {a}")
print(f"b: {b}")
print(f"Union (a | b): {a | b}")  # All elements
print(f"Intersection (a & b): {a & b}")  # Common elements
print(f"Difference (a - b): {a - b}")  # In a but not b

---
## 5. The Mutability Deep Dive

This is **the most important concept in this notebook**.

### Mutable vs Immutable

| Type | Mutable? |
|------|----------|
| `list` | Yes |
| `dict` | Yes |
| `set` | Yes |
| `tuple` | No |
| `str` | No |
| `int`, `float`, `bool` | No |

### The Aliasing Trap (R Users: PAY ATTENTION!)

When you assign a mutable object to a new variable, both variables point to the **same object**.

**This is fundamentally DIFFERENT from R!**

#### In R (copies by default):
```r
a <- c(1, 2, 3)
b <- a          # R makes a COPY
b[1] <- 99      # Only b changes
print(a)        # Still c(1, 2, 3) - unchanged!
print(b)        # c(99, 2, 3)
```

#### In Python (shares by default):
```python
a = [1, 2, 3]
b = a           # b points to the SAME list
b[0] = 99       # Both a and b change!
print(a)        # [99, 2, 3] - ALSO changed!
print(b)        # [99, 2, 3]
```

Think of it this way:
- **R**: Each variable gets its own copy of the data
- **Python**: Multiple variables can be "names" for the same object

This WILL trip you up. Let's see it in action:

In [None]:
# This is the trap!
original = [1, 2, 3]
alias = original  # alias points to the SAME list!

alias.append(4)

print(f"original: {original}")  # Also changed!
print(f"alias: {alias}")
print(f"Same object? {original is alias}")

### Predict Before You Run

This is crucial. Predict what `a` will contain:

In [None]:
a = ["x", "y", "z"]
b = a
b[0] = "CHANGED"

# What is a now?
# print(a)

In [None]:
a = [1, 2, 3]
b = a
c = a
b.append(4)
c.append(5)

# What is a now? What about len(a)?
# print(a)
# print(len(a))

### How to Actually Copy

If you want independent copies:

In [None]:
original = [1, 2, 3]

# Method 1: .copy()
copy1 = original.copy()

# Method 2: list()
copy2 = list(original)

# Method 3: slice
copy3 = original[:]

# Now they're independent
copy1.append("changed copy1")
print(f"original: {original}")
print(f"copy1: {copy1}")

### Nested Structures: Deep Copy Warning

`.copy()` only makes a **shallow** copy - nested objects are still shared!

In [None]:
original = [[1, 2], [3, 4]]
shallow = original.copy()

# Modify nested list
shallow[0][0] = 99

print(f"original: {original}")  # Also changed!
print(f"shallow: {shallow}")

In [None]:
# For nested structures, use deepcopy
import copy

original = [[1, 2], [3, 4]]
deep = copy.deepcopy(original)

deep[0][0] = 99

print(f"original: {original}")  # Unchanged!
print(f"deep: {deep}")

---
## 6. When to Use Which

| Need | Use |
|------|-----|
| Ordered, changeable sequence | `list` |
| Ordered, unchangeable sequence | `tuple` |
| Key-value pairs | `dict` |
| Unique values, fast membership | `set` |
| Need to use as dict key | `tuple` (lists can't be keys) |

### Your Turn: Choose the Collection

For each scenario, which collection type would you use?

In [None]:
# 1. Store a list of todo items that can be added/removed
# Answer: 

# 2. Store a person's name, age, and email
# Answer:

# 3. Store coordinates (x, y) that shouldn't change
# Answer:

# 4. Store unique tags for a blog post
# Answer:

# 5. Store students and their grades
# Answer:

---
## 7. Nested Structures

Real data often has nested structures - lists of dicts, dicts of lists, etc.

In [None]:
# List of dictionaries (like a data frame!)
students = [
    {"name": "Alice", "grade": 95},
    {"name": "Bob", "grade": 87},
    {"name": "Charlie", "grade": 92}
]

# Access nested data
print(students[0])  # First student
print(students[0]["name"])  # First student's name
print(students[1]["grade"])  # Second student's grade

In [None]:
# This is what Jeopardy questions might look like!
question = {
    "category": "SCIENCE",
    "value": 400,
    "question": "This element has the atomic number 6",
    "answer": "Carbon"
}

print(f"Category: {question['category']}")
print(f"For ${question['value']}: {question['question']}")

### Your Turn: Navigate Nested Data

In [None]:
game_data = {
    "players": [
        {"name": "Alice", "score": 1200},
        {"name": "Bob", "score": 800}
    ],
    "categories": ["HISTORY", "SCIENCE", "LITERATURE"],
    "current_round": 1
}

# YOUR CODE HERE
# 1. Get Bob's score
# bob_score = 

# 2. Get the second category
# second_category = 

# 3. Get Alice's name
# alice_name = 

In [None]:
# ðŸ§ª Grading cell - run this to check your answer
assert 'bob_score' in dir(), "Variable 'bob_score' not defined"
assert 'second_category' in dir(), "Variable 'second_category' not defined"
assert 'alice_name' in dir(), "Variable 'alice_name' not defined"
assert bob_score == 800, f"bob_score should be 800, got {bob_score}"
assert second_category == "SCIENCE", f"second_category should be 'SCIENCE', got '{second_category}'"
assert alice_name == "Alice", f"alice_name should be 'Alice', got '{alice_name}'"
print("âœ“ Nested data navigation correct!")

---
## 8. Practice Exercises

### Exercise 1: List Manipulation

In [None]:
# Start with this list
numbers = [5, 2, 8, 1, 9]

# YOUR CODE HERE:
# 1. Add 10 to the end
# 2. Remove the 8
# 3. Sort the list
# 4. Print the result (should be [1, 2, 5, 9, 10])

In [None]:
# ðŸ§ª Grading cell - run this to check your answer
assert numbers == [1, 2, 5, 9, 10], f"numbers should be [1, 2, 5, 9, 10], got {numbers}"
print("âœ“ List manipulation correct!")

### Exercise 2: Dictionary Building

# Create a dictionary representing a Jeopardy question
# Include: category, value, question text, and answer

# YOUR CODE HERE
my_question = {
    # Fill in the fields
}

In [None]:
# ðŸ§ª Grading cell - run this to check your answer
assert 'my_question' in dir(), "Variable 'my_question' not defined"
assert isinstance(my_question, dict), "my_question should be a dictionary"
required_keys = {'category', 'value', 'question', 'answer'}
missing = required_keys - set(my_question.keys())
assert not missing, f"Missing keys: {missing}"
assert isinstance(my_question['value'], int), "value should be an integer (like 200, 400, etc.)"
print("âœ“ Jeopardy question dictionary correct!")

### Exercise 3: Remove Duplicates

# Given this list with duplicates:
tags = ["python", "data", "python", "science", "data", "ml", "python"]

# YOUR CODE HERE
# Create a list of unique tags (order doesn't matter)
# unique_tags = 

In [None]:
# ðŸ§ª Grading cell - run this to check your answer
assert 'unique_tags' in dir(), "Variable 'unique_tags' not defined"
assert set(unique_tags) == {"python", "data", "science", "ml"}, f"unique_tags should contain exactly: python, data, science, ml"
assert len(unique_tags) == 4, f"Should have 4 unique tags, got {len(unique_tags)}"
print("âœ“ Duplicates removed correctly!")

### Exercise 4: The Copy Quiz

Predict the output of each, then run to check:

In [None]:
# Scenario 1
a = [1, 2, 3]
b = a
a.append(4)
# What is b? Predict, then print

In [None]:
# Scenario 2
a = [1, 2, 3]
b = a.copy()
a.append(4)
# What is b? Predict, then print

In [None]:
# Scenario 3
a = [1, 2, 3]
b = a
a = [4, 5, 6]  # This is reassignment, not modification!
# What is b? Predict, then print

---
## Key Takeaways

1. **Lists are mutable, tuples are not** - Choose based on whether data should change
2. **Dictionaries are key-value pairs** - Like R's named lists
3. **Sets have unique values** - Great for deduplication
4. **Assignment creates aliases** - Multiple variables can point to same object
5. **Use `.copy()` for independence** - Or `copy.deepcopy()` for nested structures
6. **`in` checks membership** - For dicts, checks keys not values

---

**Next up:** Notebook 04 - Control Flow (if statements, loops, and more!)