# Homework Solutions: Python Data Structures, Control Flow, Strings, and Generators

This notebook contains **one possible set of correct solutions** to the homework problems.

- It is written to be **deterministic** (same results each run).
- It includes **sanity-check `assert`** so you can quickly verify correctness.
- Some questions allow multiple valid implementations; these are clean, straightforward solutions.


## Setup

We'll use a few standard-library helpers for timing and string processing.


In [1]:
from time import perf_counter
import string

# Helper: simple punctuation stripping
PUNCT_TABLE = str.maketrans("", "", string.punctuation)

## Part 1: Data-Generating Functions

In [2]:
def make_numbers_list(n: int) -> list[int]:
    """Return a list of integers from 0 to n-1."""
    return list(range(n))


def make_numbers_tuple(n: int) -> tuple[int, ...]:
    """Return a tuple of integers from 0 to n-1."""
    return tuple(range(n))


def make_squares_dict(n: int) -> dict[int, int]:
    """Return dict mapping i -> i^2 for i in 0..n-1."""
    return {i: i * i for i in range(n)}


def make_unique_mod_set(n: int, k: int) -> set[int]:
    """Return set of i % k for i in 0..n-1."""
    return {i % k for i in range(n)}

In [4]:
# Quick checks (small n)
assert make_numbers_list(5) == [0, 1, 2, 3, 4]
assert make_numbers_tuple(5) == (0, 1, 2, 3, 4)
assert make_squares_dict(4) == {0: 0, 1: 1, 2: 4, 3: 9}
assert make_unique_mod_set(10, 4) == {0, 1, 2, 3}

print("Part 1 checks passed ✅")

Part 1 checks passed ✅


## Part 2: Assigning and Manipulating Data

In [7]:
n = 10_000

numbers_list = make_numbers_list(n)
numbers_tuple = make_numbers_tuple(n)
squares_dict = make_squares_dict(n)
mods_set = make_unique_mod_set(n, k=97)  # choose a k to make the set interesting

# Q6 (even numbers using list comprehension)
evens = [x for x in numbers_list if x % 2 == 0]

# Q7 (sum values > 5000 from the tuple)
sum_gt_5000 = sum(x for x in numbers_tuple if x > 5000)

# Q8 (keys where square divisible by 3)
keys_square_div_by_3 = [k for k, v in squares_dict.items() if v % 3 == 0]

# Q9 (unique values in mod set + explanation)
mods_count = len(mods_set)

print("len(evens) =", len(evens))
print("sum_gt_5000 =", f"{sum_gt_5000:,}")
print("first 10 keys_square_div_by_3 =", keys_square_div_by_3[:10])
print("mods_count =", mods_count)

len(evens) = 5000
sum_gt_5000 = 37,492,500
first 10 keys_square_div_by_3 = [0, 3, 6, 9, 12, 15, 18, 21, 24, 27]
mods_count = 97


In [8]:
# Sanity checks for Part 2

# Evens in 0..9999: exactly 5000 even numbers (0 included)
assert len(evens) == 5000
assert evens[:5] == [0, 2, 4, 6, 8]
assert evens[-1] == 9998

# sum of integers 5001..9999
# formula: sum 0..9999 - sum 0..5000
sum_0_9999 = 9999 * 10000 // 2
sum_0_5000 = 5000 * 5001 // 2
assert sum_gt_5000 == (sum_0_9999 - sum_0_5000)

# Squares divisible by 3 iff the base integer is divisible by 3
assert all(k % 3 == 0 for k in keys_square_div_by_3)

# For mods_set where k=97 and n=10000, should contain all residues 0..96
assert mods_count == 97
assert mods_set == set(range(97))

print("Part 2 checks passed ✅")

Part 2 checks passed ✅


**Explanation for Q9 (unique mod values):**  
For `i % k`, the result is always one of `0, 1, ..., k-1`.  
If `n >= k`, you will see **all** residues, so the set has size `k`. If `n < k`, it will have size `n`.


## Part 3: Strings and String Methods

In [9]:
# Q10

def make_sentence(words: list[str]) -> str:
    """Join words with spaces into a single sentence."""
    return " ".join(words)

In [None]:
# Use a fixed paragraph for deterministic results
paragraph = (
    "Python is great, and Python is fun! "
    "Data science with Python: clean, analyze, and model data."
)

# Q10: word counts (case-insensitive)
cleaned = paragraph.lower().translate(PUNCT_TABLE)
tokens = cleaned.split()

word_counts = {}
for w in tokens:
    word_counts[w] = word_counts.get(w, 0) + 1

# Q11: words longer than 4 chars after stripping punctuation
# (we already stripped punctuation above)
long_words = [w for w in tokens if len(w) > 4]

sentence = make_sentence(["This", "is", "a", "sentence."])

print("sentence:", sentence)
print("word_counts:", word_counts)
print("long_words:", long_words)


sentence: This is a sentence.
word_counts: {'python': 3, 'is': 2, 'great': 1, 'and': 2, 'fun': 1, 'data': 2, 'science': 1, 'with': 1, 'clean': 1, 'analyze': 1, 'model': 1}
long_words: ['python', 'great', 'python', 'science', 'python', 'clean', 'analyze', 'model']


In [None]:
# Checks for Part 3
assert make_sentence(["hello", "world"]) == "hello world"

# Deterministic expected counts from our paragraph
# paragraph lower/punct-stripped is:
# "python is great and python is fun data science with python clean analyze and model data"
expected_counts = {
    "python": 3,
    "is": 2,
    "great": 1,
    "and": 2,
    "fun": 1,
    "data": 2,
    "science": 1,
    "with": 1,
    "clean": 1,
    "analyze": 1,
    "model": 1,
}
assert word_counts == expected_counts

# Long words (>4)
# tokens: python,is,great,and,python,is,fun,data,science,with,python,clean,analyze,and,model,data
# long words: python,great,python,science,python,clean,analyze,model
assert long_words == ["python", "great", "python", "science", "python", "clean", "analyze", "model"]

print("Part 3 checks passed ✅")

## Part 4: Control Flow and Iteration

In [None]:
# Q13: count numbers divisible by both 2 and 5 => divisible by 10
count_div_by_2_and_5 = 0
for x in numbers_list:
    if x % 2 == 0 and x % 5 == 0:
        count_div_by_2_and_5 += 1

# Q14: average of dict values whose keys are odd
total = 0
count = 0
for k, v in squares_dict.items():
    if k % 2 == 1:
        total += v
        count += 1
avg_odd_keys = total / count

# Q15: classify 0..20 with if/elif/else
labels = {}
for x in range(21):
    if x < 10:
        labels[x] = "small"
    elif 10 <= x <= 15:
        labels[x] = "medium"
    else:
        labels[x] = "large"

print("count_div_by_2_and_5 =", count_div_by_2_and_5)
print("avg_odd_keys =", avg_odd_keys)
print("labels sample:", {k: labels[k] for k in [0, 9, 10, 15, 16, 20]})

In [None]:
# Checks for Part 4

# In 0..9999, multiples of 10: 0,10,...,9990 => 1000 values
assert count_div_by_2_and_5 == 1000

# Average of squares of odd integers 1..9999
# Compute expected via formula using sums:
# sum_{i=1..m} i^2 = m(m+1)(2m+1)/6
# odd squares sum = total squares - even squares
m = 9999
sum_sq_total = m * (m + 1) * (2 * m + 1) // 6

# even i are 2j for j=0..4999, squares are 4*j^2
m2 = 4999
sum_sq_j = m2 * (m2 + 1) * (2 * m2 + 1) // 6
sum_sq_even = 4 * sum_sq_j

sum_sq_odd = sum_sq_total - sum_sq_even
count_odds = 5000  # odds in 0..9999 are 1..9999 => 5000 odds
expected_avg_odd = sum_sq_odd / count_odds

assert abs(avg_odd_keys - expected_avg_odd) < 1e-9

# Classification checks
assert labels[0] == "small"
assert labels[9] == "small"
assert labels[10] == "medium"
assert labels[15] == "medium"
assert labels[16] == "large"
assert labels[20] == "large"

print("Part 4 checks passed ✅")

## Part 5: Generators vs Iterables (Performance + Memory)

### Notes

- A **list-based** approach builds the entire list in memory first.
- A **generator** yields values one at a time (lazy), which is usually **more memory efficient**.
- Timing results vary by machine; focus on *relative behavior* and the memory concept.

In Colab, you can also use `%time` / `%timeit`, but here we use `perf_counter()` for portability.


In [None]:
def gen_even_squares(n: int):
    """Yield squares of even numbers from 0 to n-1."""
    for x in range(n):
        if x % 2 == 0:
            yield x * x


def list_even_squares(n: int) -> list[int]:
    """Return a list of squares of even numbers from 0 to n-1."""
    return [x * x for x in range(n) if x % 2 == 0]

In [None]:
# Quick equivalence test
assert list(gen_even_squares(20)) == list_even_squares(20)
print("Generator and list versions match ✅")

In [None]:
n_big = 10_000_000

# Time generator approach (no intermediate list)
t0 = perf_counter()
gen_sum = sum(gen_even_squares(n_big))
t1 = perf_counter()

# Time list approach (build list first)
t2 = perf_counter()
lst_sum = sum(list_even_squares(n_big))
t3 = perf_counter()

print("gen_sum =", gen_sum)
print("lst_sum =", lst_sum)
print(f"Generator time: {t1 - t0:.4f} seconds")
print(f"List time:      {t3 - t2:.4f} seconds")

In [None]:
# Sums must match
assert gen_sum == lst_sum
print("Part 5 checks passed ✅")

### Explanation (Q19)

The generator version is more memory efficient because it **does not create a full list** of all even squares.  
Instead, it yields one value at a time, and `sum()` consumes each value immediately.  
The list version must allocate memory for ~500,000 integers (for `n=1,000,000`), which increases memory usage and can add overhead.
