# Homework Starter Notebook: Python Data Structures, Control Flow, Strings, and Generators

**This homework is not formatted for `nbgrader` despite being put in nbgrader directory**

Complete the **TODO** sections in each code cell.  
Do **not** delete or change the provided **sanity checks** (assertions) unless instructed.

## Rules
- Your code should run **top-to-bottom** with no errors.
- You may be creative in *how* you implement solutions, but outputs must satisfy the checks.
- The sanity checks are **not** the full grading tests—passing them only means you're on the right track.
- You should delete `raise NotImplementedError` statements after completing each implementation.

---


## Setup

Run this cell first. You may import additional **standard library** modules if needed.


In [None]:
from time import perf_counter
import string

PUNCT_TABLE = str.maketrans("", "", string.punctuation)

print("Setup complete ✅")

## Part 1: Data-Generating Functions

Implement each function. Each function must **return** the requested data structure.
- Do **not** print inside these functions.
- Make sure your functions work for small `n` and large `n`.


In [None]:
# TODO (Q1-Q4): Implement the functions below

def make_numbers_list(n):
    """Return a list of integers from 0 to n-1."""
    # YOUR CODE HERE
    raise NotImplementedError


def make_numbers_tuple(n):
    """Return a tuple of integers from 0 to n-1."""
    # YOUR CODE HERE
    raise NotImplementedError


def make_squares_dict(n):
    """Return a dict mapping i -> i^2 for i in 0..n-1."""
    # YOUR CODE HERE
    raise NotImplementedError


def make_unique_mod_set(n, k):
    """Return a set containing i % k for i in 0..n-1."""
    # YOUR CODE HERE
    raise NotImplementedError

In [None]:
# Sanity checks (do not edit)

# Type checks
out1 = make_numbers_list(5)
out2 = make_numbers_tuple(5)
out3 = make_squares_dict(5)
out4 = make_unique_mod_set(10, 4)

assert isinstance(out1, list)
assert isinstance(out2, tuple)
assert isinstance(out3, dict)
assert isinstance(out4, set)

# Basic content checks
assert out1[0] == 0 and out1[-1] == 4 and len(out1) == 5
assert out2[0] == 0 and out2[-1] == 4 and len(out2) == 5
assert out3[0] == 0 and out3[1] == 1 and out3[4] == 16 and len(out3) == 5

# Mod set must only contain residues in [0, k-1]
k = 4
assert all((0 <= x < k) for x in out4)
assert len(out4) <= k

print("Part 1 sanity checks passed ✅")

## Part 2: Assigning and Manipulating Data

Use **n = 10_000**. Call your Part 1 functions and assign the results to variables.

Then implement the TODOs:
- list comprehension (evens)
- sum over a condition (values > 5000)
- filter dictionary keys based on values
- compute and explain unique count of modulo set

Write short explanations where requested.
For very short explanations you can leave comments in the code using # character.


In [None]:
# TODO (Q5): Create the large data structures and store them in variables

n = 10_000
# k_mod = ... (choose a reasonable value, e.g. 97)


# YOUR CODE HERE
# numbers_list = ...
# numbers_tuple = ...
# squares_dict = ...
# mods_set = ...  (use k_mod here.)

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)

assert n == 10_000
assert isinstance(numbers_list, list) and len(numbers_list) == n
assert isinstance(numbers_tuple, tuple) and len(numbers_tuple) == n
assert isinstance(squares_dict, dict) and len(squares_dict) == n
assert isinstance(mods_set, set)

# Quick consistency: squares_dict should match i*i at a few points
for i in [0, 1, 2, 123, 9999]:
    assert squares_dict[i] == i * i

print("Data structures created ✅")

In [None]:
# TODO (Q6): Use a list comprehension to create a list of only even numbers from numbers_list
# evens = ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert isinstance(evens, list)
assert all(x % 2 == 0 for x in evens[:100])  # spot check
assert evens[0] == 0
assert evens[-1] % 2 == 0
print("Q6 sanity checks passed ✅")

In [None]:
# TODO (Q7): Compute the sum of values > 5000 from numbers_tuple
# sum_gt_5000 = ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert isinstance(sum_gt_5000, int)
assert sum_gt_5000 > 0
# Must be less than sum of all numbers (0..9999)
assert sum_gt_5000 < sum(numbers_tuple)
print("Q7 sanity checks passed ✅")

In [None]:
# TODO (Q8): Create a list of keys where squares_dict[key] is divisible by 3
# keys_square_div_by_3 = ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert isinstance(keys_square_div_by_3, list)
assert all(isinstance(k, int) for k in keys_square_div_by_3[:50])  # spot check
# For any key you include, its square must be divisible by 3
assert all(squares_dict[k] % 3 == 0 for k in keys_square_div_by_3[:200])  # partial check
print("Q8 sanity checks passed ✅")

### TODO (Q9): Explain the modulo set size (short answer)

In this markdown cell, explain **why** `len(mods_set)` is what it is (1–3 sentences).  
Hint: consider the possible outcomes of `i % k`.


In [None]:
# TODO (Q9): Compute the number of unique values in mods_set
# mods_count = ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert isinstance(mods_count, int)
assert mods_count == len(mods_set)
# Mod set size can never exceed k (whatever k you used)
# Try to define k as a variable named `k_mod` when you build the set.
assert 'k_mod' in globals()
assert mods_count <= k_mod
print("Q9 sanity checks passed ✅")

## Part 3: Strings and String Methods

You will work with strings and basic string methods.

Use this paragraph **exactly as given** (do not edit it), so grading remains consistent.


In [None]:
# TODO (Q10): Implement make_sentence(words)
# - words: list of strings
# - return: a single string joined by spaces

def make_sentence(words):
    raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
s = make_sentence(["hello", "world"])
print(s)
assert s == "hello world"
assert isinstance(make_sentence(["one"]), str)
print("Q10 sanity checks passed ✅")

In [None]:
paragraph = (
    "Python is great, and Python is fun! "
    "Data science with Python: clean, analyze, and model data."
)

assert isinstance(paragraph, str) and len(paragraph) > 0
print("Paragraph loaded ✅")

In [None]:
# TODO (Q11): Word counts (case-insensitive)
# Create a dictionary word_counts mapping word -> frequency.
# Requirements:
# - Case-insensitive (treat 'Python' and 'python' as the same word)
# - Ignore punctuation (use PUNCT_TABLE or another method)
# - Split on whitespace after cleaning

# word_counts = ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert isinstance(word_counts, dict)
assert all(isinstance(k, str) for k in word_counts.keys())
assert all(k == k.lower() for k in word_counts.keys())  # keys should be lowercase
assert sum(word_counts.values()) > 0
# Total tokens should match sum of counts for a proper bag-of-words
cleaned = paragraph.lower().translate(PUNCT_TABLE)
tokens = cleaned.split()
assert sum(word_counts.values()) == len(tokens)
print("Q11 sanity checks passed ✅")

In [None]:
# TODO (Q12): Create a list of all words longer than 4 characters after stripping punctuation.
# - Use the same cleaned tokens as Q11 (lowercase + punctuation removed)
# - long_words = ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert isinstance(long_words, list)
assert all(isinstance(w, str) for w in long_words)
assert all(w == w.lower() for w in long_words)
assert all(len(w) > 4 for w in long_words)
print("Q12 sanity checks passed ✅")

## Part 4: Control Flow and Iteration

In [None]:
# TODO (Q13): Count numbers divisible by BOTH 2 and 5 in numbers_list using a loop and if-statement.
# count_div_by_2_and_5 = ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert isinstance(count_div_by_2_and_5, int)
assert count_div_by_2_and_5 > 0
# Must match counting via a comprehension (reference check)
ref = sum(1 for x in numbers_list if (x % 2 == 0 and x % 5 == 0))
assert count_div_by_2_and_5 == ref
print("Q13 sanity checks passed ✅")

In [None]:
# TODO (Q14): Compute the average of squares_dict values whose keys are odd.
# avg_odd_keys = ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert isinstance(avg_odd_keys, float)
assert avg_odd_keys > 0
# Compare against a reference calculation (should match closely)
ref_avg = sum(v for k, v in squares_dict.items() if k % 2 == 1) / sum(1 for k in squares_dict.keys() if k % 2 == 1)
assert abs(avg_odd_keys - ref_avg) < 1e-9
print("Q14 sanity checks passed ✅")

In [None]:
# TODO (Q15): Classify numbers 0..20 into 'small' (<10), 'medium' (10–15), 'large' (>15)
# Store in a dictionary labels: number -> label
# labels = ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert isinstance(labels, dict)
assert set(labels.keys()) == set(range(21))
assert labels[0] == "small"
assert labels[10] == "medium"
assert labels[15] == "medium"
assert labels[16] == "large"
print("Q15 sanity checks passed ✅")

## Part 5: Generators vs Iterables (Performance + Memory)

You will implement:
- a **generator function** that yields values lazily
- an equivalent **list-based** function

Then you will time both approaches for a large `n`.

### Hint
A generator is typically more memory efficient because it does not store all values at once.


In [None]:
# TODO (Q16): Generator function
# gen_even_squares(n) should YIELD squares of even numbers from 0 to n-1

def gen_even_squares(n):
    # YOUR CODE HERE
    raise NotImplementedError

In [None]:
# TODO (Q17): List-based equivalent
# list_even_squares(n) should RETURN a list of squares of even numbers from 0 to n-1

def list_even_squares(n):
    # YOUR CODE HERE
    raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
small_n = 50
g = list(gen_even_squares(small_n))
l = list_even_squares(small_n)

assert isinstance(l, list)
assert g == l
assert all(x >= 0 for x in g)
assert all(int(v**0.5) ** 2 == v for v in g[:10])  # spot-check squares

print("Q16–Q17 sanity checks passed ✅")

In [None]:
# TODO (Q18): Time both approaches for n = 1_000_000 and compute sums.
# Use perf_counter() for timing.
#
# Requirements:
# - Compute gen_sum = sum(gen_even_squares(n_big))
# - Compute lst_sum = sum(list_even_squares(n_big))
# - Record two elapsed times (generator and list)
# - Print both sums and both times
#
# n_big = 1_000_000
# ...

raise NotImplementedError

In [None]:
# Sanity checks (do not edit)
assert 'n_big' in globals() and n_big == 1_000_000
assert isinstance(gen_sum, int)
assert isinstance(lst_sum, int)
assert gen_sum == lst_sum
assert isinstance(gen_time, float) and gen_time > 0
assert isinstance(list_time, float) and list_time > 0

print("Q18 sanity checks passed ✅")

### TODO (Q19): Explanation (2–4 sentences)

In this markdown cell, explain **why** the generator version is more memory efficient and what that means in practice.


---
## Submission checklist

Before you submit:
- Restart runtime/kernel
- Run **all cells** top-to-bottom
- Ensure every sanity check prints ✅
