# Homework Starter Notebook: Python Data Structures, Control Flow, Strings, and Generators

Complete the **TODO** sections in each code cell.  
Do **not** delete or change the provided **sanity checks** (assertions) unless instructed.

## Rules
- Your code should run **top-to-bottom** with no errors.
- You may be creative in *how* you implement solutions, but outputs must satisfy the checks.
- The sanity checks are **not** the full grading tests—passing them only means you're on the right track.
- You should delete `raise NotImplementedError` statements after completing each implementation.

---


In [132]:
score0 = 0.6

#! Please see comments in the codeblocks. Generally, the formatting of the report is good but we expect more attention to details and following of instructions.
# Some repetitive imports were observed.

## Setup

Run this cell first. You may import additional **standard library** modules if needed.


In [133]:
from time import perf_counter
import string

PUNCT_TABLE = str.maketrans("", "", string.punctuation)

print("Setup complete ✅")

Setup complete ✅


## Part 1: Data-Generating Functions

Implement each function. Each function must **return** the requested data structure.
- Do **not** print inside these functions.
- Make sure your functions work for small `n` and large `n`.


In [134]:
# TODO (Q1-Q4): Implement the functions below

def make_numbers_list(n):
    """Return a list of integers from 0 to n-1."""
    return list(range(n))

def make_numbers_tuple(n):
    """Return a tuple of integers from 0 to n-1."""
    return tuple(range(n))


def make_squares_dict(n):
    """Return a dict mapping i -> i^2 for i in 0..n-1."""
    return {i: i**2 for i in range(n)}


def make_unique_mod_set(n, k):
    """Return a set containing i % k for i in 0..n-1."""
    return {i % k for i in range(n)}

In [135]:
# Quick checks (small n)
assert make_numbers_list(5) == [0, 1, 2, 3, 4]
assert make_numbers_tuple(5) == (0, 1, 2, 3, 4)
assert make_squares_dict(4) == {0: 0, 1: 1, 2: 4, 3: 9}
assert make_unique_mod_set(10, 4) == {0, 1, 2, 3}

print("Part 1 checks passed ✅")

Part 1 checks passed ✅


In [136]:
score1 = 4

## Part 2: Assigning and Manipulating Data

Use **n = 10_000**. Call your Part 1 functions and assign the results to variables.

Then implement the TODOs:
- list comprehension (evens)
- sum over a condition (values > 5000)
- filter dictionary keys based on values
- compute and explain unique count of modulo set

Write short markdown explanations where requested.


In [137]:
# TODO (Q5): Create the large data structures and store them in variables
n = 10_000
k_mod = 43

numbers_list = make_numbers_list(n)
numbers_tuple = make_numbers_tuple(n)
squares_dict = make_squares_dict(n)
mods_set = make_unique_mod_set(n, k_mod)

In [138]:
score2 = 1

In [139]:
# TODO (Q6): Use a list comprehension to create a list of only even numbers from numbers_list
evens = [num for num in numbers_list if num % 2 == 0]

In [140]:
# TODO (Q7): Compute the sum of values > 5000 from numbers_tuple
sum_gt_5000 = sum(num for num in numbers_tuple if num > 5000)

In [141]:
# TODO (Q8): Create a list of keys where squares_dict[key] is divisible by 3
keys_square_div_by_3 = [key for key, value in squares_dict.items() if value % 3 == 0]

In [142]:
score3 = 3

### TODO (Q9): Explain the modulo set size (short answer)

In a markdown cell below, explain **why** `len(mods_set)` is what it is (1–3 sentences).  
Hint: consider the possible outcomes of `i % k`.


len(mods_set) allows us to see the number of the values in mods_set. This then allows us to see up until what numbers it can produce a remainder for. For example, I have 43 as my modulo. This means that there up until the number 43 (so, 42), there will be that many outcomes for i % k. In this case, there are 43 possible outcomes.

In [143]:
# TODO (Q9): Compute the number of unique values in mods_set
mods_count = len(mods_set)
print(mods_count)

43


In [144]:
# Sanity checks for Part 2

# Evens in 0..9999: exactly 5000 even numbers (0 included)
assert len(evens) == 5000
assert evens[:5] == [0, 2, 4, 6, 8]
assert evens[-1] == 9998

# sum of integers 5001..9999
# formula: sum 0..9999 - sum 0..5000
sum_0_9999 = 9999 * 10000 // 2
sum_0_5000 = 5000 * 5001 // 2
assert sum_gt_5000 == (sum_0_9999 - sum_0_5000)

# Squares divisible by 3 iff the base integer is divisible by 3
assert all(k % 3 == 0 for k in keys_square_div_by_3)

# For mods_set where k=97 and n=10000, should contain all residues 0..96
assert mods_count == k_mod
assert mods_set == set(range(k_mod))

print("Part 2 checks passed ✅")

Part 2 checks passed ✅


In [145]:
score4 = 0.7
#! Please see the comments below.
print("The explanation for the module size was not adequate. A sample explanation is given below.")
print("""For `i % k`, the result is always one of `0, 1, ..., k-1`.  
If `n >= k`, you will see **all** residues, so the set has size `k`. If `n < k`, it will have size `n`.""")

The explanation for the module size was not adequate. A sample explanation is given below.
For `i % k`, the result is always one of `0, 1, ..., k-1`.  
If `n >= k`, you will see **all** residues, so the set has size `k`. If `n < k`, it will have size `n`.


## Part 3: Strings and String Methods

You will work with strings and basic string methods.

Use this paragraph **exactly as given** (do not edit it), so grading remains consistent.


In [146]:
paragraph = (
    "Python is great, and Python is fun! "
    "Data science with Python: clean, analyze, and model data."
)

assert isinstance(paragraph, str) and len(paragraph) > 0
print("Paragraph loaded ✅")

Paragraph loaded ✅


In [147]:
# TODO (Q10): Implement make_sentence(words)
# - words: list of strings
# - return: a single string joined by spaces

def make_sentence(words):
    return " ".join(words)

In [148]:
# TODO (Q11): Word counts (case-insensitive)
# Create a dictionary word_counts mapping word -> frequency.
# Requirements:
# - Case-insensitive (treat 'Python' and 'python' as the same word)
# - Ignore punctuation (use PUNCT_TABLE or another method)
# - Split on whitespace after cleaning

word_counts = {}

cleaned_paragraph = paragraph.lower().translate(PUNCT_TABLE)
words = cleaned_paragraph.split()

for word in words:
  word_counts[word] = word_counts.get(word, 0) + 1

In [149]:
# TODO (Q12): Create a list of all words longer than 4 characters after stripping punctuation.
# - Use the same cleaned tokens as Q11 (lowercase + punctuation removed)

long_words = [word for word in words if len(word) > 4]


In [150]:
# Checks for Part 3
assert make_sentence(["hello", "world"]) == "hello world"

# Deterministic expected counts from our paragraph
# paragraph lower/punct-stripped is:
# "python is great and python is fun data science with python clean analyze and model data"
expected_counts = {
    "python": 3,
    "is": 2,
    "great": 1,
    "and": 2,
    "fun": 1,
    "data": 2,
    "science": 1,
    "with": 1,
    "clean": 1,
    "analyze": 1,
    "model": 1,
}
assert word_counts == expected_counts

# Long words (>4)
# tokens: python,is,great,and,python,is,fun,data,science,with,python,clean,analyze,and,model,data
# long words: python,great,python,science,python,clean,analyze,model
assert long_words == ["python", "great", "python", "science", "python", "clean", "analyze", "model"]

print("Part 3 checks passed ✅")

Part 3 checks passed ✅


In [151]:
score5 = 3

## Part 4: Control Flow and Iteration

In [152]:
# TODO (Q13): Count numbers divisible by BOTH 2 and 5 in numbers_list using a loop and if-statement.

count_div_by_2_and_5 = 0

for number in numbers_list:
  if number % 2 == 0 and number % 5 == 0:
    count_div_by_2_and_5 += 1

In [153]:
# TODO (Q14): Compute the average of squares_dict values whose keys are odd.

#variable to have number
avg_odd_keys = 0

for key, value in squares_dict.items():
  if key % 2 == 1:
    avg_odd_keys += value

#calculation to assert the formula and get the average. hopefully it works
avg_odd_keys /= sum(1 for key in squares_dict.keys() if key % 2 == 1)


In [154]:
# TODO (Q15): Classify numbers 0..20 into 'small' (<10), 'medium' (10–15), 'large' (>15)
# Store in a dictionary labels: number -> label

labels = {}

#for loop for the numbers. it is 21 because if we chose 20 it would only go until 19
for i in range(21):
    if 0 <= i < 10:
      labels[i] = "small"
    elif 10 <= i <= 15:
      labels[i] = "medium"
    else:
      labels[i] = "large"


In [155]:
# Checks for Part 4

# In 0..9999, multiples of 10: 0,10,...,9990 => 1000 values
assert count_div_by_2_and_5 == 1000

# Average of squares of odd integers 1..9999
# Compute expected via formula using sums:
# sum_{i=1..m} i^2 = m(m+1)(2m+1)/6
# odd squares sum = total squares - even squares
m = 9999
sum_sq_total = m * (m + 1) * (2 * m + 1) // 6

# even i are 2j for j=0..4999, squares are 4*j^2
m2 = 4999
sum_sq_j = m2 * (m2 + 1) * (2 * m2 + 1) // 6
sum_sq_even = 4 * sum_sq_j

sum_sq_odd = sum_sq_total - sum_sq_even
count_odds = 5000  # odds in 0..9999 are 1..9999 => 5000 odds
expected_avg_odd = sum_sq_odd / count_odds

assert abs(avg_odd_keys - expected_avg_odd) < 1e-9

# Classification checks
assert labels[0] == "small"
assert labels[9] == "small"
assert labels[10] == "medium"
assert labels[15] == "medium"
assert labels[16] == "large"
assert labels[20] == "large"

print("Part 4 checks passed ✅")

Part 4 checks passed ✅


In [156]:
score6 = 3

## Part 5: Generators vs Iterables (Performance + Memory)

You will implement:
- a **generator function** that yields values lazily
- an equivalent **list-based** function

Then you will time both approaches for a large `n`.

### Hint
A generator is typically more memory efficient because it does not store all values at once.


In [157]:
# TODO (Q16): Generator function
# gen_even_squares(n) should YIELD squares of even numbers from 0 to n-1

# my first time working with generators
def gen_even_squares(n):
    for i in range(n):
      if i % 2 == 0:
        yield i**2

In [158]:
# TODO (Q17): List-based equivalent
# list_even_squares(n) should RETURN a list of squares of even numbers from 0 to n-1

def list_even_squares(n):
    return [i**2 for i in range(n) if i % 2 == 0]

In [159]:
# TODO (Q18): Time both approaches for n = 1_000_000 and compute sums.
# Use perf_counter() for timing.
#
# Requirements:
# - Compute gen_sum = sum(gen_even_squares(n_big))
# - Compute lst_sum = sum(list_even_squares(n_big))
# - Record two elapsed times (generator and list)
# - Print both sums and both times

#! The perf_counter was already imported from the time library at the very first block.
import time

n_big = 1_000_000

#time generator version
start_time = time.perf_counter()
gen_sum = sum(gen_even_squares(n_big))
end_time = time.perf_counter()
gen_time = end_time - start_time
print("Gen sum and gen time: ",gen_sum, gen_time)

#time list version
start_time = time.perf_counter()
lst_sum = sum(list_even_squares(n_big))
end_time = time.perf_counter()
list_time = end_time - start_time
print("List sum and list time: ",lst_sum, list_time)


Gen sum and gen time:  166666166667000000 0.10094176799975685
List sum and list time:  166666166667000000 0.10497394300000451


In [160]:
# Checks for Part 5

# Quick equivalence test
assert list(gen_even_squares(20)) == list_even_squares(20)
print("Generator and list versions match ✅")

# Sums must match
assert gen_sum == lst_sum
print("Part 5 checks passed ✅")

Generator and list versions match ✅
Part 5 checks passed ✅


In [161]:
score7 = 3

### TODO (Q19): Explanation (2–4 sentences)

In a markdown cell below, explain **why** the generator version is more memory efficient and what that means in practice.


A generator is more memory effiecient as it does not require as much memory to 'store' and 'run'. For example, asking python to print a list would have that list stored in memory all at once. Compared to a generator where values would only be produced one at a time and would not take as much space in memory. In practice, when working with larger and larger datasets, it makes it easier on your memory if you use generators to produce those values as opposed to inserting them all at once in your memory with a list or something similar.

In [162]:
score8 = 1

---
## Submission checklist

Before you submit:
- Restart runtime/kernel
- Run **all cells** top-to-bottom
- Ensure every sanity check prints ✅


In [163]:
total = (score0 + score1 + score2 + score3 + score4 + score5 + score6 + score7 + score8) * 5
print("total score = ", total)

total score =  96.49999999999999
