# Homework: Introduction to Python

## Problem 1: Containers and Functions

**(a)** Write a function called `unique_sorted` that takes a list of integers and returns a **new** list containing the unique values in ascending order. For example, `unique_sorted([3, 1, 2, 3, 2])` should return `[1, 2, 3]`.

In [3]:
def unique_sorted(values):
    return(sorted(set(values)))
    pass
unique_sorted([3, 1, 2, 3, 2])

[1, 2, 3]

**(b)** Write a function called `count_words` that takes a list of strings and returns a dictionary mapping each word (lowercased) to its count. For example, `count_words(["Apple", "banana", "apple", "BANANA"])` should return `{'apple': 2, 'banana': 2}`.

In [6]:
def count_words(words):
    counts = {}
    for word in words:
        w = word.lower()
        counts[w] = counts.get(w, 0) + 1
    return counts
    pass
count_words(["Apple", "banana", "apple", "BANANA"])

{'apple': 2, 'banana': 2}

**(c)** You are given a list of `(name, score)` tuples. Write a function called `average_scores` that returns a dictionary mapping each name to their average score. If a name appears multiple times, you should average all of their scores. For example, `average_scores([("Ada", 90), ("Bob", 80), ("Ada", 100)])` should return `{'Ada': 95.0, 'Bob': 80.0}`.

In [7]:
def average_scores(records):
    totals = {}
    counts = {}
    for name, score in records:
        totals[name] = totals.get(name, 0) + score
        counts[name] = counts.get(name, 0) + 1
    return {name: totals[name] / counts[name] for name in totals}
    pass
average_scores([("Ada", 90), ("Bob", 80), ("Ada", 100)])

{'Ada': 95.0, 'Bob': 80.0}

## Problem 2: Writing Functions with Comprehensions

**(a)** Write a function called `filter_by_threshold` that takes a list of numbers and a threshold value, and returns a new list containing only the numbers greater than the threshold. Use a list comprehension. For example, `filter_by_threshold([1, 5, 3, 8, 2], 3)` should return `[5, 8]`.

In [8]:
def filter_by_threshold(numbers, threshold):
    return [x for x in numbers if x > threshold]
    pass
filter_by_threshold([1, 5, 3, 8, 2], 3)

[5, 8]

**(b)** Write a function called `word_lengths` that takes a list of strings and returns a dictionary mapping each unique word to its length. Use a dictionary comprehension. For example, `word_lengths(["hello", "world", "hi"])` should return `{'hello': 5, 'world': 5, 'hi': 2}`.

In [9]:
def word_lengths(words):
    return {word: len(word) for word in set(words)}
    pass
word_lengths(["hello", "world", "hi"])

{'hi': 2, 'world': 5, 'hello': 5}

**(c)** Write a function called `common_elements` that takes two lists and returns a set of elements that appear in both lists. Use a set comprehension. For example, `common_elements([1, 2, 3, 4], [3, 4, 5, 6])` should return `{3, 4}`.

In [10]:
def common_elements(list1, list2):
    set2 = set(list2)
    return {x for x in list1 if x in set2}
    pass
common_elements([1, 2, 3, 4], [3, 4, 5, 6])

{3, 4}

## Problem 3: NumPy Array Operations

**(a)** Given the following 2D array, write NumPy code to:
1. Extract the second row as a 1D array
2. Extract the last column as a 2D column vector (shape should be `(4, 1)`)
3. Extract the 2x2 subarray from the bottom-right corner

In [4]:
import numpy as np

data = np.array([[1, 2, 3, 4],
                 [5, 6, 7, 8],
                 [9, 10, 11, 12],
                 [13, 14, 15, 16]])

(2, 2)

In [8]:
second_row = data[1]
second_row.shape

(4,)

In [9]:
last_column = data[:, -1].reshape(-1, 1)
last_column.shape

(4, 1)

In [10]:
bottom_right_2x2 = data[-2:, -2:]
bottom_right_2x2.shape

(2, 2)

**(b)** Without using loops, write NumPy code to:
1. Create a 5x5 array where each element is the sum of its row and column indices (i.e., element at position `[i, j]` should equal `i + j`)
2. Normalize each row of a matrix so that each row sums to 1 (use broadcasting)

In [15]:
# Given matrix for part 2
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]], dtype=float)


In [16]:
arr = np.add.outer(np.arange(5), np.arange(5))
arr

array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])

In [19]:
row_sums = matrix.sum(axis=1, keepdims=True)
normalized_matrix = matrix / row_sums
normalized_matrix

array([[0.16666667, 0.33333333, 0.5       ],
       [0.26666667, 0.33333333, 0.4       ],
       [0.29166667, 0.33333333, 0.375     ]])

**(c)** Explain the difference between the following two indexing operations. What are the shapes of `a` and `b`? Which one creates a view and which creates a copy?

In [21]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a = arr[0:1, :]
b = arr[[0], :]

In [24]:
a.shape
# a uses slice indexing and is a view of arr, the original array. so modifying a does affect arr

(1, 3)

In [26]:
b.shape
# b uses advanced indexing, [0] is a list of indices. b is a copy of arr - in other words, modifying b does not affect arr

(1, 3)

## Problem 4: Improving Inefficient Code

The following code computes pairwise distances between points but is inefficient. Rewrite it using NumPy broadcasting to eliminate the nested loops. Your solution should produce the same result as the original code.

In [32]:
import numpy as np

def pairwise_distances_slow(points):
    """Compute pairwise Euclidean distances between points.

    Parameters
    ----------
    points : np.ndarray
        Array of shape (n, d) where n is number of points and d is dimensions.

    Returns
    -------
    np.ndarray
        Array of shape (n, n) containing pairwise distances.
    """
    n = points.shape[0]
    distances = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            diff = points[i] - points[j]
            distances[i, j] = np.sqrt(np.sum(diff ** 2))
    return distances


In [33]:
points = np.array([[0, 0],
                   [3, 4],
                   [6, 8]], dtype=float)
pairwise_distances_slow(points)

array([[ 0.,  5., 10.],
       [ 5.,  0.,  5.],
       [10.,  5.,  0.]])

In [35]:
import numpy as np

def pairwise_distances(points):
    """
    Compute pairwise Euclidean distances between points using broadcasting.
    """
    # points shape: (n, d)
    diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
    distances = np.sqrt(np.sum(diff ** 2, axis=2))
    return distances

In [36]:
points = np.array([[0, 0],
                   [3, 4],
                   [6, 8]], dtype=float)
pairwise_distances(points)

array([[ 0.,  5., 10.],
       [ 5.,  0.,  5.],
       [10.,  5.,  0.]])

## Problem 5: The View Trap

A colleague wrote the following function to standardize columns of a matrix (subtract mean, divide by standard deviation). However, users are reporting unexpected behavior.

In [None]:
import numpy as np

def standardize_columns(data):
    """Standardize each column to have mean 0 and std 1."""
    result = data
    for j in range(data.shape[1]):
        col = result[:, j]
        col_mean = np.mean(col)
        col_std = np.std(col)
        col = (col - col_mean) / col_std
    return result

**(a)** Explain why this function does not work as intended. What fundamental concept about NumPy arrays is being misunderstood?

**(b)** A user runs the following code and is surprised by the output. Explain what happens and why.

In [None]:
original = np.array([[1.0, 2.0], [3.0, 4.0]])
normalized = standardize_columns(original)
print("Original:", original)
print("Normalized:", normalized)
print("Are they the same object?", original is normalized)

**(c)** Rewrite the function correctly. Your solution should actually standardize the columns, not modify the input array, and use vectorized operations instead of explicit loops where possible.

**(d)** Write a simple test that verifies your corrected function works properly. The test should check that the output columns have approximately mean 0 and standard deviation 1.