# Homework: Introduction to Python

## Problem 1: Containers and Functions

**(a)** Write a function called `unique_sorted` that takes a list of integers and returns a **new** list containing the unique values in ascending order. For example, `unique_sorted([3, 1, 2, 3, 2])` should return `[1, 2, 3]`.

In [1]:
def unique_sorted(values):
    return sorted(set(values))
    pass

In [2]:
unique_sorted([3, 1, 2, 3, 2])

[1, 2, 3]

**(b)** Write a function called `count_words` that takes a list of strings and returns a dictionary mapping each word (lowercased) to its count. For example, `count_words(["Apple", "banana", "apple", "BANANA"])` should return `{'apple': 2, 'banana': 2}`.

In [6]:
def count_words(words):
    # Your code here
    count = {}
    for w in words:
      w = w.lower()
      count[w] = count.get(w,0) + 1
    return count
    pass

In [7]:
count_words(["Apple", "banana", "apple", "BANANA"])

{'apple': 2, 'banana': 2}

**(c)** You are given a list of `(name, score)` tuples. Write a function called `average_scores` that returns a dictionary mapping each name to their average score. If a name appears multiple times, you should average all of their scores. For example, `average_scores([("Ada", 90), ("Bob", 80), ("Ada", 100)])` should return `{'Ada': 95.0, 'Bob': 80.0}`.

In [16]:
def average_scores(records):
    # Your code here
    total = {}
    count = {}
    for name, score in records:
      total[name] = total.get(name, 0) + score
      count[name] = count.get(name, 0) + 1
    averages = {}
    for name in total:
        averages[name] = total[name] / count[name]
    return averages
    pass

In [17]:
average_scores([("Ada", 90), ("Bob", 80), ("Ada", 100)])

{'Ada': 95.0, 'Bob': 80.0}

## Problem 2: Writing Functions with Comprehensions

**(a)** Write a function called `filter_by_threshold` that takes a list of numbers and a threshold value, and returns a new list containing only the numbers greater than the threshold. Use a list comprehension. For example, `filter_by_threshold([1, 5, 3, 8, 2], 3)` should return `[5, 8]`.

In [23]:
def filter_by_threshold(numbers, threshold):
    return [num for num in numbers if num > threshold]
    pass

In [24]:
filter_by_threshold([1, 5, 3, 8, 2], 3)

[5, 8]

**(b)** Write a function called `word_lengths` that takes a list of strings and returns a dictionary mapping each unique word to its length. Use a dictionary comprehension. For example, `word_lengths(["hello", "world", "hi"])` should return `{'hello': 5, 'world': 5, 'hi': 2}`.

In [25]:
def word_lengths(words):
    # Your code here
    return {w: len(w) for w in words}
    pass

In [26]:
word_lengths(["hello", "world", "hi"])

{'hello': 5, 'world': 5, 'hi': 2}

**(c)** Write a function called `common_elements` that takes two lists and returns a set of elements that appear in both lists. Use a set comprehension. For example, `common_elements([1, 2, 3, 4], [3, 4, 5, 6])` should return `{3, 4}`.

In [27]:
def common_elements(list1, list2):
    # Your code here
    return {x for x in list1 if x in list2}
    pass

In [28]:
common_elements([1, 2, 3, 4], [3, 4, 5, 6])

{3, 4}

## Problem 3: NumPy Array Operations

**(a)** Given the following 2D array, write NumPy code to:
1. Extract the second row as a 1D array
2. Extract the last column as a 2D column vector (shape should be `(4, 1)`)
3. Extract the 2x2 subarray from the bottom-right corner

In [29]:
import numpy as np

data = np.array([[1, 2, 3, 4],
                 [5, 6, 7, 8],
                 [9, 10, 11, 12],
                 [13, 14, 15, 16]])


In [31]:
second_row = data[1]
second_row

array([5, 6, 7, 8])

In [32]:
last_col = data[:, -1:]
last_col

array([[ 4],
       [ 8],
       [12],
       [16]])

In [33]:
bottom_right = data[-2:, -2:]
bottom_right

array([[11, 12],
       [15, 16]])

**(b)** Without using loops, write NumPy code to:
1. Create a 5x5 array where each element is the sum of its row and column indices (i.e., element at position `[i, j]` should equal `i + j`)
2. Normalize each row of a matrix so that each row sums to 1 (use broadcasting)

In [38]:
arr = np.arange(5).reshape(5, 1) + np.arange(5)
arr

array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])

In [40]:
# Given matrix for part 2
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]], dtype=float)

In [42]:
row_sums = matrix.sum(axis=1, keepdims=True)
normalized = matrix / row_sums
normalized

array([[0.16666667, 0.33333333, 0.5       ],
       [0.26666667, 0.33333333, 0.4       ],
       [0.29166667, 0.33333333, 0.375     ]])

**(c)** Explain the difference between the following two indexing operations. What are the shapes of `a` and `b`? Which one creates a view and which creates a copy?

In [46]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a = arr[0:1, :]
b = arr[[0], :]

(1, 3)

Both shapes are (1,3). a is a view; b is a copy.

## Problem 4: Improving Inefficient Code

The following code computes pairwise distances between points but is inefficient. Rewrite it using NumPy broadcasting to eliminate the nested loops. Your solution should produce the same result as the original code.

In [None]:
import numpy as np

def pairwise_distances_slow(points):
    """Compute pairwise Euclidean distances between points.

    Parameters
    ----------
    points : np.ndarray
        Array of shape (n, d) where n is number of points and d is dimensions.

    Returns
    -------
    np.ndarray
        Array of shape (n, n) containing pairwise distances.
    """
    diffs = points[:, None, :] - points[None, :, :]
    return np.sqrt(np.sum(diffs**2, axis=2))

## Problem 5: The View Trap

A colleague wrote the following function to standardize columns of a matrix (subtract mean, divide by standard deviation). However, users are reporting unexpected behavior.

In [47]:
import numpy as np

def standardize_columns(data):
    """Standardize each column to have mean 0 and std 1."""
    result = data
    for j in range(data.shape[1]):
        col = result[:, j]
        col_mean = np.mean(col)
        col_std = np.std(col)
        col = (col - col_mean) / col_std
    return result

**(a)** Explain why this function does not work as intended. What fundamental concept about NumPy arrays is being misunderstood?

The results is never actually udpated. Col is only a view of result.

**(b)** A user runs the following code and is surprised by the output. Explain what happens and why.

The normalization is not functioning here, as the orginal matrix is not been updated and no new matrix beenn created.

In [48]:
original = np.array([[1.0, 2.0], [3.0, 4.0]])
normalized = standardize_columns(original)
print("Original:", original)
print("Normalized:", normalized)
print("Are they the same object?", original is normalized)

Original: [[1. 2.]
 [3. 4.]]
Normalized: [[1. 2.]
 [3. 4.]]
Are they the same object? True


**(c)** Rewrite the function correctly. Your solution should actually standardize the columns, not modify the input array, and use vectorized operations instead of explicit loops where possible.



In [51]:
def standardize_columns_update(data):
    """Standardize each column to have mean 0 and std 1."""
    result = data
    for j in range(data.shape[1]):
        col = result[:, j]
        col_mean = np.mean(col)
        col_std = np.std(col)
        result[:, j] = (col - col_mean) / col_std
    return result

**(d)** Write a simple test that verifies your corrected function works properly. The test should check that the output columns have approximately mean 0 and standard deviation 1.

In [52]:
data = np.array([[1.0, 2.0],
                     [3.0, 4.0],
                     [5.0, 6.0]])
standardized = standardize_columns_update(data)
assert np.allclose(standardized.mean(axis=0), 0.0, atol=1e-8)
assert np.allclose(standardized.std(axis=0), 1.0, atol=1e-8)

print("All tests passed!")


All tests passed!
