# Homework: Documenting Your Code + Testing Your Code

## Problem 1 - Write docstrings

The following functions are missing docstrings. Write Google-style docstrings for each function, including `Args`, `Returns`, and `Raises` sections where appropriate. Make sure to document default values and explain what each parameter means.

In [None]:
import numpy as np

def normalize(data, method="zscore"):
    """Normalize the input data using the specified method.

    Args:
        data (np.ndarray): Input data to normalize.
        method (str): Normalization method, either "zscore" (normalizes to mean 0 and std 1) 
                      or "minmax" (scales data to the range [0, 1]). Defaults to "zscore". 

    Returns:
        (np.ndarray) Normalized data.
    
    Raises:
        ValueError: If method is not "zscore" or "minmax".
    """
    if method == "zscore":
        return (data - np.mean(data)) / np.std(data)
    elif method == "minmax":
        return (data - np.min(data)) / (np.max(data) - np.min(data))
    else:
        raise ValueError(f"Unknown method: {method}")


def weighted_mean(values, weights=None):
    """Calculate the weighted mean.
    
    Args:
        values (np.ndarray): Array of values.
        weights (np.ndarray or None): Array of weights with the same length as values.
                                    If None, calculates the unweighted mean. Defaults to None.

    Returns:
        (float) Weighted mean of the values.
    
    Raises:
        ValueError: If weights is not None and its length does not match values.
    """
    if weights is None:
        return np.mean(values)
    if len(values) != len(weights):
        raise ValueError("values and weights must have the same length")
    return np.sum(values * weights) / np.sum(weights)


def remove_outliers(data, threshold=3.0):
    """Remove outliers from the data based on a z-score threshold.

    Args:
        data (np.ndarray): Input data from which to remove outliers.
        threshold (float): Z-score threshold to identify outliers. Defaults to 3.0.

    Returns:
        (np.ndarray) Data with outliers removed.
    """
    mean = np.mean(data)
    std = np.std(data)
    mask = np.abs(data - mean) <= threshold * std
    return data[mask]

## Problem 2 - Add type hints

The following functions have incomplete or missing type hints. Add appropriate type hints for all parameters and return values. Use `|` syntax for union types where a parameter can accept multiple types or return `None`.

In [None]:
import numpy as np

def clip_values(arr: np.ndarray, lower: float, upper: float) -> np.ndarray:
    """Clip array values to be within [lower, upper] range."""
    return np.clip(arr, lower, upper)


def find_peaks(data: np.ndarray | list, min_height: float | None = None) -> list | None:
    """Find indices where values are local maxima above min_height.

    Returns None if no peaks are found.
    """
    peaks = []
    for i in range(1, len(data) - 1):
        if data[i] > data[i - 1] and data[i] > data[i + 1]:
            if min_height is None or data[i] >= min_height:
                peaks.append(i)
    if len(peaks) == 0:
        return None
    return peaks


def summarize(data: np.ndarray, stats: list) -> dict:
    """Calculate summary statistics for data.

    Args:
        data: Input array of numeric values.
        stats: List of statistic names to compute.
            Valid options: "mean", "median", "std", "min", "max"

    Returns:
        Dictionary mapping statistic names to computed values.
    """
    result = {}
    for stat in stats:
        if stat == "mean": result[stat] = np.mean(data)
        elif stat == "median":
            result[stat] = np.median(data)
        elif stat == "std":
            result[stat] = np.std(data)
        elif stat == "min":
            result[stat] = np.min(data)
        elif stat == "max":
            result[stat] = np.max(data)
    return result

## Problem 3: Identifying Test Types

For each scenario below, identify whether the test being described is a **unit test**, **integration test**, or **regression test**. Briefly explain your reasoning.

**(a)** You write a test that verifies `calculate_variance()` returns 0 for the input `[3.0, 3.0, 3.0]`.

This is a unit test, since it verifies that an individual function works correctly in isolation.

**(b)** After discovering that `fit_model()` crashes when given a dataset with a single row, you fix the bug and add a test with a one-row input.

This is a regression test, as it verifies that previously fixed bugs don't reappear.

**(c)** You write a test that loads data from a CSV file, passes it through `clean_data()`, fits a model with `fit_linear_regression()`, and verifies the model's R-squared value is within an expected range.

This is an integration test, as it verifies that multiple components work together correctly.

**(d)** A user reports that `normalize()` returns incorrect values when all input values are negative. After fixing the issue, you add a test with input `[-5.0, -3.0, -1.0]`.

This is a regression test, as it verifies that previously fixed bugs don't reappear.

## Problem 4: Code Review - What's Wrong with These Tests?

Review the following test code and identify at least **four** problems with the test design or implementation. Explain why each is problematic and suggest how to fix it.

In [None]:
import numpy as np

#--------------------------------
# The following test function tests multiple things at one time.
# It should be split into separate test functions, each testing one aspect.
def test_all_statistics():
    data = [10, 20, 30, 40, 50]

    # Test mean
    assert np.mean(data) == 30

    # Test median
    assert np.median(data) == 30

    # Test standard deviation
    assert np.std(data) > 0

    # Test min and max
    assert np.min(data) == 10
    assert np.max(data) == 50

    # Test sum
    assert np.sum(data) == 150

#--------------------------------
# The following test function needs to be renamed.
# A good pattern is `test_<function>_<scenario>_<expected_result>`
# For example, the function name could be `test_npvar_returns_non_negative`
def verify_variance_positive(arr):
    var = np.var(arr)
    assert var >= 0

#--------------------------------
# The following test function doesn't take into account floating point precision.
# It should use `math.isclose` or other ways to compare with some tolerance.
def test_correlation():
    x = np.array([1.0, 2.0, 3.0])
    y = np.array([2.0, 4.0, 6.0])
    corr = np.corrcoef(x, y)[0, 1]
    assert corr == 1.0

#--------------------------------
# The two test functions below should be combined into one, and we should create a
# `results` list inside the function. Otherwise, the two tests will depend on each other.
results = []

def test_append_result():
    global results
    results.append(42)
    assert 42 in results

def test_check_results():
    assert len(results) == 1

## Problem 5: The Flaky Test

Your colleague wrote the following test for a bootstrap confidence interval function:

In [5]:
import numpy as np

def bootstrap_ci(data, confidence=0.95, n_bootstrap=1000):
    """Compute bootstrap confidence interval for the mean."""
    means = []
    n = len(data)
    for _ in range(n_bootstrap):
        sample = np.random.choice(data, size=n, replace=True)
        means.append(np.mean(sample))

    alpha = 1 - confidence
    lower = np.percentile(means, 100 * alpha / 2)
    upper = np.percentile(means, 100 * (1 - alpha / 2))
    return lower, upper

def test_bootstrap_ci_contains_true_mean():
    data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    true_mean = 5.5
    lower, upper = bootstrap_ci(data)
    assert lower < true_mean < upper

**(a)** The test passes most of the time but occasionally fails. Explain why this test is "flaky" (non-deterministic).

A random seed should have been set. Using random state will also work.

**(b)** Your colleague argues: "The test is correct because a 95% confidence interval should contain the true mean 95% of the time, so occasional failures are expected." Is this a good argument for keeping the test as-is? Why or why not?

The function to be tested involves randomness. Testing such code requires careful handling to get reproducible results. A "flaky" test isn't good.

**(c)** Rewrite the test to be deterministic and reliable while still meaningfully testing the `bootstrap_ci` function. Your solution should: ensure reproducible results and verify that the confidence interval has reasonable properties.

In [9]:
def test_bootstrap_ci_contains_true_mean_v2():
    np.random.seed(0)
    data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    true_mean = 5.5
    lower, upper = bootstrap_ci(data)
    assert lower < true_mean < upper

test_bootstrap_ci_contains_true_mean_v2()


**(d)** Propose an alternative testing strategy that could verify the 95% coverage property without making the test flaky. You don't need to implement it, but describe the approach.

In [None]:
def test_bootstrap_ci_contains_true_mean_v3():
    np.random.seed(0)
    data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    true_mean = 5.5
    
    n_trials = 200
    covered_count = 0

    for i in range(n_trials):
        lower, upper = bootstrap_ci(data)
        if lower < true_mean < upper:
            covered_count += 1

    coverage = covered_count / n_trials
    assert coverage >= 0.90

test_bootstrap_ci_contains_true_mean_v3()