# Homework: Documenting Your Code + Testing Your Code

## Problem 1 - Write docstrings

The following functions are missing docstrings. Write Google-style docstrings for each function, including `Args`, `Returns`, and `Raises` sections where appropriate. Make sure to document default values and explain what each parameter means.

In [1]:
import numpy as np


def normalize(data, method="zscore"):
    """
    Normalize numerical data using a specified method.

    Args:
        data (array-like): Numeric input data to be normalized.
        method (str, optional): Normalization method to use.
            Options are "zscore" or "minmax".
            Defaults to "zscore".

    Returns:
        np.ndarray: Normalized data array.

    Raises:
        ValueError: If an unknown normalization method is provided.
    """
    if method == "zscore":
        return (data - np.mean(data)) / np.std(data)
    elif method == "minmax":
        return (data - np.min(data)) / (np.max(data) - np.min(data))
    else:
        raise ValueError(f"Unknown method: {method}")


def weighted_mean(values, weights=None):
    """
    Compute the weighted mean of input values.

    Args:
        values (array-like): Numeric input values.
        weights (array-like, optional): Weights associated with each value.
            If None, the arithmetic mean is returned.
            Defaults to None.

    Returns:
        float: Weighted mean of the input values.

    Raises:
        ValueError: If `values` and `weights` have different lengths.
    """
    if weights is None:
        return np.mean(values)
    if len(values) != len(weights):
        raise ValueError("values and weights must have the same length")

    return np.sum(values * weights) / np.sum(weights)


def remove_outliers(data, threshold=3.0):
    """
    Remove outliers from numerical data using a standard deviation threshold.

    Data points are retained if their absolute distance from the mean is less
    than or equal to `threshold` times the standard deviation.

    Args:
        data (array-like): One-dimensional numeric input data.
        threshold (float, optional): Number of standard deviations from the
            mean used to identify outliers.
            Defaults to 3.0.

    Returns:
        np.ndarray: Array containing only data points within the specified
        threshold of the mean.
    """
    mean = np.mean(data)
    std = np.std(data)
    mask = np.abs(data - mean) <= threshold * std
    return data[mask]


In [2]:
help(remove_outliers)

Help on function remove_outliers in module __main__:

remove_outliers(data, threshold=3.0)
    Remove outliers from numerical data using a standard deviation threshold.

    Data points are retained if their absolute distance from the mean is less
    than or equal to `threshold` times the standard deviation.

    Args:
        data (array-like): One-dimensional numeric input data.
        threshold (float, optional): Number of standard deviations from the
            mean used to identify outliers.
            Defaults to 3.0.

    Returns:
        np.ndarray: Array containing only data points within the specified
        threshold of the mean.



## Problem 2 - Add type hints

The following functions have incomplete or missing type hints. Add appropriate type hints for all parameters and return values. Use `|` syntax for union types where a parameter can accept multiple types or return `None`.

In [3]:
import numpy as np

def clip_values(
    arr: np.ndarray,
    lower: float,
    upper: float
) -> np.ndarray:
    """Clip array values to be within [lower, upper] range."""
    return np.clip(arr, lower, upper)


def find_peaks(
    data: list[float] | np.ndarray,
    min_height: float | None = None
) -> list[int] | None:
    """Find indices where values are local maxima above min_height.

    Returns None if no peaks are found.
    """
    peaks = []
    for i in range(1, len(data) - 1):
        if data[i] > data[i - 1] and data[i] > data[i + 1]:
            if min_height is None or data[i] >= min_height:
                peaks.append(i)
    if len(peaks) == 0:
        return None
    return peaks


def summarize(
    data: np.ndarray | list[float],
    stats: list[str]
) -> dict[str, float]:
    """Calculate summary statistics for data.

    Args:
        data: Input array of numeric values.
        stats: List of statistic names to compute.
            Valid options: "mean", "median", "std", "min", "max"

    Returns:
        Dictionary mapping statistic names to computed values.
    """
    result = {}
    for stat in stats:
        if stat == "mean": result[stat] = np.mean(data)
        elif stat == "median":
            result[stat] = np.median(data)
        elif stat == "std":
            result[stat] = np.std(data)
        elif stat == "min":
            result[stat] = np.min(data)
        elif stat == "max":
            result[stat] = np.max(data)
    return result

## Problem 3: Identifying Test Types

For each scenario below, identify whether the test being described is a **unit test**, **integration test**, or **regression test**. Briefly explain your reasoning.

**(a)** You write a test that verifies `calculate_variance()` returns 0 for the input `[3.0, 3.0, 3.0]`.

This is unit test as this use controlled, known case to test a single function.

**(b)** After discovering that `fit_model()` crashes when given a dataset with a single row, you fix the bug and add a test with a one-row input.

Regression test, as this rerun the old test when the bug got fixed.

**(c)** You write a test that loads data from a CSV file, passes it through `clean_data()`, fits a model with `fit_linear_regression()`, and verifies the model's R-squared value is within an expected range.

Integration test, as this test several functions jointly.

**(d)** A user reports that `normalize()` returns incorrect values when all input values are negative. After fixing the issue, you add a test with input `[-5.0, -3.0, -1.0]`.

Regression test, as this test the case after the bug is fixed.
## Problem 4: Code Review - What's Wrong with These Tests?

Review the following test code and identify at least **four** problems with the test design or implementation. Explain why each is problematic and suggest how to fix it.

In [4]:
import numpy as np

def test_all_statistics():
    data = [10, 20, 30, 40, 50]

    # Test mean
    assert np.mean(data) == 30

    # Test median
    assert np.median(data) == 30

    # Test standard deviation
    assert np.std(data) > 0

    # Test min and max
    assert np.min(data) == 10
    assert np.max(data) == 50

    # Test sum
    assert np.sum(data) == 150

def verify_variance_positive(arr):
    var = np.var(arr)
    assert var >= 0

def test_correlation():
    x = np.array([1.0, 2.0, 3.0])
    y = np.array([2.0, 4.0, 6.0])
    corr = np.corrcoef(x, y)[0, 1]
    assert corr == 1.0

results = []

def test_append_result():
    global results
    results.append(42)
    assert 42 in results

def test_check_results():
    assert len(results) == 1

1. assert np.std(data) > 0 barely checks anything as many functions produce non-negative results. Suggesting using the known cases to test that.

2. verify_variance_positive(arr):  should be names as test_verify_variance_positive(arr):

3. global results can be problematic here, as this call the results in the global envir and when the order of test is not following what is above showed the test_check_results(): could failed. Suggesting when call the test_append_results, generate a nest list and put test_check_results inside test_append_results.

4. In the correlation test, suggesting using assert np.isclose(corr, 1.0, atol=1e-12), as the output of correlation gives a float object sometimes could be very close to 1 but not exactly 1.


## Problem 5: The Flaky Test

Your colleague wrote the following test for a bootstrap confidence interval function:

In [5]:
import numpy as np

def bootstrap_ci(data, confidence=0.95, n_bootstrap=1000):
    """Compute bootstrap confidence interval for the mean."""
    means = []
    n = len(data)
    for _ in range(n_bootstrap):
        sample = np.random.choice(data, size=n, replace=True)
        means.append(np.mean(sample))

    alpha = 1 - confidence
    lower = np.percentile(means, 100 * alpha / 2)
    upper = np.percentile(means, 100 * (1 - alpha / 2))
    return lower, upper

def test_bootstrap_ci_contains_true_mean():
    data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    true_mean = 5.5
    lower, upper = bootstrap_ci(data)
    assert lower < true_mean < upper

In [6]:
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
bootstrap_ci(data)

(np.float64(3.8), np.float64(7.4))

**(a)** The test passes most of the time but occasionally fails. Explain why this test is "flaky" (non-deterministic).

The test is non-deterministic because the bootstrap procedure is random and no seed is set, so the confidence interval changes across runs and may occasionally exclude the true mean.

**(b)** Your colleague argues: "The test is correct because a 95% confidence interval should contain the true mean 95% of the time, so occasional failures are expected." Is this a good argument for keeping the test as-is? Why or why not?

No. Unit tests must be deterministic; probabilistic correctness does not justify flaky tests. In short, a good should pass 100% when the code is right.

**(c)** Rewrite the test to be deterministic and reliable while still meaningfully testing the `bootstrap_ci` function. Your solution should: ensure reproducible results and verify that the confidence interval has reasonable properties.

See next cell.


**(d)** Propose an alternative testing strategy that could verify the 95% coverage property without making the test flaky. You don't need to implement it, but describe the approach.

We can do thing like below:
For 1000 simulated datasets:
    compute bootstrap CI
    check if true mean is inside

coverage = proportion of successes
assert 0.93 < coverage < 0.97


In [7]:
def test_bootstrap_ci_reasonable():
    np.random.seed(0)

    data = np.array([1,2,3,4,5,6,7,8,9,10])
    lower, upper = bootstrap_ci(data, confidence=0.95, n_bootstrap=1000)

    # CI bounds are ordered
    assert lower < upper

    # CI contains the sample mean (deterministic check)
    sample_mean = np.mean(data)
    assert lower <= sample_mean <= upper
