# CP201A Lecture: Testing for Statistical Significance
Fall 2025

Today, we're going to answer the question:  Which county saw the greatest increases in median rents between 2019 and 2023?

In [None]:
# Import the libraries and modules we need
import pandas as pd
from census import Census
import numpy as np

In [None]:
# Initialize the Census Data API connection with your API key
# This should look familiar by now, but talk through each step of the code with your team so you can explain what is happening
api_key = ''
c = Census(key=api_key)

# Define the dict of variables to pull and rename
variables_of_interest = {
    'NAME': 'NAME',
    'GEO_ID': 'GEO_ID',
    'B25064_001E': 'med_rent',
    'B25064_001M': 'med_rent_moe',
}

# Pull 2019
df_2019 = pd.DataFrame(
    c.acs1.get(
        list(variables_of_interest.keys()),
        {'for': 'county:*', 'in':'state:06'},
        year=2019
    )
).rename(columns=variables_of_interest)

# Pull 2023
df_2023 = pd.DataFrame(
    c.acs1.get(
        list(variables_of_interest.keys()),
        {'for': 'county:*', 'in':'state:06'},
        year=2023
    )
).rename(columns=variables_of_interest)

In [None]:
#we're going to do some cleaning - talk through with your team what is happening in each step of this code
df_2019 = df_2019.rename(
    columns={
        "med_rent": "med_rent_2019",
        "med_rent_moe": "med_rent_2019_moe"
    }
)

df_2023 = df_2023.rename(
    columns={
        "med_rent": "med_rent_2023",
        "med_rent_moe": "med_rent_2023_moe"
    }
)

# Merge and keep NAME from 2023
df_merged = pd.merge(
    df_2019[["state", "county", "med_rent_2019", "med_rent_2019_moe"]],
    df_2023[["state", "county", "NAME", "med_rent_2023", "med_rent_2023_moe"]],
    on=["state", "county"],
    how="inner"
)

df = df_merged[[
    "NAME", 
    "med_rent_2019", 
    "med_rent_2019_moe", 
    "med_rent_2023", 
    "med_rent_2023_moe"
]].copy()
df

In [None]:
#calculate the percent increase in median rents
#which counties have the greatest increase?

In [None]:
#adjust the 2019 values for inflation and the percent increase in median rents - what changed?
# I've provided the inflation factor for you (2019 -> 2023 dollars)
inflation_factor = 305.109 / 255.657  # ≈ 1.193

#you'll use that inflation factor to adjust 2019 estimates and MOEs to 2023 dollars

------------------------
# 2. Testing for statistically significant differences

## 2.1 Calculating standard errors

First we need to convert the 90% confidence level margins of error that come with the ACS data into standard errors. The formula to do so is $SE = \frac{MOE_{ACS}}{1.645},$ where $MOE_{ACS}$ is the 90% margin of error provided for the ACS estimate.


In [None]:
# Create new variables for the standard errors

## 2.2 Implementing the two-sample t-test of means

Let's review the formula for testing whether two sample estimates are statistically significantly different from each other:

$$\left|\frac{\hat{X}_1 - \hat{X}_2}{\sqrt{SE_1^2 + SE_2^2}}\right| > Z_{CL},$$
where:
* $\hat{X}_1$ and $\hat{X}_2$ are the estimates we're comparing (the hat over the $X$ just means that the value is an estimate)
* $SE_1$ and $SE_2$ are the corresponding *standard error* values, and
* $Z_{CL}$ is the z-score associated with a given *confidence level* (1.645 for 90%, 1.96 for 95%, 2.576 for 99%).

We have all our “ingredients” – we have the percent of renters who are cost burdened for each geography, as well as the associated standard error. Now we just need to implement this formula. It looks complicated, but we already know addition `+`, subtraction `-`, division `-`, and exponentiation `**` in Python. All we really need to complete the picture is how to take the *absolute value* of a number.

The absolute value of a real number $x$ is the non-negative value of $x$, without regard to its sign. In math formulas, $|x|$ denotes an absolute value. In Python, the function `abs(x)` returns the value of `x` if `x` is non-negative, or `-x` if `x` is negative. So `abs(4)` is 4, and `abs(-10)` is 10.


In [None]:
# Write code to calculate the Z test statistic for the change in median rent (dollar change adjusted for inflation)

In [None]:
#sort your data by the percent increase in median rents and look at the output of the dataframe

**What county saw the greatest increase in rents?**

**Which saw the smallest?**

**Are the values statistically significant?**
If so, at what confidence level?


In [None]:
#If you have time, can you do the same analysis for another state? 