https://www.huduser.gov/portal/dataset/uspszip-api.html

## This zipcode data is incomplete and does not include PO Boxes or major universities


### Purpose:

> This script downloads Massachusetts USPS ZIP code data from the HUD-USPS API
> and saves it as a CSV file for later use. The data includes ZIP codes, 
> geoid, residential/business/other ratios, total ratio, city, and state.

### Key Features:
- Uses the HUD-USPS API to get official ZIP code metadata.
- Reads the API key from the environment variable 'HUD_API_KEY'.
- Saves the results to a configurable output directory.
- Performs basic validation and error handling for API responses.

### Workflow / Steps:
1. Set the output directory and create it if it doesn't exist.
2. Define the HUD USPS API endpoint for Massachusetts.
3. Load the API token from environment variables.
    Raises an error if the token is missing.
4. Set request headers with the Bearer token for authentication.
5. Make an HTTP GET request to the API with a timeout.
6. Check the response status code; raise an error if the request failed.
7. Parse the JSON response safely; validate that expected keys exist.
8. Convert the 'results' data into a pandas DataFrame.
9. Save the DataFrame to CSV in the output directory.
10. Print a success message with the number of records saved.

### Notes:
- The HUD API returns metadata for all USPS ZIP codes in Massachusetts.
- The script preserves leading zeros in ZIP codes by reading the 'zip'
    column as a string when loading the CSV later.
- This CSV can be used for filtering ZIP codes by city/state, building 
    bounding boxes for geocoding, or linking to TIGER ZCTA shapefiles for GIS.
- Ensure the environment variable 'HUD_API_KEY' is set before running this script.


In [1]:
import os
import pandas as pd
import requests


# Output directory
output_dir = "../data/external/hud_data/"
os.makedirs(output_dir, exist_ok=True)


# HUD USPS API endpoint (MA = Massachusetts)
url = "https://www.huduser.gov/hudapi/public/usps?type=1&query=MA"


# Get API token from environment
token = os.environ.get("HUD_API_KEY")

if not token:
    raise RuntimeError("HUD_API_KEY environment variable is not set")


# Request headers
headers = {
    "Authorization": f"Bearer {token}"
}


# Make request
response = requests.get(url, headers=headers, timeout=30)


# Check response
if response.status_code != 200:
    raise RuntimeError(
        f"Request failed: {response.status_code} - {response.text}"
    )


# Parse JSON safely
data = response.json()

if "data" not in data or "results" not in data["data"]:
    raise ValueError("Unexpected response format")


# Convert to DataFrame
df = pd.DataFrame(data["data"]["results"])


# Save to CSV
output_file = os.path.join(output_dir, "ma_zips.csv")

df.to_csv(output_file, index=False)

print(f"Saved {len(df)} records to: {output_file}")





Saved 3213 records to: ../data/external/hud_data/ma_zips.csv


In [3]:
import pandas as pd

# Path to your CSV
csv_path = "../data/external/hud_data/ma_zips.csv"

# Load the CSV
df = pd.read_csv(csv_path, dtype={"zip": str})

# Filter for Worcester, MA
worcester_df = df[
    (df["city"].str.upper() == "WORCESTER") & 
    (df["state"].str.upper() == "MA")
]

# Extract ZIP codes as a list
worcester_zips = worcester_df["zip"].unique().tolist()

print("Worcester ZIP codes:", sorted(worcester_zips))


Worcester ZIP codes: ['01602', '01603', '01604', '01605', '01606', '01607', '01608', '01609', '01610', '01613', '01653', '01655']
