# Address-Based Geocoding with the U.S. Census Geocoder (Python)

In this notebook, we convert **street addresses** into **latitude/longitude**
coordinates using the U.S. Census Bureau’s Geocoding Service.

By the end, you will be able to:
- Send addresses to the Census Geocoder API
- Extract coordinates from the API response
- Save results into a DataFrame/GeoDataFrame
- Quickly visualize points on an interactive map

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
# importing necessary packages
import pandas as pd
import geopandas as gpd
import numpy as np
import requests
import folium

## Part 1 - Converting Street Address to Lat/Lon

### What is address-based geocoding?

Address-based geocoding converts a street address like:

`1600 Pennsylvania Ave NW, Washington, DC 20500`

into coordinates like:

`(lat, lon) = (38.8977, -77.0365)`

In this notebook we use the **U.S. Census Geocoder** (best for U.S. addresses).
Geocoding results depend heavily on address quality — missing street numbers,
typos, or vague locations can return no match.

In [3]:
def get_coordinates(address):
    base_url = "https://geocoding.geo.census.gov/geocoder/locations/onelineaddress"
    params = {
        "address": address,
        "benchmark": "4",  # Public_AR_Current
        "format": "json"
    }
    headers = {"User-Agent": "census-geocoding-notebook (educational use)"}

    try:
        r = requests.get(base_url, params=params, headers=headers, timeout=30)
        r.raise_for_status()
        data = r.json()
        matches = data.get("result", {}).get("addressMatches", [])
        if matches:
            coords = matches[0]["coordinates"]
            return coords["y"], coords["x"]   # (lat, lon)
    except Exception as e:
        # Optional: print(e) for debugging
        return None, None

    return None, None

## Input Data Format (Example)

We will read a simple text file of addresses. The file should contain:

- Column 1: Name (optional)
- Column 2: Address (required)

Example (tab-separated):
Name<TAB>Address


### Important note on address quality

Geocoders require **specific, complete addresses**.
If the address is missing a street number, city, or state (or contains typos),
the geocoder may return no match.

In [4]:
# Download the Museum Data
!wget https://raw.githubusercontent.com/SpatialTurn/DataCollection-Notebooks/main/Census/museums.txt

--2025-12-16 18:50:36--  https://raw.githubusercontent.com/SpatialTurn/DataCollection-Notebooks/main/Census/museums.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1062 (1.0K) [text/plain]
Saving to: ‘museums.txt’


2025-12-16 18:50:36 (39.1 MB/s) - ‘museums.txt’ saved [1062/1062]



In [5]:
# Read a tab-delimited text file with two columns: Name and Address
file_path = "museums.txt"   # change to your file name
df = pd.read_csv(file_path, sep="\t", header=None, names=["Name", "Address"])
df.head()

Unnamed: 0,Name,Address
0,The Metropolitan Museum of Art,"1000 5th Ave, New York, NY 10028"
1,The Museum of Modern Art (MoMA),"11 W 53rd St, New York, NY 10019"
2,The Art Institute of Chicago,"111 S Michigan Ave, Chicago, IL 60603"
3,Smithsonian National Museum of Natural History,"10th St. & Constitution Ave. NW, Washington, D..."
4,The Getty Center,"1200 Getty Center Dr, Los Angeles, CA 90049"


In [6]:
df["Latitude"], df["Longitude"] = zip(*df["Address"].apply(get_coordinates))

# Quick quality check
match_rate = df["Latitude"].notna().mean()
print(f"Match rate: {match_rate:.1%}")

gdf = gpd.GeoDataFrame(
    df,
    geometry=gpd.points_from_xy(df.Longitude, df.Latitude),
    crs="EPSG:4326"
)

Match rate: 80.0%


In [7]:
gdf.head()

Unnamed: 0,Name,Address,Latitude,Longitude,geometry
0,The Metropolitan Museum of Art,"1000 5th Ave, New York, NY 10028",,,POINT (NaN NaN)
1,The Museum of Modern Art (MoMA),"11 W 53rd St, New York, NY 10019",40.760627,-73.976177,POINT (-73.97618 40.76063)
2,The Art Institute of Chicago,"111 S Michigan Ave, Chicago, IL 60603",41.880649,-87.624172,POINT (-87.62417 41.88065)
3,Smithsonian National Museum of Natural History,"10th St. & Constitution Ave. NW, Washington, D...",38.892079,-77.025989,POINT (-77.02599 38.89208)
4,The Getty Center,"1200 Getty Center Dr, Los Angeles, CA 90049",34.08307,-118.476108,POINT (-118.47611 34.08307)


### Why some addresses fail

If an address is incomplete, misspelled, or too vague, the Census Geocoder may return no match.
This is normal — always review unmatched rows and fix inputs where possible.


### Inspect unmatched rows

After geocoding, filter rows where Latitude/Longitude are missing.
These are the addresses you should revisit and correct.

In [8]:
# interactive mapping
center_lat = df["Latitude"].dropna().mean()
center_lon = df["Longitude"].dropna().mean()

m = folium.Map(location=[center_lat, center_lon], zoom_start=4)

for _, row in df.iterrows():
    if pd.notna(row["Latitude"]) and pd.notna(row["Longitude"]):
        folium.Marker(
            location=[row["Latitude"], row["Longitude"]],
            popup=row["Name"],
            tooltip=row["Address"]
        ).add_to(m)

m