# Cross-Docket Analysis

**Problem**: How can we map related dockets (RFI → Proposed Rule → Final Rule) and enable search across multiple agencies and rulemaking cycles?

This notebook demonstrates:
- Finding related dockets by title/keyword similarity
- Tracking rulemaking progression (RFI → NPRM → Final)
- Cross-agency topic analysis
- Building a docket relationship graph

In [1]:
import duckdb
import pandas as pd

R2_BASE_URL = "https://pub-5fc11ad134984edf8d9af452dd1849d6.r2.dev"

conn = duckdb.connect()
conn.execute("INSTALL httpfs; LOAD httpfs;")
print("✓ Ready")

✓ Ready


## 1. Search Dockets by Keyword

In [2]:
# Search for dockets containing specific keywords
keyword = "climate"  # Change this to search different topics

results = conn.execute(f"""
    SELECT docket_id, agency_code, title, docket_type, modify_date
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet')
    WHERE LOWER(title) LIKE '%{keyword.lower()}%'
       OR LOWER(abstract) LIKE '%{keyword.lower()}%'
    ORDER BY modify_date DESC
    LIMIT 25
""").fetchdf()

print(f"Found {len(results)} dockets matching '{keyword}'")
results

Found 25 dockets matching 'climate'


Unnamed: 0,docket_id,agency_code,title,docket_type,modify_date
0,FAR-2021-0015,FAR,Federal Acquisition Regulation: Disclosure of ...,Rulemaking,2026-01-06T15:15:26Z
1,FAR-2021-0016,FAR,Federal Acquisition Regulation: Minimizing th...,Rulemaking,2026-01-06T15:15:14Z
2,EPA-HQ-OAR-2023-0330,EPA,Review of Final Rule Reclassification of Major...,Rulemaking,2026-01-02T17:24:36Z
3,EPA-HQ-OAR-2025-0162,EPA,Extension of Deadlines in Standards of Perform...,Rulemaking,2025-12-03T15:03:57Z
4,BOEM-2023-0005,BOEM,Renewable Energy Modernization Rule,Rulemaking,2025-10-10T19:54:11Z
5,EPA-HQ-OAR-2024-0358,EPA,Reconsideration of Oil and Natural Gas Sector ...,Rulemaking,2025-10-10T17:05:28Z
6,DOE-HQ-2025-0207,DOE,A Critical Review of Impacts of Greenhouse Gas...,Rulemaking,2025-10-07T14:00:20Z
7,DOD-2025-OS-0342,DOD,Defense Organizational Climate Survey (DEOCS);...,Nonrulemaking,2025-07-29T12:47:52Z
8,EPA-HQ-OAR-2021-0317,EPA,"Standards of Performance for New, Reconstructe...",Rulemaking,2025-07-23T17:07:06Z
9,EPA-HQ-OAR-2023-0492,EPA,Transportation and Climate Division (TCD) Gran...,Nonrulemaking,2025-07-21T11:05:58Z


## 2. Cross-Agency Topic Analysis

In [3]:
# Which agencies have dockets on this topic?
agency_breakdown = conn.execute(f"""
    SELECT agency_code, COUNT(*) as docket_count
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet')
    WHERE LOWER(title) LIKE '%{keyword.lower()}%'
       OR LOWER(abstract) LIKE '%{keyword.lower()}%'
    GROUP BY agency_code
    ORDER BY docket_count DESC
    LIMIT 15
""").fetchdf()

print(f"Agencies with '{keyword}' dockets:")
agency_breakdown

Agencies with 'climate' dockets:


Unnamed: 0,agency_code,docket_count
0,EPA,217
1,NOAA,96
2,FS,33
3,DOS,20
4,BOEM,20
5,DOD,17
6,FWS,17
7,CEQ,13
8,NCUA,7
9,ITA,7


## 3. Docket Type Distribution

In [4]:
# What types of dockets exist for this topic?
type_breakdown = conn.execute(f"""
    SELECT docket_type, COUNT(*) as count
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet')
    WHERE LOWER(title) LIKE '%{keyword.lower()}%'
       OR LOWER(abstract) LIKE '%{keyword.lower()}%'
    GROUP BY docket_type
    ORDER BY count DESC
""").fetchdf()

type_breakdown

Unnamed: 0,docket_type,count
0,Nonrulemaking,270
1,Rulemaking,234


## 4. Find Related Dockets

Look for dockets with similar titles that might be part of the same rulemaking process.

In [5]:
# Start with a specific docket and find related ones
base_docket_id = "EPA-HQ-OAR-2021-0317"  # Change this

# Get the base docket info
base = conn.execute(f"""
    SELECT docket_id, agency_code, title, docket_type
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet')
    WHERE docket_id = '{base_docket_id}'
""").fetchdf()

print("Base docket:")
print(f"  {base['docket_id'].iloc[0]}: {base['title'].iloc[0][:80]}...")

# Extract key terms from title (first 5 significant words)
title = base['title'].iloc[0]
words = [w for w in title.split() if len(w) > 4][:3]
search_pattern = '%' + '%'.join(words) + '%'

print(f"\nSearching for related dockets with pattern: {words}")

Base docket:
  EPA-HQ-OAR-2021-0317: Standards of Performance for New, Reconstructed, and Modified  Sources and Emiss...

Searching for related dockets with pattern: ['Standards', 'Performance', 'Reconstructed,']


In [6]:
# Find dockets with similar titles from the same agency
agency = base['agency_code'].iloc[0]

related = conn.execute(f"""
    SELECT docket_id, title, docket_type, modify_date
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet')
    WHERE agency_code = '{agency}'
      AND docket_id != '{base_docket_id}'
      AND (
          LOWER(title) LIKE '%{words[0].lower()}%'
          OR LOWER(title) LIKE '%{words[1].lower() if len(words) > 1 else words[0].lower()}%'
      )
    ORDER BY modify_date DESC
    LIMIT 15
""").fetchdf()

print(f"Potentially related {agency} dockets:")
related

Potentially related EPA dockets:


Unnamed: 0,docket_id,title,docket_type,modify_date
0,EPA-HQ-OAR-2004-0022,National Emission Standards for Hazardous Air ...,Rulemaking,2026-01-14T14:51:19Z
1,EPA-R06-OAR-2020-0086,OK035 Oklahoma Request for Delegation of Natio...,Rulemaking,2026-01-13T23:44:38Z
2,EPA-HQ-OAR-2024-0505,Renewable Fuel Standard (RFS) Program: Standar...,Rulemaking,2026-01-08T20:51:14Z
3,EPA-R08-OAR-2025-0001,"Boundary Expansion of Northern Wasatch Front, ...",Rulemaking,2026-01-06T14:23:39Z
4,EPA-R03-OAR-2025-1872,Proposed Revisions of the Nonattainment Design...,Rulemaking,2026-01-05T14:29:12Z
5,EPA-HQ-OW-2001-0009,Draft National Beach Guidance and Performance ...,Nonrulemaking,2026-01-02T22:13:27Z
6,EPA-HQ-OPPT-2017-0245,Voluntary Consensus Standards Update; Formalde...,Rulemaking,2025-12-30T22:54:53Z
7,EPA-R05-OAR-2025-0013,Ohio Emergency Episode and Ambient Air Quality...,Rulemaking,2025-12-29T12:37:27Z
8,EPA-R09-OAR-2025-2833,Determination of Attainment by the Attainment ...,Rulemaking,2025-12-22T17:24:35Z
9,EPA-R06-OAR-2010-1054,LA034 Louisiana; New Source Performance Standa...,Rulemaking,2025-12-19T19:18:13Z


## 5. Timeline of Related Dockets

In [7]:
# Build a timeline showing potential rulemaking progression
all_related = conn.execute(f"""
    SELECT docket_id, title, docket_type, modify_date
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet')
    WHERE agency_code = '{agency}'
      AND (
          LOWER(title) LIKE '%{words[0].lower()}%'
      )
    ORDER BY modify_date ASC
""").fetchdf()

print("Rulemaking Timeline:")
print("=" * 80)
for _, row in all_related.iterrows():
    date = str(row['modify_date'])[:10] if row['modify_date'] else 'Unknown'
    dtype = row['docket_type'] or 'Unknown'
    title = row['title'][:60] if row['title'] else 'No title'
    print(f"{date} | {dtype:15} | {title}...")

Rulemaking Timeline:
2007-09-22 | Nonrulemaking   | TX040 Texas: National Emission Standards for Hazardous Air P...
2007-09-22 | Nonrulemaking   | TX004 Texas: National Emission Standards for Hazardous Air P...
2011-06-12 | Rulemaking      | NM006 New Mexico Albuquerque/Bernalillo County Air Quality C...
2011-06-12 | Rulemaking      | LA006 Louisiana: National Emission Standards for Hazardous A...
2011-06-12 | Rulemaking      | OK003 Oklahoma: National Emission Standards for Hazardous Ai...
2011-06-12 | Rulemaking      | Wisconsin PM 2.5 Standards...
2011-06-12 | Rulemaking      | Ohio Revisions to PM Standards OAC 3745-17...
2011-06-12 | Rulemaking      | NM008 New Mexico: Albuquerque/Bernalillo County Air Quality ...
2011-06-12 | Rulemaking      | Credible Evidence Rule / PM Standards and Definitions / Annu...
2011-06-12 | Rulemaking      | Ohio Open Burning Standards, OAC Chapter 3745-19...
2011-06-30 | Rulemaking      | Approval and Disapproval and Promulgation of Air Quality Imp..

## 6. Comment Volume Comparison

In [8]:
# Compare comment volumes across related dockets
docket_ids = all_related['docket_id'].tolist()[:10]  # Top 10 related
docket_list = "', '".join(docket_ids)

volume = conn.execute(f"""
    SELECT 
        c.docket_id,
        d.title,
        COUNT(*) as comment_count
    FROM read_parquet('{R2_BASE_URL}/comments.parquet') c
    JOIN read_parquet('{R2_BASE_URL}/dockets.parquet') d
        ON c.docket_id = d.docket_id
    WHERE c.docket_id IN ('{docket_list}')
    GROUP BY c.docket_id, d.title
    ORDER BY comment_count DESC
""").fetchdf()

print("Comment volumes across related dockets:")
volume

Comment volumes across related dockets:


Unnamed: 0,docket_id,title,comment_count
0,EPA-R05-OAR-2009-0731,Wisconsin PM 2.5 Standards,1
