# Spicy Regs Data Access

Query federal regulations data from Cloudflare R2 using DuckDB.

Data source: [regulations.gov](https://www.regulations.gov/) via [Mirrulations](https://github.com/MoravianUniversity/mirrulations)

In [1]:
# Install dependencies (run once)
# !pip install duckdb pandas

In [2]:
import duckdb

# R2 public URL for Spicy Regs data
R2_BASE_URL = "https://pub-5fc11ad134984edf8d9af452dd1849d6.r2.dev"

# Initialize DuckDB with HTTP support
conn = duckdb.connect()
conn.execute("INSTALL httpfs; LOAD httpfs;")
print("✓ DuckDB initialized with HTTP support")

✓ DuckDB initialized with HTTP support


## Data Overview

The data is stored in 3 Parquet files:
- `dockets.parquet` - Regulatory dockets (rulemaking proceedings)
- `documents.parquet` - Documents within dockets (proposed rules, notices)
- `comments.parquet` - Public comments on documents

In [3]:
# Get data statistics
for data_type in ["dockets", "documents", "comments"]:
    url = f"{R2_BASE_URL}/{data_type}.parquet"
    result = conn.execute(f"""
        SELECT 
            COUNT(*) as total_rows,
            COUNT(DISTINCT agency_code) as agencies
        FROM read_parquet('{url}')
    """).fetchone()
    print(f"{data_type}: {result[0]:,} rows, {result[1]} agencies")

dockets: 346,173 rows, 194 agencies
documents: 2,009,957 rows, 315 agencies
comments: 24,774,674 rows, 178 agencies


## Query Dockets

In [4]:
# Recent EPA dockets
conn.execute(f"""
    SELECT docket_id, title, docket_type, modify_date
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet')
    WHERE agency_code = 'EPA'
    ORDER BY modify_date DESC
    LIMIT 10
""").fetchdf()

Unnamed: 0,docket_id,title,docket_type,modify_date
0,EPA-HQ-OW-2025-0322,Updated Definition of Waters of the United States,Rulemaking,2026-01-14T21:11:07Z
1,EPA-HQ-OPP-2024-0239,Pyriofenone on Apple and Cherry subgroup 12-12...,Rulemaking,2026-01-14T19:51:45Z
2,EPA-R08-OAR-2021-0418,"CR Group, LLC - Tekoi Landfill, Part 71 Renewa...",Nonrulemaking,2026-01-14T18:33:24Z
3,EPA-HQ-OPPT-2020-0549,Reporting and Recordkeeping for Perfluoroalkyl...,Rulemaking,2026-01-14T18:23:07Z
4,EPA-R09-OAR-2025-1938,Air Plan Approval; California; San Joaquin Val...,Rulemaking,2026-01-14T17:50:00Z
5,EPA-HQ-OA-2006-0734,EPA Training Dockets,Rulemaking,2026-01-14T17:27:45Z
6,EPA-HQ-OPPT-2018-0503,"Dibutyl phthalate (DBP) (1,2-Benzene- dicarbox...",Nonrulemaking,2026-01-14T16:39:00Z
7,EPA-HQ-OPPT-2018-0501,"Butyl benzyl phthalate (BBP) 1,2-Benzene- dica...",Nonrulemaking,2026-01-14T16:11:41Z
8,EPA-HQ-OPPT-2018-0433,"Di-ethylhexyl phthalate (DEHP)(1,2-Benzene- di...",Nonrulemaking,2026-01-14T16:06:59Z
9,EPA-R08-OAR-2025-0669,"Marathon Oil Company - Brodahl Pad, Synthetic ...",Nonrulemaking,2026-01-14T15:53:42Z


In [5]:
# Top agencies by docket count
conn.execute(f"""
    SELECT agency_code, COUNT(*) as docket_count
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet')
    GROUP BY agency_code
    ORDER BY docket_count DESC
    LIMIT 15
""").fetchdf()

Unnamed: 0,agency_code,docket_count
0,FDA,74303
1,FAA,58578
2,EPA,34093
3,DOT,14520
4,USCG,14055
5,FMCSA,10093
6,NOAA,9569
7,PHMSA,8522
8,NRC,6630
9,NHTSA,5376


## Query Documents

In [6]:
# Documents with open comment periods
# Note: dates are stored as strings, so we cast to DATE for comparison
conn.execute(f"""
    SELECT document_id, agency_code, title, comment_start_date, comment_end_date
    FROM read_parquet('{R2_BASE_URL}/documents.parquet')
    WHERE comment_end_date IS NOT NULL
      AND TRY_CAST(comment_end_date AS DATE) > CURRENT_DATE
    ORDER BY comment_end_date ASC
    LIMIT 10
""").fetchdf()

Unnamed: 0,document_id,agency_code,title,comment_start_date,comment_end_date
0,DEA-2025-0918-0002,DEA,Schedules of Controlled Substances: Placement ...,2025-12-16T05:00:00Z,2026-01-16T04:59:59Z
1,DEA_FRDOC_0001-0498,DEA,Schedules of Controlled Substances: Placement ...,2025-12-16T05:00:00Z,2026-01-16T04:59:59Z
2,DOD-2025-OS-0476-0002,DOD,Agency Information Collection Activities; Prop...,2025-12-16T05:00:00Z,2026-01-16T04:59:59Z
3,DOD-2025-OS-0475-0002,DOD,Agency Information Collection Activities; Prop...,2025-12-16T05:00:00Z,2026-01-16T04:59:59Z
4,DOD-2025-OS-0540-0003,DOD,Agency Information Collection Activities; Prop...,2025-12-16T05:00:00Z,2026-01-16T04:59:59Z
5,DOL_FRDOC_0001-2598,DOL,Agency Information Collection Activities; Prop...,2025-12-16T05:00:00Z,2026-01-16T04:59:59Z
6,DOL_FRDOC_0001-2597,DOL,Agency Information Collection Activities; Prop...,2025-12-16T05:00:00Z,2026-01-16T04:59:59Z
7,EPA_FRDOC_0001-32440,EPA,Pesticide Petitions: Residues of Pesticide Che...,2025-12-16T05:00:00Z,2026-01-16T04:59:59Z
8,EPA_FRDOC_0001-32464,EPA,"Charter Amendments, Establishments, Renewals a...",2026-01-02T05:00:00Z,2026-01-16T04:59:59Z
9,EPA_FRDOC_0001-32439,EPA,Pesticide Product Registration: Applications f...,2025-12-16T05:00:00Z,2026-01-16T04:59:59Z


## Query Comments

In [7]:
# Recent comments
conn.execute(f"""
    SELECT comment_id, docket_id, agency_code, title, posted_date
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    ORDER BY posted_date DESC
    LIMIT 10
""").fetchdf()

Unnamed: 0,comment_id,docket_id,agency_code,title,posted_date
0,AMS-FGIS-25-0155-0006,AMS-FGIS-25-0155,AMS,Comment from Anonymous,2026-01-14T05:00:00Z
1,AMS-FGIS-25-0155-0005,AMS-FGIS-25-0155,AMS,"Comment from Riceland Foods, Inc.",2026-01-14T05:00:00Z
2,BOEM-2025-0351-1091,BOEM-2025-0351,BOEM,"Comment from Miyawaki, Rintaro",2026-01-14T05:00:00Z
3,BOEM-2025-0351-1090,BOEM-2025-0351,BOEM,"Comment from Drabe, Simon",2026-01-14T05:00:00Z
4,BOEM-2025-0351-1089,BOEM-2025-0351,BOEM,Comment from Anonymous,2026-01-14T05:00:00Z
5,BOEM-2025-0351-1085,BOEM-2025-0351,BOEM,"Comment from C, K",2026-01-14T05:00:00Z
6,BOEM-2025-0351-1094,BOEM-2025-0351,BOEM,Comment from Anonymous,2026-01-14T05:00:00Z
7,BOEM-2025-0351-1096,BOEM-2025-0351,BOEM,"Comment from Obyrne, Nancy",2026-01-14T05:00:00Z
8,BOEM-2025-0351-1095,BOEM-2025-0351,BOEM,"Comment from Lee, Minnie",2026-01-14T05:00:00Z
9,BOEM-2025-0351-1118,BOEM-2025-0351,BOEM,"Comment from Knoeppel, Kathleen",2026-01-14T05:00:00Z


In [8]:
# Comments on a specific docket
docket_id = "EPA-HQ-OW-2025-0322"  # Change this to any docket ID

conn.execute(f"""
    SELECT comment_id, title, posted_date
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    WHERE docket_id = '{docket_id}'
    ORDER BY posted_date DESC
    LIMIT 20
""").fetchdf()

Unnamed: 0,comment_id,title,posted_date
0,EPA-HQ-OW-2025-0322-1213,Comment submitted by Dianna Cohen,2026-01-14T05:00:00Z
1,EPA-HQ-OW-2025-0322-1204,Comment submitted by Lauren Redfield,2026-01-14T05:00:00Z
2,EPA-HQ-OW-2025-0322-1202,Anonymous public comment,2026-01-14T05:00:00Z
3,EPA-HQ-OW-2025-0322-1229,Comment submitted by Rob Erbele,2026-01-14T05:00:00Z
4,EPA-HQ-OW-2025-0322-1215,Comment submitted by Michael Shore,2026-01-14T05:00:00Z
5,EPA-HQ-OW-2025-0322-1219,Comment submitted by Homer Hansen,2026-01-14T05:00:00Z
6,EPA-HQ-OW-2025-0322-1228,Comment submitted by Stanley Weston,2026-01-14T05:00:00Z
7,EPA-HQ-OW-2025-0322-1227,Comment submitted by Nicholas Pandolfi,2026-01-14T05:00:00Z
8,EPA-HQ-OW-2025-0322-1232,Comment submitted by Scott Glass,2026-01-14T05:00:00Z
9,EPA-HQ-OW-2025-0322-1233,Anonymous public comment,2026-01-14T05:00:00Z


## Advanced: Join Across Tables

In [9]:
# Dockets with comment counts
conn.execute(f"""
    SELECT 
        d.docket_id,
        d.agency_code,
        d.title,
        COUNT(c.comment_id) as comment_count
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet') d
    LEFT JOIN read_parquet('{R2_BASE_URL}/comments.parquet') c
        ON d.docket_id = c.docket_id
    WHERE d.agency_code = 'EPA'
    GROUP BY d.docket_id, d.agency_code, d.title
    ORDER BY comment_count DESC
    LIMIT 10
""").fetchdf()

Unnamed: 0,docket_id,agency_code,title,comment_count
0,EPA-HQ-OAR-2023-0072,EPA,New Source Performance Standards for GHG Emiss...,690660
1,EPA-HQ-OAR-2022-0829,EPA,Multi-Pollutant Emissions Standards for Model ...,366420
2,EPA-HQ-OAR-2015-0072,EPA,Review of the National Ambient Air Quality Sta...,242821
3,EPA-HQ-OAR-2018-0794,EPA,National Emission Standards for Hazardous Air ...,242220
4,EPA-HQ-OAR-2022-0985,EPA,Greenhouse Gas Emissions Standards for Heavy-D...,155700
5,EPA-HQ-OW-2022-0801,EPA,National Primary Drinking Water Regulations: L...,129360
6,EPA-HQ-OW-2009-0819,EPA,Rulemaking for the Steam Electric Power Genera...,123156
7,EPA-HQ-OAR-2022-0730,EPA,New Source Performance Standards for the Synth...,117880
8,EPA-HQ-OW-2022-0114,EPA,Per- and polyfluoroalkyl substances (PFAS): Pe...,95030
9,EPA-HQ-OAR-2021-0317,EPA,"Standards of Performance for New, Reconstructe...",80058
