# Influence Mapping

**Problem**: How can we link public comments to specific changes in final rules and identify which commenters influenced regulatory outcomes?

**Stakeholder Quotes:**
- "Develop tools to assess whether and how comments influenced the final rule."
- "Which entities actually impact regulatory filings?"

This notebook demonstrates:
- Identifying high-impact commenters
- Tracking comment-to-rule connections
- Analyzing agency responsiveness patterns

In [4]:
import duckdb
import pandas as pd

R2_BASE_URL = "https://pub-5fc11ad134984edf8d9af452dd1849d6.r2.dev"

conn = duckdb.connect()
conn.execute("INSTALL httpfs; LOAD httpfs;")
print("✓ Ready")

✓ Ready


## 1. Identify High-Impact Commenters

Find organizations that frequently submit detailed comments.

In [5]:
# Commenters with long, substantive comments across multiple dockets
high_impact = conn.execute(f"""
    SELECT 
        title as commenter,
        COUNT(*) as total_comments,
        COUNT(DISTINCT docket_id) as dockets,
        COUNT(DISTINCT agency_code) as agencies,
        AVG(LENGTH(comment)) as avg_length
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    WHERE title IS NOT NULL
      AND LENGTH(comment) > 1000  -- Substantive comments only
    GROUP BY title
    HAVING COUNT(DISTINCT docket_id) > 10
    ORDER BY dockets DESC, avg_length DESC
    LIMIT 30
""").fetchdf()

print("High-impact commenters (substantive comments across many dockets):")
high_impact

High-impact commenters (substantive comments across many dockets):


Unnamed: 0,commenter,total_comments,dockets,agencies,avg_length
0,Comment from Anonymous,136803,2799,59,2007.894513
1,Anonymous public comment,37889,1685,7,1989.673573
2,Submitted Electronically via eRulemaking Portal,343477,575,2,1765.645807
3,Comment Submitted by Anonymous,20673,438,13,2038.952789
4,Comment from Anonymous Anonymous,8103,417,35,2351.049981
5,Anonymous,3916,365,21,2195.596272
6,Comment on FR Doc # N/A,78883,334,58,1930.02794
7,Anonymous - Comments,2055,271,12,1862.339659
8,Comment from WhoPoo App,232,210,29,3090.086207
9,Anonymous Public Comment,5971,168,3,1864.821973


## 2. Agency Responsiveness Analysis

Which agencies receive the most public engagement?

In [6]:
# Comment volume by agency
agency_engagement = conn.execute(f"""
    SELECT 
        agency_code,
        COUNT(*) as total_comments,
        COUNT(DISTINCT docket_id) as dockets,
        ROUND(COUNT(*) * 1.0 / COUNT(DISTINCT docket_id), 1) as avg_comments_per_docket
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    GROUP BY agency_code
    ORDER BY total_comments DESC
    LIMIT 20
""").fetchdf()

print("Agencies by public comment volume:")
agency_engagement

Agencies by public comment volume:


Unnamed: 0,agency_code,total_comments,dockets,avg_comments_per_docket
0,FWS,2489321,1624,1532.8
1,FDA,1807294,5097,354.6
2,CMS,1289068,1269,1015.8
3,EPA,1122190,9349,120.0
4,HHS,1089930,157,6942.2
5,CFPB,1062348,437,2431.0
6,CEQ,1025541,16,64096.3
7,ED,980398,1410,695.3
8,ATF,975423,22,44337.4
9,BLM,848986,57,14894.5


## 3. Track Entity Across Rulemaking

In [7]:
# Track a specific org's regulatory engagement
org_name = "Chamber of Commerce"  # Change this

org_history = conn.execute(f"""
    SELECT 
        c.agency_code,
        c.docket_id,
        d.title as docket_title,
        c.posted_date,
        LENGTH(c.comment) as comment_length
    FROM read_parquet('{R2_BASE_URL}/comments.parquet') c
    LEFT JOIN read_parquet('{R2_BASE_URL}/dockets.parquet') d
        ON c.docket_id = d.docket_id
    WHERE LOWER(c.title) LIKE '%{org_name.lower()}%'
    ORDER BY c.posted_date DESC
    LIMIT 20
""").fetchdf()

print(f"{org_name} regulatory engagement:")
org_history

Chamber of Commerce regulatory engagement:


Unnamed: 0,agency_code,docket_id,docket_title,posted_date,comment_length
0,EPA,EPA-HQ-OW-2025-0322,Updated Definition of Waters of the United States,2026-01-08T05:00:00Z,20
1,EPA,EPA-HQ-OW-2025-0322,Updated Definition of Waters of the United States,2026-01-08T05:00:00Z,93
2,EPA,EPA-HQ-OPPT-2020-0549,Reporting and Recordkeeping for Perfluoroalkyl...,2025-12-31T05:00:00Z,20
3,EPA,EPA-HQ-OPPT-2020-0549,Reporting and Recordkeeping for Perfluoroalkyl...,2025-12-31T05:00:00Z,20
4,EPA,EPA-HQ-OPPT-2020-0549,Reporting and Recordkeeping for Perfluoroalkyl...,2025-12-31T05:00:00Z,20
5,FWS,FWS-HQ-ES-2025-0039,Endangered and Threatened Wildlife and Plants;...,2025-12-23T05:00:00Z,59
6,FWS,FWS-HQ-ES-2025-0044,Endangered and Threatened Wildlife and Plants;...,2025-12-23T05:00:00Z,90
7,FWS,FWS-HQ-ES-2025-0044,Endangered and Threatened Wildlife and Plants;...,2025-12-23T05:00:00Z,47
8,FWS,FWS-HQ-ES-2025-0048,Endangered and Threatened Wildlife and Plants;...,2025-12-23T05:00:00Z,59
9,FWS,FWS-HQ-ES-2025-0048,Endangered and Threatened Wildlife and Plants;...,2025-12-23T05:00:00Z,84


## 4. Comment Length as Influence Proxy

Longer, more detailed comments may indicate greater effort and potential influence.

In [8]:
# Find the most detailed comments in a docket
docket_id = "EPA-HQ-OAR-2021-0317"  # Change this

detailed_comments = conn.execute(f"""
    SELECT 
        comment_id,
        title as commenter,
        LENGTH(comment) as length,
        posted_date
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    WHERE docket_id = '{docket_id}'
      AND comment IS NOT NULL
    ORDER BY LENGTH(comment) DESC
    LIMIT 20
""").fetchdf()

print(f"Most detailed comments on {docket_id}:")
detailed_comments

Most detailed comments on EPA-HQ-OAR-2021-0317:


Unnamed: 0,comment_id,commenter,length,posted_date
0,EPA-HQ-OAR-2021-0317-2490,Mass Comment Campaign sponsored by Mom Clean A...,7789,2023-02-24T05:00:00Z
1,EPA-HQ-OAR-2021-0317-2195,Comment submitted by Environmental Health Proj...,5274,2023-02-14T05:00:00Z
2,EPA-HQ-OAR-2021-0317-0784,Comment submitted by Shawn Dolan,5240,2022-02-02T05:00:00Z
3,EPA-HQ-OAR-2021-0317-2241,Comment submitted by Public Lands Policy Coord...,5240,2023-02-14T05:00:00Z
4,EPA-HQ-OAR-2021-0317-0196,Comment submitted by Evangelical Environmental...,5236,2021-12-01T05:00:00Z
5,EPA-HQ-OAR-2021-0317-2154,Comment submitted by Jessie Henshaw,5229,2023-02-06T05:00:00Z
6,EPA-HQ-OAR-2021-0317-0745,"Comment submitted by Step 2 Compliance, LLC",5207,2022-02-02T05:00:00Z
7,EPA-HQ-OAR-2021-0317-1613,Comment submitted by Barbara Brandom,5182,2023-01-09T05:00:00Z
8,EPA-HQ-OAR-2021-0317-1349,Comment submitted by Theodora Tsongas,5105,2022-02-07T05:00:00Z
9,EPA-HQ-OAR-2021-0317-0672,Comment submitted by Lauri Costello,5096,2022-02-02T05:00:00Z


## 5. Early vs Late Commenters

Early commenters may have more influence on shaping the discourse.

In [9]:
# Who commented first vs last?
timing = conn.execute(f"""
    WITH ranked AS (
        SELECT 
            comment_id,
            title,
            posted_date,
            ROW_NUMBER() OVER (ORDER BY posted_date ASC) as rank_early,
            ROW_NUMBER() OVER (ORDER BY posted_date DESC) as rank_late
        FROM read_parquet('{R2_BASE_URL}/comments.parquet')
        WHERE docket_id = '{docket_id}'
          AND posted_date IS NOT NULL
          AND LENGTH(comment) > 500
    )
    SELECT * FROM ranked
    WHERE rank_early <= 10 OR rank_late <= 10
    ORDER BY posted_date
""").fetchdf()

print("Early and late substantive commenters:")
timing

Early and late substantive commenters:


Unnamed: 0,comment_id,title,posted_date,rank_early,rank_late
0,EPA-HQ-OAR-2021-0317-0187,Comment submitted by Bridget Bailey,2021-11-23T05:00:00Z,2,626
1,EPA-HQ-OAR-2021-0317-0188,Comment submitted by Erica Mulcahy,2021-11-23T05:00:00Z,1,627
2,EPA-HQ-OAR-2021-0317-0194,Comment submitted by Dennis Groce,2021-12-01T05:00:00Z,3,624
3,EPA-HQ-OAR-2021-0317-0196,Comment submitted by Evangelical Environmental...,2021-12-01T05:00:00Z,4,625
4,EPA-HQ-OAR-2021-0317-0209,Comment submitted by Daurie Pollitto,2021-12-02T05:00:00Z,9,619
5,EPA-HQ-OAR-2021-0317-0212,Anonymous public comment,2021-12-02T05:00:00Z,8,620
6,EPA-HQ-OAR-2021-0317-0208,Anonymous public comment,2021-12-02T05:00:00Z,7,621
7,EPA-HQ-OAR-2021-0317-0202,Comment submitted by Amy Sindorf,2021-12-02T05:00:00Z,6,622
8,EPA-HQ-OAR-2021-0317-0203,Anonymous public comment,2021-12-02T05:00:00Z,5,623
9,EPA-HQ-OAR-2021-0317-0311,Comment submitted by Robert Perry,2021-12-10T05:00:00Z,10,599


## 6. Future: Proposed→Final Rule Diffing

To truly measure influence, we'd need to:
1. Extract text from proposed rules
2. Extract text from final rules
3. Identify changes between versions
4. Match changes to comment arguments

This requires access to the full document text (beyond current metadata).

In [10]:
# Placeholder for future implementation
print("""
Influence Scoring Framework (Future Work):

1. Document Retrieval
   - Fetch proposed rule PDF/HTML
   - Fetch final rule PDF/HTML
   
2. Change Detection
   - Diff documents section by section
   - Identify substantive changes vs formatting
   
3. Comment Matching
   - Extract key arguments from comments
   - Match arguments to rule changes
   - Score similarity using embeddings
   
4. Influence Attribution
   - Rank commenters by matched changes
   - Weight by change significance
   - Generate influence reports
""")


Influence Scoring Framework (Future Work):

1. Document Retrieval
   - Fetch proposed rule PDF/HTML
   - Fetch final rule PDF/HTML

2. Change Detection
   - Diff documents section by section
   - Identify substantive changes vs formatting

3. Comment Matching
   - Extract key arguments from comments
   - Match arguments to rule changes
   - Score similarity using embeddings

4. Influence Attribution
   - Rank commenters by matched changes
   - Weight by change significance
   - Generate influence reports

