# Position & Sentiment Analysis

**Problem**: How can we extract nuanced positions and sentiments from comments beyond simple for/against categorizations?

**Stakeholder Quotes:**
- "Sophisticated position analysis… can we use LLMs?"
- "How many credible commenters made similar points?"

This notebook demonstrates:
- Rule-based sentiment indicators
- Keyword-based position detection
- Clustering comments by stance
- LLM integration for nuanced analysis

In [3]:
import duckdb
import pandas as pd
from collections import Counter
import re

R2_BASE_URL = "https://pub-5fc11ad134984edf8d9af452dd1849d6.r2.dev"

conn = duckdb.connect()
conn.execute("INSTALL httpfs; LOAD httpfs;")
print("✓ Ready")

✓ Ready


In [4]:
# Select a docket to analyze
docket_id = "EPA-HQ-OAR-2021-0317"  # Change to your target docket

# Get docket info
docket_info = conn.execute(f"""
    SELECT docket_id, agency_code, title
    FROM read_parquet('{R2_BASE_URL}/dockets.parquet')
    WHERE docket_id = '{docket_id}'
""").fetchdf()
print(f"Analyzing: {docket_info['title'].iloc[0][:80]}...")

Analyzing: Standards of Performance for New, Reconstructed, and Modified  Sources and Emiss...


## 1. Keyword-Based Position Detection

Identify support/opposition signals through specific phrases.

In [5]:
# Define position indicators
support_keywords = [
    'support', 'favor', 'agree', 'endorse', 'approve', 'applaud',
    'commend', 'encourage', 'welcome', 'urge you to finalize',
    'strongly support', 'fully support'
]

oppose_keywords = [
    'oppose', 'against', 'reject', 'disagree', 'concerned',
    'harmful', 'damaging', 'negative', 'withdraw', 'rescind',
    'strongly oppose', 'urge you to withdraw'
]

# Build SQL conditions
support_cond = " OR ".join([f"LOWER(comment) LIKE '%{kw}%'" for kw in support_keywords])
oppose_cond = " OR ".join([f"LOWER(comment) LIKE '%{kw}%'" for kw in oppose_keywords])

In [6]:
# Classify comments by position
positions = conn.execute(f"""
    SELECT
        CASE
            WHEN ({support_cond}) AND NOT ({oppose_cond}) THEN 'support'
            WHEN ({oppose_cond}) AND NOT ({support_cond}) THEN 'oppose'
            WHEN ({support_cond}) AND ({oppose_cond}) THEN 'mixed'
            ELSE 'neutral/unclear'
        END as position,
        COUNT(*) as count
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    WHERE docket_id = '{docket_id}'
      AND comment IS NOT NULL
    GROUP BY position
    ORDER BY count DESC
""").fetchdf()

print("Position breakdown:")
total = positions['count'].sum()
for _, row in positions.iterrows():
    pct = 100 * row['count'] / total
    print(f"  {row['position']}: {row['count']:,} ({pct:.1f}%)")

Position breakdown:
  neutral/unclear: 3,161 (86.9%)
  support: 202 (5.6%)
  oppose: 147 (4.0%)
  mixed: 127 (3.5%)


## 2. Extract Key Arguments

Find the most common phrases and arguments in comments.

In [7]:
# Sample comments for text analysis
sample = conn.execute(f"""
    SELECT comment
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    WHERE docket_id = '{docket_id}'
      AND comment IS NOT NULL
      AND LENGTH(comment) > 100
    LIMIT 500
""").fetchdf()

# Extract common phrases (bigrams)
all_bigrams = []
for text in sample['comment'].dropna():
    words = re.findall(r'\b[a-z]{3,}\b', text.lower())
    bigrams = [f"{words[i]} {words[i+1]}" for i in range(len(words)-1)]
    all_bigrams.extend(bigrams)

# Filter out common/boring bigrams
stopbigrams = {'the the', 'of the', 'in the', 'to the', 'and the', 'for the', 'this is', 'that the'}
meaningful = [b for b in all_bigrams if b not in stopbigrams]

print("Common phrases in comments:")
for phrase, count in Counter(meaningful).most_common(25):
    print(f"  {phrase}: {count}")

Common phrases in comments:
  oil and: 850
  and gas: 766
  methane pollution: 377
  the epa: 364
  the oil: 233
  thank you: 231
  methane emissions: 226
  climate change: 224
  natural gas: 204
  pollution from: 193
  from the: 193
  gas industry: 186
  the proposed: 174
  gas operations: 172
  you for: 163
  public health: 126
  epa rsquo: 121
  the climate: 116
  flaring oil: 112
  routine flaring: 110
  from oil: 109
  emissions from: 102
  new mexico: 102
  greenhouse gas: 100
  and new: 95


## 3. Sentiment Intensity Analysis

In [8]:
# Intensity indicators
strong_indicators = ['strongly', 'absolutely', 'extremely', 'deeply', 'urgently', 
                     'critical', 'essential', 'must', 'demand', 'insist']

strong_cond = " OR ".join([f"LOWER(comment) LIKE '%{kw}%'" for kw in strong_indicators])

intensity = conn.execute(f"""
    SELECT
        CASE WHEN ({strong_cond}) THEN 'strong' ELSE 'moderate' END as intensity,
        COUNT(*) as count
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    WHERE docket_id = '{docket_id}'
      AND comment IS NOT NULL
    GROUP BY intensity
""").fetchdf()

print("Sentiment intensity:")
intensity

Sentiment intensity:


Unnamed: 0,intensity,count
0,moderate,3220
1,strong,417


## 4. Sample Comments by Position

In [9]:
# Sample supporting comments
support_sample = conn.execute(f"""
    SELECT comment_id, title, LEFT(comment, 400) as excerpt
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    WHERE docket_id = '{docket_id}'
      AND comment IS NOT NULL
      AND ({support_cond})
      AND NOT ({oppose_cond})
      AND LENGTH(comment) > 200
    LIMIT 3
""").fetchdf()

print("Sample SUPPORTING comments:")
print("=" * 60)
for _, row in support_sample.iterrows():
    print(f"\n[{row['comment_id']}] {row['title']}")
    print(row['excerpt'][:300] + "...")

Sample SUPPORTING comments:

[EPA-HQ-OAR-2021-0317-0208] Anonymous public comment
The EPA has long been an enforcement agency of the radical left. Time and time again their &quot;recommendations&quot; have been nothing more than the furthering of a radical left-wing agenda on climate change policies that do nothing but cripple America, kill jobs, and make us dependent on foreign ...

[EPA-HQ-OAR-2021-0317-0220] Comment submitted by C. Runyon
 I agree that there needs to be some laws and rules to control emissions. However, the restrictions should not put companies in financial jeopardy.  There are <br/>small producers in the Oil &amp; Gas Industry that do not have large amounts of cash.  They are barely surviving as it is.  With these r...

[EPA-HQ-OAR-2021-0317-0240] Anonymous public comment
EPA Administrator Michael Regan,<br/><br/>I am writing in support of the EPA&rsquo;s proposed methane standard and to ask that the EPA take stronger action to reduce methane pollution. <br/><br/>L

In [10]:
# Sample opposing comments
oppose_sample = conn.execute(f"""
    SELECT comment_id, title, LEFT(comment, 400) as excerpt
    FROM read_parquet('{R2_BASE_URL}/comments.parquet')
    WHERE docket_id = '{docket_id}'
      AND comment IS NOT NULL
      AND ({oppose_cond})
      AND NOT ({support_cond})
      AND LENGTH(comment) > 200
    LIMIT 3
""").fetchdf()

print("Sample OPPOSING comments:")
print("=" * 60)
for _, row in oppose_sample.iterrows():
    print(f"\n[{row['comment_id']}] {row['title']}")
    print(row['excerpt'][:300] + "...")

Sample OPPOSING comments:

[EPA-HQ-OAR-2021-0317-0206] Comment submitted by David  Roche
With energy costs rising almost daily this new regulation needs to be looked at thoroughly. We can control harmful emissions better with innovation instead of more regulation. At this point in time I am against this proposed nothing in regulation ruling....

[EPA-HQ-OAR-2021-0317-0202] Comment submitted by Amy Sindorf
Hello,<br/><br/>I am new to this process but as I see the world changing and the risk of our own human extinction increasing everyday, I feel the need to start getting more involved. I&rsquo;m not a scientist or an expert in this field, but I&rsquo;m a citizen who as watched the world<br/>Change as ...

[EPA-HQ-OAR-2021-0317-0213] Comment submitted by Robert Pitkin
I am opposed to this further rule on methane in the oil industry for the following reasons.  It will reduce production of oil by increasing cost of production, raising cost of living for all Americans, slowing our economy. 

## 5. LLM-Based Analysis (Optional)

For more nuanced analysis, use an LLM to classify positions and extract arguments.

In [11]:
# Example LLM prompt for position analysis
# Uncomment and add your API key to use

llm_prompt = """
Analyze this public comment on a federal regulation. Extract:
1. Overall position (support/oppose/neutral/mixed)
2. Key arguments (bullet points)
3. Specific concerns or suggestions
4. Emotional tone (professional, passionate, hostile, constructive)

Comment:
{comment_text}

Analysis:
"""

print("LLM prompt template for position analysis:")
print(llm_prompt)

# Example with OpenAI (uncomment to use)
# import openai
# openai.api_key = "your-key"
# response = openai.chat.completions.create(
#     model="gpt-4",
#     messages=[{"role": "user", "content": llm_prompt.format(comment_text=comment)}]
# )

LLM prompt template for position analysis:

Analyze this public comment on a federal regulation. Extract:
1. Overall position (support/oppose/neutral/mixed)
2. Key arguments (bullet points)
3. Specific concerns or suggestions
4. Emotional tone (professional, passionate, hostile, constructive)

Comment:
{comment_text}

Analysis:

