# Sarcasm-aware sentiment playground

This notebook demonstrates how the hybrid sentiment pipeline reacts to sarcasm-heavy Reddit comments.
It also summarizes lightweight research takeaways to guide future improvements.

## Research snapshot
- **Specialized irony classifiers** (e.g., `cardiffnlp/twitter-roberta-base-irony`) outperform generic sentiment models on Reddit-like data because they are trained on short, informal posts.
- **Context cues** that boost sarcasm recall include emoji (ðŸ™ƒ, ðŸ˜‚), exaggerated punctuation (`???!!!`), hyperbole words ("totally", "best ever"), and contradiction between sentiment words and subject ("Great, another crash").
- **Confidence-aware blending** is safer than hard flipping: when sarcasm probability is high, dampen sentiment magnitude instead of inverting it to avoid overcorrection.
- **Human-in-the-loop evaluation** with real Reddit snippets remains essential; many sarcastic comments are community- or thread-specific.


In [25]:
import sys
from pprint import pprint

# Ensure the repository code is on the path for notebook execution
if ".." not in sys.path:
    sys.path.append("..")

from app.services.hybrid_sentiment import HybridSentimentService

# Configure the hybrid service with sarcasm dampening enabled
service = HybridSentimentService(
    use_llm=True,
    dual_model_strategy=True,
    enable_sarcasm_detection=True,
    sarcasm_threshold=0.6,
    sarcasm_dampening_factor=0.6,
    strong_llm_threshold=0.25,
)

print("Service configuration:")
pprint(service.get_service_info())




Service configuration:
{'device': -1,
 'enable_sarcasm_detection': True,
 'fallback_to_vader': True,
 'is_loaded': True,
 'llm_model_name': 'mwkby/distilbert-base-uncased-sentiment-reddit-crypto',
 'model_name': 'mwkby/distilbert-base-uncased-sentiment-reddit-crypto',
 'sarcasm_dampening_factor': 0.6,
 'sarcasm_model': 'cardiffnlp/twitter-roberta-base-irony',
 'sarcasm_threshold': 0.6,
 'transformers_available': True,
 'use_gpu': False,
 'use_llm': True}




In [26]:
sample_comments = [
    "Sure, because this stock only goes up... ðŸ¤¡",
    "Absolutely love losing money on every dip, best feeling ever!",
    "Solid earnings and a clean balance sheet. I'm bullish here.",
    "Yeah right, management will totally deliver this time ðŸ™ƒ",
    "This is fine.",
]

def run_examples(comments: list[str]):
    rows = []
    for text in comments:
        details = service.analyze_with_details(text)
        sarcasm = details["sarcasm"]
        rows.append(
            {
                "text": text,
                "raw": round(details["raw_score"], 3),
                "adjusted": round(details["adjusted_score"], 3),
                "label": details["label"],
                "sarcasm_prob": None if not sarcasm else round(sarcasm.probability, 3),
                "sarcastic": None if not sarcasm else sarcasm.is_sarcastic,
            }
        )

    for row in rows:
        print("-" * 80)
        print(row["text"])
        print(
            f"raw={row['raw']} adjusted={row['adjusted']} label={row['label']} sarcasm_prob={row['sarcasm_prob']} sarcasm?={row['sarcastic']}"
        )

run_examples(sample_comments)


--------------------------------------------------------------------------------
Sure, because this stock only goes up... ðŸ¤¡
raw=0.318 adjusted=0.144 label=Positive sarcasm_prob=0.915 sarcasm?=True
--------------------------------------------------------------------------------
Absolutely love losing money on every dip, best feeling ever!
raw=0.994 adjusted=0.428 label=Positive sarcasm_prob=0.948 sarcasm?=True
--------------------------------------------------------------------------------
Solid earnings and a clean balance sheet. I'm bullish here.
raw=0.997 adjusted=0.508 label=Positive sarcasm_prob=0.818 sarcasm?=True
--------------------------------------------------------------------------------
Yeah right, management will totally deliver this time ðŸ™ƒ
raw=0.994 adjusted=0.475 label=Positive sarcasm_prob=0.87 sarcasm?=True
--------------------------------------------------------------------------------
This is fine.
raw=0.996 adjusted=0.466 label=Positive sarcasm_prob=0.886 sarc

## Try your own comments

Edit `sample_comments` above or create a new list below to quickly compare how sarcasm
changes the adjusted sentiment score.


In [36]:
custom_comments = [
    "HELP LOSING ALL MONEY"
]

run_examples(custom_comments)


--------------------------------------------------------------------------------
HELP LOSING ALL MONEY
raw=-0.982 adjusted=-0.546 label=Negative sarcasm_prob=0.74 sarcasm?=True


## Sentiment vs Price Movement Analysis

Query articles where sentiment and actual price movement are misaligned:
- **Negative sentiment but price went up** (bearish sentiment, bullish price)
- **Positive sentiment but price went down** (bullish sentiment, bearish price)

This can help identify cases where sarcasm or other factors may have caused sentiment misclassification.


In [29]:
from datetime import UTC, datetime, timedelta

import pandas as pd
from sqlalchemy import and_, func

from app.db.models import Article, ArticleTicker, StockPriceHistory
from app.db.session import SessionLocal


def find_sentiment_price_mismatches(
    limit: int = 20,
    days_back: int = 30,
    min_sentiment_magnitude: float = 0.1,  # Ignore neutral sentiment
):
    """
    Find articles where sentiment and price movement are misaligned.
    
    Args:
        limit: Maximum number of results to return
        days_back: How many days back to look
        min_sentiment_magnitude: Minimum absolute sentiment value to consider
    """
    db = SessionLocal()
    try:
        cutoff_date = datetime.now(UTC) - timedelta(days=days_back)

        # Query articles with sentiment and price data
        # First, get articles with price on their published date
        base_query = (
            db.query(
                Article.id,
                Article.title,
                Article.text,
                Article.published_at,
                Article.sentiment,
                ArticleTicker.ticker,
                StockPriceHistory.close_price.label("price_on_date"),
                StockPriceHistory.date.label("price_date"),
            )
            .join(ArticleTicker, Article.id == ArticleTicker.article_id)
            .join(
                StockPriceHistory,
                and_(
                    StockPriceHistory.symbol == ArticleTicker.ticker,
                    func.date(StockPriceHistory.date) == func.date(Article.published_at),
                ),
            )
            .filter(
                Article.sentiment.isnot(None),
                Article.published_at >= cutoff_date,
                func.abs(Article.sentiment - 0.5) >= min_sentiment_magnitude,
            )
        )

        # Execute base query
        articles_with_prices = base_query.all()

        # For each article, find the previous trading day's price
        results = []
        seen_articles = set()

        for article_row in articles_with_prices:
            if article_row.id in seen_articles:
                continue
            seen_articles.add(article_row.id)

            # Find the most recent price before the article's price date
            prev_price_row = (
                db.query(StockPriceHistory.close_price)
                .filter(
                    StockPriceHistory.symbol == article_row.ticker,
                    StockPriceHistory.date < article_row.price_date,
                )
                .order_by(StockPriceHistory.date.desc())
                .first()
            )

            if prev_price_row and prev_price_row.close_price:
                price_change_pct = (
                    (article_row.price_on_date - prev_price_row.close_price)
                    / prev_price_row.close_price
                    * 100
                )

                # Create a result object with all needed fields
                result = type(
                    "Result",
                    (),
                    {
                        "id": article_row.id,
                        "title": article_row.title,
                        "text": article_row.text,
                        "published_at": article_row.published_at,
                        "sentiment": article_row.sentiment,
                        "ticker": article_row.ticker,
                        "price_on_date": article_row.price_on_date,
                        "prev_price": prev_price_row.close_price,
                        "price_change_pct": price_change_pct,
                    },
                )()
                results.append(result)

                if len(results) >= limit:
                    break

        return results

    finally:
        db.close()


def get_mismatches_dataframe(results):
    """
    Convert results to a DataFrame with mismatch information.
    
    Returns:
        DataFrame with columns: article_id, ticker, published_at, title, text,
        sentiment, price_on_date, prev_price, price_change_pct, mismatch_type
    """
    rows = []

    for row in results:
        sentiment = row.sentiment
        price_change = row.price_change_pct

        # Determine mismatch type
        mismatch_type = None
        if sentiment < 0.5 and price_change > 0:
            mismatch_type = "negative_sentiment_price_up"
        elif sentiment > 0.5 and price_change < 0:
            mismatch_type = "positive_sentiment_price_down"

        # Only include mismatches
        if mismatch_type:
            rows.append(
                {
                    "article_id": row.id,
                    "ticker": row.ticker,
                    "published_at": row.published_at,
                    "title": row.title,
                    "text": row.text,
                    "sentiment": row.sentiment,
                    "price_on_date": row.price_on_date,
                    "prev_price": row.prev_price,
                    "price_change_pct": price_change,
                    "mismatch_type": mismatch_type,
                }
            )

    if not rows:
        return pd.DataFrame()

    df = pd.DataFrame(rows)
    return df


# Run the analysis
print("Querying articles with sentiment/price mismatches...")
results = find_sentiment_price_mismatches(limit=50, days_back=30)
print(f"Found {len(results)} articles with price data")

# Get DataFrame of mismatches
df_mismatches = get_mismatches_dataframe(results)
print(f"\nFound {len(df_mismatches)} mismatches:")
if len(df_mismatches) > 0:
    neg_up = len(df_mismatches[df_mismatches['mismatch_type'] == 'negative_sentiment_price_up'])
    pos_down = len(df_mismatches[df_mismatches['mismatch_type'] == 'positive_sentiment_price_down'])
    print(f"  - Negative sentiment but price went up: {neg_up}")
    print(f"  - Positive sentiment but price went down: {pos_down}")

# Return the DataFrame


Querying articles with sentiment/price mismatches...
Found 50 articles with price data

Found 27 mismatches:
  - Negative sentiment but price went up: 25
  - Positive sentiment but price went down: 2


In [35]:
df_mismatches.sort_values(by=['price_change_pct'], ascending=True)

Unnamed: 0,article_id,ticker,published_at,title,text,sentiment,price_on_date,prev_price,price_change_pct,mismatch_type
12,428458,PLTR,2025-11-14 11:11:29+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...",#Ban Bet Won\n---\n\n/u/mushhd made a bet that...,0.7717,174.059998,174.800003,-0.423344,positive_sentiment_price_down
0,428265,COAL,2025-11-14 09:54:29+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...",At least the rich people won't have to worry a...,0.8179,22.252001,22.34,-0.393909,positive_sentiment_price_down
3,428264,BOX,2025-11-14 09:54:07+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...",Screw this I'm going to hold my bull spreads. ...,0.2732,32.014999,32.009998,0.015623,negative_sentiment_price_up
11,428463,NVDA,2025-11-14 11:15:00+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...",This whole ai thing will come crashing down on...,-0.844278,190.229996,190.100006,0.06838,negative_sentiment_price_up
4,531943,GL,2025-11-24 02:49:59+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...",you gonna get blooody. gl!,-0.696174,133.235001,133.134995,0.075116,negative_sentiment_price_up
8,531891,SPY,2025-11-24 02:55:52+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...",SPY CALLS AT OPEN ðŸ—£ðŸ—£ðŸ—£,-0.064804,669.585022,668.940002,0.096424,negative_sentiment_price_up
2,531726,SPY,2025-11-24 02:33:50+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...",SPY 670 tomorrow guaranteed,-0.022769,669.585022,668.940002,0.096424,negative_sentiment_price_up
14,428448,NGL,2025-11-14 11:20:54+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...","Ngl, Red days make me feel alive",0.3818,9.83,9.82,0.101835,negative_sentiment_price_up
10,532819,GOOG,2025-11-24 04:47:27+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...",Fk it man\n\n!banbet GOOG 350 EOY,0.056898,318.369995,318.040009,0.103756,negative_sentiment_price_up
19,531874,GOOG,2025-11-24 02:54:35+00:00,"Comment in: What Are Your Moves Tomorrow, Nove...",Why is GOOG ripping right now,-0.213238,318.369995,318.040009,0.103756,negative_sentiment_price_up
