Skip to content

SwapnilSMane/CascadeNS

Repository files navigation

CascadeNS: Confidence-Cascaded Neurosymbolic Sarcasm Detection

Implementation of CascadeNS, a confidence-calibrated neurosymbolic architecture achieving state-of-the-art sarcasm detection through principled integration of symbolic linguistic features and neural semantic analysis.

Highlights

  • State-of-the-Art Performance: F1=0.8864 on Amazon product reviews, surpassing best transformer baseline by 7.44%
  • Confidence-Based Cascading: Selective module activation based on calibrated confidence quantification (normalized score margins)
  • Representational Compatibility: Avoids catastrophic degradation from fusion-based integration (up to 15.67% in ablations)
  • Statistical Validation: 94% bootstrap probability of superiority (10,000 iterations), 95% CI [0.8090, 0.9485]

Repository Structure

CascadeNS/
├── cascade_sarcasm_detector.py       # Clean API (scikit-learn style)
├── hybrid_confidence_cascade.py      # Full experimental pipeline
├── semigraph_detector.py             # Symbolic semigraph classifier
├── requirements.txt                  # Python dependencies
├── .gitignore                        # Git ignore patterns
├── README.md                         # This file
└── results/                          # Experimental results
    ├── figures/                      # Figures
    └── intermediate/                 # Intermediate files
        ├── cascade_best_results.csv
        ├── bootstrap_results.csv
        ├── Final_train_feature.pkl
        ├── Final_test_feature.pkl
        └── [22 files total]

Installation

# Clone repository
git clone https://github.com/anonymous/CascadeNS.git
cd CascadeNS

# Install dependencies
pip install -r requirements.txt

Requirements

  • Python >= 3.8
  • PyTorch >= 2.0.0
  • Transformers >= 4.30.0
  • scikit-learn >= 1.3.0
  • NetworkX >= 3.0
  • NumPy >= 1.24.0
  • Pandas >= 2.0.0

See requirements.txt for complete dependency list.

Quick Start

Using the Clean API

import numpy as np
from cascade_sarcasm_detector import CascadeNS

# Initialize model with optimal hyperparameters
model = CascadeNS(k=5, threshold=0.02)

# Fit on training data
model.fit(
    train_embeddings=train_roberta_embeddings,  # shape: (n_train, 768)
    train_labels=train_labels                    # shape: (n_train,)
)

# Predict on test data
predictions = model.predict(
    test_embeddings=test_roberta_embeddings,     # shape: (n_test, 768)
    symbolic_predictions=symbolic_preds,         # from semigraph classifier
    symbolic_scores_pos=symbolic_sarc_scores,    # S+(x)
    symbolic_scores_neg=symbolic_nonsarc_scores  # S-(x)
)

# Evaluate performance
from sklearn.metrics import classification_report
print(classification_report(y_true, predictions))

Running Full Experiments

Step 1: Extract symbolic features and generate predictions

python semigraph_detector.py

This extracts 7 linguistic features, constructs polarity-weighted bipartite semigraphs, and computes symbolic predictions with confidence scores.

Step 2: Run complete cascade pipeline

python hybrid_confidence_cascade.py

This reproduces all paper results:

  • Threshold grid search ($\tau \in {0.01, 0.02, \ldots, 0.20}$)
  • Bootstrap analysis (10,000 iterations)
  • Performance comparison with baselines
  • Ablation study (5 fusion-based alternatives)

Key Findings

  1. Fusion degrades performance catastrophically due to representational incompatibility between polarity-weighted symbolic scores and embedding-based neural representations
  2. Confidence-based cascading substantially outperforms all fusion alternatives
  3. Domain-adapted symbolic methods match general-purpose neural models when explicit patterns are prevalent
  4. Calibrated confidence ($\gamma$) reliably indicates prediction correctness, validated by monotonic accuracy increase

Figures

All publication-quality figures are available in results/figures/:

  • threshold_analysis.png: Threshold sensitivity (F1 vs. τ)
  • confidence_analysis.png: Confidence calibration validation
  • ablation.png: Ablation study results
  • bootstrap.png: Bootstrap distributions (10K iterations)
  • KG_implementation.png: Polarity-weighted bipartite semigraph

Intermediate Files

The results/intermediate/ directory contains:

  • Feature files: Final_train_feature.pkl, Final_test_feature.pkl
  • Cascade results: cascade_best_results.csv, cascade_threshold_search.csv
  • Bootstrap data: bootstrap_results.csv
  • Ablation results: hybrid_roberta_results_detailed.csv, semantic_incongruity_results.csv
  • Preprocessed data: train_data_for_amazon_sarcasm_detection.pkl, test_data_for_amazon_sarcasm_detection.pkl

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages