**NOTE**: This notebook implements the class defined in `models/keywords_sentiment.py`, extending KeyBERT to incorporate sentiment information directly into the keyword selection process.

# KeyBERT with Integrated Sentiment-aware Keyword Extraction

Unlike post-hoc reranking strategies that adjust keyword scores after semantic filtering, this approach introduces sentiment alignment **early**, during candidate selection and scoring.

*The goal is to extract keywords that are not only semantically relevant, but also emotionally aligned with the overall sentiment of the document.*

### Theoretical Framework

The original KeyBERT pipeline is modified to consider both semantic similarity and sentiment coherence during candidate filtering and ranking.

Given:
- $\text{sim}_{sem}$: cosine similarity between document and candidate embeddings
- $s_{doc} \in [0, 1]$: sentiment polarity of the full document
- $s_{cand} \in [0, 1]$: sentiment polarity of each candidate keyword

We compute a sentiment alignment score:

$$
\text{align}(s_{doc}, s_{cand}) = 1 - |s_{doc} - s_{cand}|
$$

This is then combined with semantic similarity to form a final score:

$$
\text{score}_{final} = (1 - \alpha) \cdot \text{sim}_{sem} + \alpha \cdot (2 \cdot \text{align} - 1)
$$

where **α ∈ [0, 1]** controls the trade-off between semantic and sentiment-based scoring.

### Characteristics

- **Early sentiment filtering**: Sentiment alignment is used to filter low-quality candidates before ranking.
- **Continuous scoring**: Sentiment is computed as a weighted average of class probabilities, enabling smooth comparisons.
- **Flexible control**: The parameter `alpha` lets users adjust the influence of sentiment.
- **Embedding-compatible**: Works with any `SentenceTransformer` model and HuggingFace sentiment classifier.

By using sentiment alignment to filter and score keywords **before ranking**, this approach allows sentiment to directly influence which candidates are considered at all — not just how they are ranked.

### When to Use This

This strategy is most effective when:
- The text contains **strong sentiment signals** (e.g., reviews, social media, opinions).
- You want to **prioritize emotional consistency** from the beginning.
- Reducing **irrelevant or sentimentally inconsistent keywords early** is important for your use case.

It offers more control over the candidate pool and generally produces **more focused, sentiment-aware results**, especially in emotionally polarized datasets.

### Setup: Installing and Importing Required Libraries

In [6]:
import subprocess
import sys

# List of required packages
required_packages = [
    "numpy", "torch", "scikit-learn", "keybert", "transformers", "sentence_transformers"
    
]

def install_package(package):
    """Installs a package using pip if it's not already installed."""
    try:
        __import__(package)
        print(f"{package} is already installed.")
    except ImportError:
        print(f"Installing {package}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# Check and install missing packages
for package in required_packages:
    install_package(package)

numpy is already installed.
torch is already installed.
Installing scikit-learn...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
keybert is already installed.
transformers is already installed.
sentence_transformers is already installed.



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m


In [7]:
# NumPy: fundamental package for numerical operations and array handling
import numpy as np

# Typing module: used for function signature annotations (e.g., Tuple[int, int])
from typing import Tuple

# PyTorch: core deep learning framework for tensor operations and model inference
import torch

# PyTorch functional API: provides functions like softmax, relu, etc.
import torch.nn.functional as F

# scikit-learn utility to extract candidate phrases based on n-gram statistics
from sklearn.feature_extraction.text import CountVectorizer

# scikit-learn function to compute cosine similarity between vector embeddings
from sklearn.metrics.pairwise import cosine_similarity

# KeyBERT: base class for keyword extraction using BERT-based embeddings
from keybert import KeyBERT as KB

# HuggingFace Transformers:
# - AutoTokenizer: automatically loads the correct tokenizer for a given model
# - AutoModelForSequenceClassification: loads a pretrained classification model
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification
)

# SentenceTransformers: framework for encoding text into dense embeddings using pretrained transformer models
from sentence_transformers import SentenceTransformer

# Library for regular expressions
import re


# Classes Definition

## SentimentModel: Flexible Transformer-based Sentiment Scorer

The `SentimentModel` class provides a unified interface for performing sentiment analysis using pretrained HuggingFace transformer models.  
It is designed to return **probability distributions** over sentiment classes and compute **continuous sentiment scores** for downstream tasks.

### Overview

This class supports both 3-class models (e.g., `cardiffnlp/twitter-roberta-base-sentiment`) and 5-class models (e.g., `nlptown/bert-base-multilingual-uncased-sentiment`) by dynamically adapting to the label schema of the specified model.

It is used across both **reranking** and **candidate selection** pipelines to ensure consistent, interpretable sentiment scoring.

### Functionality

- **Model Loading:**  
  Loads the tokenizer and sequence classification model specified via the `model_name`.  
  All models must be compatible with HuggingFace’s `AutoModelForSequenceClassification`.

- **Device Handling:**  
  Automatically moves the model to the selected device (`cpu` or `cuda`) and validates compatibility.

- **Token Limit Management:**  
  For models with a maximum sequence length (typically **512 tokens**), the class **automatically splits input texts into smaller chunks**,  
  runs inference on each chunk, and returns the **average probability distribution**.  
  This prevents information loss and avoids runtime errors on long texts.

- **Class Probability Prediction (`predict_proba`):**  
  1. Tokenization and encoding of input text(s) with padding and truncation  
  2. Automatic chunking if input exceeds token limit  
  3. Inference using the transformer model (no gradient tracking)  
  4. Softmax activation over logits to obtain class probabilities  
  5. Returns a NumPy array of shape `(batch_size, num_classes)`

- **Continuous Sentiment Scoring (`predict_score`):**  
  Computes a **score in [0, 1]** as a probability-weighted average over mapped class values  
  (e.g., `1 star → 0.0`, `5 stars → 1.0`, or `negative → 0.0`, `positive → 1.0`).

### Why Use It

- Compatible with **any HuggingFace sentiment model**
- Automatically handles **long texts via chunk-aware processing**
- Supports **batch processing** and **fine-grained scoring**
- Integrates seamlessly with **sentiment-aware KeyBERT pipelines**
- Outputs **interpretable, continuous sentiment scores**

### Example Output

→ **Text**: "absolutely loved"  
→ **Probabilities:** `[0.01, 0.02, 0.07, 0.18, 0.72]`  
→ **Score:** `0.89`

→ **Text**: "underdeveloped twist"  
→ **Probabilities:** `[0.55, 0.30, 0.10, 0.04, 0.01]`  
→ **Score:** `0.22`

→ **Text**: "The film is visually stunning and emotionally profound, despite a few narrative missteps."  
→ **Probabilities:** `[0.03, 0.07, 0.12, 0.25, 0.53]`  
→ **Score:** `0.78`

In [8]:
# Required imports
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F
import numpy as np
import re

# Class definition
class SentimentModel:
    """
    A flexible sentiment analysis wrapper supporting multiple HuggingFace models.

    This class dynamically adapts to the label schema of the specified model,
    allowing for consistent polarity scoring across different sentiment models.
    It also supports automatic truncation and chunking for long texts that exceed
    the model's token limit, ensuring robust performance on lengthy inputs.

    For models with a token limit (e.g., BERT-based with 512 tokens), the class
    splits long texts into chunks, computes predictions for each, and returns
    the averaged sentiment probabilities.
    """

    def __init__(self, model_name="cardiffnlp/twitter-roberta-base-sentiment", device="cpu"):
        """
        Initialize the sentiment model.

        Parameters:
        ----------
        model_name : str
            HuggingFace model identifier.
            Default is "cardiffnlp/twitter-roberta-base-sentiment".
            Alternatively, you can use "nlptown/bert-base-multilingual-uncased-sentiment"

        device : str
            Computation device. Should be either 'cpu' or 'cuda'.
        """

        # Validate the selected device
        if device not in ["cpu", "cuda"]:
            raise ValueError("Device must be 'cpu' or 'cuda'.")

        if device == "cuda" and not torch.cuda.is_available():
            raise ValueError("CUDA is not available. Please use 'cpu' instead.")

        self.device = device
        self.model_name = model_name

        # Load tokenizer and model from HuggingFace Hub
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name).to(device)

        # Determine label mapping based on the model
        self._set_label_mapping()

    def _set_label_mapping(self):
        """
        Set the label to score mapping based on the model's label schema.
        """

        # Retrieve the model's configuration to get label mappings
        id2label = self.model.config.id2label

        # Sort labels by their IDs to maintain order
        self.labels_ordered = [id2label[i] for i in range(len(id2label))]

        # Define label to score mapping based on known models
        if self.model_name == "cardiffnlp/twitter-roberta-base-sentiment":
            # Labels: ['negative', 'neutral', 'positive']
            self.label_to_score = {
                'LABEL_0': 0.0, # negative
                'LABEL_1': 0.5, # neutral
                'LABEL_2': 1.0  # positive
            }
        elif self.model_name == "nlptown/bert-base-multilingual-uncased-sentiment":
            # Labels: ['1 star', '2 stars', '3 stars', '4 stars', '5 stars']
            self.label_to_score = {
                '1 star': 0.0,
                '2 stars': 0.25,
                '3 stars': 0.5,
                '4 stars': 0.75,
                '5 stars': 1.0
            }
        else:
            # For unknown models, assign scores evenly across labels
            num_labels = len(self.labels_ordered)
            self.label_to_score = {
                label: idx / (num_labels - 1) for idx, label in enumerate(self.labels_ordered)
            }

    def _split_into_chunks(self, text, max_length=512):
        """
        Split long texts into smaller chunks that do not exceed the token limit.

        Parameters:
        ----------
        text : str
            The input text to split.

        max_length : int
            Maximum token length allowed by the model.

        Returns:
        -------
        List[str]
            A list of text chunks within the token limit.
        """

        # Naively split on sentence delimiters
        sentences = re.split(r'(?<=[.!?]) +', text)
        chunks, current = [], ""

        for sent in sentences:
            # Check tokenized length with current buffer
            if len(self.tokenizer.encode(current + " " + sent, add_special_tokens=True)) <= max_length:
                current += " " + sent
            else:
                if current:
                    chunks.append(current.strip())
                current = sent
        if current:
            chunks.append(current.strip())

        return chunks

    def predict_proba(self, texts, max_length=512):
        """
        Compute the probability distribution over sentiment classes for one or more input texts.

        Parameters:
        ----------
        texts : List[str]
            List of text strings to analyze.

        max_length : int
            Maximum token length per chunk (default: 512).

        Returns:
        -------
        np.ndarray
            A 2D array of shape (len(texts), num_classes), where each row represents
            the predicted softmax probabilities for the corresponding input.
        """

        all_probs = []

        for text in texts:
            # Always chunk to avoid loss of info on long inputs
            chunks = self._split_into_chunks(text, max_length=max_length)
            chunk_probs = []

            for chunk in chunks:
                # Tokenize and infer sentiment per chunk
                inputs = self.tokenizer(chunk, return_tensors="pt", truncation=True,
                                        padding=True, max_length=max_length).to(self.device)
                with torch.no_grad():
                    outputs = self.model(**inputs)
                    probs = F.softmax(outputs.logits, dim=-1).squeeze().cpu().numpy()
                    chunk_probs.append(probs)

            # Average the probabilities over all chunks
            avg_probs = np.mean(chunk_probs, axis=0)
            all_probs.append(avg_probs)

        return np.array(all_probs)

    def predict_score(self, text):
        """
        Compute the continuous sentiment score for a single input text.

        Parameters:
        ----------
        text : str
            The input text to analyze.

        Returns:
        -------
        float
            The sentiment score in the range [0, 1].
        """

        probs = self.predict_proba([text])[0]
        score = sum(
            prob * self.label_to_score[label]
            for prob, label in zip(probs, self.labels_ordered)
        )
        return score


## KeyBERTSentimentAware Class: Sentiment-Integrated Keyword Extraction

This class extends the base KeyBERT model by integrating sentiment analysis directly into the keyword extraction pipeline. It enhances the traditional semantic-only approach by incorporating continuous sentiment polarity scores for both the entire document and each candidate keyword.

### Overview

- **Candidate Extraction:**  
  Uses `CountVectorizer` to extract a broad pool of candidate keywords (n-grams) from the document text.  
  **Note:** see next paragraph for details.   

- **Sentiment Analysis:**  
  Leverages a transformer-based sentiment classification model (via HuggingFace) to compute **continuous sentiment polarity scores** in the range \[0, 1\] for both the document and each candidate keyword.

- **Combined Scoring and Filtering:**  
  Computes a weighted score that balances:
  - **Semantic similarity** (cosine similarity between document and candidate embeddings).
  - **Sentiment alignment** (1 minus the absolute difference between candidate and document sentiment).

  The sentiment alignment score is then mapped to \[-1, 1\] to match the range of cosine similarity.  
  Candidates with low combined scores are **filtered out before ranking**, allowing sentiment to guide selection from the earliest stages.

  The trade-off is controlled by the `alpha` parameter:
  - `alpha = 1.0` → pure sentiment-based filtering and ranking.
  - `alpha = 0.0` → pure semantic similarity (equivalent to standard KeyBERT).

### Candidate Selection in KeyBERT vs Sentiment-Aware Extension

In standard KeyBERT, candidate keywords are extracted solely based on **surface-level frequency statistics** using `CountVectorizer`, without semantic or sentiment awareness. This means:

- The candidate pool may include terms that are **frequent but sentimentally mismatched** with the document.
- In strongly polarized reviews, irrelevant or misleading keywords may still be selected if they occur frequently.

The sentiment-aware extension addresses this by introducing a **joint semantic-sentiment filter** immediately after candidate extraction:

1. Extract a wide candidate pool using `CountVectorizer`.
2. Compute continuous sentiment polarity scores for both the document and each candidate.
3. Calculate a combined score per candidate, balancing semantic relevance and emotional alignment.
4. Discard candidates whose combined score falls below a filtering threshold.

This process ensures that only keywords that are both **topically and emotionally relevant** progress to the final ranking stage.

### Parameters

- `model`: The semantic embedding model (e.g., a `SentenceTransformer` from `sentence-transformers`).
- `sentiment_model_name`: HuggingFace identifier for the sentiment classifier (default: 5-class multilingual model).
- `alpha`: Weight controlling the influence of sentiment (0 = only semantic, 1 = only sentiment).
- `candidate_pool_size`: Maximum number of candidates extracted before filtering.
- `device`: Device to run the model (`"cpu"` or `"cuda"`).


In [None]:
# KeyBERT extension for sentiment-aware keyword extraction
# This class extends KeyBERT to include sentiment-aware keyword extraction
# by defining a subclass of KeyBERT that modifies the scoring phase to incorporate
# sentiment alignment in a post-processing step. The goal is to boost keywords
# that are both semantically relevant and emotionally aligned with the overall
# sentiment of the input review.
class KeyBERTSentimentAware(KB):
    """
    Extension of KeyBERT to integrate sentiment analysis in keyword extraction.

    This class overrides and extends parts of KeyBERT's pipeline to:
    - Extract a larger candidate pool using CountVectorizer.
    - Calculate sentiment polarity scores for the document and candidates,
      using a pretrained sentiment classification model with continuous outputs.
    - Combine semantic similarity and sentiment alignment scores via a weighting factor alpha.
    - Filter candidate keywords based on this combined score before final ranking.

    Parameters:
    -----------
    model : SentenceTransformer
        Semantic embedding model used by KeyBERT.

    sentiment_model_name : str, optional (default: "nlptown/bert-base-multilingual-uncased-sentiment")
        Identifier of pretrained sentiment model on HuggingFace Hub.

    alpha : float, optional (default: 0.5)
        Weight to balance sentiment alignment vs semantic similarity.
        alpha=1.0 means only sentiment alignment is considered.
        alpha=0.0 means only semantic similarity is considered.

    candidate_pool_size : int, optional (default: 100)
        Maximum number of initial candidate keywords to extract.

    device : str, optional (default: "cpu")
        Device to run embedding and sentiment models on ("cpu" or "cuda").
    """

    def __init__(
        self,
        model,
        sentiment_model_name: str ="cardiffnlp/twitter-roberta-base-sentiment", # or "nlptown/bert-base-multilingual-uncased-sentiment"
        alpha: float = 0.5,
        candidate_pool_size: int = 100,
        device: str = "cpu",
    ):
        # Validate that the specified device is either 'cpu' or 'cuda'
        valid_devices = {"cpu", "cuda"}
        if device not in valid_devices:
            raise ValueError(f"Device must be one of {valid_devices}.")
        
        # Check CUDA availability if 'cuda' is requested
        if device == "cuda" and not torch.cuda.is_available():
            raise ValueError("CUDA is not available. Please use 'cpu' instead.")

        # Validate input types to ensure correct usage
        if not isinstance(model, SentenceTransformer):
            raise TypeError("model must be an instance of SentenceTransformer.")
        if not isinstance(sentiment_model_name, str):
            raise TypeError("sentiment_model_name must be a string.")
        if not isinstance(alpha, float):
            raise TypeError("alpha must be a float.")
        if not isinstance(candidate_pool_size, int):
            raise TypeError("candidate_pool_size must be an integer.")

        # Validate value ranges to prevent logical errors
        if not (0.0 <= alpha <= 1.0):
            raise ValueError("alpha must be between 0 and 1 inclusive.")
        if candidate_pool_size <= 0:
            raise ValueError("candidate_pool_size must be a positive integer.")

        # Initialize the superclass (KeyBERT) with the semantic embedding model
        super().__init__(model)

        # Assign validated parameters to instance variables
        self._alpha = None
        self.alpha = alpha
        self.candidate_pool_size = candidate_pool_size
        self.device = device

        # Store the semantic embedding model for embedding computation
        self.embedder = model

        # Initialize the sentiment model wrapper with the given model name and device
        self.sentiment_model = SentimentModel(sentiment_model_name, device=device)

    @property
    def alpha(self):
        return self._alpha

    @alpha.setter
    def alpha(self, value):
        if not 0.0 <= value <= 1.0:
            raise ValueError(f"alpha must be in [0, 1]. Got {value}")
        self._alpha = value

    def _get_doc_polarity_continuous(self, doc: str) -> float:
        """
        Compute the document's continuous sentiment polarity score as the weighted sum of
        predicted class probabilities multiplied by their numeric mappings.

        This method overrides and replaces any default sentiment handling in the base class.

        Parameters:
        -----------
        doc : str
            The document text.

        Returns:
        --------
        float
            Continuous sentiment polarity score between 0 (very negative) and 1 (very positive).
        """
        # Get probability distribution over sentiment classes for the document
        probs = self.sentiment_model.predict_proba([doc])[0]

        # Compute continuous polarity as weighted average of class scores
        polarity = sum(
            p * self.sentiment_model.label_to_score[label]
            for p, label in zip(probs, self.sentiment_model.labels_ordered)
        )
        return polarity

    def _get_candidate_polarities(self, candidates) -> np.ndarray:
        """
        Compute continuous sentiment polarity scores for each candidate keyword.

        This method extends candidate scoring with sentiment, overriding base candidate processing.

        Parameters:
        -----------
        candidates : iterable of str
            List of candidate keywords.

        Returns:
        --------
        np.ndarray
            Array of polarity scores for each candidate keyword.
        """
        candidates = list(candidates)  # ensure correct input format for tokenizer
        
        # Batch predict probabilities for all candidates
        probs_list = self.sentiment_model.predict_proba(candidates)
        
        polarities = []
        for probs in probs_list:
            # Weighted average as continuous polarity score
            polarity = sum(
                p * self.sentiment_model.label_to_score[label]
                for p, label in zip(probs, self.sentiment_model.labels_ordered)
            )
            polarities.append(polarity)
        return np.array(polarities)

    def _select_candidates(
        self, 
        doc: str, 
        ngram_range: Tuple[int, int] = (1, 3), 
        threshold: float = 0.4
    ):
        """
        Extract initial candidates with CountVectorizer and filter them based on combined
        semantic similarity and sentiment alignment scores.

        This method replaces the default candidate generation and filtering steps of KeyBERT,
        incorporating sentiment filtering before final keyword ranking.

        Parameters:
        -----------
        doc : str
            Document text.

        ngram_range : tuple of int
            N-gram size range for candidate extraction.

        threshold : float
            Minimum combined score for candidate retention.

        Returns:
        --------
        list of str
            Filtered list of candidate keywords.
        """
        # Extract candidates with CountVectorizer (statistical n-grams)
        vectorizer = CountVectorizer(
            ngram_range=ngram_range,
            stop_words='english',
            max_features=self.candidate_pool_size
        )
        candidates = vectorizer.fit([doc]).get_feature_names_out()

        # Compute semantic embeddings for doc and candidates
        doc_emb = self.model.embed([doc])
        cand_emb = self.model.embed(candidates)

        # Compute continuous sentiment polarity scores
        doc_pol = self._get_doc_polarity_continuous(doc)
        cand_pols = self._get_candidate_polarities(candidates)

        # Calculate cosine semantic similarity scores
        sim_scores = cosine_similarity(doc_emb, cand_emb)[0]

        # Calculate sentiment alignment scores
        sentiment_scores = 1 - np.abs(cand_pols - doc_pol)
        sentiment_scores_mapped = 2 * sentiment_scores - 1

        # Combine semantic and sentiment scores with alpha weighting
        combined_scores = self.alpha * sentiment_scores_mapped + (1 - self.alpha) * sim_scores

        # Filter candidates that meet threshold on combined score
        filtered_candidates = [c for c, s in zip(candidates, combined_scores) if s >= threshold]

        return filtered_candidates

    def extract_keywords(
        self,
        doc: str,
        top_n: int = 5,
        candidate_threshold: float = 0.4,
        keyphrase_ngram_range: Tuple[int, int] = (1, 3),
        print_doc_polarity: bool = False,
    ):
        """
        Extract top keywords from a document by combining semantic similarity and sentiment alignment.

        This method overrides the `extract_keywords` method from KeyBERT base class,
        adding sentiment-aware candidate filtering and scoring.

        Parameters:
        -----------
        doc : str
            Input document text.

        top_n : int
            Number of keywords to return.

        candidate_threshold : float
            Threshold score to filter candidate keywords.

        keyphrase_ngram_range : tuple of int
            N-gram range for candidate keyword extraction.

        print_doc_polarity : bool
            Whether to print the document's sentiment polarity score.

        Returns
        -------
        list of tuples
            List of (keyword, score, keyword_sentiment) tuples sorted by descending combined score.

        """

        # Select candidates filtered by combined semantic+sentiment scoring
        candidates = self._select_candidates(
            doc,
            ngram_range=keyphrase_ngram_range,
            threshold=candidate_threshold
        )
        if not candidates:
            print("No candidates passed the sentiment-semantic filter.")
            return []

        # Compute semantic embeddings for document and filtered candidates
        doc_emb = self.model.embed([doc])
        cand_emb = self.model.embed(candidates)

        # Compute continuous sentiment polarity for the document
        doc_pol = self._get_doc_polarity_continuous(doc)

        # Print document polarity if requested
        if print_doc_polarity:
            # Scale polarity from [0,1] to [0,10]
            scaled_pol = doc_pol * 10

            # Determine polarity label with neutral zone between 4 and 6 on 0-10 scale
            if scaled_pol < 5.5:
                polarity_label = "Negative"
            elif scaled_pol > 6.5:
                polarity_label = "Positive"
            else:
                polarity_label = "Neutral"

            print(f"\n=== Document Polarity Score: {scaled_pol:.2f} ({polarity_label}) ===\n")

        # Compute sentiment polarities for candidates
        cand_pols = self._get_candidate_polarities(candidates)

        # Calculate cosine semantic similarity scores (range [-1,1])
        sim_scores = cosine_similarity(doc_emb, cand_emb)[0]

        # Calculate sentiment alignment scores in [0,1] and map to [-1,1]
        sentiment_scores = 1 - np.abs(cand_pols - doc_pol)
        sentiment_scores_mapped = 2 * sentiment_scores - 1

        # Final combined score with weighting factor alpha in [-1,1]
        final_scores = self.alpha * sentiment_scores_mapped + (1 - self.alpha) * sim_scores

        # Select top_n keywords sorted by combined score descending
        top_indices = np.argsort(final_scores)[-top_n:][::-1]

        return [(candidates[i], final_scores[i], cand_pols[i]) for i in top_indices]


# Tests

## Test 1: Basic Keyword Extraction with Sentiment-Aware KeyBERT

This first test demonstrates the basic usage of the `KeyBERTSentimentAware` class on a simple document. We extract the top keywords combining semantic similarity and sentiment alignment with default parameter settings.

**Objectives:**
- Verify that the class instantiates correctly.
- Check that keywords are extracted without errors.
- Observe the impact of sentiment-aware ranking on keyword scores.

We use a short, clearly positive sentence to observe how sentiment affects the keyword selection.

In [10]:
from sentence_transformers import SentenceTransformer

# Initialize the embedding model 
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize Sentiment-Aware KeyBERT
kw_model = KeyBERTSentimentAware(
                        model=embedding_model, 
                        alpha=0.5,
                        sentiment_model_name="cardiffnlp/twitter-roberta-base-sentiment")

# Sample review
doc = "The movie was fantastic with beautiful visuals and great acting."

# Extract top 5 keywords with polarity info
keywords = kw_model.extract_keywords(doc, top_n=5, print_doc_polarity=True)

# Display results
print("Extracted Keywords, Scores, and Sentiment Polarity:\n")
print(f"{'Keyword':30s} {'Score':>10s} {'Polarity':>12s}")
print("-" * 55)
for kw, score, pol in keywords:
    print(f"{kw:30s} {score:10.4f} {pol:12.3f}")



=== Document Polarity Score: 9.94 (Positive) ===

Extracted Keywords, Scores, and Sentiment Polarity:

Keyword                             Score     Polarity
-------------------------------------------------------
movie fantastic beautiful          0.8259        0.978
movie fantastic                    0.7670        0.945
great acting                       0.7512        0.914
visuals great acting               0.7383        0.915
beautiful visuals great            0.7343        0.989


## Test 2: Comparing Sentiment-Aware KeyBERT with Base KeyBERT

In this test, we compare the keywords extracted by the sentiment-aware extension with those from the original KeyBERT model that relies solely on semantic similarity.

**Objectives:**
- Highlight differences in keyword selection between semantic-only and sentiment-aware approaches.
- Understand the effect of integrating sentiment on keyword ranking.
- Use the same document for a fair comparison.

We use a document containing both positive and negative elements to see how sentiment influences the extraction.

In [14]:
from sentence_transformers import SentenceTransformer
from keybert import KeyBERT as KB

# Load embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize models
kw_base = KB(model=embedding_model)
kw_sentiment = KeyBERTSentimentAware(
                        model=embedding_model, 
                        alpha=0.5,
                        sentiment_model_name="cardiffnlp/twitter-roberta-base-sentiment")

# Mixed sentiment document
doc_mixed = (
    "The film offers a solid cast and visually impressive scenes, "
    "although the dialogue often feels forced and some transitions lack coherence. "
    "While the soundtrack complements the tone effectively, the overall narrative structure "
    "is conventional and doesn't take many risks. It's a competent production with moments of brilliance, "
    "but also several missed opportunities."
)

# Extract keywords — Base KeyBERT
base_keywords = kw_base.extract_keywords(doc_mixed, top_n=5, keyphrase_ngram_range=(1, 3))

print("BASE KeyBERT Keywords:\n")
print(f"{'Keyword':30s} {'Score':>10s}")
print("-" * 42)
for kw, score in base_keywords:
    print(f"{kw:30s} {score:10.4f}")

# Extract keywords — Sentiment-Aware KeyBERT
sentiment_keywords = kw_sentiment.extract_keywords(
    doc_mixed,
    top_n=5,
    keyphrase_ngram_range=(1, 3),
    print_doc_polarity=True
)

print("Sentiment-Aware KeyBERT Keywords:\n")
print(f"{'Keyword':30s} {'Score':>10s} {'Polarity':>12s}")
print("-" * 58)
for kw, score, polarity in sentiment_keywords:
    print(f"{kw:30s} {score:10.4f} {polarity:12.3f}")


BASE KeyBERT Keywords:

Keyword                             Score
------------------------------------------
soundtrack complements tone        0.6131
impressive scenes dialogue         0.5666
scenes dialogue feels              0.5444
soundtrack complements             0.5318
coherence soundtrack complements     0.5222

=== Document Polarity Score: 8.51 (Positive) ===

Sentiment-Aware KeyBERT Keywords:

Keyword                             Score     Polarity
----------------------------------------------------------
impressive scenes dialogue         0.7602        0.874
soundtrack complements tone        0.7347        0.779
film offers solid                  0.7209        0.874
cast visually impressive           0.7045        0.862
visually impressive scenes         0.6739        0.914


## Test 3: Candidate Filtering with Different Thresholds



In this test, we explore how varying the candidate filtering threshold affects the pool of candidate keywords before the final ranking.

**Objectives:**
- Understand the impact of the `candidate_threshold` parameter on candidate selection.
- Observe how stricter thresholds reduce candidate pool size and potentially increase keyword relevance.
- Use a moderately complex document with mixed sentiment.

This test highlights the balance between recall and precision in candidate filtering.

In [16]:
from sentence_transformers import SentenceTransformer

# Initialize the semantic embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize our sentiment-aware KeyBERT with default weight_sentiment=0.5
kw_model = KeyBERTSentimentAware(
                        model=embedding_model, 
                        alpha=0.5,
                        sentiment_model_name="cardiffnlp/twitter-roberta-base-sentiment")


doc = (
    "Despite the beautiful cinematography and strong performances, "
    "the storyline was convoluted and difficult to follow at times."
)

thresholds = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]

for thresh in thresholds:
    print(f"\nCandidate Threshold: {thresh}")
    candidates = kw_model._select_candidates(doc, threshold=thresh)
    print(f"Number of candidates after filtering: {len(candidates)}")
    print("Candidates:", candidates)


Candidate Threshold: 0.0
Number of candidates after filtering: 26
Candidates: ['beautiful cinematography', 'beautiful cinematography strong', 'cinematography', 'cinematography strong', 'cinematography strong performances', 'convoluted', 'convoluted difficult', 'convoluted difficult follow', 'despite', 'despite beautiful', 'despite beautiful cinematography', 'difficult', 'difficult follow', 'difficult follow times', 'follow', 'follow times', 'performances', 'performances storyline', 'performances storyline convoluted', 'storyline', 'storyline convoluted', 'storyline convoluted difficult', 'strong', 'strong performances', 'strong performances storyline', 'times']

Candidate Threshold: 0.2
Number of candidates after filtering: 21
Candidates: ['cinematography', 'cinematography strong', 'cinematography strong performances', 'convoluted', 'convoluted difficult', 'convoluted difficult follow', 'despite', 'despite beautiful cinematography', 'difficult', 'difficult follow', 'difficult follow t

## Test 4: Impact of Sentiment Weighting (`alpha`) on Keyword Ranking



This test examines how changing the `alpha`, the weighting parameter influences the balance between semantic similarity and sentiment alignment in keyword scoring.

**Objectives:**
- Observe differences in extracted keywords when prioritizing sentiment vs. semantic relevance.
- Understand the flexibility of the model in adapting to different use cases by tuning `alpha`.
- Use a document with both positive and negative sentiments to highlight effect.

We test three values of `alpha`: 0.0 (semantic only), 0.5 (balanced), and 1.0 (sentiment only).

In [18]:
from sentence_transformers import SentenceTransformer

# 🔤 Load the embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize Sentiment-Aware KeyBERT
kw_model = KeyBERTSentimentAware(
                        model=embedding_model, 
                        alpha=0.5,
                        sentiment_model_name="cardiffnlp/twitter-roberta-base-sentiment")


# Sample reviews
neutral_rev = (
    "The film features competent acting and decent production values, "
    "but lacks any standout elements. While the visuals are polished and the plot flows logically, "
    "the characters remain underdeveloped and the emotional impact is minimal. "
    "It’s a watchable experience, neither particularly memorable nor offensive."
)

positive_rev = (
    "This film was an extraordinary blend of artistry and emotion. "
    "The performances were deeply moving, the visuals were stunning, and the direction masterful. "
    "Every scene felt purposeful and beautifully crafted. The story resonated on a profound level, "
    "leaving the audience inspired and emotionally fulfilled."
)

negative_rev = (
    "The movie was a frustrating mess of clichés and poor execution. "
    "The acting felt robotic, the plot was incoherent, and the pacing dragged endlessly. "
    "Even the soundtrack, which could have been a redeeming factor, was repetitive and uninspired. "
    "By the end, it felt like a complete waste of time."
)

# Choose review for testing
chosen_rev = positive_rev  # change to neutral_rev or negative_rev to test other tones

# Test different alpha values (including invalid ones)
weights = [0.0, 0.25, 0.5, 0.75, 1.0, -0.2, 1.2]

# Run test
for alpha in weights:
    try:
        print(f"Testing alpha = {alpha}")
        kw_model.alpha = alpha  # will raise ValueError if invalid

        keywords = kw_model.extract_keywords(
            chosen_rev,
            top_n=5,
            keyphrase_ngram_range=(1, 2),
            print_doc_polarity=True
        )

        print(f"{'Keyword':30s} {'Score':>10s} {'Polarity':>12s}")
        print("-" * 55)
        for kw, score, pol in keywords:
            print(f"{kw:30s} {score:10.4f} {pol:12.3f}")

    except Exception as e:
        print(f"Error for alpha = {alpha}: {e}")

Testing alpha = 0.0

=== Document Polarity Score: 9.90 (Positive) ===

Keyword                             Score     Polarity
-------------------------------------------------------
film extraordinary                 0.6329        0.807
scene felt                         0.5675        0.483
masterful scene                    0.5253        0.825
emotion performances               0.5084        0.517
performances deeply                0.5039        0.684
Testing alpha = 0.25

=== Document Polarity Score: 9.90 (Positive) ===

Keyword                             Score     Polarity
-------------------------------------------------------
film extraordinary                 0.6332        0.807
visuals stunning                   0.5881        0.961
masterful scene                    0.5616        0.825
stunning direction                 0.5161        0.888
story resonated                    0.5120        0.837
Testing alpha = 0.5

=== Document Polarity Score: 9.90 (Positive) ===

Keyword       

## Test 5:  Comparing Keyword Extraction Between Base KeyBERT and Sentiment-Aware KeyBERT


This test compares the standard `KeyBERT` model with the `KeyBERTSentimentAware` extension on a small set of reviews with varying sentiment. The goal is to assess how sentiment integration influences the selection and ranking of keywords.

**Objectives:**
- Compute sentiment polarity for each review and observe the average emotional tone across the dataset.
- Extract top keywords using both the standard KeyBERT and the sentiment-aware model.
- Rank keywords based on their average score and frequency of occurrence across reviews.
- Apply semantic diversity filtering to ensure that selected keywords are not too similar to each other.
- Compare the final top 5 keywords from both models, and observe how sentiment-aware keywords differ in alignment and relevance.

This test demonstrates how sentiment-aware filtering can shift the focus of keyword extraction, promoting emotionally consistent terms and reducing the prominence of keywords that conflict with the document’s tone.

In [21]:
from collections import defaultdict
from sentence_transformers import SentenceTransformer
from keybert import KeyBERT as KB
import numpy as np

# Load the embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize models: base and sentiment-aware
kw_base = KB(model=embedding_model)
kw_sentiment = KeyBERTSentimentAware(
    model=embedding_model,
    alpha=0.5,
    sentiment_model_name="nlptown/bert-base-multilingual-uncased-sentiment"
)

# Define sample reviews
reviews = [
    """This film is a stunning achievement in storytelling, with unforgettable characters and a gripping plot.
    The visuals are breathtaking, and the soundtrack perfectly complements every scene.""",

    """I was completely captivated from start to finish. The performances were heartfelt,
    and the director’s vision shines through in every frame. A truly inspiring movie experience.""",

    """A beautifully crafted film with rich emotional depth. The screenplay is tight,
    and the cinematography creates an immersive atmosphere that kept me hooked.""",

    """One of the best films I've seen in years. It combines a compelling narrative with excellent acting,
    making it both entertaining and thought-provoking.""",

    """An absolute masterpiece! Every element, from the score to the visual effects,
    contributes to a powerful and uplifting cinematic journey.""",

    """An utter disaster of a film. The plot was incoherent, characters were completely flat, and the 
    pacing was excruciatingly slow. Dialogue felt forced and unnatural throughout. 
    I struggled to stay awake — a total waste of time. I hate this film in all its aspects"""
]

# Helper: Convert polarity score into a label
def polarity_label(p):
    if p < 0.4:
        return "Negative"
    elif p > 0.6:
        return "Positive"
    else:
        return "Neutral"

# Compute and print sentiment polarity for each review
print("Sentiment polarity per review (0 = negative, 1 = positive):\n")
polarities = []
for i, review in enumerate(reviews, 1):
    pol = kw_sentiment._get_doc_polarity_continuous(review)
    polarities.append(pol)
    print(f" Review #{i}: Polarity = {pol:.3f} ({polarity_label(pol)})")

mean_polarity = np.mean(polarities)
print(f"\nAverage sentiment polarity across all reviews: {mean_polarity:.3f} ({polarity_label(mean_polarity)})")

# Accumulate scores and polarities per keyword across reviews
def accumulate_keywords(model, reviews):
    scores = defaultdict(float)
    counts = defaultdict(int)
    polarities = defaultdict(list)

    for review in reviews:
        kws = model.extract_keywords(review, top_n=7, keyphrase_ngram_range=(1, 3))

        for result in kws:
            if isinstance(result, tuple) and len(result) == 3:
                kw, score, polarity = result
                polarities[kw].append(polarity)
            elif isinstance(result, tuple) and len(result) == 2:
                kw, score = result
                polarity = float('nan')
            else:
                continue

            scores[kw] += score
            counts[kw] += 1

    return scores, counts, polarities

# Extract data from both models
base_scores, base_counts, base_pols = accumulate_keywords(kw_base, reviews)
sent_scores, sent_counts, sent_pols = accumulate_keywords(kw_sentiment, reviews)

# Compute average score and average polarity per keyword
def compute_averages(scores, counts, polarities):
    avg_scores = {kw: scores[kw] / counts[kw] for kw in scores}
    avg_pols = {kw: np.mean(polarities[kw]) for kw in polarities} if polarities else {}
    return avg_scores, avg_pols

base_avg_scores, base_avg_pols = compute_averages(base_scores, base_counts, base_pols)
sent_avg_scores, sent_avg_pols = compute_averages(sent_scores, sent_counts, sent_pols)

# Rank keywords using score + normalized frequency
def rank_keywords(avg_scores, counts, alpha=0.5):
    max_count = max(counts.values()) if counts else 1
    ranked = []
    for kw in avg_scores:
        freq_norm = counts[kw] / max_count
        combined_score = alpha * avg_scores[kw] + (1 - alpha) * freq_norm
        ranked.append((kw, combined_score))
    ranked.sort(key=lambda x: x[1], reverse=True)
    return ranked

base_ranked = rank_keywords(base_avg_scores, base_counts)
sent_ranked = rank_keywords(sent_avg_scores, sent_counts)

# Embed keyword phrases and normalize embeddings
def embed_keywords(keywords):
    return embedding_model.encode(keywords, convert_to_tensor=True, normalize_embeddings=True).cpu().numpy()

# Select top N semantically diverse keywords
def select_diverse_keywords(ranked_list, avg_scores, avg_pols, top_n=5, similarity_threshold=0.7):
    selected = []
    selected_embs = []

    all_keywords = [kw for kw, _ in ranked_list]
    all_embs = embed_keywords(all_keywords)

    for i, (kw, _) in enumerate(ranked_list):
        if not selected:
            selected.append((kw, avg_scores[kw], avg_pols.get(kw, float('nan'))))
            selected_embs.append(all_embs[i])
        else:
            emb = all_embs[i]
            sims = [np.dot(emb, se) for se in selected_embs]
            if max(sims) < similarity_threshold:
                selected.append((kw, avg_scores[kw], avg_pols.get(kw, float('nan'))))
                selected_embs.append(emb)
        if len(selected) >= top_n:
            break
    return selected

# Final selection
top_base = select_diverse_keywords(base_ranked, base_avg_scores, base_avg_pols, top_n=5)
top_sent = select_diverse_keywords(sent_ranked, sent_avg_scores, sent_avg_pols, top_n=5)

# Print final results
def print_topics(title, topics, show_polarity=True, flop=False):
    if flop:
        print(f"\n{title} (Flop {len(topics)}):")
        topics_to_print = reversed(topics)
    else:
        print(f"\n{title} (Top {len(topics)}):")
        topics_to_print = topics

    header = f"{'Keyword':40s} {'Avg Score':>10s}"
    if show_polarity:
        header += f" {'Avg Polarity':>14s}"
    print(header)
    print("-" * len(header))

    for kw, score, pol in topics_to_print:
        line = f"{kw:40s} {score:10.4f}"
        if show_polarity:
            line += f" {pol:14.3f}"
        print(line)

# Display both models’ top keywords
print_topics("Top 5 Diverse Keywords - Base KeyBERT", top_base, show_polarity=False)
print_topics("Top 5 Diverse Keywords - Sentiment-Aware KeyBERT", top_sent, show_polarity=True)


Sentiment polarity per review (0 = negative, 1 = positive):

 Review #1: Polarity = 0.957 (Positive)
 Review #2: Polarity = 0.983 (Positive)
 Review #3: Polarity = 0.877 (Positive)
 Review #4: Polarity = 0.980 (Positive)
 Review #5: Polarity = 0.993 (Positive)
 Review #6: Polarity = 0.022 (Negative)

Average sentiment polarity across all reviews: 0.802 (Positive)

Top 5 Diverse Keywords - Base KeyBERT (Top 5):
Keyword                                   Avg Score
---------------------------------------------------
emotional depth screenplay                   0.6815
uplifting cinematic journey                  0.6554
absolute masterpiece                         0.6435
narrative excellent acting                   0.6363
beautifully crafted film                     0.6284

Top 5 Diverse Keywords - Sentiment-Aware KeyBERT (Top 5):
Keyword                                   Avg Score   Avg Polarity
------------------------------------------------------------------
absolute masterpiece         

In [22]:
# Select flop 5 keywords (lowest ranked) for each model
flop_base = base_ranked[-5:]
flop_sent = sent_ranked[-5:]

# Disabling diversity filtering for flop: simply take the flop keywords as is
flop_base_diverse = [(kw, base_avg_scores[kw], base_avg_pols.get(kw, float('nan'))) for kw, _ in flop_base]
flop_sent_diverse = [(kw, sent_avg_scores[kw], sent_avg_pols.get(kw, float('nan'))) for kw, _ in flop_sent]

# Print flop keywords
print_topics("Flop 5 Keywords - Base KeyBERT", flop_base_diverse, show_polarity=False, flop=True)
print_topics("Flop 5 Keywords - Sentiment-Aware KeyBERT", flop_sent_diverse, show_polarity=True, flop=True)



Flop 5 Keywords - Base KeyBERT (Flop 5):
Keyword                                   Avg Score
---------------------------------------------------
combines compelling narrative                0.4711
dialogue felt forced                         0.4803
slow dialogue felt                           0.5029
heartfelt director                           0.5045
director vision shines                       0.5103

Flop 5 Keywords - Sentiment-Aware KeyBERT (Flop 5):
Keyword                                   Avg Score   Avg Polarity
------------------------------------------------------------------
unnatural struggled stay                     0.6130          0.095
truly inspiring                              0.6159          0.972
unnatural struggled                          0.6184          0.085
narrative excellent                          0.6246          0.875
utter disaster                               0.6258          0.066


### Summary of Results

The comparison highlights key differences between the base KeyBERT and the sentiment-aware extension:

- **KeyBERT** selects keywords based purely on semantic similarity, often favoring neutral or generic terms that may not reflect the review's tone.
- **Sentiment-aware KeyBERT** integrates sentiment polarity during both filtering and scoring, surfacing keywords that are emotionally aligned with each individual review.

This results in:
- More emotionally coherent keywords at both the single-review and aggregate level.
- Penalization of sentimentally inconsistent keywords in globally positive (or negative) datasets.
- Improved interpretability for applications where sentiment is important, such as opinion mining or customer feedback analysis.

Incorporating sentiment enhances keyword relevance, emotional alignment, and the overall contextual quality of extracted topics.
