# KeyBERT with Sentiment-aware Embedding Fusion

This notebook introduces a **sentiment-aware extension** of the KeyBERT keyword extraction model, which integrates sentiment information directly into the candidate selection and ranking process. Unlike simple post-hoc reranking approaches, this method incorporates sentiment consistency during both candidate filtering and final keyword scoring.

### Theoretical Approach

Traditional KeyBERT extracts candidate keywords purely based on semantic similarity between the document embedding and candidate embeddings. This extension enhances the process by considering the **emotional coherence** between candidates and the document, operationalized as continuous sentiment polarity scores.

Given:
-  $\text{sim}_{sem}$ : cosine similarity between document and candidate embeddings,
-  $s_{doc} \in [0,1]$: continuous sentiment polarity score of the document,
-  $s_{cand} \in [0,1]$: continuous sentiment polarity score of a candidate keyword,

we define the **sentiment alignment score** as:

$$
\text{align}(s_{doc}, s_{cand}) = 1 - |s_{doc} - s_{cand}|
$$

which equals 1 for perfect polarity match and decreases linearly to 0 for maximal polarity difference.

The overall combined score used to filter and rank candidates is:

$$
\text{score}_{final} = w_{sentiment} \times \text{align}(s_{doc}, s_{cand}) + (1 - w_{sentiment}) \times \text{sim}_{sem}
$$

where $$ w_{sentiment} \in [0,1] $$ is a tunable weight balancing sentiment alignment and semantic similarity.

### Key Features

- **Integrated sentiment filtering**: Sentiment is incorporated early to filter out candidates that are sentimentally incongruent with the document, not only at reranking stage.
- **Continuous sentiment modeling**: Uses probability-weighted sentiment polarity scores from a pretrained transformer classifier, enabling nuanced sentiment comparisons.
- **Flexible weighting parameter**: The parameter \( w_{sentiment} \) allows task-specific tuning of the relative importance of sentiment versus semantic relevance.
- **Candidate generation enhancement**: The candidate pool is initially large and filtered by combined semantic and sentiment scores, improving quality and relevance.

### Advantages Over Post-hoc Reranking

- Unlike reranking approaches that adjust keyword order **after** candidate generation, this method filters candidates **before** ranking, reducing noise and irrelevant candidates early.
- Sentiment influences the candidate pool itself, resulting in more coherent and contextually appropriate keyword extraction.
- The approach remains compatible with any KeyBERT-compatible embedding model and sentiment classification backend.

### Intended Applications

This sentiment-aware KeyBERT extension is especially suited for sentiment-rich domains such as:

- Product and service reviews
- Social media opinion mining
- Customer feedback analysis
- Any text where emotional tone is critical to understanding key themes

It enables the extraction of keywords that are both topically relevant and emotionally aligned, enhancing interpretability and downstream analysis.


### Setup: Installing and Importing Required Libraries

In [138]:
import subprocess
import sys

# List of required packages
required_packages = [
    "numpy", "torch", "scikit-learn", "keybert", "transformers", "sentence_transformers", "collections"
    
]

def install_package(package):
    """Installs a package using pip if it's not already installed."""
    try:
        __import__(package)
        print(f"{package} is already installed.")
    except ImportError:
        print(f"Installing {package}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# Check and install missing packages
for package in required_packages:
    install_package(package)

numpy is already installed.
torch is already installed.
Installing scikit-learn...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
keybert is already installed.
transformers is already installed.
sentence_transformers is already installed.
collections is already installed.



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m


In [103]:
import numpy as np  # Fundamental package for numerical computing in Python
from typing import Tuple  # Used for type hinting tuples in function signatures

import torch  # Core PyTorch library for tensor computations
import torch.nn.functional as F  # Functional interface for activation functions, etc.

from sklearn.feature_extraction.text import CountVectorizer  # Extract text n-gram candidates
from sklearn.metrics.pairwise import cosine_similarity  # Compute cosine similarity between embeddings

from keybert import KeyBERT as KB  # KeyBERT keyword extraction base class

from transformers import (
    AutoTokenizer,  # Tokenizer for preparing input text for transformer models
    AutoModelForSequenceClassification  # Transformer model for classification tasks
)

from sentence_transformers import SentenceTransformer # Sentence transformer for generating sentence embeddings

# Classes Definition

## SentimentModel Class: Transformer-based Sentiment Probability Predictor

The `SentimentModel` class is a wrapper around a pretrained HuggingFace transformer model designed for sentiment classification. It provides a convenient interface to obtain **probability distributions over sentiment classes** for batches of input texts.

### Purpose and Functionality

- **Model loading:**  
  Upon initialization, the class loads both the tokenizer and the sequence classification model specified by the `model_name`.  
  By default, it uses `"nlptown/bert-base-multilingual-uncased-sentiment"`, a multilingual BERT model fine-tuned for 5-class sentiment classification (1 to 5 stars).

- **Device management:**  
  The model and tokenizer are moved to the specified device (`cpu` or `cuda`).  
  Input validation ensures that `cuda` is only used if a compatible GPU is available.

- **Batch sentiment prediction:**  
  The core method `predict_proba` takes a list of texts and:  
  1. Tokenizes and encodes them into the format expected by the transformer.  
  2. Performs a forward pass through the model without computing gradients (efficient inference).  
  3. Applies a softmax to the output logits to obtain a probability distribution over the sentiment classes for each text.  
  4. Returns a NumPy array of shape `(batch_size, num_classes)` containing the class probabilities.

### Advantages

- Allows seamless integration of sentiment analysis into larger NLP pipelines.
- Outputs probabilistic sentiment scores, enabling nuanced, continuous sentiment representations rather than hard labels.
- Supports batch processing for efficiency.

### Example Output

**Text:**  
_I absolutely loved this movie! It was fantastic._  
Sentiment probabilities (1 to 5 stars): [0.01 0.02 0.05 0.12 0.80]


**Text:**  
_The plot was boring and predictable._  
Sentiment probabilities (1 to 5 stars): [0.70 0.20 0.07 0.02 0.01]


**Text:**  
_The movie was okay, nothing special but not bad either._  
Sentiment probabilities (1 to 5 stars): [0.05 0.10 0.65 0.15 0.05]


In [104]:
class SentimentModel:
    """
    Wrapper class for a HuggingFace transformer sentiment classification model.

    This class loads a pretrained sentiment classification model and tokenizer,
    and provides a method to compute the probability distribution over sentiment classes
    for a batch of input texts.

    Parameters:
    -----------
    model_name : str, optional
        The identifier of the pretrained sentiment model on HuggingFace Hub.
        Default is "nlptown/bert-base-multilingual-uncased-sentiment",
        a 5-class sentiment classifier (1 to 5 stars).

    device : str, optional
        The device to run the model on. Typical values: "cpu" or "cuda".
    """

    def __init__(
            self, 
            model_name="nlptown/bert-base-multilingual-uncased-sentiment", 
            device="cpu"):
        
        # Set the device for model computation
        if device not in ["cpu", "cuda"]:
            raise ValueError("Device must be 'cpu' or 'cuda'.")
        
        if device == "cuda" and not torch.cuda.is_available():
            raise ValueError("CUDA is not available. Please use 'cpu' instead.")
        
        self.device = device

        # Load the tokenizer associated with the pretrained model
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)

        # Load the pretrained sequence classification model on the given device
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name).to(device)

    def predict_proba(self, texts):
        """
        Compute the probability distribution over sentiment classes for input texts.

        Parameters:
        -----------
        texts : list of str
            List of input texts for which to compute sentiment probabilities.

        Returns:
        --------
        numpy.ndarray
            Array of shape (len(texts), num_classes) where each row corresponds
            to the probability distribution over sentiment classes for that text.
        """
        # Tokenize and encode the input texts, handling padding and truncation for batching
        inputs = self.tokenizer(texts, padding=True, truncation=True, return_tensors="pt").to(self.device)
        
        with torch.no_grad():
            # Perform forward pass without gradient computation for efficiency
            outputs = self.model(**inputs)
            logits = outputs.logits  # raw model outputs before softmax
            
            # Convert logits to probabilities using softmax along class dimension
            probs = F.softmax(logits, dim=1).cpu().numpy()
        return probs


## KeyBERTSentimentAware Class: Sentiment-Integrated Keyword Extraction

This class extends the base KeyBERT model by integrating sentiment analysis directly into the keyword extraction pipeline. It enhances the traditional semantic-only approach by incorporating continuous sentiment polarity scores for both the entire document and each candidate keyword.

### Overview

- **Candidate Extraction:**  
  Uses `CountVectorizer` to extract a broad pool of candidate keywords (n-grams) from the document text.  
  **Note:** This initial candidate generation is purely statistical and **does not incorporate sentiment information**.

- **Sentiment Analysis:**  
  Leverages a pretrained transformer sentiment classification model to compute **continuous sentiment polarity scores** ranging from 0 (very negative) to 1 (very positive) for both the document and each candidate.

- **Combined Scoring and Filtering:**  
  Calculates a weighted score combining:
  - Semantic similarity (cosine similarity between embeddings).
  - Sentiment alignment (1 minus the absolute difference between candidate and document sentiment).

  Candidates with combined scores below a threshold are **filtered out before final ranking**, effectively integrating sentiment as a filter immediately after candidate extraction.

  The weighting is controlled by `weight_sentiment`:
  - `weight_sentiment=1.0` means keywords are ranked purely by sentiment alignment.
  - `weight_sentiment=0.0` means keywords are ranked purely by semantic similarity.

### Candidate Selection in KeyBERT vs Sentiment-Aware Extension

In the original KeyBERT model, the candidate keywords are extracted purely based on **statistical properties** of the text. Specifically, KeyBERT uses a tool like `CountVectorizer` to identify n-grams (contiguous sequences of words) that appear frequently or are relevant according to basic frequency statistics. This means:

- The **candidate pool is generated without any semantic or sentiment understanding**.
- All candidates are treated equally in this phase, regardless of their emotional tone or contextual relevance beyond raw occurrence patterns.

This purely **statistical candidate extraction** can lead to a large number of candidates that are relevant but may not align emotionally with the overall document sentiment. For example, in a strongly negative review, KeyBERT might still generate positive-sounding candidates simply because those phrases appear often, potentially misrepresenting the sentiment conveyed.

To address this limitation, our sentiment-aware extension introduces a **joint filtering mechanism** that combines both semantic relevance and sentiment alignment **immediately after the initial statistical candidate extraction**:

1. We first extract a large pool of candidates statistically using `CountVectorizer` to ensure broad coverage.

2. We compute **continuous sentiment polarity scores** for both the entire document and each candidate keyword using a pretrained transformer sentiment model.

3. We calculate a **combined score** for each candidate that balances:
   - Semantic similarity to the document (embedding cosine similarity).
   - Sentiment alignment with the document's overall polarity (inverted absolute difference between sentiment scores).

4. Candidates whose combined score falls below a threshold are **filtered out early**, significantly reducing the pool to those that are both topically and emotionally relevant.

This approach allows the model to **avoid candidates that are semantically plausible but sentimentally inconsistent**, leading to more meaningful and context-aware keyword extraction.

### Summary

| Step                        | KeyBERT Base                        | Sentiment-Aware Extension          |
|-----------------------------|-----------------------------------|-----------------------------------|
| Candidate generation         | Purely statistical (n-gram counts)| Statistical, followed by sentiment-semantic filtering (no sentiment during extraction but sentiment used immediately after to filter) |
| Candidate ranking            | Semantic similarity only           | Semantic + sentiment combined     |
| Sentiment consideration     | None                              | Integral part of candidate filtering|

By incorporating sentiment as an early filtering step (post-statistical extraction), our extension improves the **precision and emotional coherence** of extracted keywords, especially in domains where sentiment plays a crucial role.

### Parameters

- `model`: The base semantic embedding model (usually a SentenceTransformer).
- `sentiment_model_name`: Identifier of the pretrained sentiment model (default is a 5-class multilingual sentiment classifier).
- `weight_sentiment`: Balances importance between sentiment alignment and semantic similarity.
- `candidate_pool_size`: Number of candidates initially extracted.
- `device`: Compute device, `"cpu"` or `"cuda"`.

### Usage

The class allows flexible, context-aware keyword extraction that respects both topical relevance and emotional tone, ideal for analyzing opinion-rich texts such as reviews or social media posts.

---


In [105]:
class KeyBERTSentimentAware(KB):
    """
    Extension of KeyBERT to integrate sentiment analysis in keyword extraction.

    This class overrides and extends parts of KeyBERT's pipeline to:
    - Extract a larger candidate pool using CountVectorizer.
    - Calculate sentiment polarity scores for the document and candidates,
      using a pretrained sentiment classification model with continuous outputs.
    - Combine semantic similarity and sentiment alignment scores via a weighting factor weight_sentiment.
    - Filter candidate keywords based on this combined score before final ranking.

    Parameters:
    -----------
    model : SentenceTransformer
        Semantic embedding model used by KeyBERT.

    sentiment_model_name : str, optional (default: "nlptown/bert-base-multilingual-uncased-sentiment")
        Identifier of pretrained sentiment model on HuggingFace Hub.

    weight_sentiment : float, optional (default: 0.7)
        Weight to balance sentiment alignment vs semantic similarity.
        weight_sentiment=1.0 means only sentiment alignment is considered.
        weight_sentiment=0.0 means only semantic similarity is considered.

    candidate_pool_size : int, optional (default: 100)
        Maximum number of initial candidate keywords to extract.

    device : str, optional (default: "cpu")
        Device to run embedding and sentiment models on ("cpu" or "cuda").
    """

    def __init__(
        self,
        model,
        sentiment_model_name: str = "nlptown/bert-base-multilingual-uncased-sentiment",
        weight_sentiment: float = 0.7,
        candidate_pool_size: int = 100,
        device: str = "cpu",
    ):
        # Validate that the specified device is either 'cpu' or 'cuda'
        valid_devices = {"cpu", "cuda"}
        if device not in valid_devices:
            raise ValueError(f"Device must be one of {valid_devices}.")
        
        # Check CUDA availability if 'cuda' is requested
        if device == "cuda" and not torch.cuda.is_available():
            raise ValueError("CUDA is not available. Please use 'cpu' instead.")

        # Validate input types to ensure correct usage
        if not isinstance(model, SentenceTransformer):
            raise TypeError("model must be an instance of SentenceTransformer.")
        if not isinstance(sentiment_model_name, str):
            raise TypeError("sentiment_model_name must be a string.")
        if not isinstance(weight_sentiment, float):
            raise TypeError("weight_sentiment must be a float.")
        if not isinstance(candidate_pool_size, int):
            raise TypeError("candidate_pool_size must be an integer.")

        # Validate value ranges to prevent logical errors
        if not (0.0 <= weight_sentiment <= 1.0):
            raise ValueError("weight_sentiment must be between 0 and 1 inclusive.")
        if candidate_pool_size <= 0:
            raise ValueError("candidate_pool_size must be a positive integer.")

        # Initialize the superclass (KeyBERT) with the semantic embedding model
        super().__init__(model)

        # Assign validated parameters to instance variables
        self.weight_sentiment = weight_sentiment
        self.candidate_pool_size = candidate_pool_size
        self.device = device

        # Store the semantic embedding model for embedding computation
        self.embedder = model

        # Initialize the sentiment model wrapper with the given model name and device
        self.sentiment_model = SentimentModel(sentiment_model_name, device=device)

        # Define the ordered sentiment labels corresponding to model output classes
        self.labels_ordered = ['1 star', '2 stars', '3 stars', '4 stars', '5 stars']

        # Create a mapping from sentiment labels to continuous numeric scores between 0 and 1
        self.label_to_score = {
            label: i / (len(self.labels_ordered) - 1)
            for i, label in enumerate(self.labels_ordered)
        }


    def _get_doc_polarity_continuous(self, doc: str) -> float:
        """
        Compute the document's continuous sentiment polarity score as the weighted sum of
        predicted class probabilities multiplied by their numeric mappings.

        This method overrides and replaces any default sentiment handling in the base class.

        Parameters:
        -----------
        doc : str
            The document text.

        Returns:
        --------
        float
            Continuous sentiment polarity score between 0 (very negative) and 1 (very positive).
        """
        # Get probability distribution over sentiment classes for the document
        probs = self.sentiment_model.predict_proba([doc])[0]

        # Compute continuous polarity as weighted average of class scores
        polarity = sum(
            p * self.label_to_score[label]
            for p, label in zip(probs, self.labels_ordered)
        )
        return polarity

    def _get_candidate_polarities(self, candidates) -> np.ndarray:
        """
        Compute continuous sentiment polarity scores for each candidate keyword.

        This method extends candidate scoring with sentiment, overriding base candidate processing.

        Parameters:
        -----------
        candidates : iterable of str
            List of candidate keywords.

        Returns:
        --------
        np.ndarray
            Array of polarity scores for each candidate keyword.
        """
        candidates = list(candidates)  # ensure correct input format for tokenizer
        
        # Batch predict probabilities for all candidates
        probs_list = self.sentiment_model.predict_proba(candidates)
        
        polarities = []
        for probs in probs_list:
            # Weighted average as continuous polarity score
            polarity = sum(
                p * self.label_to_score[label]
                for p, label in zip(probs, self.labels_ordered)
            )
            polarities.append(polarity)
        return np.array(polarities)

    def _select_candidates(
            self, 
            doc: str, 
            ngram_range: Tuple[int, int] = (1, 3), 
            threshold: float = 0.4
    ):
        """
        Extract initial candidates with CountVectorizer and filter them based on combined
        semantic similarity and sentiment alignment scores.

        This method replaces the default candidate generation and filtering steps of KeyBERT,
        incorporating sentiment filtering before final keyword ranking.

        Parameters:
        -----------
        doc : str
            Document text.

        ngram_range : tuple of int
            N-gram size range for candidate extraction.

        threshold : float
            Minimum combined score for candidate retention.

        Returns:
        --------
        list of str
            Filtered list of candidate keywords.
        """
        # Extract candidates with CountVectorizer (statistical n-grams)
        vectorizer = CountVectorizer(
            ngram_range=ngram_range,
            stop_words='english',
            max_features=self.candidate_pool_size
        )
        candidates = vectorizer.fit([doc]).get_feature_names_out()

        # Compute semantic embeddings for doc and candidates
        doc_emb = self.model.embed([doc])
        cand_emb = self.model.embed(candidates)

        # Compute continuous sentiment polarity scores
        doc_pol = self._get_doc_polarity_continuous(doc)
        cand_pols = self._get_candidate_polarities(candidates)

        # Calculate cosine semantic similarity scores
        sim_scores = cosine_similarity(doc_emb, cand_emb)[0]

        # Calculate sentiment alignment scores
        sentiment_scores = 1 - np.abs(cand_pols - doc_pol)

        # Combine semantic and sentiment scores with alpha weighting
        combined_scores = self.weight_sentiment * sentiment_scores + (1 - self.weight_sentiment) * sim_scores

        # Filter candidates that meet threshold on combined score
        filtered_candidates = [c for c, s in zip(candidates, combined_scores) if s >= threshold]

        return filtered_candidates

    def extract_keywords(
        self,
        doc: str,
        top_n: int = 5,
        candidate_threshold: float = 0.4,
        keyphrase_ngram_range: Tuple[int, int] = (1, 3),
        print_doc_polarity: bool = False,
    ):
        """
        Extract top keywords from a document by combining semantic similarity and sentiment alignment.

        This method overrides the `extract_keywords` method from KeyBERT base class,
        adding sentiment-aware candidate filtering and scoring.

        Parameters:
        -----------
        doc : str
            Input document text.

        top_n : int
            Number of keywords to return.

        candidate_threshold : float
            Threshold score to filter candidate keywords.

        keyphrase_ngram_range : tuple of int
            N-gram range for candidate keyword extraction.

        print_doc_polarity : bool
            Whether to print the document's sentiment polarity score.

        Returns:
        --------
        list of tuples
            List of (keyword, score) tuples sorted by descending combined score.
        """

        # Select candidates filtered by combined semantic+sentiment scoring
        candidates = self._select_candidates(
            doc,
            ngram_range=keyphrase_ngram_range,
            threshold=candidate_threshold
        )
        if not candidates:
            print("No candidates passed the sentiment-semantic filter.")
            return []

        # Compute semantic embeddings for document and filtered candidates
        doc_emb = self.model.embed([doc])
        cand_emb = self.model.embed(candidates)

        # Normalize embeddings to unit length for cosine similarity in [-1,1]
        from sklearn.preprocessing import normalize
        doc_emb_norm = normalize(doc_emb)
        cand_emb_norm = normalize(cand_emb)

        # Compute continuous sentiment polarity for the document
        doc_pol = self._get_doc_polarity_continuous(doc)

        # Print document polarity if requested
        if print_doc_polarity:
            # Scale polarity from [0,1] to [0,10]
            scaled_pol = doc_pol * 10

            # Determine polarity label with neutral zone between 4 and 6 on 0-10 scale
            if scaled_pol < 4:
                polarity_label = "Negative"
            elif scaled_pol > 6:
                polarity_label = "Positive"
            else:
                polarity_label = "Neutral"

            print(f"\n=== Document Polarity Score: {scaled_pol:.2f} ({polarity_label}) ===\n")

        # Compute sentiment polarities for candidates
        cand_pols = self._get_candidate_polarities(candidates)

        # Calculate cosine semantic similarity scores (range [-1,1])
        sim_scores = cosine_similarity(doc_emb_norm, cand_emb_norm)[0]

        # Calculate sentiment alignment scores in [0,1]
        sentiment_scores = 1 - np.abs(cand_pols - doc_pol)

        # Map sentiment alignment from [0,1] to [-1,1]
        sentiment_scores_mapped = 2 * sentiment_scores - 1

        # Final combined score with weighting factor weight_sentiment in [-1,1]
        final_scores = self.weight_sentiment * sentiment_scores_mapped + (1 - self.weight_sentiment) * sim_scores

        # Select top_n keywords sorted by combined score descending
        top_indices = np.argsort(final_scores)[-top_n:][::-1]

        return [(candidates[i], final_scores[i]) for i in top_indices]



# Tests

## Test 1: Basic Keyword Extraction with Sentiment-Aware KeyBERT

This first test demonstrates the basic usage of the `KeyBERTSentimentAware` class on a simple document. We extract the top keywords combining semantic similarity and sentiment alignment with default parameter settings.

**Objectives:**
- Verify that the class instantiates correctly.
- Check that keywords are extracted without errors.
- Observe the impact of sentiment-aware ranking on keyword scores.

We use a short, clearly positive sentence to observe how sentiment affects the keyword selection.

In [113]:
from sentence_transformers import SentenceTransformer

# Initialize the semantic embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize our sentiment-aware KeyBERT with default weight_sentiment=0.7
kw_model = KeyBERTSentimentAware(model=embedding_model)

# Sample document with positive sentiment
doc = "The movie was fantastic with beautiful visuals and great acting."

# Extract top 5 keywords
keywords = kw_model.extract_keywords(doc, top_n=5, print_doc_polarity=True)

print("Extracted Keywords and Scores:\n")
for keyword, score in keywords:
    print(f"{keyword:20s} \t score: {score:.4f}")



=== Document Polarity Score: 9.31 (Positive) ===

Extracted Keywords and Scores:

movie fantastic beautiful 	 score: 0.9029
movie fantastic      	 score: 0.8555
great acting         	 score: 0.8540
beautiful visuals great 	 score: 0.8231
fantastic beautiful visuals 	 score: 0.8135


## Test 2: Comparing Sentiment-Aware KeyBERT with Base KeyBERT

In this test, we compare the keywords extracted by the sentiment-aware extension with those from the original KeyBERT model that relies solely on semantic similarity.

**Objectives:**
- Highlight differences in keyword selection between semantic-only and sentiment-aware approaches.
- Understand the effect of integrating sentiment on keyword ranking.
- Use the same document for a fair comparison.

We use a document containing both positive and negative elements to see how sentiment influences the extraction.

In [114]:
from sentence_transformers import SentenceTransformer
from keybert import KeyBERT as KB

# Initialize the semantic embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize base KeyBERT model with the same embedding model
kw_base = KB(model=embedding_model)

# Initialize our sentiment-aware KeyBERT with default weight_sentiment=0.7
kw_sentiment = KeyBERTSentimentAware(model=embedding_model)

doc_mixed = (
    "The film had stunning visual effects and an amazing soundtrack, "
    "but the plot was predictable and the pacing was slow at times."
)

# Extract keywords using base KeyBERT
base_keywords = kw_base.extract_keywords(doc_mixed, top_n=5, keyphrase_ngram_range=(1, 1))

print("BASE KeyBERT Keywords:\b")
for kw, score in base_keywords:
    print(f"{kw:20s} {score:.4f}")

# Extract keywords using sentiment-aware KeyBERT (with weight_sentiment=0.7)
sentiment_keywords = kw_sentiment.extract_keywords(
    doc_mixed,
    top_n=5,
    keyphrase_ngram_range=(1, 1),
    print_doc_polarity=True
)

print("Sentiment-Aware KeyBERT Keywords:\n")
for kw, score in sentiment_keywords:
    print(f"{kw:20s} {score:.4f}")


BASE KeyBERT Keywords
pacing               0.3888
film                 0.3627
soundtrack           0.3257
predictable          0.3226
plot                 0.2605

=== Document Polarity Score: 3.22 (Negative) ===

Sentiment-Aware KeyBERT Keywords:

slow                 0.7235
predictable          0.6371
plot                 0.4930
pacing               0.3587
effects              0.3586


## Test 3: Candidate Filtering with Different Thresholds



In this test, we explore how varying the candidate filtering threshold affects the pool of candidate keywords before the final ranking.

**Objectives:**
- Understand the impact of the `candidate_threshold` parameter on candidate selection.
- Observe how stricter thresholds reduce candidate pool size and potentially increase keyword relevance.
- Use a moderately complex document with mixed sentiment.

This test highlights the balance between recall and precision in candidate filtering.

In [115]:
from sentence_transformers import SentenceTransformer

# Initialize the semantic embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize our sentiment-aware KeyBERT with default weight_sentiment=0.7
kw_model = KeyBERTSentimentAware(model=embedding_model)

doc = (
    "Despite the beautiful cinematography and strong performances, "
    "the storyline was convoluted and difficult to follow at times."
)

thresholds = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]

for thresh in thresholds:
    print(f"\nCandidate Threshold: {thresh}")
    candidates = kw_model._select_candidates(doc, threshold=thresh)
    print(f"Number of candidates after filtering: {len(candidates)}")
    print("Candidates:", candidates)



Candidate Threshold: 0.0
Number of candidates after filtering: 27
Candidates: ['beautiful', 'beautiful cinematography', 'beautiful cinematography strong', 'cinematography', 'cinematography strong', 'cinematography strong performances', 'convoluted', 'convoluted difficult', 'convoluted difficult follow', 'despite', 'despite beautiful', 'despite beautiful cinematography', 'difficult', 'difficult follow', 'difficult follow times', 'follow', 'follow times', 'performances', 'performances storyline', 'performances storyline convoluted', 'storyline', 'storyline convoluted', 'storyline convoluted difficult', 'strong', 'strong performances', 'strong performances storyline', 'times']

Candidate Threshold: 0.2
Number of candidates after filtering: 27
Candidates: ['beautiful', 'beautiful cinematography', 'beautiful cinematography strong', 'cinematography', 'cinematography strong', 'cinematography strong performances', 'convoluted', 'convoluted difficult', 'convoluted difficult follow', 'despite',

## Test 4: Impact of Sentiment Weighting (`weight_sentiment`) on Keyword Ranking



This test examines how changing the `weight_sentiment` parameter influences the balance between semantic similarity and sentiment alignment in keyword scoring.

**Objectives:**
- Observe differences in extracted keywords when prioritizing sentiment vs. semantic relevance.
- Understand the flexibility of the model in adapting to different use cases by tuning `weight_sentiment`.
- Use a document with both positive and negative sentiments to highlight effect.

We test three values of `weight_sentiment`: 0.0 (semantic only), 0.5 (balanced), and 1.0 (sentiment only).

In [118]:
from sentence_transformers import SentenceTransformer

# Initialize the semantic embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize our sentiment-aware KeyBERT with default weight_sentiment=0.7
kw_model = KeyBERTSentimentAware(model=embedding_model)

doc = ("The film featured breathtaking visuals and an outstanding soundtrack that "
       "truly immersed the audience in the story. The lead actors delivered powerful "
       "performances, bringing depth and emotion to their characters. However, the pacing "
       "was somewhat slow in the middle, and certain plot points felt predictable and underdeveloped. "
       "Despite these shortcomings, the compelling narrative arc and the strong direction kept "
       "the viewers engaged throughout. The film's ending was uplifting and satisfying, "
       "leaving a lasting impression. Overall, it was a highly enjoyable experience that combined "
       "artistic excellence with heartfelt storytelling.")

weights = [0.0, 0.25, 0.5, 0.75, 1.0, -0.2, 1.2]  # Includes invalid weights

for w in weights:
    try:
        kw_model.weight_sentiment = w
        # Manually check weight range, raise error if invalid
        if not (0.0 <= w <= 1.0):
            raise ValueError(f"weight_sentiment value {w} is out of valid range [0,1].")
        keywords = kw_model.extract_keywords(
            doc, 
            top_n=5, 
            print_doc_polarity=True, 
            keyphrase_ngram_range=(1, 2)
        )
        print(f"\nweight_sentiment = {w}\n")
        for kw, score in keywords:
            print(f"{kw:20s} \t score: {score:.4f}")
    except Exception as e:
        print(f"\nweight_sentiment = {w} caused error: {e}")



=== Document Polarity Score: 6.07 (Positive) ===


weight_sentiment = 0.0

film ending          	 score: 0.5824
breathtaking visuals 	 score: 0.4919
outstanding soundtrack 	 score: 0.4868
story lead           	 score: 0.4825
heartfelt storytelling 	 score: 0.4742

=== Document Polarity Score: 6.07 (Positive) ===


weight_sentiment = 0.25

story lead           	 score: 0.6053
film ending          	 score: 0.5888
breathtaking visuals 	 score: 0.5784
narrative arc        	 score: 0.5625
audience story       	 score: 0.5625

=== Document Polarity Score: 6.07 (Positive) ===


weight_sentiment = 0.5

story lead           	 score: 0.7281
narrative arc        	 score: 0.6953
audience story       	 score: 0.6869
characters pacing    	 score: 0.6867
lead actors          	 score: 0.6837

=== Document Polarity Score: 6.07 (Positive) ===


weight_sentiment = 0.75

story lead           	 score: 0.8509
lead actors          	 score: 0.8311
narrative arc        	 score: 0.8281
characters pacing    	 s

## Test 5: Keyword Extraction on a Document with Mixed Sentiment


This test evaluates the performance of the `KeyBERTSentimentAware` model on a document containing both positive and negative sentiments. It helps observe how the model balances semantic relevance and sentiment alignment when the document expresses contrasting opinions.

**Objectives:**
- Verify that the model extracts keywords reflecting both the positive and negative aspects of the text.
- Assess the impact of sentiment-aware filtering and scoring in realistic scenarios.

In [121]:
from sentence_transformers import SentenceTransformer

# Initialize the semantic embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize our sentiment-aware KeyBERT with default weight_sentiment=0.7
kw_model = KeyBERTSentimentAware(model=embedding_model)

doc_mixed = (
    "The movie had decent cinematography and the actors performed adequately. "
    "The storyline was straightforward and predictable, with neither surprising twists nor major flaws. "
    "The soundtrack complemented the scenes suitably, without standing out. "
    "Overall, the film provided a passable entertainment experience—nothing exceptional," 
    "but not disappointing either."
)

# Extract top 5 keywords with n-grams up to length 3
keywords = kw_model.extract_keywords(doc_mixed, top_n=5, keyphrase_ngram_range=(1, 3), print_doc_polarity=True)

print("Keywords extracted from mixed sentiment document:")
for kw, score in keywords:
    print(f"{kw:20s} \t score: {score:.4f}")



=== Document Polarity Score: 4.72 (Neutral) ===

Keywords extracted from mixed sentiment document:
actors performed adequately 	 score: 0.8231
storyline straightforward 	 score: 0.7820
adequately storyline straightforward 	 score: 0.7623
flaws soundtrack complemented 	 score: 0.7451
adequately storyline 	 score: 0.7011


## Test 6:  Comparing Keyword Extraction Between Base KeyBERT and Sentiment-Aware KeyBERT


This test evaluates the difference in keyword extraction between the standard KeyBERT model (which uses only semantic similarity) and our extended Sentiment-Aware KeyBERT model, which integrates sentiment alignment into the keyword selection process.

- **Sentiment Polarity Computation:**  
  For each review, it calculates a continuous sentiment polarity score between 0 (very negative) and 1 (very positive) using the sentiment model embedded in the Sentiment-Aware KeyBERT. It prints the polarity for each review and the average polarity across all reviews, categorized as Negative, Neutral, or Positive.

- **Keyword Extraction and Scoring:**  
  For each review and each model (base and sentiment-aware), it extracts the top keywords along with their semantic scores. For the sentiment-aware model, it also calculates the average sentiment polarity for each keyword.

- **Ranking Keywords:**  
  Keywords are ranked based on a combination of their average semantic score and how frequently they appear across the reviews. This ensures the selection favors keywords that are both important and consistently relevant.

- **Diversity Filtering:**  
  To avoid redundant keywords that are semantically too similar, the test applies a diversity filter based on cosine similarity of keyword embeddings. Only keywords that are sufficiently distinct from previously selected keywords are retained.

- **Result Presentation:**  
  The final output lists the top 5 diverse keywords from each model, showing their average scores. For the sentiment-aware model, the average sentiment polarity of each keyword is also displayed to highlight emotional alignment.

**Purpose**:

This test demonstrates how integrating sentiment information affects the keyword extraction results by:

- Highlighting differences in topic selection between purely semantic and sentiment-aware approaches.
- Showing how sentiment-aware keywords tend to align better with the emotional tone of the reviews.
- Providing insight into the practical impact of sentiment-aware extensions for keyword extraction tasks.

In [150]:
from collections import defaultdict
from sentence_transformers import SentenceTransformer
from keybert import KeyBERT as KB
import numpy as np

# Initialize the semantic embedding model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize base KeyBERT (semantic-only) and sentiment-aware KeyBERT
kw_base = KB(model=embedding_model)
kw_sentiment = KeyBERTSentimentAware(model=embedding_model, weight_sentiment=0.7)

# Define reviews with mixed sentiments to highlight topic differences
reviews = [
    """This film is a stunning achievement in storytelling, with unforgettable characters and a gripping plot.
    The visuals are breathtaking, and the soundtrack perfectly complements every scene.""",

    """I was completely captivated from start to finish. The performances were heartfelt,
    and the director’s vision shines through in every frame. A truly inspiring movie experience.""",

    """A beautifully crafted film with rich emotional depth. The screenplay is tight,
    and the cinematography creates an immersive atmosphere that kept me hooked.""",

    """One of the best films I've seen in years. It combines a compelling narrative with excellent acting,
    making it both entertaining and thought-provoking.""",

    """An absolute masterpiece! Every element, from the score to the visual effects,
    contributes to a powerful and uplifting cinematic journey.""",

    """“An utter disaster of a film. The plot was incoherent, characters were completely flat, and the 
    pacing was excruciatingly slow. Dialogue felt forced and unnatural throughout. 
    I struggled to stay awake — a total waste of time.”
"""
]

# Function to convert continuous polarity (0 to 1) into categorical sentiment label
def polarity_label(p):
    if p < 0.4:
        return "Negative"
    elif p > 0.6:
        return "Positive"
    else:
        return "Neutral"

# Compute and print the sentiment polarity for each review individually
print("Sentiment polarity per review (0 = negative, 1 = positive):")
polarities = []
for i, review in enumerate(reviews, 1):
    pol = kw_sentiment._get_doc_polarity_continuous(review)
    polarities.append(pol)
    print(f" Review #{i}: Polarity = {pol:.3f} ({polarity_label(pol)})")

# Compute and print average polarity over all reviews with categorical label
mean_polarity = np.mean(polarities)
print(f"\nAverage sentiment polarity across all reviews: {mean_polarity:.3f} ({polarity_label(mean_polarity)})\n")

# Function to accumulate scores and polarities per keyword for a given model
def accumulate_keywords(model, reviews):
    scores = defaultdict(float)
    counts = defaultdict(int)
    polarities = defaultdict(list)

    for review in reviews:
        # Extract keywords with their scores
        kws = model.extract_keywords(review, top_n=7, keyphrase_ngram_range=(1,3))
        for kw, score in kws:
            scores[kw] += score
            counts[kw] += 1

            # For sentiment-aware model: calculate polarity per keyword
            if hasattr(model, '_get_candidate_polarities'):
                cand_pol = kw_sentiment._get_candidate_polarities([kw])[0]
                polarities[kw].append(cand_pol)
    return scores, counts, polarities

# Accumulate data for both models
base_scores, base_counts, base_pols = accumulate_keywords(kw_base, reviews)
sent_scores, sent_counts, sent_pols = accumulate_keywords(kw_sentiment, reviews)

# Compute average score and average polarity per keyword
def compute_averages(scores, counts, polarities):
    avg_scores = {kw: scores[kw] / counts[kw] for kw in scores}
    avg_pols = {kw: np.mean(polarities[kw]) for kw in polarities} if polarities else {}
    return avg_scores, avg_pols

base_avg_scores, base_avg_pols = compute_averages(base_scores, base_counts, base_pols)
sent_avg_scores, sent_avg_pols = compute_averages(sent_scores, sent_counts, sent_pols)

# Ranking function combining average score and normalized frequency to balance importance and consistency
def rank_keywords(avg_scores, counts, alpha=0.7):
    max_count = max(counts.values()) if counts else 1
    ranked = []
    for kw in avg_scores:
        freq_norm = counts[kw] / max_count  # Normalize frequency to [0,1]
        combined_score = alpha * avg_scores[kw] + (1 - alpha) * freq_norm
        ranked.append((kw, combined_score))
    ranked.sort(key=lambda x: x[1], reverse=True)  # Sort descending by combined score
    return ranked

# Rank keywords for both models
base_ranked = rank_keywords(base_avg_scores, base_counts)
sent_ranked = rank_keywords(sent_avg_scores, sent_counts)

# Function to embed keyword phrases and normalize embeddings for cosine similarity
def embed_keywords(keywords):
    return embedding_model.encode(keywords, convert_to_tensor=True, normalize_embeddings=True).cpu().numpy()

# Select top N keywords ensuring semantic diversity by filtering out candidates too similar to already selected ones
def select_diverse_keywords(ranked_list, avg_scores, avg_pols, top_n=5, similarity_threshold=0.7):
    selected = []
    selected_embs = []

    all_keywords = [kw for kw, _ in ranked_list]
    all_embs = embed_keywords(all_keywords)

    for i, (kw, _) in enumerate(ranked_list):
        if not selected:
            selected.append((kw, avg_scores[kw], avg_pols.get(kw, float('nan'))))
            selected_embs.append(all_embs[i])
        else:
            emb = all_embs[i]
            # Compute max cosine similarity with selected keywords
            sims = [np.dot(emb, se) for se in selected_embs]
            if max(sims) < similarity_threshold:
                selected.append((kw, avg_scores[kw], avg_pols.get(kw, float('nan'))))
                selected_embs.append(emb)
        if len(selected) >= top_n:
            break
    return selected

# Select top 5 diverse keywords for each model
top_base = select_diverse_keywords(base_ranked, base_avg_scores, base_avg_pols, top_n=5)
top_sent = select_diverse_keywords(sent_ranked, sent_avg_scores, sent_avg_pols, top_n=5)

# Nicely print the final selected keywords with average scores and polarities (only for sentiment-aware)
def print_topics(title, topics, show_polarity=True, flop=False):
    if flop:
        print(f"\n{title} (Flop {len(topics)}):")
        # Reverse order for flop to show worst first
        topics_to_print = reversed(topics)
    else:
        print(f"\n{title} (Top {len(topics)}):")
        topics_to_print = topics

    header = f"{'Keyword':40s} {'Avg Score':>10s}"
    if show_polarity:
        header += f" {'Avg Polarity':>14s}"
    print(header)
    print("-" * len(header))

    for kw, score, pol in topics_to_print:
        line = f"{kw:40s} {score:10.4f}"
        if show_polarity:
            line += f" {pol:14.3f}"
        print(line)


# Print results
print_topics("Top 5 Diverse Keywords - Base KeyBERT", top_base, show_polarity=False)
print_topics("Top 5 Diverse Keywords - Sentiment-Aware KeyBERT", top_sent, show_polarity=True)


Sentiment polarity per review (0 = negative, 1 = positive):
 Review #1: Polarity = 0.957 (Positive)
 Review #2: Polarity = 0.983 (Positive)
 Review #3: Polarity = 0.877 (Positive)
 Review #4: Polarity = 0.980 (Positive)
 Review #5: Polarity = 0.993 (Positive)
 Review #6: Polarity = 0.025 (Negative)

Average sentiment polarity across all reviews: 0.803 (Positive)


Top 5 Diverse Keywords - Base KeyBERT (Top 5):
Keyword                                   Avg Score
---------------------------------------------------
emotional depth screenplay                   0.6815
uplifting cinematic journey                  0.6554
absolute masterpiece                         0.6435
narrative excellent acting                   0.6363
beautifully crafted film                     0.6284

Top 5 Diverse Keywords - Sentiment-Aware KeyBERT (Top 5):
Keyword                                   Avg Score   Avg Polarity
------------------------------------------------------------------
absolute masterpiece         

In [151]:
# Select flop 5 keywords (lowest ranked) for each model
flop_base = base_ranked[-5:]
flop_sent = sent_ranked[-5:]

# Disabling diversity filtering for flop: simply take the flop keywords as is
flop_base_diverse = [(kw, base_avg_scores[kw], base_avg_pols.get(kw, float('nan'))) for kw, _ in flop_base]
flop_sent_diverse = [(kw, sent_avg_scores[kw], sent_avg_pols.get(kw, float('nan'))) for kw, _ in flop_sent]

# Print flop keywords nicely
print_topics("Flop 5 Keywords - Base KeyBERT", flop_base_diverse, show_polarity=False, flop=True)
print_topics("Flop 5 Keywords - Sentiment-Aware KeyBERT", flop_sent_diverse, show_polarity=True, flop=True)



Flop 5 Keywords - Base KeyBERT (Flop 5):
Keyword                                   Avg Score
---------------------------------------------------
combines compelling narrative                0.4711
heartfelt director                           0.5045
director vision shines                       0.5103
best films ve                                0.5111
dialogue felt                                0.5181

Flop 5 Keywords - Sentiment-Aware KeyBERT (Flop 5):
Keyword                                   Avg Score   Avg Polarity
------------------------------------------------------------------
powerful uplifting                           0.7013          0.900
unnatural struggled stay                     0.7101          0.095
narrative excellent acting                   0.7114          0.851
inspiring movie                              0.7116          0.878
unnatural struggled                          0.7161          0.085


### Analysis of the results

The results from comparing the base KeyBERT model and the sentiment-aware KeyBERT extension reveal **notable differences** in keyword extraction behavior. The **base KeyBERT** relies solely on semantic similarity, selecting keywords that are statistically relevant but **ignore the emotional tone** of the text. Thus, even lower-ranked (flop) keywords tend to reflect generic narrative or cinematic elements without sentiment alignment.

In contrast, the **sentiment-aware KeyBERT** incorporates sentiment polarity during candidate selection and scoring, combining **semantic relevance** with **sentiment alignment**. It surfaces keywords that better reflect the **positive or negative tone** of individual reviews, as shown by the high average polarity of top keywords. Importantly, this model analyzes the **polarity of each single review** for sentiment alignment, ensuring keywords are contextually emotionally coherent at the review level.

Interestingly, even among lower-ranked keywords, the sentiment-aware model maintains a clear distinction between **positive and negative terms**, highlighting its sensitivity to emotional nuance. This integration is especially valuable in tasks like opinion mining and customer feedback analysis, where understanding sentiment is crucial.

Moreover, when aggregating keywords across multiple reviews to select overall top and flop topics, the model applies a global reasoning: since the set of reviews is **overall positive** (reflected by the average polarity), keywords with **negative sentiment are penalized** at this stage. This ensures that the final extracted topics are coherent with the **dominant emotional tone** of the audience’s feedback.

Overall, this sentiment-aware extension enhances interpretability and better aligns with human perception by avoiding generic keywords and emphasizing those that **resonate emotionally** both at the review and aggregate level.