# Validation and Final Evaluation of KeyBERT Extensions

In this notebook, we evaluate and compare the performance of the original KeyBERT model against two custom extensions:
- **KeyBERTSentimentReranker**: reranks keywords based on their sentiment alignment with the document.
- **KeyBERTSentimentAware**: integrates sentiment during keyword selection.

### Setup: Installing and Importing Required Libraries

In [12]:
import subprocess
import sys

# List of required packages
required_packages = [
    "numpy", "pandas", "torch", "scikit-learn", "matplotlib", "seaborn", "tqdm",
    "nltk", "scipy", "keybert", "transformers", "sentence-transformers"
]

def install_package(package):
    """Installs a package using pip if it's not already installed."""
    try:
        __import__(package)
        print(f"{package} is already installed.")
    except ImportError:
        print(f"Installing {package}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# Check and install missing packages
for package in required_packages:
    install_package(package)

numpy is already installed.
pandas is already installed.
torch is already installed.
Installing scikit-learn...
Defaulting to user installation because normal site-packages is not writeable



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m


matplotlib is already installed.
seaborn is already installed.
tqdm is already installed.
nltk is already installed.
scipy is already installed.
keybert is already installed.
transformers is already installed.
Installing sentence-transformers...
Defaulting to user installation because normal site-packages is not writeable



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m


In [13]:
# Library for interacting with the operating system
import os

# Core libraries for data manipulation and analysis
import pandas as pd            
import numpy as np       

# Library for random number generation
import random

# Progress bar utility
from tqdm import tqdm         

# Visualization libraries
import matplotlib.pyplot as plt  
import seaborn as sns      

# Library for time-related functions
import time

# Similarity computation
from sklearn.metrics.pairwise import cosine_similarity  # Computes cosine similarity between vectors (e.g., between document and keyword embeddings)

# Loads pretrained BERT-based models to compute dense vector representations of text
from sentence_transformers import SentenceTransformer  # type: ignore

# Main library used for extracting keywords from documents using BERT embeddings
from keybert import KeyBERT  # type: ignore

# Library for displaying HTML content in Jupyter notebooks
from IPython.display import display

# Library for text formatting
from textwrap import fill


### Load Datasets

This cell loads the main datasets used in this project:

- `preprocessed_sw_reviews_df.pkl` and `preprocessed_others_reviews_df.pkl`:  
  These contain preprocessed reviews (lemmatized, stopword-removed, lower-cased, ...) for Star Wars and other movies.
  
- `custom_sw_reviews_df.pkl` and `custom_others_reviews_df.pkl`:  
  These are further processed versions where generic film-related terms (e.g., "movie", "director", "cinema") are removed to avoid bias in keyword extraction.

- `keywords_df.pkl`:  
  A manually annotated dataset containing keywords per movie along with their "Helpful" and "Not_Helpful" vote counts from users.

**Included movies:**

- **Star Wars Series:**
  - Star Wars: Episode I – The Phantom Menace (1999)  
  - Star Wars: Episode II – Attack of the Clones (2002)  
  - Star Wars: Episode III – Revenge of the Sith (2005)  
  - Star Wars: Episode IV – A New Hope (1977)  
  - Star Wars: Episode V – The Empire Strikes Back (1980)  
  - Star Wars: Episode VI – Return of the Jedi (1983)  
  - Star Wars: Episode VII – The Force Awakens (2015)  
  - Star Wars: Episode VIII – The Last Jedi (2017)  
  - Star Wars: Episode IX – The Rise of Skywalker (2019)

- **Other Movies:**
  - Parasite (2019)  
  - The Good, the Bad and the Ugly (1966)  
  - Harry Potter and the Sorcerer's Stone (2001)  
  - Oppenheimer (2023)  
  - La La Land (2016)  
  - Raiders of the Lost Ark (1981)

In [None]:
# Define paths
DATA_DIR = "../../Dataset"

# Preprocessed review datasets
PREPROCESSED_SW_PATH = os.path.join(DATA_DIR, "preprocessed_sw_reviews_df.pkl")
PREPROCESSED_OTHERS_PATH = os.path.join(DATA_DIR, "preprocessed_others_reviews_df.pkl")

# Custom review datasets (with film-related terms removed)
CUSTOM_SW_PATH = os.path.join(DATA_DIR, "custom_preprocessed_sw_reviews_df.pkl")
CUSTOM_OTHERS_PATH = os.path.join(DATA_DIR, "custom_preprocessed_others_reviews_df.pkl")

# Reference keyword annotations
KEYWORDS_PATH = os.path.join(DATA_DIR, "keywords_df.pkl")

# Load datasets
sw_df = pd.read_pickle(PREPROCESSED_SW_PATH)
others_df = pd.read_pickle(PREPROCESSED_OTHERS_PATH)
sw_custom_df = pd.read_pickle(CUSTOM_SW_PATH)
others_custom_df = pd.read_pickle(CUSTOM_OTHERS_PATH)
keywords_df = pd.read_pickle(KEYWORDS_PATH)

# Display basic info
print("Star Wars reviews (preprocessed):", sw_df.shape)
print("Other reviews (preprocessed):", others_df.shape)
print("Star Wars reviews (custom):", sw_custom_df.shape)
print("Other reviews (custom):", others_custom_df.shape)
print("Keywords annotation:", keywords_df.shape)

Star Wars reviews (preprocessed): (36192, 11)
Other reviews (preprocessed): (15132, 11)
Star Wars reviews (custom): (36192, 11)
Other reviews (custom): (15132, 11)
Keywords annotation: (5617, 4)


In [None]:
# Display function
def display_datasets(sw_df, others_df, sw_custom_df, others_custom_df, keywords_df):
    print("Star Wars Reviews (Preprocessed):")
    display(sw_df.head())

    print("\nOther Reviews (Preprocessed):")
    display(others_df.head())

    print("\nStar Wars Reviews (Custom):")
    display(sw_custom_df.head())

    print("\nOther Reviews (Custom):")
    display(others_custom_df.head())

    print("\nKeywords Annotation:")
    display(keywords_df.head())

# Display datasets
display_datasets(sw_df, others_df, sw_custom_df, others_custom_df, keywords_df)


Star Wars Reviews (Preprocessed):


Unnamed: 0,Review_ID,Movie_ID,Movie_Title,Rating,Review_Date,Review_Title,Review_Text,Helpful_Votes,Total_Votes,Processed_Review_Text,Processed_Review_Title
0,2221293,tt0076759,Star Wars: Episode IV - A New Hope,,15 March 2010,Impossible to watch with fresh eyes,It was a long time ago when I first saw Star W...,0.0,0.0,long time ago first see star war watch part tr...,impossible watch fresh eye
1,4756672,tt0076759,Star Wars: Episode IV - A New Hope,10.0,1 April 2019,It's Still Just Star Wars to Me,While I will acknowledge its faults this is st...,0.0,0.0,acknowledge fault still one favorite film time...,still star war
2,156096,tt0076759,Star Wars: Episode IV - A New Hope,10.0,19 January 1999,A modern myth that can't be beat,Star Wars is a modern myth that has a story li...,0.0,0.0,star war modern myth story line can not beat t...,modern myth can not beat
3,155657,tt0076759,Star Wars: Episode IV - A New Hope,,28 August 1999,There is a God and his name is George Lucas,I saw for the first time when I was six years ...,0.0,0.0,see first time six year old way back get old...,god name george lucas
4,155649,tt0076759,Star Wars: Episode IV - A New Hope,1.0,31 August 1999,Good but over-rated.,"Frankly, I think ""Star wars"" is a great movie....",7.0,53.0,frankly think star war great movie way first...,good overrate



Other Reviews (Preprocessed):


Unnamed: 0,Review_ID,Movie_ID,Movie_Title,Rating,Review_Date,Review_Title,Review_Text,Helpful_Votes,Total_Votes,Processed_Review_Text,Processed_Review_Title
0,9637661,tt6751668,Parasite,5.0,23 February 2024,"Solid Film Craftsmanship, Trash Story",I'm genuinely baffled this film won not only b...,3.0,8.0,genuinely baffle film good foreign film good d...,solid film craftsmanship trash story
1,5510542,tt6751668,Parasite,10.0,26 February 2020,MASTERPIECE,Just watch it. It has everything; entertainmen...,3.0,5.0,watch everything entertainment comedy thrill h...,masterpiece
2,5182892,tt6751668,Parasite,10.0,12 October 2019,First Hit: I really enjoyed this story as it d...,First Hit: I really enjoyed this story as it d...,24.0,40.0,first hit really enjoy story dive hilarious ab...,first hit really enjoy story dive hilarious ab...
3,5499682,tt6751668,Parasite,9.0,21 February 2020,If you love cliché stories this movie is not f...,I was not expecting that much of this movie. N...,2.0,5.0,expect much movie normally film nominate oscar...,love clich story movie
4,6094155,tt6751668,Parasite,8.0,14 September 2020,Amazing.,"Good acting, cinematography, twists and screen...",0.0,0.0,good act cinematography twist screenplay side ...,amazing



Star Wars Reviews (Custom):


Unnamed: 0,Review_ID,Movie_ID,Movie_Title,Rating,Review_Date,Review_Title,Review_Text,Helpful_Votes,Total_Votes,Processed_Review_Text,Processed_Review_Title
0,2221293,tt0076759,Star Wars: Episode IV - A New Hope,,15 March 2010,Impossible to watch with fresh eyes,It was a long time ago when I first saw Star W...,0.0,0.0,long time ago first see star war watch part ea...,impossible watch fresh eye
1,4756672,tt0076759,Star Wars: Episode IV - A New Hope,10.0,1 April 2019,It's Still Just Star Wars to Me,While I will acknowledge its faults this is st...,0.0,0.0,acknowledge fault still one favorite time reme...,still star war
2,156096,tt0076759,Star Wars: Episode IV - A New Hope,10.0,19 January 1999,A modern myth that can't be beat,Star Wars is a modern myth that has a story li...,0.0,0.0,star war modern myth story line can not beat t...,modern myth can not beat
3,155657,tt0076759,Star Wars: Episode IV - A New Hope,,28 August 1999,There is a God and his name is George Lucas,I saw for the first time when I was six years ...,0.0,0.0,see first time six year old way back get old t...,god name george lucas
4,155649,tt0076759,Star Wars: Episode IV - A New Hope,1.0,31 August 1999,Good but over-rated.,"Frankly, I think ""Star wars"" is a great movie....",7.0,53.0,frankly think star war great way first kind im...,good overrate



Other Reviews (Custom):


Unnamed: 0,Review_ID,Movie_ID,Movie_Title,Rating,Review_Date,Review_Title,Review_Text,Helpful_Votes,Total_Votes,Processed_Review_Text,Processed_Review_Title
0,9637661,tt6751668,Parasite,5.0,23 February 2024,"Solid Film Craftsmanship, Trash Story",I'm genuinely baffled this film won not only b...,3.0,8.0,genuinely baffle good foreign good directing w...,solid film craftsmanship trash story
1,5510542,tt6751668,Parasite,10.0,26 February 2020,MASTERPIECE,Just watch it. It has everything; entertainmen...,3.0,5.0,watch everything entertainment comedy thrill h...,masterpiece
2,5182892,tt6751668,Parasite,10.0,12 October 2019,First Hit: I really enjoyed this story as it d...,First Hit: I really enjoyed this story as it d...,24.0,40.0,first hit really enjoy story dive hilarious ab...,first hit really enjoy story dive hilarious ab...
3,5499682,tt6751668,Parasite,9.0,21 February 2020,If you love cliché stories this movie is not f...,I was not expecting that much of this movie. N...,2.0,5.0,expect much normally nominate oscar favorite s...,love clich story movie
4,6094155,tt6751668,Parasite,8.0,14 September 2020,Amazing.,"Good acting, cinematography, twists and screen...",0.0,0.0,good act twist screenplay side like location c...,amazing



Keywords Annotation:


Unnamed: 0,Movie_ID,Keyword,Helpful,Not_Helpful
0,tt0076759,rebellion,15,0
1,tt0076759,princess,12,0
2,tt0076759,space opera,11,0
3,tt0076759,good versus evil,10,0
4,tt0076759,droid,9,0


### Load Custom KeyBERT Extensions

In this step, we import the custom KeyBERT extensions developed specifically for this project:

- `KeyBERTSentimentReranker`: re-ranks the keywords extracted by standard KeyBERT based on their sentiment alignment with the overall tone of the document.
- `KeyBERTSentimentAware`: integrates sentiment directly into the keyword selection phase, adjusting the similarity score using a transformer-based sentiment classifier.

These models are used alongside the original KeyBERT to assess the impact of incorporating sentiment information into keyword extraction.

**Note:**

>  All models share the same embedding backbone (`all-MiniLM-L6-v2`) to ensure a fair and consistent comparison.  

> The sentiment-aware variants use a transformer-based sentiment classifier, with `cardiffnlp/twitter-roberta-base-sentiment` as the default option (alternatively, `nlptown/bert-base-multilingual-uncased-sentiment` can be used).  

> An alpha value of **0.5** is used uniformly across both extensions to balance semantic similarity and sentiment influence.


In [26]:
# Add custom module path
import sys
sys.path.append("../../SentimentAwareKeyBERT") 

# Import custom KeyBERT extensions
from models.reranker_sentiment import KeyBERTSentimentReranker  # Post-hoc sentiment-based reranking
from models.embedding_sentiment import KeyBERTSentimentAware    # Sentiment-aware selection during extraction

# Import base model and sentence transformer
from keybert import KeyBERT
from sentence_transformers import SentenceTransformer

# Instantiate the models with the same embedding backbone
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

model_base = KeyBERT(model=embedding_model)

model_reranker = KeyBERTSentimentReranker(
    model=embedding_model,
    alpha=0.5,  # Weight for the sentiment score in the reranking process
    sentiment_model_name="cardiffnlp/twitter-roberta-base-sentiment"  # You can also use: "nlptown/bert-base-multilingual-uncased-sentiment"
)

model_sentiment_aware = KeyBERTSentimentAware(
    model=embedding_model,
    alpha=0.5,  # Weight for the sentiment score in the extraction process
    sentiment_model_name="cardiffnlp/twitter-roberta-base-sentiment"  # You can also use: "nlptown/bert-base-multilingual-uncased-sentiment"
)

print("All models loaded successfully.")


All models loaded successfully.


## Validation 1 — Keyword Output Comparison on a Single Review

In this validation, we extract keywords from the same review using the following models:

- **KeyBERT** (baseline): purely semantic extraction, with no sentiment awareness.
- **KeyBERTSentimentReranker**: reorders keywords based on sentiment alignment with the document.
- **KeyBERTSentimentAware**: selects keywords by integrating sentiment directly into the similarity scoring.

We use a review from *Star Wars: Episode VIII (2017)*, a film known for its polarized reception, which makes it a suitable case for testing sentiment-aware behavior.

The goal is to visually compare the top n keywords extracted by each model and observe how sentiment impacts keyword selection.


In [None]:
# Keyword extraction wrapper
def run_keyword_extraction(model, text, top_n=5):
    """
    Runs keyword extraction using the specified model.
    If the model supports 'print_doc_polarity', it will be enabled.
    Returns the extracted keywords and execution time.
    """
    start = time.time()
    
    kwargs = {
        "top_n": top_n,
        "keyphrase_ngram_range": (1, 3)
    }

    # Enable polarity printing if the model supports it
    if hasattr(model, "print_doc_polarity"):
        kwargs["print_doc_polarity"] = True

    results = model.extract_keywords(text, **kwargs)
    duration = time.time() - start
    return results, duration


# Display results for each model 
def display_keywords(title, keywords, elapsed_time):
    print(f"\n{'='*60}")
    print(f"{title}")
    print(f"Execution time: {elapsed_time:.2f} seconds")
    print("-" * 60)

    for i, kw in enumerate(keywords, 1):
        if isinstance(kw, tuple) and len(kw) == 3:
            print(f"{i}. {kw[0]:<40} (Score: {kw[1]:.4f}, Sentiment: {kw[2]:.2f})")
        elif isinstance(kw, tuple) and len(kw) == 2:
            print(f"{i}. {kw[0]:<40} (Score: {kw[1]:.4f})")
        else:
            print(f"{i}. {kw}")

    print(f"{'='*60}")




In [28]:
# Filter all custom reviews for Star Wars: Episode VIII (2017)
sw8_reviews = sw_custom_df[sw_custom_df["Movie_Title"] == "Star Wars: Episode VIII - The Last Jedi"]

# Randomly select one review from this subset
random_index = random.randint(0, len(sw8_reviews) - 1)
test_review = sw8_reviews["Processed_Review_Text"].iloc[random_index]

# Print review
print("Test Review (Star Wars: Episode VIII):")
print(fill(test_review, width=100))
print("\n" + "="*100)

# Extract keywords from all models
print("Extracting with KeyBERT...")
keywords_base, time_base = run_keyword_extraction(model_base, test_review)

print("Extracting with KeyBERTSentimentReranker...")
keywords_reranked, time_reranker = run_keyword_extraction(model_reranker, test_review)

print("Extracting with KeyBERTSentimentAware...")
keywords_aware, time_aware = run_keyword_extraction(model_sentiment_aware, test_review)

# Call display function
display_keywords("KeyBERT (Baseline)", keywords_base, time_base)
display_keywords("KeyBERTSentimentReranker", keywords_reranked, time_reranker)
display_keywords("KeyBERTSentimentAware", keywords_aware, time_aware)


Test Review (Star Wars: Episode VIII):
low expectation last jedi first desappointment force awaken second bad totally unfair get top notch
star war worthy plot fantastic full twist acting great space battle amazing whole keep interested
till end not mind critic wether fan scifi lover go one will not regret

Extracting with KeyBERT...
Extracting with KeyBERTSentimentReranker...
Extracting with KeyBERTSentimentAware...

KeyBERT (Baseline)
⏱Execution time: 0.44 seconds
------------------------------------------------------------
1. force awaken second                      (Score: 0.5745)
2. expectation jedi desappointment          (Score: 0.5337)
3. space battle amazing                     (Score: 0.5259)
4. force awaken                             (Score: 0.5206)
5. low expectation jedi                     (Score: 0.5143)

KeyBERTSentimentReranker
⏱Execution time: 0.72 seconds
------------------------------------------------------------
1. [('space battle amazing', 0.6997), ('force awake