
# Topic Modeling on Article Texts using BERTopic

This notebook demonstrates how to use the BERTopic model for topic modeling on a collection of texts. We apply it to a DataFrame column called `full_text` and generate multiple visualizations to explore the topics and their distributions.



> 💡 Note: Make sure `bertopic`, `umap-learn`, `hdbscan`, and `sentence-transformers` are installed in your environment.



## 📦 Import Required Libraries

We import:
- `pandas` for loading the dataset
- `BERTopic` for topic modeling
- `matplotlib` and `plotly` for visualizations


In [1]:

import pandas as pd
from bertopic import BERTopic
import matplotlib.pyplot as plt
import plotly.io as pio

pio.renderers.default = 'notebook'


ModuleNotFoundError: No module named 'bertopic'


## 📄 Load Dataset

We load the dataset from a CSV file named `data/parsed_data_with_LLM_summaries.csv` which is expected to contain a `full_text` column.


In [None]:

df = pd.read_csv("NLS_wit_LLM_summaries.csv")
documents = df['full_text'].dropna().astype(str).tolist()
print(f"Loaded {len(documents)} documents.")



## 🧠 Create and Fit the BERTopic Model

We create a BERTopic model instance and fit it on the list of documents. This step may take some time depending on the dataset size.


In [None]:

topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(documents)



## 🔍 View Top Topics

We display the most frequent topics and their keywords using `get_topic_info()`.


In [None]:

topic_info = topic_model.get_topic_info()
topic_info.head()



## 📊 Visualize Topic Distribution

We use BERTopic’s built-in interactive visualization to explore topic frequency and relationships.


In [None]:

topic_model.visualize_topics()



## 🔥 Visualize Topic Similarity (Heatmap)

This heatmap shows how semantically similar the topics are to one another.


In [None]:

topic_model.visualize_heatmap()



## 🌳 Visualize Topic Hierarchy

This dendrogram groups topics into a hierarchy based on their semantic relationships.


In [None]:

topic_model.visualize_hierarchy()



## 🧮 Visualize Term Scores per Topic

We can inspect term importance for a specific topic using `visualize_barchart`.


In [None]:

# Example: topic_model.visualize_barchart(topic=1)
topic_model.visualize_barchart()
