-
Notifications
You must be signed in to change notification settings - Fork 765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional representations did not update with topic reduction #2035
Comments
Strange, it seems that they are updated for some but not all others. If I'm not mistaken, topic 396 is not properly updated right but topic 0 is? Also, can you share your full code along with the versions of your environment? |
Thanks for such a great project and the quick response @MaartenGr! The additional representations does not get updated with the I am running bertopic 0.16.2 on Python 3.10.12. The code: from sentence_transformers import SentenceTransformer
from umap import UMAP
from hdbscan import HDBSCAN
from sklearn.feature_extraction.text import CountVectorizer
from bertopic.representation import KeyBERTInspired, MaximalMarginalRelevance
from bertopic import BERTopic
import pickle
embedding_model = SentenceTransformer("all-mpnet-base-v2")
with open("/content/drive/MyDrive/code_stuff/mpnet_embeddings.pickle", "rb") as pkl:
embeddings = pickle.load(pkl)
umap_model = UMAP(n_neighbors=20, n_components=5, min_dist=0.0,
metric="cosine", random_state=42)
hdbscan_model = HDBSCAN(min_cluster_size=20, min_samples=10,
metric="euclidean", cluster_selection_method="eom",
prediction_data=True)
vectorizer_model = CountVectorizer(stop_words="english", min_df=5, max_df=0.9,
ngram_range=(1, 3))
keybert_model = KeyBERTInspired()
mmr_model = MaximalMarginalRelevance(diversity=0.3)
representation_model = {"KeyBERT": keybert_model,
"MMR": mmr_model}
topic_model = BERTopic(
embedding_model=embedding_model,
umap_model=umap_model,
hdbscan_model=hdbscan_model,
vectorizer_model=vectorizer_model,
representation_model=representation_model,
top_n_words=10,
verbose=True,
)
topics, _ = topic_model.fit_transform(docs, embeddings=embeddings)
topic_model.reduce_topics(docs, nr_topics=400)
topic_model.get_topic_info() |
I'm not sure if I understand correctly. The code you shared does not show loading a saved BERTopic model right? Also, if you need to use |
Sorry, that's my bad. I shared the original code and not the code for the subsequent runs when I loaded the model. Again, it only happens when loading a saved model, so I will be fine. Still looking into the best way to reduce the number of topics for my case as I do want the small clusters if they are distinct enough, that's why I'm looking into merging methods. |
@vidieo Then it might indeed be helpful to start with |
Hi, I am trying to reduce the number of topics that I have with
topic_model.reduce_topics(docs, nr_topics=400)
which works fine. However, when I ran
topic_model.get_topic_info()
I got mismatched representations. Only the main representation was updated and all the other aspects were from the old topics.
I understand the preferred method of controlling topic number is
min_cluster_size
which I did use, but it would be nice to know if I could usereduce_topics
with the additional representations updated. Thanks in advance!The text was updated successfully, but these errors were encountered: