New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support of clustering plot (2D UMAP) #584
Comments
I think that the answer is that BERTopic doesn't 'support' this particular visualization, but it is relatively easy to do on your own. What you need is a 2D representation of the embeddings. The simplest way to do this is to do a 2D reduction on the saved UMAP embeddings within your current model so something like: 2D_UMAP = umap.UMAP(MyBERTopicModel.umap_model.embedding_) Then you can use the output for x, y coordinates for a scatter plot. The above reduction is not going to be very pretty however - because it is a 2D UMAP reduction of the 5D UMAP reduction of the original embeddings. You can get a 'nicer' looking scatter by either creating a TSNE 2D from the umap_model.embedding_ like above - but with TSNE the downside being that it takes longer than UMAP. Alternatively you can get the original embeddings and UMAP reduce down to 2D the way that Maarten did in the original Medium article. Not sure if any of this is helpful. I totally agree that plotting out the embeddings is very useful. It has fundamentally altered how I understand BERTopic. If you want code to do some of the above, you can refer to a github repo I put together as part of the discussion at #582. Hope this is helpful and not too in the weeds. |
Hey Dan! |
@karelin You are in luck! I am almost finished with a function called |
@MaartenGr Awesome! |
@karelin The PR is still currently in the works but I just implemented the pip install --upgrade git+https://github.com/MaartenGr/BERTopic.git@refs/pull/578/merge` Doing so allows you to try it out before the release of the new version. The official release most likely will take a couple more weeks but I will let you know when it is ready! |
Hi ! I've tried the command above to install the branch but it didn't work ... when I do a pip list bertopic is still in 0.10.0 version ? |
I also have this warning WARNING: Did not find branch or tag 'refs/pull/578/merge', assuming revision or ref. |
It seems that there was a character at the end of the link that should have been removed. The install should be as follows: pip install git+https://github.com/MaartenGr/BERTopic.git@refs/pull/578/merge After doing so, you can test it by running something like the following to see if you now have the new features: from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))["data"]
topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(docs)
hierarchical_topics = topic_model.hierarchical_topics(docs, topics) |
Thank you but even without the character, it's not working ... here 's the output : Collecting git+https://github.com/MaartenGr/BERTopic.git@refs/pull/578/merge |
and when I try hierarchical_topics = topic_model.hierarchical_topics(abstract, topics) AttributeError Traceback (most recent call last) AttributeError: 'BERTopic' object has no attribute 'hierarchical_topics' |
I would advise starting from a completely fresh environment and then installing BERTopic via de link provided instead. Then, after installing, make sure to restart the notebook that you are working in. |
I've been able to test the features and I have a request : Thank you again for your AMAZING work ! |
And also on the hierarchical visualization, we don't see the text on hover and we can't click on it either like the non-hierarchical one |
I believe that Plotly does not generate newlines on either carriage returns or line feeds. What might work is using
I just checked the Plotly documentation and from what I can tell this is unfortunately not possible in their current API.
Strange, for me the following is working without any problems: from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))["data"]
topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(docs)
hierarchical_topics = topic_model.hierarchical_topics(docs, topics) Then, visualize the hierarchy with hover: topic_model.visualize_hierarchy(hierarchical_topics=hierarchical_topics) Could you share the code you have been using to get the hierarchical visualization? |
I meant this function;: Run the visualization with the original embeddings hovering doesn't work like in : as for the hover and clickable URL , in the Doc2Map package he used this : def plotly_interactive_map(self, G=None, root=None):
I believe that customdata and hovertemplate could be used in similar manner in scattergl (https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Scattergl.html) Or maybe i'm completely wrong, I've never built python packages before ... Thanks again ! |
That is correct, hovering is turned off by default as you risk memory errors by loading in so many documents. The following should do the trick: topic_model.visualize_hierarchical_documents(
abstract,
hierarchical_topics,
embeddings=embeddings,
hide_document_hover=False
) There are quite a few parameters that you can find in the visualization functions. Going through the docstrings should help quite a bit.
Unfortunately this is not possible at the moment as |
Thank you ! Thanks so much again !!! |
@doubianimehdi |
Seeing this was implemented in v0.11, I will close this issue for now. Feel free to ping me if you want to continue this discussion. |
@MaartenGr Hi ! Thank you for your wonderful work ! I was getting back to this implementation because I wanted to do a visualization similar to this : https://get.carrotsearch.com/foamtree/latest/demos/large.html But for that I have to use this : https://get.carrotsearch.com/foamtree/latest/api/ I'm not a front end man at all ... unfortunately ... I was wondering if you or some talented member of this community, could do this or help to do this ? Thank you so much again ! |
@doubianimehdi If you want to keep it straightforward, then you can also use plotly for this as it has implemented Treemaps. Other than that, I am not familiar with carrotsearch unfortunately. |
@MaartenGr Thanks ! That's what I was thinking for my Proof of Concept phase ... but later the beautiful interface of carrotsearch would be a good addition to my final product ! |
@MaartenGr I'm having a hard time seeing how I can use the hierarchical topics dataframe to adapt it to a treemap ... could you give me some clue to achieve this ? Thank you ! |
@doubianimehdi No problem, it is just a few lines of code to get this working: # Prepare children
children_left = (hierarchical_topics.Child_Left_ID + "_" + hierarchical_topics.Child_Left_Name).tolist()
children_right = (hierarchical_topics.Child_Right_ID + "_" + hierarchical_topics.Child_Right_Name).tolist()
children = children_left + children_right
# Prepare parents
parents = (hierarchical_topics.Parent_ID + "_" + hierarchical_topics.Parent_Name).tolist()
parents = parents + parents
# Plot treemap
import plotly.express as px
fig = px.treemap(names = children, parents = parents)
fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show() |
@MaartenGr Thank you ! I want to go further and make the full hierarchy with a slider to navigate through the level of topics ... what you it take to do it ? |
@doubianimehdi I am not sure whether something like that is possible. You would have to dive into the source code of plotly I think. |
@MaartenGr https://towardsdatascience.com/make-a-treemap-in-python-426cee6ee9b8 it's possible but the structure of hierarchical topics is confusing to me ... i'm having a hard time translating it ... |
@doubianimehdi If you follow along with that tutorial and use the code I shared above, I think it might be possible. You would have to try some things out yourself first. Do note though that the widget is jupyter-specific module and not part of plotly. |
Hi @MaartenGr I've done some tests ... i'm almost there but I can't wrap my head around something :
Then i'm using this snippet of code : def generate_treemap(level): max_level = tree_df['Level'].max() fig = go.Figure(figures[0]) for level in range(1, max_level + 1): steps = [] sliders = [dict( fig.update_layout(sliders=sliders) It works but the slider is not making the nesting and level change ... Can you help ? |
I am not entirely sure but based on the Plotly documentation it seems that you will have to do an |
I DID IT ! def create_treemap_data(level): max_level = tree_df['Level'].max() fig_dict = { Create frames for each levelfor level in range(1, max_level + 1): Create slider stepssteps = [] Configure slidersliders = [{"active": 0, "steps": steps, "x": 0.1, "y": 0, "len": 0.9}] Add slider to layoutfig_dict["layout"]["sliders"] = sliders Create figure from the dictionaryfig = go.Figure(fig_dict) fig.show()` It works because it handles the transition and animation when you move the slider |
Great! Glad to hear that you found the solution. |
@MaartenGr thank you for your help ! that would be great to have a visualization like that in bertopic :) |
@doubianimehdi I cannot make any promises as I do not want to depend too much on plotly since it might be replaced in the future with a different plotting library but I definitely keep it in mind! |
Hi there,
Just wandering, if the current version of BERTopic supports 2D UMAP plot with clustering, like first plot in original post https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6
Didn't find such plot in documentation, but it could be rather useful in analysis of document collection.
The text was updated successfully, but these errors were encountered: