Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ddangelov committed Mar 23, 2020
1 parent d3ac5c9 commit c12fa6e
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,17 +45,17 @@ attracted the documents to the dense area are the topic words.
**3. Find dense areas of documents using [HDBSCAN](https://github.com/scikit-learn-contrib/hdbscan).**
>The colored areas are the dense areas of documents. Red points are outliers that do not belong to a specific cluster.
![HDBSCAN Document Clusters](images/hdbscan_docs.png)
![HDBSCAN Document Clusters](https://github.com/ddangelov/Top2Vec/blob/master/images/hdbscan_docs.png)

**4. For each dense area calculate the centroid of document vectors in original dimension, this is the topic vector.**
>The red points are outlier documents and do not get used for calculating the topic vector. The purple points are the document vectors that belong to a dense area, from which the topic vector is calculated.
![Topic Vector](images/topic_vector.svg)
![Topic Vector](https://github.com/ddangelov/Top2Vec/blob/master/images/topic_vector.svg)

**5. Find n-closest word vectors to the resulting topic vector**
>The closest word vectors in order of proximity become the topic words.
![Topic Words](images/topic_words.svg)
![Topic Words](https://github.com/ddangelov/Top2Vec/blob/master/images/topic_words.svg)

Installation
------------
Expand Down

0 comments on commit c12fa6e

Please sign in to comment.