Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ddangelov committed Mar 23, 2020
1 parent b060bd4 commit c33e6c5
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ attracted the documents to the dense area are the topic words.

**The Algorithm:**

1. Create jointly embedded document and word vectors using [Doc2Vec](https://radimrehurek.com/gensim/models/doc2vec.html).
1. Create jointly embedded document and word vectors using [Doc2Vec](https://radimrehurek.com/gensim/models/doc2vec.html).
![Joint Document and Word Embedding](images/doc_word_embedding.svg)
2. Create lower dimensional embedding of document vectors using [UMAP](https://github.com/lmcinnes/umap)
3. Find dense areas of documents using [HDBSCAN](https://github.com/scikit-learn-contrib/hdbscan)
4. For each dense area calculate centroid of document vectors in original dimension. (centroid = topic vector)
Expand Down

0 comments on commit c33e6c5

Please sign in to comment.