Skip to content

Commit

Permalink
Gensim sparse corpus fix (#82)
Browse files Browse the repository at this point in the history
* updated gensim prepare logic to allow csc_corpus as an input
* patched gensim.prepare to properly accept corpuses in csc_matrix format
  • Loading branch information
Alex Loosley authored and bmabey committed Feb 7, 2017
1 parent 4dc9b0e commit c362ddb
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion pyLDAvis/gensim.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,11 @@ def _extract_data(topic_model, corpus, dictionary, doc_topic_dists=None):
import gensim

if not gensim.matutils.ismatrix(corpus):
corpus_csc = gensim.matutils.corpus2csc(corpus)
corpus_csc = gensim.matutils.corpus2csc(corpus, num_terms=len(dictionary))
else:
corpus_csc = corpus
# Need corpus to be a streaming gensim list corpus for len and inference functions below:
corpus = gensim.matutils.Sparse2Corpus(corpus_csc)

vocab = list(dictionary.token2id.keys())
# TODO: add the hyperparam to smooth it out? no beta in online LDA impl.. hmm..
Expand Down

0 comments on commit c362ddb

Please sign in to comment.