Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KeyedVectors, FastText and CoherenceModel to API reference. #1193

Merged
merged 4 commits into from
Mar 7, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/src/apiref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,16 @@ Modules:
models/lda_worker
models/atmodel
models/word2vec
models/keyedvectors
models/doc2vec
models/phrases
models/coherencemodel
models/wrappers/ldamallet
models/wrappers/dtmmodel
models/wrappers/ldavowpalwabbit.rst
models/wrappers/wordrank
models/wrappers/varembed
models/wrappers/fasttext
similarities/docsim
similarities/index
topic_coherence/aggregation
Expand Down
9 changes: 9 additions & 0 deletions docs/src/models/keyedvectors.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:mod:`models.keyedvectors` -- Store and query word vectors
==========================================================

.. automodule:: gensim.models.keyedvectors
:synopsis: Store and query word vectors
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
9 changes: 9 additions & 0 deletions docs/src/models/wrappers/fasttext.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:mod:`models.wrappers.fasttext` -- FastText Word Embeddings
===========================================================

.. automodule:: gensim.models.wrappers.fasttext
:synopsis: FastText Embeddings
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
7 changes: 0 additions & 7 deletions docs/src/topic_coherence/topic_coherence.rst

This file was deleted.

28 changes: 15 additions & 13 deletions gensim/models/coherencemodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@

"""
Module for calculating topic coherence in python. This is the implementation of
the four stage topic coherence pipeline from the paper [1].
the four stage topic coherence pipeline from the paper [1]_.
The four stage pipeline is basically:

Segmentation -> Probability Estimation -> Confirmation Measure -> Aggregation.

Implementation of this pipeline allows for the user to in essence "make" a
coherence measure of his/her choice by choosing a method in each of the pipelines.

[1] Michael Roeder, Andreas Both and Alexander Hinneburg. Exploring the space of topic
.. [1] Michael Roeder, Andreas Both and Alexander Hinneburg. Exploring the space of topic
coherence measures. http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf.
"""

Expand Down Expand Up @@ -75,15 +75,17 @@ class CoherenceModel(interfaces.TransformationABC):
2. the ``get_coherence()`` method, which returns the topic coherence.

One way of using this feature is through providing a trained topic model. A dictionary has to be explicitly
provided if the model does not contain a dictionary already.
>>> cm = CoherenceModel(model=tm, corpus=corpus, coherence='u_mass') # tm is the trained topic model
>>> cm.get_coherence()
provided if the model does not contain a dictionary already::

Another way of using this feature is through providing tokenized topics such as:
>>> topics = [['human', 'computer', 'system', 'interface'],
cm = CoherenceModel(model=tm, corpus=corpus, coherence='u_mass') # tm is the trained topic model
cm.get_coherence()

Another way of using this feature is through providing tokenized topics such as::

topics = [['human', 'computer', 'system', 'interface'],
['graph', 'minors', 'trees', 'eps']]
>>> cm = CoherenceModel(topics=topics, corpus=corpus, dictionary=dictionary, coherence='u_mass') # note that a dictionary has to be provided.
>>> cm.get_coherence()
cm = CoherenceModel(topics=topics, corpus=corpus, dictionary=dictionary, coherence='u_mass') # note that a dictionary has to be provided.
cm.get_coherence()

Model persistency is achieved via its load/save methods.
"""
Expand All @@ -94,11 +96,11 @@ def __init__(self, model=None, topics=None, texts=None, corpus=None, dictionary=
model : Pre-trained topic model. Should be provided if topics is not provided.
Currently supports LdaModel, LdaMallet wrapper and LdaVowpalWabbit wrapper. Use 'topics'
parameter to plug in an as yet unsupported model.
topics : List of tokenized topics. If this is preferred over model, dictionary should be provided.
eg. topics = [['human', 'machine', 'computer', 'interface'],
topics : List of tokenized topics. If this is preferred over model, dictionary should be provided. eg::
topics = [['human', 'machine', 'computer', 'interface'],
['graph', 'trees', 'binary', 'widths']]
texts : Tokenized texts. Needed for coherence models that use sliding window based probability estimator.
eg. texts = [['system', 'human', 'system', 'eps'],
texts : Tokenized texts. Needed for coherence models that use sliding window based probability estimator, eg::
texts = [['system', 'human', 'system', 'eps'],
['user', 'response', 'time'],
['trees'],
['graph', 'trees'],
Expand Down
11 changes: 5 additions & 6 deletions gensim/models/doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -606,12 +606,11 @@ def __init__(self, documents=None, dm_mean=None,
doc-vector training; default is 0 (faster training of doc-vectors only).

`trim_rule` = vocabulary trimming rule, specifies whether certain words should remain
in the vocabulary, be trimmed away, or handled using the default (discard if word count < min_count).
Can be None (min_count will be used), or a callable that accepts parameters (word, count, min_count) and
returns either util.RULE_DISCARD, util.RULE_KEEP or util.RULE_DEFAULT.
Note: The rule, if given, is only used prune vocabulary during build_vocab() and is not stored as part
of the model.

in the vocabulary, be trimmed away, or handled using the default (discard if word count < min_count).
Can be None (min_count will be used), or a callable that accepts parameters (word, count, min_count) and
returns either util.RULE_DISCARD, util.RULE_KEEP or util.RULE_DEFAULT.
Note: The rule, if given, is only used prune vocabulary during build_vocab() and is not stored as part
of the model.
"""

super(Doc2Vec, self).__init__(
Expand Down