Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document LDA-related models #2026

Merged
merged 29 commits into from
Jun 20, 2018
Merged

Conversation

steremma
Copy link
Contributor

In this PR I will add documentation for the LDA model as well as all other related modules such as:

  • ldamodel.py
  • ldaseqmodel.py
  • ldamulticore.py
  • callbacks.py

@menshikh-iv menshikh-iv added the incubator project PR is RaRe incubator project label Apr 10, 2018
@@ -18,63 +18,84 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing module docstring with an example of callback usage


Parameters
----------
**parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type missing (for all kwargs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do this on purpose because the type can be anything in this function.

viz_env : object, optional
Visdom environment to use for plotting the graph. Unused.
title : str, optional
Title of the graph plot in case `logger == 'visdom'`. Unused.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing empty line at the end of docstring (here and everywhere)

"""
Args:
model : Trained topic model
"""""Get the coherence score.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

""""" - too much, should be 3

@@ -354,7 +477,9 @@ class CallbackAny2Vec(object):
... model.save(output_path)
... self.epoch += 1
...
>>>

#. Create a callback to print logging information to the console:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better move this part to module docstring.

@@ -82,53 +98,70 @@ def update_dir_prior(prior, N, logphat, rho):


class LdaState(utils.SaveLoad):
"""
Encapsulate information for distributed computation of LdaModel objects.
"""Encapsulate information for distributed computation of LdaModel objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed :class:...LdaModel


Objects of this class are sent over the network, so try to keep them lean to
reduce traffic.

"""

def __init__(self, eta, shape, dtype=np.float32):
"""Initialize the state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to write "Initialize the state." always, this is obvious (here and everywhere)


>>> doc_lda = lda[doc_bow]

The model can be updated (trained) with new documents via
The model can be updated (trained) with new documents.

>>> lda.update(other_corpus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other_corpus not defined

----------
prior : {str, list of float, np.ndarray of float, float}
A-priori belief on word probability. If `name` == 'eta' then the prior can be:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check please that this rendered exactly as you expect

@menshikh-iv menshikh-iv added this to To Do in Documentation via automation Apr 13, 2018

Examples
--------
To implement a Callback, inherit from this base class and override one or more of its methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add example with LDA + perplexity

"""
for parameter, value in parameters.items():
setattr(self, parameter, value)

def get_value(self):
"""TODO: According to my understanding this method MUST be override. I propose to make ti abstract, or at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings
--------
you must override this method

"""
Metric class for coherence evaluation
"""
"""Metric class for coherence evaluation.s """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also -> CoherenceModel

@@ -50,6 +50,7 @@

logger = logging.getLogger('gensim.models.ldamodel')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update module docstring (+ example, update current description)

The number of documents is strected in both state objects, so that they are of comparable magnitude.
This procedure corresponds to the stochastic gradient update from
`Hoffman et al. :"Online Learning for Latent Dirichlet Allocation"
<https://www.di.ens.fr/~fbach/mdhnips2010.pdf>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

algorithm number (eq?)

Examples
--------

#. Train an LDA model using a gensim corpus:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update example of class docstring, remove it from here

@@ -62,21 +62,37 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update module docstring (here and everywhere)

-------
list of float
Probabilities for each topic in the mixture. This is essentially a point in the `num_topics - 1` simplex.

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update other classes please


#. Persist model to disk:

>>> ldaseq.save("ldaseq")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add also "how to get an vectors" to example, this is important always (for all examples)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean how to get the vector representation of a single document right? I will add an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to resolve PEP8 issues (see TravisCI) and documentation build issues (see CircleCI)

... ['survey', 'response', 'eps'],
... ['human', 'system', 'computer']
... ]
>>> other_dictionary = Dictionary(other_texts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a serious mistake, you should use only one dictionary for vectorization: here you should re-use dictionary that was used to create common_corpus or (better) create it yourself from start, convert common_texts, etc.

Same mistake in other files.

@menshikh-iv menshikh-iv moved this from To Do to In Progress (common) in Documentation Jun 20, 2018
@menshikh-iv menshikh-iv changed the title [DNM] Document LDA Document LDA-related models Jun 20, 2018
@menshikh-iv menshikh-iv merged commit e455bc8 into piskvorky:develop Jun 20, 2018
Documentation automation moved this from In Progress (common) to Done Jun 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incubator project PR is RaRe incubator project
Projects
No open projects
Documentation
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants