Skip to content

Commit

Permalink
DOC: Improve documentation and correct numpydoc errors
Browse files Browse the repository at this point in the history
  • Loading branch information
Allen Riddell committed Sep 26, 2014
1 parent c8c226e commit 8559427
Show file tree
Hide file tree
Showing 6 changed files with 54 additions and 17 deletions.
22 changes: 15 additions & 7 deletions README.rst
Expand Up @@ -3,17 +3,17 @@ lda: Topic modeling with latent Dirichlet allocation

|pypi| |travis| |crate|

Topic modeling with latent Dirichlet allocation. ``lda`` aims for simplicity.

``lda`` implements latent Dirichlet allocation (LDA) using collapsed Gibbs
sampling. LDA is described in `Blei et al. (2003)`_ and `Pritchard et al. (2000)`_.
sampling. ``lda`` is fast and is tested on Linux, OS X, and Windows.

Installation
------------

If you have NumPy installed,

``pip install lda``

Installing ``lda`` is tested on Linux, OS X, and Windows.
Installation does not require a compiler on Windows or OS X.

Getting started
---------------
Expand All @@ -22,7 +22,8 @@ Getting started
conventions found in scikit-learn_.

The following demonstrates how to inspect a model of a subset of the Reuters
news dataset.
news dataset. The input below, ``X``, is a document-term matrix (sparse matrices
are accepted).

.. code-block:: python
Expand All @@ -35,7 +36,7 @@ news dataset.
>>> X.shape
(395, 4258)
>>> model = lda.LDA(n_topics=20, n_iter=500, random_state=1)
>>> model.fit(X)
>>> model.fit(X) # model.fit_transform(X) is also available
>>> topic_word = model.topic_word_ # model.components_ also works
>>> n_top_words = 8
>>> for i, topic_dist in enumerate(topic_word):
Expand Down Expand Up @@ -99,6 +100,13 @@ Unlike ``lda``, hca_ can use more than one processor at a time. Both MALLET_ and
hca_ implement topic models known to be more robust than standard latent
Dirichlet allocation.

Notes
-----

Latent Dirichlet allocation is described in `Blei et al. (2003)`_ and `Pritchard
et al. (2000)`_. Inference using collapsed Gibbs sampling is described in
`Griffiths and Steyvers (2004)`_.

Important links
---------------

Expand All @@ -125,7 +133,7 @@ lda is licensed under Version 2.0 of the Mozilla Public License.
.. _Cython: http://cython.org
.. _Blei et al. (2003): http://jmlr.org/papers/v3/blei03a.html
.. _Pritchard et al. (2000): http://www.genetics.org/content/164/4/1567.full

.. _Griffiths and Steyvers (2004): http://www.pnas.org/content/101/suppl_1/5228.abstract

.. |pypi| image:: https://badge.fury.io/py/lda.png
:target: https://badge.fury.io/py/lda
Expand Down
16 changes: 16 additions & 0 deletions bench/README.rst
@@ -0,0 +1,16 @@
================
Benchmarking lda
================

This directory contains scripts to compare the running time of ``lda`` against
hca_. hca_ is written entirely in C.

To run ``bench_hca`` you will need to have hca_ on your path.

The test uses the following settings for hca_

- 100 topics
- 100 iterations
- Latent Dirichlet allocation (used automatically with ``-A<float>`` and ``-B<float>``)

.. _hca: http://www.mloss.org/software/view/527/
2 changes: 0 additions & 2 deletions doc/source/api.rst
Expand Up @@ -9,12 +9,10 @@ lda.lda
.. automodule:: lda.lda
:members:
:undoc-members:
:show-inheritance:

lda.utils
---------

.. automodule:: lda.utils
:members:
:undoc-members:
:show-inheritance:
4 changes: 2 additions & 2 deletions doc/source/index.rst
Expand Up @@ -3,8 +3,8 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
lda: Topic modeling using latent Dirichlet Allocation
=====================================================
lda: Topic modeling with latent Dirichlet Allocation
====================================================

Contents:

Expand Down
19 changes: 17 additions & 2 deletions doc/source/installation.rst
Expand Up @@ -4,29 +4,44 @@
Installing lda
==============

lda requires Python (>= 2.7 or >= 3.3) and NumPy (>= 1.6.1).
lda requires Python (>= 2.7 or >= 3.3) and NumPy (>= 1.6.1). If these
requirements are satisfied, lda should install successfully with::

pip install lda

If you encounter problems, consult the platform-specific instructions below.

Windows
-------

First you need to install `numpy <http://numpy.scipy.org/>`_ from the official
installer.

.. FIXME: update this when Numpy has Windows wheels available
Wheel packages (.whl files) for lda from `PyPI
<https://pypi.python.org/pypi/lda`_ can be installed with the `pip
<https://pypi.python.org/pypi/lda>`_ can be installed with the `pip
<http://pip.readthedocs.org/en/latest/installing.html>`_ utility. Open
a console and type the following to install lda::

pip install lda

.. FIXME: remove the following when Python 3.3 is no longer widely used
In order to use wheels, you will need to have pip version 1.4 or higher and
setuptools version 0.8 or higher.

Mac OS X
--------

lda and its dependencies are all available as wheel packages for Mac OS X::

pip install numpy lda

.. FIXME: remove the following when Python 3.3 is no longer widely used
In order to use wheels, you will need to have pip version 1.4 or higher and
setuptools version 0.8 or higher.

Linux
-----
Expand Down
8 changes: 4 additions & 4 deletions lda/utils.py
Expand Up @@ -102,8 +102,8 @@ def dtm2ldac(dtm, offset=0):
-------
doclines : iterable of LDA-C lines suitable for writing to file
Note
----
Notes
-----
If a format similar to SVMLight is desired, `offset` of 1 may be used.
"""
try:
Expand Down Expand Up @@ -139,8 +139,8 @@ def ldac2dtm(stream, offset=0):
-------
dtm : array of shape N,V
Note
----
Notes
-----
If a format similar to SVMLight is the source, an `offset` of 1 may be used.
"""
doclines = stream
Expand Down

0 comments on commit 8559427

Please sign in to comment.