Skip to content

Commit

Permalink
Merge branch 'doc_devel'. Closes statsmodels#478.
Browse files Browse the repository at this point in the history
* doc_devel:
  DOC: Cherry picked most of Josefs recent cleanups
  DOC: dev/git_notes.rst minor edits
  DOC: datasets
  DOC: simplify dev/examples.rst
  DOC: package_overview content split and sent to relevant files
  DOC: more precise patch submission process
  DOC: pitfalls.rst reorg
  DOC: importpaths.rst reorg
  • Loading branch information
jseabold committed Nov 13, 2012
2 parents bf674da + 2788022 commit 1ff3a0b
Show file tree
Hide file tree
Showing 10 changed files with 412 additions and 297 deletions.
53 changes: 29 additions & 24 deletions docs/source/dev/dataset_notes.rst
Original file line number Diff line number Diff line change
@@ -1,32 +1,34 @@
.. _add_data:

Datasets
~~~~~~~~
========

For details about the datasets, please see the :ref:`datasets page <datasets>`.
For a list of currently available datasets and usage instructions, see the
:ref:`datasets page <datasets>`.

Adding a dataset
================
License
-------

First, if the data is not in the public domain or listed with a BSD-compatible
license, we must obtain permission from the original author.
To be considered for inclusion in `statsmodels`, a dataset must be in the
public domain, distributed under a BSD-compatible license, or we must obtain
permission from the original author.

To take an example, I will use the Nile River data that measures the volume of
the discharge of the Nile River at Aswan for the years 1871 to 1970. The data
are copied from the paper of Cobb (1978).
Adding a dataset: An example
----------------------------

Create a directory `datasets/nile/`. Add `datasets/nile/nile.csv` and
`datasets/__init__.py` that contains ::
The Nile River data measures the volume of the discharge of the Nile River at
Aswan for the years 1871 to 1970. The data are copied from the paper of Cobb
(1978).

**Step 1**: Create a directory `datasets/nile/`

**Step 2**: Add `datasets/nile/nile.csv` and a new file `datasets/__init__.py` which contains ::

from data import *

If the data will be cleaned before it is in the form included in the datasets
package then create a `nile/src` directory and include the original raw data
there. In this case, it's not necessary.
**Step 3**: If `nile.csv` is a transformed/cleaned version of the original data, create a `nile/src` directory and include the original raw data there. In the `nile` case, this step is not necessary.

Next, copy the template_data.py to nile and rename it data.py. Edit the data.py
as follows. Fill in the strings for COPYRIGHT, TITLE, SOURCE, DESCRSHORT,
DESCLONG, and NOTE. ::
**Step 4**: Copy `datasets/template_data.py` to `nile/data.py`. Edit `nile/data.py` by filling-in strings for COPYRIGHT, TITLE, SOURCE, DESCRSHORT, DESCLONG, and NOTE. ::

COPYRIGHT = """This is public domain."""
TITLE = """Nile River Data"""
Expand All @@ -53,12 +55,15 @@ DESCLONG, and NOTE. ::
set is also used as an example in many textbooks and software packages.
"""

Next we edit the `load` function. You only need to edit the docstring to
specify which dataset will be loaded. You should also edit the path and the
indices for the `endog` and `exog` attributes. In this case, there is no
`exog`, so everything referencing `exog` is not used. The `year` variable is
also not used.
**Step 5:** Edit the docstring of the `load` function in `data.py` to specify
which dataset will be loaded. Also edit the path and the indices for the
`endog` and `exog` attributes. In the `nile` case, there is no `exog`, so
everything referencing `exog` is not used. The `year` variable is also not
used.

**Step 6:** Edit the `datasets/__init__.py` to import the directory.

Lastly, edit the datasets/__init__.py to import the directory.
That's it! The result can be found `here
<https://github.com/statsmodels/statsmodels/tree/master/statsmodels/datasets/nile>`_
for reference.

That's it!
50 changes: 33 additions & 17 deletions docs/source/dev/examples.rst
Original file line number Diff line number Diff line change
@@ -1,27 +1,43 @@
.. _examples:

Statsmodels Examples
====================
Examples
========

Examples go in the top-level examples directory. Let's try to have documentation
and tutorials for as many models and code uses as possible! These are invaluable
for new users to get up and running. These can also be Cookbook recipes, but there is no wiki yet. For the most part these are just runnable example scripts. However, when the documentation is built, these are converted into ReST files and included in the documentation. There is a bit of magic that can be used to make these look nice.
Examples are invaluable for new users who hope to get up and running quickly
with `statsmodels`, and they are extremely useful to those who wish to explore
new features of `statsmodels`. We hope to provide documentation and tutorials
for as many models and use-cases as possible!

reStructured Text
~~~~~~~~~~~~~~~~~
Most user-contributed examples/tutorials/recipes should be placed on the
`statsmodels examples wiki page
<https://github.com/statsmodels/statsmodels/wiki/Examples:-user-contributions>`_
That wiki page is freely editable. Please post your cool tricks,
examples, and recipes on there!

Every example file must have a module level docstring. This docstring should contain the tile of the example, and that's it. You can include ReST markup in the files as comments. Anything that is commented out will be rendered as ReST with a few exceptions noted below. If you want a true comment in the outputed file, then you should use ``#..``. The hash symbol is stripped leaving ``..``, ReST markup for a comment line.
If you would rather have your example file officially accepted to the
`statsmodels` distribution and posted on this website, you will need to go
through the normal `patch submission process <index.html#submitting-a-patch>`_.

Code Snippets
~~~~~~~~~~~~~
File Format
~~~~~~~~~~~

Code snippets are rendered using the :ref:`ipython_directive` for Sphinx. See
the documentation for explaining its usage in greater detail. Some of it is
explained in :ref:`special_markup`.
Examples are simple runnable python scripts that go in the top-level examples
directory. We use the `ipython_directive for Sphinx
<http://ipython.org/ipython-doc/dev/development/ipython_directive.html>`_ to
convert them automatically to `reStructuredText
<http://docutils.sourceforge.net/rst.html>`_ and html at build time.

.. _special_markup:
Each line of the script is executed; both the python code and the printed
results are shown in the output file. Lines that are commented out using the
hash symbol ``#`` are rendered as reST markup.

Special Markup
~~~~~~~~~~~~~~
**Comments**: "True" comments that should not appear in the output file should be written on lines that start with ``#..``.

**Error handling**: Syntax errors in pure Python will raise an error during the build process. If you need to show a SyntaxError, an alternative would be to provide a verbatim copy of an IPython session encased in a ReST code block instead of pure Python code.

**Suppressing lines**: To suppress a line in the built documentation, follow it with a semicolon.

**Figures**: To save a figure, prepend the line directly before the plotting command with ``#@savefig file_name.png width=4in``, for example. You do not need to call show or close.

**IPython magics**: You can use IPython magics by writing a line like this: ``#%timeit X = np.empty((1000,1000))``.

Pretty much anything you can do with the IPython directive is supported for the example scripts. The only thing that is not well supported is error handling of SyntaxErrors. Syntax errors in pure Python will raise an error during the build process. You could provide an IPython session instead of pure Python if you want to show a SyntaxError for some reason. Other than this, to suppress a line in the built documentation, follow it with a semicolon. To save a figure, prepend the line directly before the plotting command with ``#@savefig file_name.png width=4in``, for example. You don't need to call show or close. You can also call IPython magic functions. So if you wanted to include some timings you could have a line ``#%timeit X = np.empty((1000,1000))``.
Loading

0 comments on commit 1ff3a0b

Please sign in to comment.