Add getting started guide

andycasey · Mar 6, 2017 · fe5dae9 · fe5dae9
1 parent 69272c5
commit fe5dae9
Show file tree

Hide file tree

Showing 8 changed files with 275 additions and 42 deletions.
diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -0,0 +1,4 @@
+.. _api:
+
+API
+===
diff --git a/docs/source/getting-started/a-simple-example.rst b/docs/source/getting-started/a-simple-example.rst
diff --git a/docs/source/getting-started/install.rst b/docs/source/getting-started/install.rst
diff --git a/docs/source/guide.rst b/docs/source/guide.rst
@@ -0,0 +1,155 @@
+.. _guide:
+
+Getting Started Guide
+=====================
+
+Before we get started, you should know that the following ingredients are required to run The Cannon: 
+
+ - a *training set* of stars with known labels (e.g., stellar parameters and chemical abundances),
+ - pseudo-continuum-normalized spectra of stars in the training set, with all spectra sampled onto the same wavelength points,
+ - some *test set* spectra that you want to derive labels from, which has been processed the same way as the training set spectra.
+
+In this guide we will provide you with the training set labels and spectra, and the test set spectra. If you want more information about `constructing a training set <tutorials.html#constructing-a-training-set>`_ or `continuum-normalizing your spectra <tutorials.html#continuum-normalization>`_, see the linked tutorials. 
+
+.. note:: We use `Travis continuous integration <https://travis-ci.org/andycasey/AnniesLasso>`_ to test every change to The Cannon in Python versions 2.7, 3.5, and 3.6. The code examples here should work in any of these Python versions. 
+
+
+In this guide we will train a model using `APOGEE DR12 <http://www.sdss.org/dr12/irspec/>`_ spectra and `ASPCAP <http://www.sdss.org/dr12/irspec/parameters/>`_ labels to derive effective temperature :math:`T_{\rm eff}`, surface gravity :math:`\log{g}`, and four individual chemical abundances (:math:`[{\rm Fe}/{\rm H}]`, :math:`[{\rm Na}/{\rm H}]`, :math:`[{\rm Ti}/{\rm H}]`, :math:`[{\rm Ni}/{\rm H}]`). These spectra have been pseudo-continuum-normalized using a sum of sine and cosine functions (which is a different process `ASPCAP <http://www.sdss.org/dr12/irspec/parameters/>`_ uses for normalization), and individual visits have been stacked.
+
+Here we won't use any `regularization <tutorials.html#regularization>`_ or `wavelength censoring <tutorials.html#censoring>`_, but these can be applied at the end, and the ``CannonModel`` object can be retrained to make use of regularization and/or censoring by using the ``.train()`` function.
+
+Downloading the data
+--------------------
+
+You can download the required data for this guide using the following command:
+
+::
+
+    wget zenodo-link #TODO  
+
+Creating a model
+----------------
+
+After you have `installed The Cannon <install>`_, you can use the following Python code to construct a ``CannonModel`` object:
+
+
+.. code-block:: python
+    :linenos:
+
+    from astropy.table import Table 
+    from six.moves import cPickle as pickle 
+    from sys import version_info
+
+    import thecannon as tc
+
+    # Load the training set labels.
+    training_set_labels = Table.read("apogee-dr12-training-set-labels.fits")
+
+    # Load the training set spectra.
+    pkl_kwds = dict(encoding="latin-1") if version_info[0] >= 3 else {}
+    with open("apogee-dr12-training-set-spectra.pkl", "rb") as fp:
+        training_set_flux, training_set_ivar = pickle.load(fp, **pkl_kwds)
+
+    # Specify the labels that we will use to construct this model.
+    label_names = ("TEFF", "LOGG", "FE_H", "NA_H", "TI_H", "NI_H")
+ 
+    # Construct a CannonModel object using a quadratic (O=2) polynomial vectorizer.
+    model = tc.CannonModel(
+        training_set_labels, training_set_flux, training_set_ivar,
+        vectorizer=tc.vectorizer.PolynomialVectorizer(label_names, 2))
+
+
+Let's check the model configuration:
+
+.. code-block:: python
+
+    >>> print(model)
+    <tc.model.CannonModel of 6 labels with a training set of 14141 stars each with 8575 pixels>
+
+    >>> print(model.vectorizer.human_readable_label_vector)
+    1 + TEFF + LOGG + FE_H + NA_H + TI_H + NI_H + TEFF^2 + LOGG*TEFF + FE_H*TEFF + NA_H*TEFF + TEFF*TI_H + NI_H*TEFF + LOGG^2 + FE_H*LOGG + LOGG*NA_H + LOGG*TI_H + LOGG*NI_H + FE_H^2 + FE_H*NA_H + FE_H*TI_H + FE_H*NI_H + NA_H^2 + NA_H*TI_H + NA_H*NI_H + TI_H^2 + NI_H*TI_H + NI_H^2
+
+    # This model has no regularization.
+    >>> print(model.regularization)
+    None
+
+    # This model includes no censoring.
+    >>> print(model.censors)
+    None
+
+
+The training step
+-----------------
+
+The model configuration matches what we expected, so let's train the model and make it useful:
+
+.. code-block:: python
+
+    >>> theta, s2, metadata = model.train(threads=1)
+    2017-03-06 14:18:40,920 [INFO] Training 6-label CannonModel with 14141 stars and 8575 pixels/star
+    [====================================================================================================] 100% (147s) 
+
+
+This model took about two minutes to train on a single core. The ``.train()`` function returns the :math:`\theta` coefficients, the noise residuals :math:`s^2`, and metadata associated with the training of each pixel. The :math:`\theta` coefficients and scatter terms :math:`s^2` are also accessible through the ``.theta`` and ``.s2`` attributes, respectively.
+
+.. code-block:: python
+    :linenos:
+
+    import matplotlib.pyplot as plt
+
+    fig, ax = plt.subplots()
+    ax.plot(model.theta.T[0], c='b')
+    ax.set_xlabel(r'Pixel')
+    ax.set_ylabel(r'$\theta_0$')
+
+    # Alternatively, you can use the convenient plotting functions:
+    fig_theta = tc.plot.theta(model, indices=0)
+
+    fig_s2 = tc.plot.s2(model)
+
+# TODO --> Theta and s2 figures
+
+The test step
+-------------
+
+The trained model can now be used to run the test step against all APOGEE spectra. First, we will run the test step *on the training set spectra* as a sanity check to ensure we can approximately recover the ASPCAP labels.
+
+.. code-block:: python
+    :linenos:
+
+    test_labels = model.test(training_set_flux, training_set_ivar, threads=1)
+
+    # Plot a comparison between the ASPCAP labels and the labels returned at the test step.
+    fig_comparison = tc.plot.one_to_one(model, test_labels)
+    
+
+Saving the model to disk
+------------------------
+
+All ``CannonModel`` objects can be written to disk, and read from disk in order to run the test step at a later time. When a model is saved, it can either be saved with or without the training set fluxes and inverse variances. The training set fluxes and inverse variances aren't strictly needed anymore once the model is trained, but they can be useful if you want to re-train the model (e.g., with regularization or censoring), or if you want to run the test step on the spectra used to train the model. 
+
+
+.. code-block:: python
+   :linenos:
+
+    model.write("apogee-dr12.model")
+    model.write("apogee-dr12-full.model", include_training_set_spectra=True)
+
+
+By default the training set spectra are not saved because they can add considerably to the file size. The ``apogee-dr12-complete.model`` file size would be smaller given a smaller training set.
+
+.. code-block:: python
+ 
+    >>> ls -lh *.model
+    -rw-rw-r-- 1 arc arc 1.9G Mar  6 15:58 apogee-dr12-complete.model
+    -rw-rw-r-- 1 arc arc 2.3M Mar  6 15:58 apogee-dr12.model
+
+
+Any saved models can be loaded from disk using the ``.read()`` function:
+
+.. code-block:: python
+
+    >>> new_model = tc.CannonModel.read("apogee-dr12.model")
+    >>> new_model.is_trained
+    True
+
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -3,26 +3,67 @@
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
 
-Welcome to The Cannon's documentation!
-======================================
+The Cannon
+==========
 
-Contents:
+The Cannon is a data-driven approach to stellar label determination. The seminal paper describing The Cannon is `Ness et al. (2015) <http://adsabs.harvard.edu/abs/2015ApJ...808...16N>`_, and the name derives from Annie Jump-Cannon, who first arranged stellar spectra in order of temperature purely by the data, without the need for stellar models. This software package is released as part of `Casey et al. (2016) <http://adsabs.harvard.edu/abs/2016arXiv160303040C>`_ and builds on the original implementation of The Cannon by including a number of additional features:
+
+ - Easily construct models with complex vectorizers (e.g., cubic polynomial models with 25 labels)
+ - Analytic derivatives for blazingly-fast optimization at the training step *and* the test step
+ - Built-in parallelism to run the training step and the test step in parallel 
+ - *Pseudo*-continuum normalization using sums of sine and cosine functions
+ - L1 regularization to discover and enforce sparsity
+ - Pixel censoring masks for individual labels
+ - Stratified under-sampling utilities to produce a (more) balanced training set 
+
+The Cannon is being actively developed in a `public GitHub repository <https://github.com/andycasey/AnniesLasso>`_, where you can `open an issue <https://github.com/andycasey/AnniesLasso/issues/new>`_ if you have any problems.
+
+User Guide
+----------
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 3
+
+   install
+   guide
+   tutorials
+   api
+   utilities
+
+
+License & Attribution
+---------------------
+
+The source code is released under the MIT license. If you make use of the code, please cite both the original Ness et al. (2015) paper and Casey et al. (2016):
 
-   getting-started/install
-   getting-started/a-simple-example
-   tutorials/apogee/apogee-dr12
-   guide/create
 
+.. code-block:: tex
 
-.. autoclass::AnniesLasso.CannonModel
+    @ARTICLE{Ness_2015
+        author = {{Ness}, M. and {Hogg}, D.~W. and {Rix}, H.-W. and {Ho}, A.~Y.~Q. and 
+	    {Zasowski}, G.},
+        title = "{The Cannon: A data-driven approach to Stellar Label Determination}",
+        journal = {\apj},
+        year = 2015,
+        month = jul,
+        volume = 808,
+        eid = {16},
+        pages = {16},
+        doi = {10.1088/0004-637X/808/1/16},
+    }
 
-Indices and tables
-==================
+    @ARTICLE{Casey_2016
+        author = {{Casey}, A.~R. and {Hogg}, D.~W. and {Ness}, M. and {Rix}, H.-W. and 
+	    {Ho}, A.~Q. and {Gilmore}, G.},
+        title = "{The Cannon 2: A data-driven model of stellar spectra for detailed chemical abundance analyses}",
+        journal = {ArXiv e-prints},
+        archivePrefix = "arXiv",
+        eprint = {1603.03040},
+        year = 2016,
+        month = mar,
+    }
 
-* :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`
+Here is a list of notable publications that have used or developed upon The Cannon:
 
+ - Ho et al.
+ - others 
diff --git a/docs/source/install.rst b/docs/source/install.rst
@@ -3,10 +3,45 @@
 Installation
 ============
 
-The code is not currently registered through PyPI, so you will need to install
-it directly from the GitHub repository. This can be done using
-`pip <http://www.pip-installer.org/>`_.
+You can install the most recent stable version of The Cannon using `PyPI <https://pypi.python.org/pypi/the-cannon>`_ or the
+development version from `GitHub <http://www.github.com/andycasey/AnniesLasso>`_.
+
+
+Stable Version
+--------------
+
+The easiest way to install the most recent stable version of The Cannon is by using `pip <https://pypi.python.org/pypi/pip>`_.
+This will install any of the prerequisites (e.g., `numpy <https://pypi.python.org/pypi/numpy>`_, `scipy <https://pypi.python.org/pypi/scipy>`_), if you don't already have them:
+
+::
+
+    pip install the-cannon
+
+
+Development Version
+-------------------
+
+To get the source for the latest development version, clone the `git <https://git-scm.com/>`_ repository on `GitHub <http://www.github.com/andycasey/AnniesLasso>`_:
+
+::
+
+    git clone https://github.com/andycasey/AnniesLasso.git
+    cd AnniesLasso
+    git checkout refactor # TODO - Remove this
+
+
+Then install the package by running the following command:
+
+::
+
+    python setup.py install
+
+
+Testing
+-------
+
+To run all the unit and integration tests, install `nose <http://nose.readthedocs.org>`_ and then run:
 
 ::
 
-    $ pip install https://github.com/andycasey/AnniesLasso/archive/master.zip 
+    nosetests -v --cover-package=the-cannon
diff --git a/docs/source/tutorials.rst b/docs/source/tutorials.rst
@@ -0,0 +1,17 @@
+.. _tutorials:
+
+Tutorials
+=========
+
+Constructing the training set
+-----------------------------
+
+
+Continuum normalization
+-----------------------
+
+Censoring
+---------
+
+Regularization
+--------------
diff --git a/docs/source/utilities.rst b/docs/source/utilities.rst
@@ -0,0 +1,5 @@
+.. _utilities:
+
+Utilities
+============
+