Skip to content

Commit

Permalink
Add getting started guide
Browse files Browse the repository at this point in the history
  • Loading branch information
andycasey committed Mar 6, 2017
1 parent 69272c5 commit fe5dae9
Show file tree
Hide file tree
Showing 8 changed files with 275 additions and 42 deletions.
4 changes: 4 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.. _api:

API
===
12 changes: 0 additions & 12 deletions docs/source/getting-started/a-simple-example.rst

This file was deleted.

12 changes: 0 additions & 12 deletions docs/source/getting-started/install.rst

This file was deleted.

155 changes: 155 additions & 0 deletions docs/source/guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
.. _guide:

Getting Started Guide
=====================

Before we get started, you should know that the following ingredients are required to run The Cannon:

- a *training set* of stars with known labels (e.g., stellar parameters and chemical abundances),
- pseudo-continuum-normalized spectra of stars in the training set, with all spectra sampled onto the same wavelength points,
- some *test set* spectra that you want to derive labels from, which has been processed the same way as the training set spectra.

In this guide we will provide you with the training set labels and spectra, and the test set spectra. If you want more information about `constructing a training set <tutorials.html#constructing-a-training-set>`_ or `continuum-normalizing your spectra <tutorials.html#continuum-normalization>`_, see the linked tutorials.

.. note:: We use `Travis continuous integration <https://travis-ci.org/andycasey/AnniesLasso>`_ to test every change to The Cannon in Python versions 2.7, 3.5, and 3.6. The code examples here should work in any of these Python versions.


In this guide we will train a model using `APOGEE DR12 <http://www.sdss.org/dr12/irspec/>`_ spectra and `ASPCAP <http://www.sdss.org/dr12/irspec/parameters/>`_ labels to derive effective temperature :math:`T_{\rm eff}`, surface gravity :math:`\log{g}`, and four individual chemical abundances (:math:`[{\rm Fe}/{\rm H}]`, :math:`[{\rm Na}/{\rm H}]`, :math:`[{\rm Ti}/{\rm H}]`, :math:`[{\rm Ni}/{\rm H}]`). These spectra have been pseudo-continuum-normalized using a sum of sine and cosine functions (which is a different process `ASPCAP <http://www.sdss.org/dr12/irspec/parameters/>`_ uses for normalization), and individual visits have been stacked.

Here we won't use any `regularization <tutorials.html#regularization>`_ or `wavelength censoring <tutorials.html#censoring>`_, but these can be applied at the end, and the ``CannonModel`` object can be retrained to make use of regularization and/or censoring by using the ``.train()`` function.

Downloading the data
--------------------

You can download the required data for this guide using the following command:

::

wget zenodo-link #TODO

Creating a model
----------------

After you have `installed The Cannon <install>`_, you can use the following Python code to construct a ``CannonModel`` object:


.. code-block:: python
:linenos:
from astropy.table import Table
from six.moves import cPickle as pickle
from sys import version_info
import thecannon as tc
# Load the training set labels.
training_set_labels = Table.read("apogee-dr12-training-set-labels.fits")
# Load the training set spectra.
pkl_kwds = dict(encoding="latin-1") if version_info[0] >= 3 else {}
with open("apogee-dr12-training-set-spectra.pkl", "rb") as fp:
training_set_flux, training_set_ivar = pickle.load(fp, **pkl_kwds)
# Specify the labels that we will use to construct this model.
label_names = ("TEFF", "LOGG", "FE_H", "NA_H", "TI_H", "NI_H")
# Construct a CannonModel object using a quadratic (O=2) polynomial vectorizer.
model = tc.CannonModel(
training_set_labels, training_set_flux, training_set_ivar,
vectorizer=tc.vectorizer.PolynomialVectorizer(label_names, 2))
Let's check the model configuration:

.. code-block:: python
>>> print(model)
<tc.model.CannonModel of 6 labels with a training set of 14141 stars each with 8575 pixels>
>>> print(model.vectorizer.human_readable_label_vector)
1 + TEFF + LOGG + FE_H + NA_H + TI_H + NI_H + TEFF^2 + LOGG*TEFF + FE_H*TEFF + NA_H*TEFF + TEFF*TI_H + NI_H*TEFF + LOGG^2 + FE_H*LOGG + LOGG*NA_H + LOGG*TI_H + LOGG*NI_H + FE_H^2 + FE_H*NA_H + FE_H*TI_H + FE_H*NI_H + NA_H^2 + NA_H*TI_H + NA_H*NI_H + TI_H^2 + NI_H*TI_H + NI_H^2
# This model has no regularization.
>>> print(model.regularization)
None
# This model includes no censoring.
>>> print(model.censors)
None
The training step
-----------------

The model configuration matches what we expected, so let's train the model and make it useful:

.. code-block:: python
>>> theta, s2, metadata = model.train(threads=1)
2017-03-06 14:18:40,920 [INFO] Training 6-label CannonModel with 14141 stars and 8575 pixels/star
[====================================================================================================] 100% (147s)
This model took about two minutes to train on a single core. The ``.train()`` function returns the :math:`\theta` coefficients, the noise residuals :math:`s^2`, and metadata associated with the training of each pixel. The :math:`\theta` coefficients and scatter terms :math:`s^2` are also accessible through the ``.theta`` and ``.s2`` attributes, respectively.

.. code-block:: python
:linenos:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(model.theta.T[0], c='b')
ax.set_xlabel(r'Pixel')
ax.set_ylabel(r'$\theta_0$')
# Alternatively, you can use the convenient plotting functions:
fig_theta = tc.plot.theta(model, indices=0)
fig_s2 = tc.plot.s2(model)
# TODO --> Theta and s2 figures

The test step
-------------

The trained model can now be used to run the test step against all APOGEE spectra. First, we will run the test step *on the training set spectra* as a sanity check to ensure we can approximately recover the ASPCAP labels.

.. code-block:: python
:linenos:
test_labels = model.test(training_set_flux, training_set_ivar, threads=1)
# Plot a comparison between the ASPCAP labels and the labels returned at the test step.
fig_comparison = tc.plot.one_to_one(model, test_labels)
Saving the model to disk
------------------------

All ``CannonModel`` objects can be written to disk, and read from disk in order to run the test step at a later time. When a model is saved, it can either be saved with or without the training set fluxes and inverse variances. The training set fluxes and inverse variances aren't strictly needed anymore once the model is trained, but they can be useful if you want to re-train the model (e.g., with regularization or censoring), or if you want to run the test step on the spectra used to train the model.


.. code-block:: python
:linenos:
model.write("apogee-dr12.model")
model.write("apogee-dr12-full.model", include_training_set_spectra=True)
By default the training set spectra are not saved because they can add considerably to the file size. The ``apogee-dr12-complete.model`` file size would be smaller given a smaller training set.

.. code-block:: python
>>> ls -lh *.model
-rw-rw-r-- 1 arc arc 1.9G Mar 6 15:58 apogee-dr12-complete.model
-rw-rw-r-- 1 arc arc 2.3M Mar 6 15:58 apogee-dr12.model
Any saved models can be loaded from disk using the ``.read()`` function:

.. code-block:: python
>>> new_model = tc.CannonModel.read("apogee-dr12.model")
>>> new_model.is_trained
True
69 changes: 55 additions & 14 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,67 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to The Cannon's documentation!
======================================
The Cannon
==========

Contents:
The Cannon is a data-driven approach to stellar label determination. The seminal paper describing The Cannon is `Ness et al. (2015) <http://adsabs.harvard.edu/abs/2015ApJ...808...16N>`_, and the name derives from Annie Jump-Cannon, who first arranged stellar spectra in order of temperature purely by the data, without the need for stellar models. This software package is released as part of `Casey et al. (2016) <http://adsabs.harvard.edu/abs/2016arXiv160303040C>`_ and builds on the original implementation of The Cannon by including a number of additional features:

- Easily construct models with complex vectorizers (e.g., cubic polynomial models with 25 labels)
- Analytic derivatives for blazingly-fast optimization at the training step *and* the test step
- Built-in parallelism to run the training step and the test step in parallel
- *Pseudo*-continuum normalization using sums of sine and cosine functions
- L1 regularization to discover and enforce sparsity
- Pixel censoring masks for individual labels
- Stratified under-sampling utilities to produce a (more) balanced training set

The Cannon is being actively developed in a `public GitHub repository <https://github.com/andycasey/AnniesLasso>`_, where you can `open an issue <https://github.com/andycasey/AnniesLasso/issues/new>`_ if you have any problems.

User Guide
----------

.. toctree::
:maxdepth: 2
:maxdepth: 3

install
guide
tutorials
api
utilities


License & Attribution
---------------------

The source code is released under the MIT license. If you make use of the code, please cite both the original Ness et al. (2015) paper and Casey et al. (2016):

getting-started/install
getting-started/a-simple-example
tutorials/apogee/apogee-dr12
guide/create

.. code-block:: tex

.. autoclass::AnniesLasso.CannonModel
@ARTICLE{Ness_2015
author = {{Ness}, M. and {Hogg}, D.~W. and {Rix}, H.-W. and {Ho}, A.~Y.~Q. and
{Zasowski}, G.},
title = "{The Cannon: A data-driven approach to Stellar Label Determination}",
journal = {\apj},
year = 2015,
month = jul,
volume = 808,
eid = {16},
pages = {16},
doi = {10.1088/0004-637X/808/1/16},
}

Indices and tables
==================
@ARTICLE{Casey_2016
author = {{Casey}, A.~R. and {Hogg}, D.~W. and {Ness}, M. and {Rix}, H.-W. and
{Ho}, A.~Q. and {Gilmore}, G.},
title = "{The Cannon 2: A data-driven model of stellar spectra for detailed chemical abundance analyses}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1603.03040},
year = 2016,
month = mar,
}

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Here is a list of notable publications that have used or developed upon The Cannon:

- Ho et al.
- others
43 changes: 39 additions & 4 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,45 @@
Installation
============

The code is not currently registered through PyPI, so you will need to install
it directly from the GitHub repository. This can be done using
`pip <http://www.pip-installer.org/>`_.
You can install the most recent stable version of The Cannon using `PyPI <https://pypi.python.org/pypi/the-cannon>`_ or the
development version from `GitHub <http://www.github.com/andycasey/AnniesLasso>`_.


Stable Version
--------------

The easiest way to install the most recent stable version of The Cannon is by using `pip <https://pypi.python.org/pypi/pip>`_.
This will install any of the prerequisites (e.g., `numpy <https://pypi.python.org/pypi/numpy>`_, `scipy <https://pypi.python.org/pypi/scipy>`_), if you don't already have them:

::

pip install the-cannon


Development Version
-------------------

To get the source for the latest development version, clone the `git <https://git-scm.com/>`_ repository on `GitHub <http://www.github.com/andycasey/AnniesLasso>`_:

::

git clone https://github.com/andycasey/AnniesLasso.git
cd AnniesLasso
git checkout refactor # TODO - Remove this


Then install the package by running the following command:

::

python setup.py install


Testing
-------

To run all the unit and integration tests, install `nose <http://nose.readthedocs.org>`_ and then run:

::

$ pip install https://github.com/andycasey/AnniesLasso/archive/master.zip
nosetests -v --cover-package=the-cannon
17 changes: 17 additions & 0 deletions docs/source/tutorials.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _tutorials:

Tutorials
=========

Constructing the training set
-----------------------------


Continuum normalization
-----------------------

Censoring
---------

Regularization
--------------
5 changes: 5 additions & 0 deletions docs/source/utilities.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. _utilities:

Utilities
============

0 comments on commit fe5dae9

Please sign in to comment.