Skip to content

Commit

Permalink
Added to the results section
Browse files Browse the repository at this point in the history
  • Loading branch information
andycasey committed Feb 16, 2016
1 parent 14c1a0a commit 0d781b7
Showing 1 changed file with 68 additions and 7 deletions.
75 changes: 68 additions & 7 deletions papers/annieslasso.tex
Original file line number Diff line number Diff line change
Expand Up @@ -710,22 +710,72 @@ \subsection{Label errors \& covariances}
\section{Results}
\label{sec:results}
% ARC: We have a good model. Note that all combined APOGEE spectra are in
% the regime where we are dominated by systematics.
% ARC: The differences between us and APOGEE are clear in the high-alpha seq.
Our experiments have demonstrated that a data-driven model for stellar spectra
can be reliably extended to high dimensionality in label space. We have further
shown that the regularization hyperparameters can be simplified to just two
numbers that can be heuristically set. This yields a sparse, interpretable
(see below) model that recovers labels with high precision at low S/N. However,
for all stacked \apogee\ spectra, the minimum S/N exceeds 50, well into the
regime where we are advantageously dominated by systematic uncertainties.
% ARC: Some plots showing galactic chemical evolution? e.g. [Fe/H] vs [X/Fe]
% ARC: Globular clusters
We have used our regularized model to measure (test) labels of 150,677 \apogee\
spectra, normalized and stacked using by the method in Section
\ref{sec:training-set}. In addition to the model being demonstrably effective,
the test step is very fast: our pure-\texttt{Python} implementation returned 17
labels for all 150,677 spectra in just 28 minutes of wall-time from a single
optimization point on a small research cluster in Cambridge. These were free
and otherwise unused resources; no dedicated computing assets were required.
This pace is also projected to increase, as the test step did not include
analytic derivatives $d\Dvector/dy_j$, which are now implemented in our
open-source code.
The test-step optimization is not convex because the vectorizer contains
quadratic label terms. For this reason we ran the optimization from nine
different initialization points, chosen to sparsely cover the range of
$\Teff$, $\logg$, and abundance labels in the training set. Of the nine
optimizations, we adopted the end result with the lowest $\chi^2$ value.
The training set only includes giant stars, but the \apogee\ \dr\ includes
giants and dwarfs. Therefore we exclude stars with
ARC GIVE FINAL CRITERIA BASED ON ASPCAP LABELS AND/OR CHI-SQUARED VALUE.
The distilled sample contains XX,XXX giant stars, where we report $\Teff$,
$\logg$, and 15 abundance labels. The distribution of $\chi^2$ values for
all 150,677 combined spectra are show in Figure \ref{fig:chisq-test-set}. The
labels in the distilled sample follow expectations from stellar astrophysics,
and include stars that are marginally outside the training set. For example,
the \aspcap\ labels include a strict cut in $\Teff$ at 3600~K, but we reliably
recover labels beyond this boundary. Figure \ref{fig:test-set-hrd} presents a
few different label projections for the distilled sample, indicative of the
boundaries and distribution of our labels.
% Galactic chemical evolution.
% High-alpha sequence.
% Globular clusters.
% ARC: Open clusters??
\section{Discussion}
\label{sec:discussion}
% We have already demonstrated the precision.
There are clear differences between our labels and those from \aspcap\ for
stars with modest S/N ratios (between 50-120). These differences are not
apparent for stars with $S/N \gtrsim 200$, however many stars in this regime
are present in our training set. In Figures \ref{fig:high-alpha-seq} and
% ARC: The differences between us and APOGEE are clear in the high-alpha seq.
% ARC: Globular clusters
% Model Interpretability?
Expand Down Expand Up @@ -782,6 +832,17 @@ \section{Discussion}
DWH: All of the code for this project is available with documentation
at \url{http://thecannon.io/}.
% Let's see if this can slip past Hogg...
%\section{Conclusions}
% We have demonstrated that a data-driven model for stellar spectra can be
% reliably extended to high dimensionality in label space. We have further
% shown that regularization substantially improves the model interpretability:
% spectral derivatives for abundance labels correspond well with known atomic
% lines, and we are able to identify spectral lines that were previously
% unknown.
\acknowledgements
% Thanks...
The authors warmly thank Daniel Foreman-Mackey for valuable discussions.
Expand Down

0 comments on commit 0d781b7

Please sign in to comment.