In [None]:
from __future__ import print_function, division, absolute_import

As a preamble - note that JuPyTer notebooks are formatted using [markdown](http://daringfireball.net/projects/markdown/), a text-to-html conversion tool. If you are not familiar, there are [cheatsheets](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) available with all the basic functionality that you will need.

# LSSTC DSFP Template Notebook:

# How to put together a DSFP notebook using our standard formatting

**Version 0.1**

This notebook serves as a template for constructing problems for students as part of the [LSSTC Data Science Fellowship Program](http://ciera.northwestern.edu/Education/LSSTC_DSFPOverview.php). This preamble to the notebook contains an overview of the problems with a brief introduction to the big picture behind the problem. 

For some example notebooks, take a look at AAM's problems on [Unsupervised Machine Learning](https://github.com/LSSTC-DSFP/LSSTC-DSFP-Sessions/blob/master/Session1/Day4/IntroToMachineLearning.ipynb) and [Building An End-to-end Machine Learning Pipeline]().

*Note - tips and suggestions based on AAM's experience developing problems for the DSFP are italicized in this notebook. Otherwise, we typically reserve italics for **hints** on specific problems for the students.*

* * *

By AA Miller (CIERA/Northwestern & Adler)

*The necessary libraries are imported at the beginning of the notebook*

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib notebook

## Problem 1) Creating DSFP Problems

We have found that modular notebooks work best. Thus the typical construction includes $\sim3-5$ capital P Problems, each with sub-problems a, b, c, d, etc. The Problems address major ideas from the lecture, while the sub-problems directly address the nitty gritty details related to implementation of the idea. Most sub-problems require some measure of coding (*typically $\lesssim 10$ lines in a single sub-problem is best*), though some simply ask for responses regarding the data or the code that has just been written. 

The modular structure allows the students to move at their own pace, and at this point they all understand that problems get progressively more difficult throughout the notebook. Within the time that is allotted, some students finish and others do not. Following the Problems, there is always a **Challenge Problem** for the students that finish early. These Challenge Problems are always more difficult and typically are not as well structured as the earlier Problems in the notebook.

**An essential tip** - in my experience it is by *far* best to start by writing the solutions to the notebook. This ensures that all the code works and behaves as expected [it also helps control the total time necessary to complete the problems]. Then the notebook for the students can be created by copying the solutions and removing important portions of the code using `# complete` as you will see in the examples below.

It is often, though not always, useful to include code that the students run to load data for the notebook. Here is an example to load some columns from SDSS. As this data is required for the exercise, this is not considered a problem.

In [None]:
# excecute this cell
from astroquery.sdss import SDSS  # enables direct queries to the SDSS database

TSquery = """SELECT TOP 1000
             p.psfMag_r, p.fiberMag_r, p.fiber2Mag_r, p.petroMag_r, 
             p.deVMag_r, p.expMag_r, p.modelMag_r, p.cModelMag_r, 
             s.class
             FROM PhotoObjAll AS p JOIN specObjAll s ON s.bestobjid = p.objid
             WHERE p.mode = 1 AND s.sciencePrimary = 1 AND p.clean = 1 AND s.class != 'QSO'
             ORDER BY p.objid ASC
               """
SDSSts = SDSS.query_sql(TSquery)
SDSSts

For the actual problems it is best to provide the students with code snippets as in the example below. It is also useful to define specific variable names that will be used throughout the notebook.

**Problem 1a**

How many sources in the random selection from SDSS have a spectroscopic class of `STAR`? Store the result in a variable called `Nstar`.

*Hint* - pay attention to the python type for the `class` column in `SDSSts`.

In [None]:
stars = SDSSts["class"] # complete
Nstar = # complete

print("There are {:d} stars in the data set.".format( # complete

**Problem 1b**

What fraction of the stars in the data set are fainter than $r' = 20 \; \mathrm{mag}$? Store the result in a variable called `Nbright_star`.

In [None]:
bright = SDSSts[ # complete
Nbright_star = # complete
    
print("There are {:d} stars with r' < 20 the data set.".format( # complete

Sometimes it is useful to add an explanation for the acquired results. Or to add some text setting up the next portion of the problem. Alternatively, you may want the students to think about the results and provide a response for what they found, as follows: *in this case it's good to provide a markdown cell for them to write their answer*

**Problem 1c**

Based on your knowledge of SDSS, do your results above make sense?

*Provide your answer to Problem 1c here*

While plots are useful, and often necessary, it's best to keep them compact if possible. We don't want students struggling with plot syntax (unless the specific topic is visualization). Thus, in these cases, it's best to provide code snippets that are more close to being complete. 

**Problem 1d**

For the bright stars, make a scatter plot of `psdMag_r` vs. `deVMag_r`. 

In [None]:
plt.scatter(SDSSts[(bright) & (star)][ # complete

plt.xtitle('psfMag_r')
plt.ytitle('deVMag_r')
plt.xlimit( # complete
plt.xlimit( # complete

plt.tight_layout()

## Problem 2

When the first idea is complete, begin a new Problem. The notebook continues with this modular structure until the challenge problem. Problems 2a, 2b, 2c, etc. are eventually to be followed by Problem 3... and so on. 

Text explaining what's going on

** Problem 2a **

Definition of Problem 2a.

In [None]:
code_snippet1 = # complete
code_snippet2 = # complete
code_snippet3 = # complete
code_snippet4 = # complete

Finally, as noted above, the notebook should end with a Challenge Problem in case any of the students finish the notebook before time is up. This problem should be harder and include fewer prompts.

## Challenge Problem

Complete the following problem, which can be arbitrarily difficult.

In [None]:
# no code snippets provided here