# Dataset: The SDSS Photometric Catalog

<table><tr>
<td><img src="../graphics/sdss-dusk.jpg" width=512></td>
<td><img src="../graphics/NGC_1130_SDSS.jpg" width=512></td>
</tr></table>

## SDSS

* The Sloan Digital Sky Survey imaged over 10,000 sq degrees of sky (about 25% of the total), automatically detecting, measuring and "cataloging" millions of "objects."


* While the primary data products of the SDSS was (and still are) its spectroscopic surveys, the photometric survey provides an important testbed for imaging surveys like DES and LSST.


* Let's download part of the SDSS photometric object catalog and explore the measurements it contains.

## DR12

* SDSS data release 12 (DR12) is described [at the SDSS3 website](http://www.sdss.org/dr12/) and in the survey paper by [Alam et al 2015](http://arxiv.org/abs/1501.00963). 

* The DR12 photometric catalog is an [online SQL database]((http://skyserver.sdss.org/dr12/en/tools/search/sql.aspx) that can be [queried](http://skyserver.sdss.org/dr12/en/help/docs/realquery.aspx) for data

> Small test queries can be executed directly in the browser. Larger ones (involving more than a few tens of thousands of objects, or that involve a lot of processing) should be submitted via the [CasJobs](http://skyserver.sdss.org/CasJobs/) system.

## Querying the DR12 database

In [None]:
exec(open('../examples/SDSScatalog/SDSS.py').read())
import numpy as np, pandas as pd, matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 12.0)
%matplotlib inline

In [None]:
objects = "SELECT top 10000 \
ra, \
dec, \
type, \
dered_u as u, \
dered_g as g, \
dered_r as r, \
dered_i as i, \
petroR50_i AS size \
FROM PhotoObjAll \
WHERE \
((type = '3' OR type = '6') AND \
 ra > 185.0 AND ra < 185.2 AND \
 dec > 15.0 AND dec < 15.2)"
print (objects)

## Querying the DR12 database

In [None]:
# Download some data. This can take a while...
sdssdata = select(objects)
sdssdata.head(50)

## Querying the DR12 database

Note:

* Some values of `size` are large and negative - indicating a problem with the automated measurement routine. We will need to deal with these.


* Sizes are "effective radii" in arcseconds. The typical resolution ("point spread function" effective radius) in an SDSS image is around 0.7".


The dataset we just downloaded is also saved in the course repo.

In [None]:
# ! mkdir -p ../examples/SDSScatalog/downloads
# ! mv SDSSobjects.csv ../examples/SDSScatalog/downloads/

## Visualizing Data in N-dimensions

* This is, in general, difficult.


* Looking at all possible 1 and 2-dimensional histograms/scatter plots helps a lot. 


* Color coding can bring in a 3rd dimension ([and even a 4th](http://blogs.scientificamerican.com/sa-visual/visualizing-4-dimensional-asteroids1/)). Interactive plots and movies are also well worth thinking about.
<br>

In [None]:
# We'll use astronomical g-r color  as the colorizer, and then plot 
# position, magnitude, size and color against each other.

data = pd.read_csv("../examples/SDSScatalog/downloads/SDSSobjects.csv",usecols=["ra","dec","u","g",\
                                                "r","i","size"])

# Filter out objects with bad magnitude or size measurements:
data = data[(data["u"] > 0) & (data["g"] > 0) & (data["r"] > 0) & (data["i"] > 0) & (data["size"] > 0)]

# Log size, and g-r color, will be more useful:
data['log_size'] = np.log10(data['size'])
data['g-r_color'] = data['g'] - data['r']

# Drop the things we're not so interested in:
del data['u'], data['g'], data['r'], data['size']

data.head()

## Visualizing some DR12 object properties

In [None]:
plot_everything(data, colorizer='g-r_color', limits=(-1.0, 3.0))