# Building a Sample for Citizen Science with Galaxy Zoo

The SDSS records colors, redshifts, magnitudes, and more for millions of objects in the sky.  You might wonder what a sample selected in a particular way (e.g., by color) "looks like".  For instance, are blue galaxies characterized by a particular shape?  

In this activity, we will walk through the basic steps required to select a sample of galaxies from the SDSS database, save their images to your SciDrive, and upload them into your own unique citizen science project on [Zooniverse](http://zooniverse.org).  We will then build a basic project to enable your friends and family to help contribute to answering a science question of your choosing.



## I. Importing SciServer and other important libraries
The SciServer team has written a number of libraries, generally prefixed with "SciServer", that assist in various functions. As with all Python libraries, they must be actively imported before being used.
The next code block imports those, together with some standard Python libraries helpful for scientific analysis. The code block below applies some settings you may find helpful.

In [None]:
# Steps to mount Drive to Google Colab
# 1. Run this script
# 2. Click the link in the output of the script.
# 3. Sign in with the appropriate Google account.
# 4. Copy the code from the new page and go back to this page.
# 5. Enter the code in the box and press ENTER
# 6. Wait for the cell to output "Mounted at /content/drive"

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
# If you have NOT changed the name of the package file 
# and it is in your main Google Drive directory, leave 
# this path alone. Otherwise, update this path accordingly.
drivePath = 'simpleZooniverse'

# Go to the appropriate folder for execution.
!python3 --version
!cd /content/drive/MyDrive/$drivePath; pwd; pip3 install -r requirements.txt > /dev/null

In [None]:
# Import other libraries for use in this notebook.
import os  
import sys
import time
import numpy as np                  # standard Python lib for math ops
from imageio import imsave       # save images as files
import pandas                       # data manipulation package
import matplotlib.pyplot as plt     # another graphing package
import glob
from astropy.table import Table
print('Supporting libraries imported')

# Apply some special settings to the imported libraries
# ensure columns get written completely in notebook
pandas.set_option('display.max_colwidth', -1)
# do *not* show python warnings 
import warnings
warnings.filterwarnings('ignore')
print('Settings applied')

In [None]:
# Import Python libraries to work with SciServer
if not '/content/drive/MyDrive/{}/'.format(drivePath) in sys.path:
    sys.path.insert(1, '/content/drive/MyDrive/{}/'.format(drivePath))
if not '/content/drive/MyDrive/{}/SciServer'.format(drivePath) in sys.path:
    sys.path.insert(1, '/content/drive/MyDrive/{}/SciServer'.format(drivePath))
if not '/content/drive/MyDrive/{}/zooniversePackage'.format(drivePath) in sys.path:   
    sys.path.insert(1, '/content/drive/MyDrive/{}/zooniversePackage'.format(drivePath))
print(sys.path)
import CasJobs
import SkyServer
import SciDrive # query with CasJobs
print('SciServer libraries imported')

## II. Querying an astronomy database (SDSS DR16)
The next code block searches the SDSS Data Release 16 database via the CasJobs REST API. The query completes quickly, so it uses CasJobs quick mode.
CasJobs also has an asynchronous mode, which will submit job to a queue and will store the results in a table in your MyDB. If your results are very large, it will store the results in MyScratchDB instead.
Run the code block below to query DR16. Try changing some of the query parameters in step to see the effect on the results returned.
Documentation on the SciServer Python libraries can be found at our documentation site at:
http://www.sciserver.org/docs
make example with batch query mode.

An extensive tutorial on how to query the SDSS database can be found here:
http://skyserver.sdss.org/dr16/en/help/howto/search/searchhowtohome.aspx

In [None]:
# Find objects in the Sloan Digital Sky Survey's Data Release 16.
#
# Query the Sloan Digital Sky Serveys' Data Release 16.
# For the database schema and documentation see http://skyserver.sdss.org/dr16
#
# This query finds "a 4x4 grid of nice-looking galaxies": 
#   galaxies in the SDSS database that have a spectrum 
#   and have a size (petror90_r) larger than 10 arcsec.
# 
# First, store the query in an object called "query"
query="""
SELECT TOP 16 p.objId,p.ra,p.dec,p.petror90_r, p.g, p.r
  FROM galaxy AS p
   JOIN SpecObj AS s ON s.bestobjid = p.objid
WHERE p.u BETWEEN 0 AND 19.6
  AND p.g BETWEEN 0 AND 17  AND p.petror90_r > 10
"""
gals = CasJobs.executeQuery(query, "dr16")
gals = gals.set_index('objId')
gals

#### EXPERIMENT:  What happens if you change the query?

Try changing the query in the following ways: 
- Return 24 matching objects insetad of 16.
- Select only nearby galaxies (redshift < 0.05).
- Select only galaxies likely to be spirals (with u-r color >= 2.22)
- Search for galaxies in SDSS Data Release 14 instead of DR16.

What changes do you notice in the table of returned results?

Try it in the code block below:

In [None]:
# Find objects in the Sloan Digital Sky Survey's Data Release 14.
# First, store the query in an object called "query"
query="""
SELECT TOP 24 p.objId,p.ra,p.dec,p.petror90_r, p.g, p.r
  FROM galaxy AS p
   JOIN SpecObj AS s ON s.bestobjid = p.objid
WHERE p.u BETWEEN 0 AND 19.6
  AND p.g BETWEEN 0 AND 17
  AND p.petror90_r > 10
  AND s.z <0.05
  AND p.u-p.r>=2.2
"""
gals = CasJobs.executeQuery(query, "dr14")
gals = gals.set_index('objId')
gals

# IV. Store results in your container for later use
The next code block saves the data table "gals" as an HD5 file and as a CSV file.

To see these files, go back to your iPython notebook dashboard (the page from which you opened this notebook). Make sure you are in the persistent folder. You should see your files there. Click on the file names to preview.


In [None]:
# store result as HDF5 file 
h5store = pandas.HDFStore('GalaxyThumbSample.h5')
h5store['galaxies']=gals
h5store.close()

# store result as CSV file
gals.to_csv('GalaxyThumbSample.csv')

print ("Done.")


# V. Retrieve thumbnail cutouts of galaxies and show them on screen
SkyServer has a service that will produce a color image cutout of certain dimensions around a specified position, displayed as a JPG thumbnail.

The code below iterates through each galaxy in your results and calls the image cutout generator for each galaxy. The scale of the image depends on the Petrosian radius of the galaxy.

In [None]:
# set thumbnail parameters
width=200           # image width
height=200          # height
pixelsize=0.396     # image scale
plt.figure(figsize=(15, 15))   # display in a 4x4 grid
subPlotNum = 1

i = 0
nGalaxies = len(gals)
for index,gal in gals.iterrows():           # iterate through rows in the DataFrame
    i = i + 1
    print('Getting image '+str(i)+' of '+str(nGalaxies)+'...')
    if (i == nGalaxies):
        print('Plotting images...')
    scale=2*gal['petror90_r']/pixelsize/width
    img = SkyServer.getJpegImgCutout(ra=gal['ra'], dec=gal['dec'], width=width, height=height, scale=scale,dataRelease='DR16')
    
    # Preview the first 16 with this command
    if i < 17:
        plt.subplot(4,4,subPlotNum)
    
    ## Preview the full sample of the galaxies by un-commenting the following command:
    #plt.subplot(int(len(gals)/4)+1,4,subPlotNum)
    
    subPlotNum += 1
    plt.imshow(img)                               # show images in grid
    plt.title(index)                            # show the object identifier (objId) above the image.

plt.show()

# VI. Write thumbnails to Google Drive

The three code blocks below work together to write the thumbnails you generated in step 6 into your Google Drive.

In [None]:
# Step 7b: Specify the directory in your Google Drive to hold the thumbnail images
mydir = '/content/drive/MyDrive/{}/subjectFolder/sdssSubjects'.format(drivePath)

# Makes a new local directory if there is no local directory named 'subjects'
if not os.path.isdir(mydir):
    os.mkdir(mydir)
else:
    for f in glob.glob(mydir+'/*'):
        os.remove(f)  

# set thumbnail parameters
width=200           # image width
height=200          # height
pixelsize=0.396     # image scale
        
# make a name for your "manifest" -- the table that matches your images to known information about the galaxies
fout = open(mydir+'/galaxy_manifest.csv', 'w')
fout.write("index, image, ra, dec \n")

# Write thumbnails to Google Drive. You will see a confirmation message when each is written
i = 0
puburls=[]
localArray = []
for index,gal in gals.iterrows():   
    i = i + 1
    print('Writing image file '+str(i)+' of '+str(len(gals))+'...')
    scale=2*gal['petror90_r']/pixelsize/width
    img = SkyServer.getJpegImgCutout(ra=gal['ra'], dec=gal['dec'], width=width, height=height, scale=scale,dataRelease='DR14')
    localname = str(index)+'.jpg'
    localpath = mydir+'/'+str(index)+'.jpg'
    # Here the file is saved to Google Drive 
    imsave(localpath,img)
    localArray.append(localpath)
    fout.write('{}, {}.jpg, {}, {}\n'.format(index,index,gal['ra'],gal['dec']))
fout.close()

print('Done!')

# Uploading your data to Zooniverse

Your thumbnails will now be collected in your Google Drive. This should have been saved in the same Google Drive directory as your images.

If you haven't already, you'll now need to make an account on [Zooniverse](https://www.zooniverse.org/).

*blurb for zooniverse scripts

In [None]:
import getpass
import zooniverseScripts as zooni

projName = 'Simple Zooniverse SDSS Example Project for ' + user
imgLoc = mydir+'/galaxy_manifest.csv'

zooni.inputData(projectName=projName, dsLocations=imgLoc)