# Example Analysis with Zooniverse GalaxyZoo Data

<img src="http://jmcolberg.com/weblog/archives/GalaxyZoo.jpg" width = "600x">

## Step 1: Where are all the galaxies?

Let's see where in the universe our <a href="http://www.galaxyzoo.org"> GalaxyZoo</a> galaxies are located.

You will be using the data file <a href="data/GZ_class_mags.dat">GZ_data.dat</a> which contains all of the classification results for the GalaxyZoo galaxies. Let's see what data is contained in this file. (To read in this data file, we will use 'ascii.read', which is a little different than the fits.open command you used in <a href="../Workshop1/Part_2.ipynb">Part 2</a> because they are different types of data files).  

In [None]:
#Import needed astropy library
from astropy.table import Table,Column
from astropy.io import ascii

#Read in data table
#Fill in the blank with the ascii.read function as you did in Part 3
Galaxies = --- ('data/GZ_class_mags.dat')
Galaxies

#Run this cell (shortcut=shift+enter) to read in your data table.
#Note: Be patient. This is a large file and may take a minute or so to load.

In order to determine the <i> location </i> of the galaxies, we will use the galactic latitude and longitude of the galaxies. These are called the <a href="http://astro.unl.edu/classaction/animations/coordsmotion/radecdemo.html">RA and DEC</a> values, and are located in columns three and four. Use these values to create a plot of the location of each galaxy in the sky.

## Make a plot using the commands you learned in <a href="../Workshop1/Part_2.ipynb">Part 2</a>.

In [None]:
""""First create two variables, RA and DEC, that contain the RA and DEC values from the table."""
#Access the 'RA' and 'DEC' values from the Galaxies index and assign those values to RA and DEC, respectively
#If you need a refresher, look back at how you accessed GLON and GLAT data in Part 2


#### Now make a plot of RA vs Dec

Reuse the matplotlib.pyplot commands you used in <a href="../Workshop1/Part_2.ipynb">Part 2</a>.
* Remember, first insert a new cell below, type in your code, and then 'run the cell' by pressing the play button in the top menu.
* Remember to include any needed libraries

Your plot of RA vs Dec for all the Galaxy Zoo galaxies will look something like the image below (though doesn't need to be in this same projection). 

The right-hand image shows with a yellow-green squiggle where the disk of our Milky Way galaxy lives. The disk of our Milky Way blocks our view of distant galaxies in that part of our sky. 

<img style="float: left" src="http://classic.sdss.org/dr7/dr7photo_big.gif" width="400x">

<img style="float: right" src="http://farm2.static.flickr.com/1055/4724975807_79f8722a8d_b.jpg" width = "500x">

### Step 2: What Color are Typical Spiral Galaxies?

Subtracting a galaxy's i-band brightness from its g-band brightness can tell you how much bluer or redder the galaxy is. Smaller values of g-i mean the galaxy is bluer, larger values of g-i means that it's redder.

<img src="http://faculty.wcas.northwestern.edu/aaron-geller/myimages/sequence-de-hubble-galaxies.jpg" width = "600x">
<img src="http://faculty.wcas.northwestern.edu/aaron-geller/myimages/plot_sdss_filters_1.png" width = "600x">



### We will need to use the numpy functions np.array and np.subtract to create a new array of corrected color numbers.

#### 'np.array' creates an array of numbers from a data set. See the example below:

In [None]:
#Import needed numpy library
import numpy as np

my_array = np.array([1, 2, 3, 4])
print my_array

#### 'np.subtract' subtracts one number from another. See the example below:

In [None]:
#Import needed numpy library
import numpy as np

new_number = np.subtract(7, 3)
print new_number

### You are going to create arrays called gmag and imag that have the data from the g_mag and i_mag columns in the galaxies data set. <br> <br> Then you will subtract imag from gmag

In [None]:
#Create your gmag array using np.array and Galaxies['g_mag']
gmag = np.array(Galaxies['g_mag'])

#Create your imag array using np.array and Galaxies['i_mag']
imag =

#Use np.subtract to subtract the imag array from the gmag array
color = 

#As a test, print the color values for the first ten galaxies
print color[0:10]

#Print the minimum and maximum color value
print 'min/max color:',min(color),max(color)

Do you notice how the color values for the first ten galaxies are all around '1', but the min & max color values are really high and really low? 

These very high/low values are because there are some sources for which the telescope was unable to get a good value for magnitude or brightness. In the table, the magnitudes for these sources have been set to -999, as a flag to indicate that they're bad values. 

Often when working with data, you'll need to be aware of 'bad' values and make sure you filter them out.

## To Flag bad color values: Use np.where to pick out the indices with values greater than 99 or less than -99.

Use np.where and identify the index values where color < -99 or color > 99. (If you need a refresher on how to use the 'where' function, look back at <a href="Part_3.ipynb">Part 3</a>.).

In [None]:
#Make x equal to the values where color < -99
x = 
#Make y equal to the values where color > 99
y = 

#Set the color values for these bad spots to NAN (not a finite number)
#This will ensure the bad data points don't end up in your histogram
color[x] = np.nan
color[y] = np.nan

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

#Plot a histogram of the finite values in the color array
#note, there are no more color < -99 or color > 99 values
#notice how we use the np.isfinite to pick out just the finite color values
plt.hist(color[np.isfinite(color)],facecolor='yellow')

Above you should have created a histogram that gives you the colors for the galaxies in GalaxyZoo

## Now let's analyze the data provided by GalaxyZoo and see the differences and similarities between the colors of the Elliptical and Spiral galaxies

##### Part A: Make an array of the indices for spiral galaxies

In [None]:
#Use the 'where' function to identify all the Spiral galaxies.
#Fill in the blank with the identifier for spiral galaxies: 'Spiral'
spirals_ind = np.where(Galaxies[---] == 1)[0]

#Just to see that these are indices of the array, print the first 10 values
print spirals_ind[0:10]

Look at the indices printed above and compare with where you see the Spiral column equal '1' in the data table. Does this make sense? <br>
Note: in Python, you start counting with '0' as your first number.

##### Part B: Make an array with the colors just for Spiral galaxies:

In [None]:
#Pick out the subset of color values associated with the spiral galaxy indices
#Fill in the blank with spirals_ind
spirals_color = color[---]

print spirals_color[0:10]
print min(spirals_color),max(spirals_color)

### Now repeat Parts A and B for elliptical galaxies

In [None]:
#Use the 'where' function to identify all the Elliptical galaxies.

#Pick out the subset of color values associated with the elliptical galaxy indices


## Now let's make a plot of the Spiral Galaxies' color vs. the number of galaxies that are that color (i.e., a histogram).

In [None]:
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import numpy as np

#Plot a histogram for the spirals' color
#notice how we use the np.isfinite to pick out just the finite color values
plt.hist(spirals_color[np.isfinite(spirals_color)],bins=50,facecolor='blue')

#Label the plot
plt.xlabel('')
plt.ylabel('')
plt.title('')
plt.xlim(xmin=0,xmax=3)

## Make the same histogram plot, but for the Elliptical Galaxies' color.

# Now overplot the Spiral color histogram over the Elliptical color histogram

For suggestions on how to overplot in a histogram plot, go <a href="../examples/MultipleHistogramsExample.ipynb">here</a>.

<!--- a http://matplotlib.org/1.2.1/examples/pylab_examples/histogram_demo.html http://www.bertplot.com/visualization/?p=229
      "-->

What do you notice about the color of the Spiral Galaxies? Do you see how most are bluer in color? Remember, the lower the value for 'g-i' color, the bluer the galaxy. The bluer color indicates that the galaxy is currently making new stars (i.e., has active star formation). 

The result (that spiral galaxies tend to be bluer) makes sense. The figure below shows a typical example of a blue, actively star forming, spiral galaxy.

<img src="http://i.space.com/images/i/000/044/763/i02/area-around-andromeda-galaxy.jpg?1420653454">

However, also notice the exceptions. In your histogram, you see that a number of spiral galaxies are redder in color (have higher 'g-i' color values). These are galaxies where classifiers saw they had spiral arms, but their color tells us that they have no active star formation happening. <a href="http://www.sciencedaily.com/releases/2008/11/081124194936.htm">A special process has shut off star formation in these galaxies.</a> 

Finding this intriguing result was possible because of the huge numbers of galaxies in GalaxyZoo!

### Step 3: Looking at individual galaxies of interest.

If you identify a subset of galaxies in the GalaxyZoo data table that you'd like to look at individually in more detail, you can do the following:

In [None]:
#Select particular galaxies of interest, for example, red spirals

# np.where picks out sources that have 
# 1.9 < color > 1.91 AND they are spiral galaxies
redSpirals = np.where((color > 1.9) & (color < 1.91) & (Galaxies['Spiral'] == 1))[0]

#The 'len' command is short for 'length'
#It tells you how many sources are in the RedSpirals array
print '# of Red Spirals: ',len(redSpirals)

#Print the RA, Dec values for the galaxies of interest
#Use a for loop to print the RA,DEC values as a pair for each galaxy
for j in redSpirals:
    print Galaxies['RA'][j],Galaxies['DEC'][j]

### Visualize these Red Spirals

Now copy and paste these RA,Dec values into the <a href="http://skyserver.sdss.org/dr7/en/tools/chart/navi.asp">SDSS SkyServer finder chart</a>. You'll see an image of the galaxy of interest. <br>

Click on 'Quick Look' in the middle right of that page to get additional information about your galaxy.<br>

Do you see how they're red spirals?! Check out <a href="http://arxiv.org/abs/0910.4113">Dr. Karen Masters article on Red Spirals in Galaxy Zoo</a>.

# Congratulations! You've completed Part 4!

## Extension Activity:
#### Look for other trends in galaxy type, magnitude, and/or color. Do your results make sense?