# Example Analysis with Zooniverse Milky Way Project Data

[![alt text][2]][1]

  [1]: http://apod.nasa.gov/apod/ap150131.html
  [2]: http://apod.nasa.gov/apod/image/1501/sig15-02spitzerW33.jpg (Milky Way Project Bubbles)

# Step 1. Play with Plots
## Goal: Learn how to make an x, y scatter plot using matplotlib.pyplot, and fit a curve to the data

### Part A. Plot your Data

#### Let's first look at some of the code you are going to be working with to make a plot.

In [None]:
"""First let's import the libraries that will provide you with some useful functions.
We'll start by importing the matplotlib, numpy, and curve_fit libraries"""
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

#Run the cell (shortcut=shift+enter) to make sure you've imported all of the libraries

## Here's the sample data you will be using: 
* x-values = 1,2,3,4,5,6,7,8
* y-values = 2,5,6,9,15,16,20,22
* error value = 0.2

### Important terms to know:
* A VARIABLE stores a piece of data and gives it a name. For example: my_variable = 10. The piece of data is "10" and the name assigned to the data is "my_variable"
* A STRING stores characters and is enclosed by quotation marks. For example: "Hello World"
* A LIST holds an ordered collection of values. For example: my_list = [1, 2, 3, 4]

In [None]:
"""Now we are going to create some variables to hold our data"""
# Create a list that contains your x-values
# Hint: the values should be seperated by commas and enclosed by brackets
x =

# Create a list that contains your y-values
y =

# Create a variable called "errorVal" that contains the error value we provided

# Now let's graph the data!
# Plot the data as red circles (that's what 'ro' refers to) with vertical errorbars
# Replace the dashes with your variables, x and y, respectively
plt.errorbar(---, ---, yerr = errorVal, fmt='ro', markersize=3)

# Set the axes range to be from 0 to 10 for the x-axis and 0 to 25 for the y-axis
# Replace x1 with 0, x2 with 10, y1 with 0, and y2 with 25
plt.axis([x1, x2, y1, y2])

# Label the plot with your own titles, make sure they are strings (enclosed in quotation marks)
plt.title("This is an example of a string")
plt.xlabel()
plt.ylabel()

# Hit Run (shortcut=shift+enter) and let's see what happens!

### If everything went smoothly there should be a scatterplot above populated with your sample data! 

### Part B. Now we are going to create a line of best-fit for this data.

First you'll set up the function (i.e., the model) to fit to your data. The choice of function takes some initial guessing on your part. Is it linear, exponential, sinusoidal, etc.? 

If you think the plot you made above looks like it could be fit with a straight line, you're right! Here it's pretty clear we should fit the data with a linear function (y = ax + b)

In [None]:
# Define the function (i.e., the model) you'll fit to your data
# In this case it's a linear fit, so we'll use y = a x + b
def fitFunc(x, a, b):
    return a*x + b

# In the step above you defined fitFunc to be y = a x + b

In [None]:
"""Curve_fit is a function that helps you find the best values for 'a' and 'b' 
that make the best match between your data (x & y) and the fitFunc model."""
fitCoeffs, fitCovariances = curve_fit(fitFunc, x, y)

# Pick out the best-fit for the 'a' value and best-fit for the 'b' value
bestfit_a = fitCoeffs[0]
bestfit_b = fitCoeffs[1]

print 'best-fit value for \'a\': ', bestfit_a
print 'best-fit value for \'b\': ', bestfit_b

# Plot your data as red circles with error bars
plt.errorbar(x, y, yerr = errorVal, fmt='ro', markersize=3)
plt.axis([0, 10, 0, 25])
plt.title('X-values versus Y-values')
plt.xlabel('X')
plt.ylabel('Y')

# Define the best-fit line x-values
bestfit_x = np.linspace(0,14,50) #an array from 0-13, with 50 linearly spaced points

# Define the y-values for the best-fit line, using the fitFunc function you defined above
bestfit_y = fitFunc(bestfit_x, bestfit_a, bestfit_b)

# Overplot the best-fit line in blue (default color)
plt.plot(bestfit_x, bestfit_y)

#Step 2: Using data from Bubbles in the Milky Way

<img src="http://www.spitzer.caltech.edu/uploaded_files/images/0008/5977/sig12-002_Med.jpg" width = "800x">

Today you will be looking at “bubbles” in the Milky Way that have been classified thanks to citizen scientists as part of the Zooniverse project. 

What are these “bubbles”? 

“They are regions around young massive stars that are so bright that their light has caused a shock wave to affect the cloud around them and blown a bubble which we can see in infrared light. The dark interior is where the shock has already passed by and the bright red/pink ring around it is where the shock is currently impacting the gas cloud. Most of the circular features (bubbles) in these images are produced by hot young stars, as winds and radiation from these young stars sweep up the surrounding gas and dust from which they formed (like a snowplow that compresses the snow in its path.) Sometimes, the swept-up material becomes dense enough for gravity to pull it together to form new stars.” -- Zooniverse Milky Way

So what are you going to do?....

Let's visualize where the <a href="https://www.milkywayproject.org/">Zooniverse MilkyWay Project</a> Bubbles are located in our <a href="http://i.space.com/images/i/000/001/163/i02/050816_milky_way_02.jpg?1292263533">Milky Way Galaxy</a>. 

In your data folder is <a href="data/MWbubbles.fits">MWbubbles.fits</a>. This file has the classification results for all the large bubbles discovered through the MilkyWay Project. 

For each of the 3744 bubbles, MWbubbles.fits provides the Galactic longitude and latitude (see figure below), radius, thickness, eccentricity, position angle, hit rate, dispersion on position, and hierarchy flag. 

We'll first focus on visualizing the Bubbles' Galactic latitude and longitude. If you choose the MilkyWay project for your research project, you'll learn all about the other values listed above. 

#### What is Galactic Latitude and Longitude?
* On the left is an image showing Earth-based longitude and latitude. 
* On the right is Galactic longitude and latitude. Same idea, but for our whole Galaxy. 

<img style="float: left" src="http://upload.wikimedia.org/wikipedia/commons/6/62/Latitude_and_Longitude_of_the_Earth.svg" width = 440> <img style="float: right" src="http://burro.case.edu/Academics/Astr306/Coords/galactic.jpg" width = 440>

### Part A. Read in your Data

In [None]:
# Import needed astropy library to read the fits file
import astropy.io.fits as fits

# Read in *.fits data file
Bubbles = fits.open('data/MWbubbles.fits')

# Assign Bubbles_data to contain all the data in this table
Bubbles_data = Bubbles[1].data

# Print the names of each column in this table
print(Bubbles[1].columns.names)

### Here's a brief description of what each column name refers to:

* MWP = Milky Way Project Catalog ID
* ONames = Other names given each object in the catalog
* GLON = Galactic Longitude
* GLAT = Galactic Latitude
* iXdiam = Inner X Diameter
* iYdiam = Inner Y Diameter
* oXdiam = Outer X Diamter
* Reff = Effective Radius
* Thick = Effective Thickness
* Ecc = Eccentricity
* PA = Ellipse Position Angle
* Hit = Hit Rate (indicates the fraction of people who identified this bubble)
* Disp = Dispersion on the position
* Flag = Hierarchy flags (indicates whether the bubble is associated with other bubbles)

<a href="https://vault.it.northwestern.edu/let412/Adler/PythonZoo/articles/MWproject_Simpson_2012_2442.full.pdf">Science research article with additional information on each of these.</a>


### To do: In the cell below, change the column name to a different one to see what type of data is in that column. 

In [None]:
# Print the first three and last three of the values in this particular column of the data table
print Bubbles_data['GLON']

# The reason 'print' doesn't print all 3744 values to your screen is because that would be annoying.
# Instead 'print' just gives a snapshot (through the first 3 and last 3 values)

### Part B. Now let's make a scatter plot of the Bubble's longitude vs latitude using what you learned above

In [None]:
# Fill in the dashes to create arrays for longitude and latitude
Longitude = Bubbles_data[---]
Latitude = Bubbles_data[---]
errorVal = 0.001

# Plot the data 
plt.errorbar(Longitude, Latitude, yerr = errorVal, fmt='ro', markersize=3)

# Replace x1,x2,y1,y2 to set the axes range to be from 0 to 400 for the x-axis 
# and -2 to 2 for the y-axis
plt.axis([x1, x2, y1, y2])

# Label the plot
plt.title("")
plt.xlabel("")
plt.ylabel("")

## What?! Yes, your plot is missing everything in the middle.<br> Why does the Longitude vs. Latitude Plot Look so Funny? 

### The image below is the area the MilkyWay Project looked at in the disk of our galaxy. 
* Do you notice how Longitude = 0 is at the Galactic Center? 
* Do you notice how the numbers increase to the left (0->60) and decrease to the right (360->300)? 

<img src="http://faculty.wcas.northwestern.edu/aaron-geller/myimages/ssc2014-02a1_ExLrg-v2.jpg" width=1000>

### The image below is an artist's representation of the same thing. 
* Looking from our Sun straight towards the Galactic Center means looking at Longitude = 0. 
* Looking to the left means looking from 0->60 degrees and looking to the right means looking from 360->300 degrees.

<img src="http://faculty.wcas.northwestern.edu/aaron-geller/myimages/050816_milky_way_02.jpg" width = "400x" >

### Part C. Let's fix our Visualization of the Bubbles' Galactic Latitude and Longitude

To fix your plot of Latitude versus Longitude, let's flip all the Longitude > 180 values so our longitude values decrease smoothly from left to right on our x-axis. <br>

We're going to use the 'where' function. This is a function that will definitely come in handy in your projects.

In [None]:
#Create a new array that's the same as the original Longitude array
import copy   #import useful library to make a copy of an array
Longitude_new = copy.copy(Longitude)

#Use the 'where' function to identify all the Longitude values > 180
#The 'where' function gives the indices of values that satisfy a given condition
#In this case, we get the indices for all Longitude values greater than 180
toShift = np.where(Longitude > 180)[0]

#Subtract 360 from all chosen "toShift" Longitude values
Longitude_shifted = Longitude[toShift]-360

#Replace the old "toShift" Longitude values with these new Longitude values
Longitude_new[toShift] = Longitude_shifted

Now let's plot these new Bubble Longitude vs Latitude values.

In [None]:
#Plot Longitude_new versus Latitude
plt.plot(Longitude_new, Latitude,'ro',markersize=3)

#Switch the order of the X-axis range, so it goes from 80 to -80.
#Keep the y-axis range as -2 to 2.
plt.axis([--, --, --, --])

#Label plot
plt.title('Bubbles Latitude and Longitude')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

### Part D. Now let's fit a line to these data using what you learned from 'play with plots'

In [None]:
# Define the function (i.e., the model) you'll fit to your data
# Here we use a linear fit again (but you have to make the function!)
#Fill in the dashes with the correct input, based on what you learned from 'play with plots'
def fitFunc -- :
    return --

-- = curve_fit(---, Longitude_new, Latitude)
print ' fit coefficients:\n', --

# Plot your data
#Fill in the dashes with the correct input
plt.errorbar(Longitude_new, ---, yerr = ---, fmt='ro', --- =3)
plt.axis([---,---,---,---])
plt.title('Bubbles in the Milky Way')
plt.xlabel('---')
plt.ylabel('Latitude')

# Define the x-values for the best-fit line
bestfit_x = np.linspace(-100,100,200) #an array from -100 to 100, with 200 linearly spaced points

# Define the y-values for the best-fit line, using the fitFunc function you defined above
#Fill in the dashes with the correct input, based on what you learned from 'play with plots'
bestfit_y = fitFunc(---, --, --)

# Overplot the best-fit line in blue (default color)
plt.plot(bestfit_x, bestfit_y)

 ### Question: Does your result make sense?
 * Why might the best-fit go through Latitude = 0?
 * What does Latitude = 0 coincide with in our galaxy?

## Step 3: Find the Biggest Bubbles in the MilkyWay Project

In [None]:
#Identify the array with the radius for all the bubbles
#Note: Reff = effective radius for the MW Project Bubbles
radius = Bubbles_data['Reff']
print radius

### To do: Find the maximum radius value

Use the max command to find the maximum value of the 'radius' array. 

Remember, first insert a new cell below, type in your code, and then 'run the cell' by typing 'shift-enter'.

In [None]:
print max(radius)

### Identify the longitude/latitude values for the largest bubbles in the catalog

In [None]:
#Use the 'where' function to identify which bubbles have radii greater than 10.
#The 'where' function gives the Latitude/Longitude values that satisfy this condiion
bigLon = Longitude[np.where(radius > 10)[0]]
bigLat = Latitude[np.where(radius > 10)[0]]

#The 'len' command is short for 'length', telling you how many sources are in the big_lon array
print '# of bubbles with radius > 10: ',len(bigLon)

#Print the Latitude/Longitude values for the bubbles of interest
#Use a for loop to print the values as a pair for each bubble
for i,j in enumerate(bigLon):
    print 'Longitude, Latitude: ',bigLon[i],',',bigLat[i]

# Congratulations! You've completed Part 2!

## Extension Activities:
* Look for trends in the number of bubbles for a given latitude. Does your result make sense?<br>
* Look for trends in bubble radius, thickness, longitude, and/or latitude. Do your results make sense?