# Our Vast Universe Probed with Big Data

## Authors
Written by Stephanie Juneau, NOAO (sjuneau@noao.edu), adapted for PHYS 500 class by B.W. Holwerda

## Learning Goals
* redshift from SDSS spectra
* template vs redshifted spectra
* the on-sky distribution of galaxies
* Redshift as the third dimension
* Large scale structure
* zoom in/out 
* all-sky projection

## Keywords
redshift, galaxy spectra, template spectra, absorption features, large scale structure, voids, walls, clusters

## Companion Content
Ryden & Peterson's "Foundations in Astrophysics" 

## Summary

1. compare spectra to identify redshift
2. compare different galaxy spectra to note differences in redshift
3. measure redshift for a mystery galaxy (from SLACS survey)
4. map out large scale structure on the sky
5. zoom/in out to identify structures.

<hr>


## Student Name and ID:



## Date:

<hr>

## BEGIN HERE: How to use this notebook

The webpage you are in is actually an app - much like you'd run on your cellphone. This app consists of cells. 

Each "input" cell (something with an "In" to the left) contains code - instructions to make the computer do something.

To activate or select a cell, you first need to click on it.

You <u>**execute a cell with Shift+Enter**</u> on the keyboard - this makes the computer execute your instructions. That's what this app does! 

You can <u>**modify the code by typing into the cell**</u> and then execute again the new code with Shift+Enter.

You can try it for yourself at https://try.jupyter.org/

## How Far Away are Galaxies?

In this activity, you will learn how astronomers measure distances to galaxies. You will get to compare galaxies to figure out which ones are closer or further away from us. You can then use this method for many more galaxies on your own as well!

In [8]:
# Ignore this stuff - it is to setup the plotting environment in your browser
# Just hit Shift + Enter here, and move on
%matplotlib notebook
%pylab
import matplotlib.path as mpath
from astroML.datasets import fetch_sdss_spectrum, fetch_vega_spectrum, fetch_sdss_S82standards
from astroML.plotting import MultiAxes
from IPython.core.display import Image, display

Using matplotlib backend: nbAgg
Populating the interactive namespace from numpy and matplotlib


As in the first notebook, we are going to use data from the <a href="http://sdss.org">**Sloan Digital Sky Survey (SDSS)**</a>. 
This project used a telescope at Apache Point in New Mexico to look at the northern sky.
<figure>
<center>
<img src="https://apod.nasa.gov/apod/image/9806/sloan_fermilab_big.jpg", width=300>
<figcaption>The Sloan Telescope at Apache Point, New Mexico.
<b>Image Credit:</b> SDSS Team, Fermilab Visual Media Services.</figcaption>
</center>
</figure>

The Sloan survey team found millions of stars and galaxies, and made their big data set public. In this activity, we will retrieve and examine galaxy data!

So how did Sloan take spectra of millions of stars and galaxies? The team used metal plates like the one shown below, with a hundreds of holes aligned with the stars and galaxies to be observed. An optical fiber is placed in each hole in order to transfer the light to the instrument and camera. As you will see below, the data are identified by their <b>Plate</b> number, their <b>Fiber</b> number, and the date when they were obtained - the <b>MJD</b> (<a href="https://en.wikipedia.org/wiki/Julian_day">Modified Julian Date</a>).
<figure>
<center>
<img src="http://www.nature.com/polopoly_fs/7.2192.1325671958!/image/Dark-Energy.jpg_gen/derivatives/fullsize/Dark-Energy.jpg", width=300>
<figcaption>Holes in aluminum plates let the light from stars and galaxies passed to an optical fiber to the instrument. <b>Image credit:</b> D. Long, SDSS-III </figcaption>
</center>
</figure>

<figure>
<center>
<img src="http://newscenter.lbl.gov/wp-content/uploads/sites/2/2008/09/schlegel.jpg", width=300>
<figcaption>David Schlegel, Principal Investigator of the BOSS survey (follow-up to SDSS), holding one fiber plug plate.
</center>
</figure>

There were thousands of plates used (~2500 for SDSS), each with 640 fibers, which together gives 1.6 million spectra (including galaxies, stars, and extra spectra on blank sky).


## Exercise 1.1: Plot a Reference Spectrum

A reference spectrum means that it is at redshift zero (not moving toward or away from us). In this case, the reference spectrum is that of a single star. We are especially interested in the bluest part of this spectrum (3800-7000 Angstrom).

In [9]:
# Fetch single spectrum - Enter the same "Plate", "MJD" and "Fiber" numbers here
# Then hit Shift+Enter
plate = 396
mjd = 51816
fiber = 605
spec = fetch_sdss_spectrum(plate, mjd, fiber)

wavelength = spec.wavelength()
starspectrum = spec.spectrum/spec.spectrum.max()

import matplotlib.pyplot as plt

# now, we can plot the reference spectrum (at redshift=0)
# student work




## Exercise 1.2: Plot a distant galaxy spectrum

Here, you will plot the spectrum of a galaxy. Notice if there are similarities and differences in its shape and lines relative to the reference spectrum.

In [10]:
# Fetch the first galaxy spectrum
# Then hit Shift+Enter
plate = 2434
mjd = 53826
fiber = 359
galspec = fetch_sdss_spectrum(plate, mjd, fiber)

galwavelength = galspec.wavelength()
galaxyspectrum = galspec.spectrum/galspec.spectrum.max()

# student work




## <font color='purple'>Questions</font>
<ul>
<font color='purple'><li>Do you notice differences between the shapes two spectra? <br>
<li>Do you notice similar patterns in the line features (dips)?</font></ul>

*your answers here*


## Exercise 1.3: Measure Redshifts

The next step here is to overlay a reference spectrum (called a template) onto the galaxy spectra from above.
Reminder: a galaxy is a collection of billion of stars, so the shape of the spectrum is not identical to the reference spectrum of a single star. But because the stars have the same elements, notice similar "dips" (absorption lines) in the spectra. 

$$ z = {\lambda_{observed} - \lambda_{rest} \over \lambda_{rest} } $$

or 

$${ \lambda_{observed} \over \lambda_{rest} } = (z+1) $$

Redshift the reference spectrum by multiplying the wavelength by (1+z). Blueshift it by multiplying it by (1-z). First we set z==0

In [11]:
# redshift value (0 for a star, and upward for distant galaxies e.g.: 
# z = 0.01, 0.02, 0.05, ... 0.1, 0.2, ... 1.0)
# First run this cell with zero redshift, and then adjust the value.
z1 = 0.00


## <font color='purple'>Question:</font>
<ul><font color='purple'><li>Do you notice how the galaxy spectrum is shifted with respect to the reference spectrum?</font><br> </ul>
This is what we saw as the "redshift" due to the expansion of the universe, which causes galaxies to appear to recede away from us.

Change the redshift of the galaxy until you have have a match between star and galaxy absorption features. What is the redshift?

*your answer here*

## Exercise 1.3 MYSTERY Galaxy
Each group will receive the information to fetch a different spectrum of a mystery galaxy. This information will be on a piece of paper and contains PLATE, MJD, FIBER and NORM numbers. Enter these numbers in the cell below to measure the redshift (like you did above) for a new galaxy.

In [12]:
# Now we do it again for a NEW galaxy!

# Replace the plate, mjd, fiber and norm with the numbers you received
# Then hit Shift+Enter

# ASSIGN EACH STUDENT A GALAXY.

plate =655
mjd = 52162
fiber =392

spec2 = fetch_sdss_spectrum(plate, mjd, fiber)

gal2wavelength = spec2.wavelength() 
gal2spectrum = spec2.spectrum/spec2.spectrum.max()

# redshift value (0 for a star, and upward for distant galaxies e.g.: 
# z=0.01, 0.02, 0.05, ... 0.1, 0.2, 0.3, ... 1.0)
# First run this cell with zero redshift, and then adjust the value.
z2 = 0.115

#student work here

## <font color='purple'>Questions:</font>
<ul>
<font color='purple'><li>Which galaxy is closer to us?<br> 
<li>Further away from us?</font><br></ul>

Now, let's check the redshift and learn more information about those two galaxies. You need to COPY and PASTE the following: http://cas.sdss.org/dr14/en/tools/explore/Summary.aspx in a new browser. **Clicking on the link does not work.**  
Click on "Search" on the left hand side menu bar, and then enter the "Plate", "Fiber" and "MJD" for one galaxy at a time, and hit "Go".
If you click on the image, you can move around, zoom in and out - it's like Google Maps for the night sky!

In [13]:
*your answer here*

SyntaxError: invalid syntax (<ipython-input-13-63e4678cdb77>, line 1)

# <u>Activity 2: Look at the Position of Many Galaxies</u>

Similarly to using coordinates of latitude and longitude, the coordinates on the sky are defined onto a sphere. They are called RA (for Right Ascension) and Dec (for Declination). There are two illustrations below of these coordinate systems.

<figure>
<center>
<img src="http://voyages.sdss.org/wp-content/uploads/2015/10/pre-flight-celestial-sphere.jpg", width=300>
<figcaption>Illustration of the celestial coordinate system with RA and Dec. You can read <a href=http://dev.skyserver.sdss3.org/voyages/pre-flight/ra-and-dec.aspx>here</a> for an explanation by the SDSS team. </figcaption>
</center>
</figure>

<figure>
<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/9/98/Ra_and_dec_on_celestial_sphere.png", width=300>
<figcaption>Illustration of the celestial coordinate system with RA and Dec. You can read on Wikipedia about <a href=https://en.wikipedia.org/wiki/Right_ascension>Right Ascension</a> and <a href=https://en.wikipedia.org/wiki/Declination>Declination</a>. </figcaption>
</center>
</figure>


## Exercise 2.1: Selecting Galaxies in a Region of the Sky

Next, we will fetch the positions of galaxies on the sky, and plot their RA and Dec coordinates.


In [14]:
#Code that we will need to fetch galaxies' coordinates
from astroML.datasets import fetch_sdss_specgals
import matplotlib.cm as cm

#For 3D Plotting:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

Now that the packages are loaded, run the cells below to actually fetch the galaxy sample and plot their positions on the sky.

Let us focus on a window of the sky RA=0-10 and DEC=10-18 *use xlim and ylim*

In [15]:
# Fetch the sample from the Sloan data
data = fetch_sdss_specgals()
print('Done retrieving the galaxy sample')
# Making this file local for sure...

# Define the coordinate variables for plotting
RA = data['ra']
DEC = data['dec']

print(' ')
print('Range for RA values')
print('  ',np.amin(RA),np.amax(RA))
print('Range for DEC values')
print('  ',np.amin(DEC),np.amax(DEC))
print(' ')

# convert RA range to [-180,+180] instead of [0,360]
RA -= 180

print('Range for RA values after conversion')
print ('  ',np.amin(RA),np.amax(RA))

#plot the RA/DEC positions


# student work here


Done retrieving the galaxy sample
 
Range for RA values
   0.000670623209317 359.997375583
Range for DEC values
   -11.2528270389 70.2873996284
 
Range for RA values after conversion
   -179.999329377 179.997375583


## Exercise 2.2 Adding the 3rd Dimension

We saw before that in order to know the full distribution in 3D, we need to know how far away the galaxies are located. Here, we will add the information from the redshift. Remember: the larger the redshift, the further away the galaxy!

First, we will plot all galaxies in red, and show galaxies that have approximately the same redshift in black. Select for redshifts between z=0.10 and z=0.12. (*use np.where*) 

Let us focus on a window of the sky RA=0-10 and DEC=10-18 *use xlim and ylim*

In [16]:
#Fetch the sample from the Sloan data
data = fetch_sdss_specgals()
print('Done retrieving the galaxy sample')

#define the variables for plotting
RA = data['ra']
DEC = data['dec']

# convert RA range to [-180,+180] instead of [0,360]
RA -= 180

#define redshift variable z
z = data['z']
print(z)

#pick a redshift range to highlight in a different color
# USE np.where
rz = np.where(np.absolute(z-0.08)<0.01)
print(rz)


#plot the RA/DEC positions

# student work here


Done retrieving the galaxy sample
[ 0.02122228  0.20378332  0.06465632 ...,  0.03541545  0.26692075
  0.25441918]
(array([     9,     10,     17, ..., 661584, 661587, 661592]),)


Now, instead of showing just one interval of redshift in black, we will show the redshift of each galaxy color-coded. Each galaxy is shown with a dot, and each dot will have a color corresponding to the redshift: purple/blue colors mean a low redshift like between 0-0.05, then green/yellow mean slightly higher redshift like 0.1, and so on until the higher redshift shown here of 0.2 in red. Remember that this means that points with exactly the same color are at the same distance from us!

For this we use a neat function in matplotlib called plt.scatter() instead of plt.plot()

Suppose I have values for X,Y and Z and I want to plot X against Y and use Z as the color parameter:

plt.scatter(X, Y,s=4,c=Z, lw=0,cmap=plt.cm.rainbow,vmin=0, vmax=0.2)

and a second command which shows the color bar:

plt.colorbar() at the end of the plotting.

In [17]:
#Fetch the sample from the Sloan data
data = fetch_sdss_specgals()
print('Done retrieving the galaxy sample')

#define the variables for plotting
RA = data['ra']
DEC = data['dec']

# convert RA range to [-180,+180] instead of [0,360]
RA -= 180

z = data['z']



Done retrieving the galaxy sample


The color bar to the right-hand side shows the correspondence between color and redshift. As mentioned before, points with exactly the same color are at the same distance from us. Purple points are the closest to us, then blue, aqua, green and so on. Think about which galaxies/colors are near and which galaxies/colors are far. 

## <font color='purple'>Questions:</font>
<ul>
<font color='purple'><li>Can you use this information to imagine the distribution of galaxies in 3D?<br> 
<li>Do you notice any structure together at the same distance from us?</font><br></ul>

The black rectangle in the figure shows where we will zoom in during the next exercise below.

*your answer here*

## Exercise 2.3 Zooming In and Zooming Out

Now, we will repeat the plots from the exercise above, but with a zoom on a smaller region ("zooming in"), and then over a larger region ("zooming out"). First we limit ourselve to RA = 4 -- 6 and DEC = 11 -- 13. Keep using scatter to mark the redshift of galaxies.

In [18]:
# ZOOMING IN

#Fetch the sample from the Sloan data
data = fetch_sdss_specgals()
print('Done retrieving the galaxy sample')

#define the variables for plotting
RA = data['ra']
DEC = data['dec']

# convert RA range to [-180,+180] instead of [0,360]
RA -= 180
z = data['z']

#plot the RA/DEC positions

# student work here


Done retrieving the galaxy sample


## <font color='purple'>Questions:</font>
<ul><font color='purple'><li>What do see? <br>
<li>Any interesting galaxy structures? <br>
<li>What galaxy structures are closer/further from you? <br>
</font></ul>

<br>


# Now, let's step back and plot galaxies over a large region of the sky!

RA= -15 - 15, DEC = 0 - 30


In [19]:
# ZOOMING OUT

#Fetch the sample from the Sloan data
data = fetch_sdss_specgals()
print('Done retrieving the galaxy sample')

#define the variables for plotting
RA = data['ra']
DEC = data['dec']

# convert RA range to [-180,+180] instead of [0,360]
RA -= 180
z = data['z']



#plot the RA/DEC positions
# student work here
s=1.0   #symbol size  (better make it small...)


Done retrieving the galaxy sample


The color bar to the right-hand side shows the correspondence between color and redshift.  You can compare the size of the two regions directly. Compare numbers using len and np.where.

## <font color='purple'>Questions:</font>
<ul>
<font color='purple'><li>How many times more galaxies are in the large (zoomed out) view relative to the small (zoomed in) view?<br>
<li>How many times can you fit the small region within the large region? (Hint: compute the size from the axes) <br>
<li>Are those two numbers above the same? What does it mean? <br>
<li>What do you see now on the zoomed out view?<br>
<li>Are those structures smaller or larger?</font></ul>

*your answer here*

## Bonus: Plot Full Sample over Sky Projection

Below, we will again plot the positions of galaxies, and include the information on redshift as the color (but with a different color scheme).

The difference with the steps above is that we will now plot the sample of galaxies over the full sky. The SDSS survey does not cover the full sky, so we will see what we call the "footprint" of the survey. This means the regions of the sky where the telescope was pointed to gather images and spectra.

In [20]:
#------------------------------------------------------------
# plot the RA/DEC in an area-preserving projection

#Actually fetch the sample from the Sloan data
data = fetch_sdss_specgals()
print 'Done retrieving the galaxy sample'

# Define coordinate variables
RA = data['ra']
DEC = data['dec']

# convert coordinates to degrees
RA -= 180
RA *= np.pi / 180
DEC *= np.pi / 180

# keep galaxies in a selected area
#rkeep = np.where(RA between [-30,0] and DEC between [15,30])

figure()
ax = plt.axes(projection='mollweide')

ax = plt.axes()
ax.grid()
plt.scatter(RA, DEC, s=1, lw=0, c=data['z'], cmap=plt.cm.rainbow,
            vmin=0, vmax=0.2)

plt.title('SDSS DR8 Spectroscopic Galaxies')
cb = plt.colorbar(cax=plt.axes([0.05, 0.1, 0.9, 0.05]),
                  orientation='horizontal',
                  ticks=np.linspace(0, 0.2, 9))
cb.set_label('redshift')

SyntaxError: Missing parentheses in call to 'print' (<ipython-input-20-b72ab1b558f8>, line 6)

HINT: for more color maps, you can look at this reference <a href=http://matplotlib.org/examples/color/colormaps_reference.html>page</a>. For example, you can replace "rainbow" with "autumn_r".