# Galaxies and the large-scale structure of the Universe

# Introduction


In this Notebook, you will use real astronomy data to explore the relationship between galaxy properties and the large-scale structure of the Universe. 

In the end, you should have found an answer to the following questions:

* How are galaxies spatially distributed in the Universe?
* Are galaxies all the same colour?
* Are galaxies all the same shape?
* How are galaxies' colours and shapes related to their spacial distribution?


## SDSS and SciServer

You will answer the above questions yourself, by exploring the largest astronomical dataset in the world - the Sloan Digital Sky Survey (www.sdss.org). You will interact directly with the data by running Python commands inside this online notebook in the SciServer virtual computing environment. That means you can work with the largest astronomical dataset in the world using only your web browser.


If you're reading this, you have already followed the instructions to get an account on SciServer, and have uploaded this notebook. These exercise assume that you are familiar with basic Python, dataframe manipulation, and matplotlib commands.


## Part 1: a map of the Universe

In this section, you will get the positions of thousands of galaxies and plot them to make a map of the Universe.

### Import libraries and apply settings

Before we do anything else, we need to tell Python to import the libaries it will need.

In [None]:
# # Import Python libraries to work with SciServer
import SciServer.CasJobs as CasJobs # query with CasJobs
import SciServer.SkyServer as SkyServer   # show individual objects and generate thumbnail images through SkyServer
#print('SciServer libraries imported')

# # Import other libraries for use in this notebook.
import numpy as np                  # standard Python lib for math ops
# from scipy.misc import imsave       # save images as files
import pandas                       # data manipulation package
import matplotlib.pyplot as plt     # another graphing package
# import os                           # manage local files in your Compute containers
#print('Supporting libraries imported')

# #import astroML
# #from astroML.datasets import fetch_sdss_spectrum
# from astropy.io import ascii

# # Apply some special settings to the imported libraries
# # ensure columns get written completely in notebook
pandas.set_option('display.max_colwidth', None)
# # do *not* show python warnings 
# import warnings
#warnings.filterwarnings('ignore')
print('Libraries imported and settings applied!')

### Querying the SDSS database

The SDSS stores its data in an online database. You can communicate with the database by sending "queries" written in Structured Query Language (SQL). For each query command, the database returns an answer. Usually, the answer will be a subsample of the original database, though SQL can operate on the data very effectively too. 

In this tutorial, we will submit queries to the SDSS database to gather the information that we need, and then we will use python to operate on, manipulate, and vizualise that data.

An extensive tutorial on how to query the SDSS database is provided here: http://skyserver.sdss.org/dr16/en/help/howto/search/searchhowtohome.aspx 

In short, every SQL command consists of three blocks: 
- The **SELECT** block: it defines the quantities that you want your query to return.
- The **FROM** block: it defines which tables of the database you want SQL to look in.
- The **WHERE** block: it defines any constraints on the data that you want to impose.

To make your map of the Universe (Part 1 of this activity), you won't have to write SQL queries from scratch, only execute commands that are already written for you.

The code cell below searches the latest SDSS data release and returns information on a sample of galaxies.

**Click inside the code cell and click Run to see how many galaxies the query has found.**

In [None]:
query="""
SELECT p.objId,p.ra,p.dec,p.petror90_r, p.expAB_r,
    p.dered_u as u, p.dered_g as g, p.dered_r as r, p.dered_i as i, 
    s.z, s.plate, s.mjd, s.fiberid
FROM galaxy AS p
   JOIN SpecObj AS s ON s.bestobjid = p.objid
WHERE p.petror90_r > 10
  and p.ra between 100 and 250
  and s.z between 0.02 and 0.5
  and p.g < 17
"""

print('Submitting query...')
all_gals = CasJobs.executeQuery(query, "dr16")

print("SQL query returned " + str(len(all_gals))+ " galaxies")
#print("SQL query returned {0:,.0f} galaxies!".format(len(all_gals)))

The dataframe that is returned, which we named all_gals, holds the following quantities (in columns) for each galaxy:

- ra = Right Ascencion coordinate in degrees
- dec = Declination coordinate in degrees
- petror90_r = Radius enclosing 90% of the pertrosian flux in arcseconds. i.e., size of the galaxy in the sky.
- dered_u, dered_g, dered_r, dered_i, dered_z = Magnitudes in 5 optical filters, from the blue to the red, after subtracting the attenuation due to the Milky Way.
- z = Redshift of the galaxy
- plate = Plate number (SDSS used alluminium plates with drilled holes for positioning optical fibers).
- mjd = Date of the observation
- fiberid = Number of the fiber in a given plate. Plates have between 640 and 1000 fibers.

Let's have a look at the first 10 elements of the returned table (dataframe) by running the Code cell below:

In [None]:
all_gals[0:10]

## The large scale structure of the Universe 

Now that we have data for thousands of galaxies, let's make a scatterplot of the positions of galaxies. 

Remember to add labels and a title to your plot. Given the large number of points, you might want to use marker='.' and s='1'.

Run the Code cell below to make the scatterplot. 

In [None]:
#Possible solution
plt.figure(figsize=(13,10))
plt.scatter(all_gals['ra'], all_gals['dec'], marker='.', color='black',s=1)
plt.xlabel('RA', fontsize=15); plt.ylabel('Dec', fontsize=15)
plt.title('Galaxy positions', fontsize=15)
plt.ylim(-5,70)
plt.show()

### Exercise

What can you tell from the distribution of galaxies? Are they uniformly distributed on the sky? Enter the answer in the textbox below.

type your answer here!