# Gaia Data for M67

*[Gaia](http://sci.esa.int/gaia/) is an [ESA](http://www.esa.int/ESA) satellite aiming to chart a three-dimensional map of our Galaxy, the Milky Way.  On 25 April 2018, Gaia had a data release, [DR2](https://www.cosmos.esa.int/web/gaia/dr2), that contains positions and velocities for over a billion stars.  We will use a subset of these data here, for a specific [open star cluster](https://en.wikipedia.org/wiki/Open_cluster) [M67](https://en.wikipedia.org/wiki/Messier_67)*

*This workshop builds off of the tutorial here: http://gea.esac.esa.int/archive-help/tutorials/python_cluster/index.html *


*Author: Aaron Geller* <br/> *June 2018*


*First, we import all the required python modules:*

In [None]:
import astropy.units as u
from astropy.coordinates.sky_coordinate import SkyCoord
from astropy.units import Quantity
from astroquery.gaia import Gaia
import numpy as np
from functools import reduce
from scipy import stats
from astropy.modeling import models, fitting

In [None]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import gridspec
import matplotlib.cm as cm

In [None]:
# Suppress warnings. Comment this out if you wish to see the warning messages
import warnings
warnings.filterwarnings('ignore')

*Do the following to load and look at the available Gaia table names:*

*Note: The main table is gaiadr2.gaia_source, and [here](http://gea.esac.esa.int/archive/documentation/GDR2/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html) is the description of the columns*

In [None]:
tables = Gaia.load_tables(only_names=True)
for table in (tables):
    print (table.get_qualified_name())

In [None]:
gaiadr2_table = Gaia.load_table('gaiadr2.gaia_source')
for column in (gaiadr2_table.get_columns()):
    print(column.get_name())

*Next, we retrieve all the available data in the region of interest. *

*To do this we perform an asynchronous query (asynchronous rather than synchronous queries should be performed when retrieving more than 2000 rows) centred on the M67 (coordinates: RA = 132.825 deg, +11.8167) with a search radius of 1 degrees*

*We'll also require that the proper motion errors are small, and the data is well behaved*

*Note: The query to the archive is with ADQL (Astronomical Data Query Language). For a description of ADQL and more examples see the Gaia DR1 ADQL cookbook: https://gaia.ac.uk/data/gaia-data-release-1/adql-cookbook *

In [None]:
cmd = "SELECT * FROM gaiadr2.gaia_source \
    WHERE CONTAINS(POINT('ICRS',gaiadr2.gaia_source.ra, gaiadr2.gaia_source.dec),\
    CIRCLE('ICRS', 132.825, 11.8167, 1))=1\
    AND abs(pmra_error)<5 \
    AND abs(pmdec_error)<5 \
    AND pmra IS NOT NULL AND abs(pmra)>0 \
    AND pmdec IS NOT NULL AND abs(pmdec)>0;"

job = Gaia.launch_job_async(cmd, dump_to_file=False) #could save this to a file

print (job)

*Inspect the output table and number of rows (Note: if we didn't supress the warnings, there would be a lot of them here).*

In [None]:
r = job.get_results()
print(len(r))
print(r)

*Plot the color-magnitude diagram (CMD), using the "BP" and "RP" magnitudes*

## Identify the cluster members

### The first thing we might want to look at is the radial velocities.  

*The [radial velocity](https://en.wikipedia.org/wiki/Radial_velocity) is the speed at which an object is moving toward or away from us.  For a star cluster, all the stars will move with a similar radial-velocity, while the field stars will have a much broader distribution*


*Plot a histogram of the radial velocities (the key from the catalog is "radial_velocity") from our M67 Gaia catalog.*

*$\texttt{astropy}$ has some really great fitting features.  See [this documention about modeling](http://docs.astropy.org/en/stable/modeling/).  For the radial velocities, we want to fit two [1D Gaussians](http://docs.astropy.org/en/stable/api/astropy.modeling.functional_models.Gaussian1D.html) to the data, using $\texttt{astropy}$.  The first Gaussian is for the cluster (that narrow, peaked distribution).  The second is for the field.*

*We will do this below.  You will probably want to supply initial guesses for the parameters.  When you have the fit, plot the fit on top of the radial-velocity histogram.*


In [None]:
#fit
p_init = models.Gaussian1D(_, _, _) + models.Gaussian1D(_, _, _)
fit_p = fitting.LevMarLSQFitter()
rvG1D = fit_p(p_init, brv[:-1], hrv)
print(rvG1D)

#plot


*Now we can calculate formal membership probabilities with the following formula*

$$
P\left(v\right) = \frac{F_\mathrm{cluster}\left(v\right)}{F_\mathrm{cluster}\left(v\right) + F_\mathrm{field}\left(v\right)}
$$

*Use this formula below.  Then plot a histogram of the $P\left(v\right)$ distribution. Stars with $P\left(v\right) \sim 1$ are high-probability members, while those with $P\left(v\right) \sim 0$ are non-members.  You will want to pick some cutoff to select the members. Then use $\texttt{numpy.where}$ to get the indices of these radial-velocity members.  You may also want to plot a CMD to look at them.*

In [None]:
#membership calculation

#plot

#where statement


### Now let's check the parallaxes.

*[Parallax](https://en.wikipedia.org/wiki/Parallax) is a displacement of an object, with respect to background objects, when viewed from different positions.  Parallax can be used to measure the distance to an object.  Once again, $\texttt{astropy}$ has a great utility for this.*

*Plot a histogram of the distances.*

In [None]:
dist = (r['parallax']).to(u.parsec, equivalencies=u.parallax())

#plot

*Now we want to fit the data again, so that we can derive cluster memberships based on distance.  Formally, there is no reason to think the cluster should be a Guassian distribution. (It should be fit with a "[King model](http://adsabs.harvard.edu/abs/1962AJ.....67..471K)".)  But let's approximate this by a 1D Gaussian.  Then we can fit the rest of the field with a simple polynomial.*

*Perform this fit to the distance, using $\texttt{astropy}$.  I suggest using a polynomail of degree 6. Plot the fit on top of the histogram of distances.*

In [None]:
#fit

#plot


*Do another membership calculation, using the same formula written above.  Plot a histogram of your membership probabilities.  Pick some cutoff to define the cluster members, and create another $\texttt{numpy.where}$ statement to hold the members you find from parallax.*

In [None]:
#membership calculation

#plot


#where statement



### Now let's look at the proper motions.

*The [proper motion](https://en.wikipedia.org/wiki/Proper_motion) is the velocity that an objects moves along the plane of the sky (perpedicular to the radial velocity).  Proper motions are usually measured in mas / yr, which can be converted to km/s if you know the distance.*

*Let's start by plotting the proper motion in RA ("pmra") versus proper motion in DEC ("pmdec"). *

*Usually we like to show $\mathrm{RA} \cos(\mathrm{Dec})$, because these are coordinates on a sphere.*

*Plot this as a 2D histogram (aka heatmap).  I recommend using $\texttt{pyplot.hist2d}$.  Also make plots of the histograms of each of the proper motion directions. (Hint: you may want to define a method to do this, because you'll need to replot this all again below when you perform the fit.)*

*For the proper motions, we will fit two [2D Gaussians](http://docs.astropy.org/en/stable/api/astropy.modeling.functional_models.Gaussian2D.html) to the data, using $\texttt{astropy}$*

*Perform this fit and plot it on top of your 2D histogram, as well as your 1D histograms.*

In [None]:
#fit


#plots


*Calculate formal membership probabilities, using the same formula as above.  Plot a histogram of these proper-motion membership probabilities. Then write another $\texttt{numpy.where}$ statement to identify the indices of the proper-motion members.*


In [None]:
#membership probabilities

#plot


#where statement


### Now, get a final list of members and plot the CMD

*Let's combine all of these different membership lists to get the intersection -- the stars that are members by radial velocity, parallax, and proper motion.  We can do this with $\texttt{numpy.intersect1d}$ and $\texttt{reduce}$; see [here](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.intersect1d.html).*

*After you create this member list, plot the CMD, showing* 
* *All the stars in the catalog,*
* *The stars identified as proper-motion members*
* *The stars identified as radial-velocity members*
* *The stars identified as members from parallax*
* *The final list of members*

*I suggest that you make more than one plot, so that you can see the different samples.*

*Also, note that, for M67, the radial velocities are only available for the brightest stars.  If a star does not have a radial-velocity it can still be considered a member from the other methods, and, if so, should be included in the final member list.*


In [None]:
#intersection for all members
members = reduce(np.intersect1d, (_, _, _))

#the plots


## Pull this together to create a Class

*You can find information about Python Classes [here](https://docs.python.org/3/tutorial/classes.html).  In general, a Class allows you to bundle a lot of functionality together to clean up your code. *

*Let's create a class that will take a star cluster's RA and DEC, and return all the members. I will set up the outline, and you should copy your code from above into the appropriate spots.*

*This should work for M67, but we'll see if it is general enough to work with any random star cluster.  Creating a code that has more general functionality is a goal of good coding practice.*

In [None]:
class GaiaClusterMembers(object):
    '''
    This Class will grab data from the Gaia archive, and attempt to determine members using the 
    proper motions, radial velocities and parallaxes.  
    
    The user must provide the RA and Dec values, and the Class will return the full catalog and 
    the indices of the members.
    
    '''
    
    def __init__(self, *args,**kwargs):
    
        #required inputs
        self.RA = None
        self.Dec = None

        #outputs
        self.catalog = None
        self.members = None
        
        #feel free to include more values in here.  These are like global variables that will 
        #available to any method you write below.
        
    def getGaiaData(self):
        #this should execute the query to retreive the data
    
    def getRVMembers(self):
        #this should calculate the radial-velocity memberships and identify those members

    def getParallaxMembers(self):
        #this should calculate the memberships based on parallax and identify those members
         
    def getPMMembers(self):
        #this should calculate the proper-motion memberships and identify those members
     
    def plotCMD(self):
        #maybe you want a method to plot the CMD, with the members
        
    def runAll(self):
        #this can run everything 
        self.getGaiaData()
        self.getRVMembers()
        self.getParallaxMembers()
        self.getPMMembers()
        self.plotCMD()


*Test this with M67*

In [None]:
#M67


*Try using this Class for the open cluster [Pleiades](https://en.wikipedia.org/wiki/Pleiades).*

In [None]:
#The Pleiades


*Try using this Class for the open cluster <a href="https://en.wikipedia.org/wiki/Hyades_(star_cluster)">Hyades</a>.  This cluster is close, but much more spread out on the sky.  If your code doesn't work, see if you can make it work.*

In [None]:
#the Hyades


*Try using this Class for the open cluster [NGC 188](https://en.wikipedia.org/wiki/NGC_188).  This cluster is much farther away.  If your code doesn't work, see if you can make it work.*

In [None]:
#NGC 188

