# Fitting Models to Data: Star Cluster Case Study

## Finding the Ages of Star Clusters

# Step 1, Warm-Up : Where are our Milky Way's Star Clusters?

NGC 6535:
<img src="http://www.nasa.gov/sites/default/files/thumbnails/image/potw1452a.jpg" width=700>

Let's first visualize where Globular Clusters (GCs) and Open Clusters (OCs) are located in our Milky Way Galaxy. <br>

<a href="http://www.nasa.gov/content/goddard/hubble-sees-an-ancient-globular-cluster/#.VPUlj7PF-5I">Globular Clusters</a> are gravitationally bound clusters of 100,000s of stars. Because they formed early in the formation of our Milky Way Galaxy, <a href="http://starchild.gsfc.nasa.gov/docs/StarChild/questions/question28.html">Globular Clusters are used to provide a lower limit on the age of our Universe</a>. 

<a href="https://en.wikipedia.org/wiki/Open_cluster">Open Clusters</a> are contain many fewer stars than globular clusters, usually 100s - 1000s.  They are constantly forming (and evaporating) in our Galaxy, and therefore have a range in ages. 

Later in this activity you'll determine the ages of a cluster of your choice. 

First we'll investigate the spatial distribution of these star clusters using the data sets : <a href="data/GlobularClusters_clean.tab">GlobularClusters_clean.tab</a> and <a href="data/OpenClusters_clean.tab">OpenClusters_clean.tab</a>, in your data folder. 

(The GC table is a cleaned up version of the <a href="http://spider.seds.org/spider/MWGC/mwgc.html">original data table from SEDs</a>, and the OC table is a cleaned up version of <a href="https://www.univie.ac.at/webda/tadross.html">this one</a>.)

Both tables contain the <a href="http://astro.unl.edu/classaction/animations/coordsmotion/radecdemo.html">RA and DEC location</a>, distance from our sun and from the galactic center in kilolightyears (kly), <a href="http://lcogt.net/spacebook/what-apparent-magnitude">apparent magnitude</a> in the V-band, and <a href="https://lcogt.net/spacebook/using-angles-describe-positions-and-apparent-sizes-objects/">angular size</a> of the Globular Clusters in our Milky Way galaxy.

### Import python libraries

In [None]:
#Set up astropy 
from astropy.table import Table,Column
from astropy.coordinates import SkyCoord, Distance
from astropy import units as u
from astropy.io import ascii

#Set up plotting libraries
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

### Read datafiles use ascii.read

In [None]:
#for the globular cluster data in data/GlobularClusters_clean.tab
GCs = ---
GCs

In [None]:
#Now the open cluster data in data/OpenClusters_clean.tab
OCs = ---
OCs

### To Do: Make a simple scatter plot of the Globular and Open Clusters' RA and DEC. 

* For suggestions, check out <a href="../examples/CurveFitExample.ipynb">step 1 of this tutorial</a>.

* For additional information/examples, check out <a href="http://nbviewer.ipython.org/github/AJRenold/ipython/blob/1.x/examples/notebooks/Part%203%20-%20Plotting%20with%20Matplotlib.ipynb">this useful reference</a> and <a href="http://matplotlib.org/1.4.1/users/pyplot_tutorial.html">this one</a>.
<br> 


* Remember, first insert a new cell below, type in your code, and then 'run the cell' by clicking 'shift-enter'. 
* Remember to include at the beginning:<br>
%matplotlib inline
* be sure to label the plot and include a legend (and give the GCs and Ocs different colors)

In [None]:

plt.scatter(---)
plt.scatter(---)
plt.legend()
plt.xlabel("RA")
plt.ylabel("Dec")

### Plot a More Useful Projection, and include the Distances and Diameters, of the Star Clusters in our Milky Way Galaxy

To locate an object in 3D space we use three numbers. Our data table provides RA, DEC, and distSun. RA and DEC tell us the star clusters' locations on the sky, and distSun tells us their distances from our Sun.  

First, it is useful to create an Astropy Coordinate Object:

In [None]:
GC_Coords=SkyCoord(GCs['RA'],GCs['DEC'],unit=(u.degree, u.degree),\
                   distance=Distance(GCs['distSun(kly)']/3.26,u.kpc),frame='icrs')
OC_Coords=SkyCoord(OCs['RA'],OCs['DEC'],unit=(u.degree, u.degree),\
                   distance=Distance(OCs['distSun(kly)']/3.26,u.kpc),frame='icrs')

Next, plot the star clusters on the sky in the <a href='http://en.wikipedia.org/wiki/Mollweide_projection'>Mollweide projection</a>. Scale the point sizes according to the angular diameter (i.e., size of the star cluster on the sky) and color them according to distance (with the "cool" color table, blue is close and pink is far).

In [None]:
#Create your plot
plt.figure (figsize=(13,7))
plt.subplot(projection="mollweide")

plt.grid(True)

#GCs
plt.scatter(GC_Coords.ra.wrap_at(180.*u.degree).radian,GC_Coords.dec.radian,c=---, s=---, cmap='cool')

#OCs
plt.scatter(OC_Coords.ra.wrap_at(180.*u.degree).radian,OC_Coords.dec.radian,c=---, s=---, cmap='autumn')

#Label your plot
plt.title("Star Clusters",fontsize=24)
plt.xlabel("RA",fontsize=16)
plt.ylabel("DEC",fontsize=16)

Questions:

- Considering the GCs, why are the biggest points mostly light blue and the pink points all small?
    
- Why are the are the GCs centered/clumped around a particulat RA/DEC? 



### Plot the Clusters in Galactic Coordinates

Here we'll make the same plot but ransformed to Galactic Coordinates (l,b). In Galactic coordinates the center of the Galaxy is at (0.0,0.0)

In [None]:
#Create your plot.  This will be nearly identical to above, except here you want to plot (l,b), rather than (RA,Dec)
---
---
---


#GCs
plt.scatter(---, cmap='cool')

#OCs
plt.scatter(---, cmap='autumn')

#Label your plot
plt.title("Star Clusters",fontsize=24)
plt.xlabel("l",fontsize=16)
plt.ylabel("b",fontsize=16)

Questions:

- Why do the OCs all live in roughly the same line in this projection, at b=0?    

- Why to the GC and OCs have different spatial distributions in our Galaxy? 
    
### In this projection, you can see why the GCs were important to the historic <a href="http://apod.nasa.gov/htmltest/gifcity/cs_why.html"> "Great Debate"</a> between Shapley and Curtis in the early 1900s, about the size of the Universe and our place within it.


# Step 2: Determining the Age of the Open Star Cluster M67

### Part A.  Plot the Observed Color Magnitude Diagram for your Star Cluster

Astronomers <a href="https://www.e-education.psu.edu/astro801/content/l7_p6.html">determine star cluster ages by finding the isochrone that best matches the observed star cluster data</a>.

We will use the M67 data in your 'data' folder that I grabbed from the internet (but there is an extension activity below where you can grab your own data on a different open cluster).
* <a href="data/m67.tab">m67.tab</a>, the observed data
* <a href="data/m67_isochrones.dat">m67_isochrones.dat</a>, a table of isochrones.

### First, let's look at the observations. Read in your Observed Data Table

In [None]:
# Here we read in the M67 Observed Data Table from data/m67.tab using ascii (as above)

obs_data = ---
obs_data

### Plot the Color Magnitude Diagram for your Star Cluster

In [None]:
#Plot B-V color on the x-axis and apparent V magnitude on the y-axis
plt.figure (figsize=(---))
plt.scatter(---)

#Label your Plot
plt.title(---,fontsize=24)
plt.xlabel(---)
plt.ylabel(---)
plt.ylim([18,8])

#Note: color-magnitude diagrams flip the y-axis 
# because the larger a star's V-mag, the fainter the star

### Part B. Now, let's look at the isochrones. 


### Read in your Isochrones Data Table

In [None]:
#Here we read in the M67 Isochrones Data Table from data/m67_isochrones.dat, using ascii

iso_data = ---
iso_data


In [None]:
# Print the ages of your isochrone models

#import needed numpy library
import numpy as np

# Unique allows you to pick out just the unique values in an array
ages = np.unique(---)
print ages

In [None]:
#let's pick an age and plot one of these isochrones to see what it looks like
plt.figure (figsize=(12,10))

# Plot the isochrone model at a chosen age in B,V (like the observed data), use numpy.where
inage = np.where(--- == ---)[0]
plt.plot(---)

#Label the Plot
plt.title("Isochrone",fontsize=24)
plt.xlabel("B-V")
plt.ylabel("V")
plt.ylim(4,-1)
plt.xlim(0,1.5)


### Part C. Overplot the Isochrone and Observations on a Color-Magnitude Diagram


### Convert the Isochrone Data to Match Observed Data Units

Notice that the isochrone and observations cover very different "x" and "y" regions on the plots we made above.  This is because the isochrone modeled outputs Absolute Magnitudes, without interstellar redenning and at a distance of zero.  <br/>

Of course in reality there is dust between us and the cluster, so we need to add redenning to the isochrone.  Also the real cluster is far away, so the stars are fainter; we add the "distance modulus" to shift the isochrone. <br/>

Now, convert your isochrones' absolute magnitudes into apparent magnitudes. 


In [None]:
#the M67 physical constants are listed here.
reddening = 0.01
distMod = 9.6

#apparent V magnitude
iso_V = iso_data['V'] +  distMod

#observed B-V color 
iso_BminV = iso_data['B'] - iso_data['V'] + reddening

### Plot the Observed Data and Isochrone Data Together

In [None]:
#Set up the plot
plt.figure (figsize=(12,10))

# Plot the observed data
plt.scatter(---, c='yellow',s=20)

# Plot the isochrone models
#first set the colors (feel free to choose a different color scheme)
cm_subsection = np.linspace(0.0, 1.0, len(ages)) 
colors = [matplotlib.cm.jet(x) for x in cm_subsection ]

for t,c in zip(ages,colors):
    inage = ---
    plt.plot(---, label=t, color=c)

#Label the Plot
plt.title("Color Magnitude Diagram with Isochrones Overplotted",fontsize=24)
plt.xlabel("B-V")
plt.ylabel("V")
plt.ylim([18,8])
plt.legend()

### Which of your isochrone models (which age) looks to be the best-fit with your star cluster's observed data? 

In general, you could determine the redenning, distance modulus, and cluster age (and metallicity) from fitting an isochrone to the observations.  Let's assume here that we know everything but the cluster age (e.g., from different observations), and we just want to find the cluster age here.

In [None]:
#Plot the best fit by eye over the data 
plt.figure(figsize=(12,10))

# Plot the observed data
plt.scatter(---)
# Plot the isochrone models

inage = ---
plt.plot(---)

#Label the Plot
plt.title("Color Magnitude Diagram with Isochrone Overplotted",fontsize=24)
plt.xlabel("B-V")
plt.ylabel("V")
plt.ylim([18,8])

This "chi-by-eye" may work fairly well, but remember, we already knew the redenning and distance modulus.  And also what about the uncertainties on our age fit by eye?  And what would we do if we wanted to fit isochrones to hundreds of clusters?

## Part C. Parameter Estimation: Automating the Fit

## In the above, you determined your star cluster's age by Eye. Let's automate the process.

We will try to find the isochrone that minimizes the distance that all observed points fall from the isochrone line.  Recall for a more usual type of curve fitting to data (e.g., a straight line), we might try to minimize the <a href="https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test">$\chi^2$ value</a>.  We'll do something similar here, but for simplicity only take the numerator in that equation (which assumes that the errors on our observations are all the same).

First, notice that there are many stars away from the predictions of the isochrone.  Some of these are non-members but others are exotic stars (e.g., "blue stragglers", "yellow giants" "sub-subgiants" -- ask Aaron about these :).  However, none of these non-members or exotic stars are modelled by the isocrones, so we should probably not try to fit to them.  Let's cut out some of these bright stars.

In [None]:
#choose some minimum V value to fit to (leave this alone for now, but come back later to refine the fit)
minVfit = -100.
maxVfit = 100.
minBVfit = -100.
maxBVfit = 100.


In [None]:
print '#log(t/yr) chi^2'
    
# create and empty list to hold the chi^2 values that we will calculate below
chi2 = []

#a normalization since differences in B-V are much smaller than in V
normV = 10.

#loop through all of the ages and calculate a chi^2 value for each
for t in ages:
#find the isochrone with this log(age/yr) == t
    inage = ---
    print '%4.2f' % t,
#calculate a modified chi^2 value based on the distance of the observation from the isochrone
    c2 = 0.
#loop through the observed BV and V values to sum up the chi^2
    for (BVo, Vo) in zip(---):
#if this star is within our V and (B-V) limits (set above) then 
#  find the distances to the all of the isochrone points at this age
        if (Vo > --- and ---):
            d2 = [( (BVo - x)**2. + ((Vo - y)/normV)**2.) for (x,y) in ---]
            c2 += min(d2)
            
#append c2 to our chi2 list            
    ---.---(---)
    print c2

In [None]:
# identify the age at the minimum chi^2 value 
# numpy.argmin is a function that gives the index of the value at the minimum value of an array.  
#    Use that here.
pos1 = ---(---)

# print the ages at these two different minima values
print 'best fit log(age/yr): ',---
print 'best fit age [Gyr]: ',10.**---/1.e9


In [None]:
# Plot the chi^2 minima vs. age
plt.figure (figsize=(10,10))

# plot all the chi^2 values
plt.---

# overplot a line indicating the  chi^2 minimum
plt.---

#Label the plot
plt.xlabel('log(t/yr)')
plt.ylabel(r'modified $\chi^2$') #NOTE: you can use latex syntax to get Greek symbols in plots

#set axes limits (if necessary)
plt.x---
plt.y---

In [None]:
# Plot the best-fit isochorone over the observed data 
plt.figure(figsize=(10,12))

# Label the plot
plt.x---
plt.y---

# Set the axes limits
plt.x---
plt.y---

# Plot the observed data
plt.---

# Plot the best-fit isochrone
bestfit_iso = ---
plt.---

# Highlight the region that is included from the fit 
plt.fill_between([minBVfit,maxBVfit],[minVfit, minVfit], [maxVfit,maxVfit] ,color='gray', alpha=0.3)


## Google the age of your star cluster. How close is your best-fit to your fit by eye (and to the accepted age in the literature)? 

#### If you're not satisfied with our automated fit, go back and improve the code so that it works more reliably (for instance, modify the V and BV limits we set above)

### Some "food for thought" to think about and discuss:

- What are the limitations to the approach used above? 

     
- Which fit do you trust more, the "chi-by-eye" or our automated fit?


- What other information might you want for each star to improve your fit?


- How can we improve this automated fit with the data that we have?  Give that a try!

 

### Remember that star clusters form an import rung on the cosmic distance ladder AND are critical tests for our theory of stellar evolution (which underpins just about all of astrophysics).  So we really want to have reliable isochrone fits to observations like these.  <a href="http://arxiv.org/abs/1501.01303"> Some people spend years developing these methods!</a>

# Congratulations! You've completed Part 5!

## Extension Activity: Download your own Data and fit an isochrone!.

## How to Download Observed Star Cluster Data

Go to http://www.univie.ac.at/webda/navigation.html

This site allows you to download data from pretty much any open star cluster in our galaxy that might be of interest to you. For the full list of clusters included in this site, click <a href="http://www.univie.ac.at/webda/complete_ad.html">here</a>. Pick one that interests you. For additional information about each cluster, look it up in <a href="http://ned.ipac.caltech.edu/forms/byname.html">NED (the NASA Extragalactic Database)</a>.

- Type the name for any star cluster of your choice (for example, M67) in the box labelled 'Display the Page of the Cluster'. Hit enter.
- Make a note of the value for this cluster's ‘Reddening’ and the ‘Distance Modulus’, listed under ‘Basic Parameters’.
- Under ‘Query’, click 'selections on data'.
    - Note: If it doesn't say UBV at the top, then click on 'UBV' (at the left).
- In the 'V' boxes, type 0 in 'Lower' and 20 in 'Upper'. Hit enter.
    - A list of stars and their apparent magnitudes should appear.
- In your data folder in yourProjectDirectory, open a new text file using emacs.
- Copy your star cluster data into this text file. These (and the isochrone data below) are the data you'll use to determine the age of your star cluster.
- Explore the site. What other data can you download about each cluster (i.e., positions, other filter magnitudes, etc.)?

<a href="http://www.univie.ac.at/webda/description.html#base_level">General information about the history and use of WEBDA</a>.

## How to Download the Isochrone Model Data


Next, go to http://stev.oapd.inaf.it/cgi-bin/cmd

- Use the default values under “Evolutionary Tracks”.
- Make sure the photometric system is appropriate for your data (i.e., if you’re using UBV data, then choose the one that starts with UBV).
- Keep the default values for “Dust”,”Extinction”, “Initial Mass Function”
- Under Ages, select: Sequence of isochrones of constant metallicity...
    - Change Z=0.008 to Z=0.019 (this is the value for solar metallicity)
    - Change the age range to log(t/yr) = 8.0 to 10.0  (i.e., ages ranging from 100 million years to 10 billion years)
- Keep the default selection for 'Output' on Isochrone Tables
- Click submit and download the linked file named ‘outputxxx.dat’
- Rename this file to something meaningful and place it in your data folder in yourProjectDirectory.
- Look at the table you generated using Emacs. 
    - Find the rows separating the isochrone of one age from the isochrone of the next age (i.e., log[age/years] = 8 to log[age/years] = 8.5). 
    - Note how this single file contains the full set of isochrones.