# Project 1: How old are the Milky Way's star clusters?

Star clusters are collections of stars that are thought to have formed at roughly the same time from the same giant molecular cloud. Because they share a common distance, age, and initial chemical composition, star clusters are crucial for studying stellar populations.

In this project, you'll use brightnesses and distance estimates from Gaia to create a color-magnitude diagram (CMD) for one of the Milky Way’s star clusters. You’ll then compare the CMD to theoretical stellar evolution models to determine the age of your cluster.

---

## Data

To conduct your analysis, you'll use data from the [Gaia mission](https://www.cosmos.esa.int/web/gaia), which has measured precise positions, proper motions, and parallaxes for over a billion stars. Specifically, you'll be using the main table of the third data release (DR3), which is called `gaiadr3.gaia_source` in the Gaia Archive. Descriptions of each columns in `gaiadr3.gaia_source` can be found [here](https://gea.esac.esa.int/archive/documentation/GDR3/Gaia_archive/chap_datamodel/sec_dm_main_source_catalogue/ssec_dm_gaia_source.html) in the Gaia documentation.

You can access the Gaia Archive either through the [online portal](https://gea.esac.esa.int/archive/) or by using `astroquery.gaia`, which we discussed in week 7 of class. When retrieving your data, you should focus on the following properties:

- Positions (right ascension and declination)
- Proper motions
- Parallax
- Photometric magnitudes (G, BP, RP) and colors (BP_RP)

You should also retrieve any reported errors for these quantities. Look for columns that have the same name as your property of interest, with `_error` appended. For example, the error on the right ascension is stored in a column called `ra_error`.

Note that not all of the stars in the Gaia Archive have well-measured properties. Sometimes, the error on a certain measurement might be larger than the measurement itself. Or maybe the measurement is missing altogether! For your analysis, you'll want to remove any stars with suspicious data. To do this, you can include quality cuts directly in your query. Here's a set of suggested quality cuts based on work done by the Gaia team:

1. `parallax_over_error > 10`
2. `visibility_periods_used > 8`
3. `phot_g_mean_flux_over_error > 50`
4. `phot_bp_mean_flux_over_error > 20`
5. `phot_rp_mean_flux_over_error > 20`

---

## Analysis tasks

### 1. Retrieve data from Gaia

To get data for your assigned cluster, first use Google or NASA ADS to find the following properties: 

1. Approximate center in RA and DEC (degrees)
2. Approximate distance (parsecs)

Then construct a Gaia query that fulfills the following criteria:

1. Search area is a circle centered on the cluster's approximate coordinates with a radius of 5 degrees.
2. Selects only the stars that fall within +/- 100 pc of the cluster's approximate distance. (This can be implemented with a parallax filter.)
3. Retrieves only the properties listed above.
4. Applies quality cuts described above.
    
Finally, download the results of your query (if submitted through the online portal), or save the results to a file (if submitted through `astroquery.gaia`). Once you've obtained the data, load it into this notebook in a format that you can easily work with. (You might want to consider using an Astropy `Table`!) 

### 2. Remove unlikely cluster members

Make a scatterplot of your retrieved stars in RA and DEC space, coloring the points by parallax. You should see a dense cluster of stars in the center of the plot, with a scattering of stars surrounding it. Though all of these stars are in the same part of the sky (and roughly at the same distance) as your cluster, not all of them were actually formed as part of the cluster.

To filter out these "imposter" stars, you should perform sigma-clipping on proper motion in both RA and DEC. The idea is that all of the true cluster members should have similar proper motions -- in other words, they should be "moving together." Outliers in proper motion are moving differently than the rest of the stars and are probably not true members of the cluster. 

You should treat sigma clipping in each parameter separately (meaning first do `pmra`, then `pmdec`). Recall that sigma-clipping is an iterative process in which you first compute the mean and standard deviation of a given list, then remove all members of the list that fall more than X standard deviations away from the mean, where X is a user-defined parameter (usually chosen to be 3 or 5). Then you repeat the process with the new (trimmed) list until either no points are removed, or a maximum number of iteractions is reached.

Note that outliers in this problem can constitute a large fraction of the sample for some clusters. Therefore, you should consider measures to mitigate effects of outliers on the estimates of the centroid and width of the member distribution a) using median instead of the mean and b) using percentiles of the distribution intsead of standard deviation. The latter is particularly prone to overestimation due to outliers.

Once you've removed unlikely cluster members, make a new scatter plot in RA and DEC space, colored by parallax. How does this plot compare to your previous plot? 

### 3. Estimate the cluster distance

Now that you've removed unlikely cluster members, make your own estimate of the cluster distance. First, convert parallaxes for your cleaned sample to distances. The formula to do this is $d ≈ \frac{1}{p}$, where $d$ is the distance in parsecs (pc) and $p$ is the parallax in arcseconds (arcsec). Keep in mind that Gaia reports parallaxes in *milli*arcseconds, so you'll need to multiply by a factor of 1000!

Finally, estimate the distance to the cluster in parsecs by taking the median of the distances for each likely cluster member.

### 4. Create a color-magnitude diagram

Compute [absolute magnitudes](https://www.phys.ksu.edu/personal/wysin/astro/magnitudes.html) for each star using your distance estimate from step 3: $G_{abs} = G - 5 \log_{10}\left(\frac{d}{10\text{ pc}}\right)$

Then, create a color-magnitude diagram for your cluster by plotting BP-RP color on the x-axis and the absolute G-band magnitudes on the y-axis. Remember to invert the y-axis so that brighter stars are on top and fainter stars are on the bottom!

### 5. Compare with theoretical models

The color-magnitude diagram (CMD) is a powerful tool for studying stellar evolution because stellar populations that were born together (like those in star clusters!) follow predictable tracks in the CMD as they age. These tracks are called "isochrones" (iso = same and chrone = time), and by comparing theoretical isochrones to data from real star clusters, we can estimate the age of the cluster. 

For this project, we will use the MESA Isochrones and Stellar Tracks (MIST), which can be downloaded [here](https://waps.cfa.harvard.edu/MIST/interp_isos.html). Leave all settings on the form as their defaults, but change **Output options** to `Synthetic Photometry (UBV(RI)c + 2MASS + Kepler + Hipparcos + Gaia (DR2/MAW/EDR3) + Tess)`. Then click "Generate Isochrones" and download the resulting file. The file will have a `.iso` extension, which your computer might be confused by, but don't worry! This is just a text file that can be opened in TextEdit, Notepad, or even VSCode.

The columns of the file that you care about are the age (`log10_isochrone_age_yr`) and the synthetic Gaia photometry for EDR3 (which refers to "early DR3", but is the same as DR3 for our purposes). For each value of the age, the file provides a set of theoretical Gaia magnitudes for stars of different masses. Your job is to plot these tracks on your CMD from step 4 by plotting synthetic BP-RP vs synthetic G. You should plot one track for each of the ages in the file. Find the isochrone that seems to match your data most closely -- this is your estimate for your cluster's age!

The MIST developers have provided a Python package called `read_mist_models` that might be helpful for reading in the isochrone file; check out their [resources page](https://waps.cfa.harvard.edu/MIST/resources.html) for more information and some examples of how to use it.

---

## Reflection

Write a brief (1-2 paragraphs) interpretation of the results you found above. Link it back to your original research question and key concepts from your literature review. (For this project in particular, you might consider looking up references that measure the age of your cluster, and comparing your results to those estimates.)

Then, write a brief (1-2 paragraphs) reflection on the limitations of your analysis. Are there any caveats or assumptions in your analysis? Could more data or a different method provide more robust results?

---

## Extending your analysis (optional)

Are there additional aspects of the dataset that you’d like to explore? Do you have ideas for refining the methods used in this notebook? Or maybe you’ve noticed an interesting pattern in your results that raises new questions? If you answered yes to any of these questions, I encourage you to extend your analysis! Feel free to reach out to me via email or visit office hours to discuss your ideas. If you're interested in diving deeper but aren’t sure where to start, I’m also happy to brainstorm with you. This is a great opportunity to practice developing your own research questions and exploring a dataset in a way that interests you.

---