# Project 4: What are the origins of the Milky Way’s stellar streams?

Stellar streams are elongated structures of stars that form when a globular cluster or dwarf galaxy is tidally disrupted by the Milky Way’s gravitational field. By modeling their orbits, we can study how the Milky Way's mass is distributed, detect possible perturbations from unseen substructures such as dark matter subhalos, and investigate the history of the disrupted systems. 

In this project, you'll investigate one of the Milky Way’s stellar streams with a provided catalog of observational data. You’ll then use the `gala` Python package to produce models of your stream, which you’ll compare to the real data to learn about the properties of the Milky Way's halo.

---

## Data

Since its launch in 2013, the [Gaia mission](https://www.cosmos.esa.int/web/gaia) has measured precise positions, proper motions, and parallaxes for over a billion stars. This unprecedented dataset has revolutionized our understanding of stellar streams (read more [here](https://ui.adsabs.harvard.edu/abs/2025NewAR.10001713B/abstract) if you're curious), resulting in the discovery of over 100 new systems. 

Even with such a comprehensive dataset, identifying members of individual streams is a tricky process that usually involving multiple rounds of cuts and multiple types of astronomical data. For this project, we'll rely on previous work to handle the tricky bits for us. You'll each be assigned as specific stellar stream and provided with a **pre-selected list of stream members**. Using this list, you'll construct a Gaia query that retrieves properties for all the (known) stars in your stream from the main DR3 table, `gaiadr3.gaia_source`.

You can access the Gaia Archive either through the [online portal](https://gea.esac.esa.int/archive/) or by using `astroquery.gaia`, which we discussed in week 7 of class. When retrieving your data, you should focus on the following properties:

- Positions (right ascension and declination)
- Parallax
- Proper motions
- Radial velocity
- Photometric magnitudes (G, BP, RP) and colors (BP_RP)

You should also retrieve any reported errors for these quantities. Look for columns that have the same name as your property of interest, with `_error` appended. For example, the error on the right ascension is stored in a column called `ra_error`.

---

## Using `gala`

`gala` is a Python package designed for **galactic dynamics**. It allows users to model gravitational potentials and integrate stellar orbits within those potentials. In this project, you'll use `gala` to model the motion of your stream by integrating its orbit in a model of the Milky Way's potential. The [`gala` documentation](https://gala.adrian.pw/en/latest/index.html) has lots of helpful examples for how to do this, so check it out if you get stuck while working on this project!

You can us `pip` to install `gala` in any `conda` environment. (For help, reference the [installation instructions](https://gala.adrian.pw/en/latest/install.html) in the `gala` documentation.) Once you have `gala` installed, you'll want to add the following import statements to the top of your notebook: 

```
import gala.potential as gp
import gala.dynamics as gd
```

To model the orbit of a single star, you first need to define the potential that will host the orbit. `gala` includes classes to represent many common potentials, including one designed to mimic the mass distribution of the Milky Way, which you can access with [`gp.MilkyWayPotential()`](https://gala.adrian.pw/en/latest/api/gala.potential.potential.MilkyWayPotential.html). 

You then need to define the initial positions and velocities for your star, including the units. To do this, you'll use [`gd.PhaseSpacePosition()`](https://gala.adrian.pw/en/latest/api/gala.dynamics.PhaseSpacePosition.html). For example:

```
starting_point = gd.PhaseSpacePosition(pos=[-8.1, 0, 0.02] * u.kpc, vel=[13, 245, 8.] * u.km/u.s)
```

You'll notice that the positions and velocities are not in the typical RA/DEC coordinate system. Instead, they're defined as Cartesian coordinates with respect to the center of the Milky Way. To convert the information provided in Gaia to this coordinate system, you'll first need to define an [Astropy `SkyCoord` object](https://docs.astropy.org/en/stable/api/astropy.coordinates.SkyCoord.html) with all of the Gaia information. For example:

```
import astropy.coordinates as coord

c = coord.SkyCoord(ra=[180.323, 1.523]*u.deg,
                   dec=[-17, 29]*u.deg,
                   distance=[172, 412]*u.pc,
                   pm_ra_cosdec=[-11, 3]*u.mas/u.yr,
                   pm_dec=[4, 8]*u.mas/u.yr,
                   radial_velocity=[114, -21]*u.km/u.s)
```

*Note: In this example, you'll notice that the `SkyCoord` object is being used to store a list of multiple coordiates rather than just one. This is particularly useful for this project, where you're working with a sample of stars.*

Once you've defined the `SkyCoord` with the appropriate information, you can transform the coordinates to the galactocentric frame with `c.transform_to(coord.Galactocentric())`. You can then used the transformed coordinate to define the necessary starting conditions for `gala`.

Finally, the [`integrate_orbit`](https://gala.adrian.pw/en/latest/api/gala.potential.potential.PotentialBase.html#gala.potential.potential.PotentialBase.integrate_orbit) function can be used to integrate the starting position of your star forward or backward in time. For example, passing the arguments `dt=1*u.Myr, t1=0, t2=2*u.Gyr` will tell `gala` to integrate the orbit from a starting time of 0 and an ending time of 2 Gyr, with steps of 1 Myr. The resulting [`Orbit` object](https://gala.adrian.pw/en/latest/api/gala.dynamics.Orbit.html#gala.dynamics.Orbit) will contain information about the position and velocity of the star at each timestep, which you can use to plot the star's orbit and final position.

`gala` is of course capable of much more advanced modeling, including evolving multiple orbits simultaneously. Here are some examples from the documentation that might be helpful as you work on this project:

1. The [**Getting started** tutorial](https://gala.adrian.pw/en/latest/getting_started.html) explains how to integrate the orbit of a single star (also described above)
2. [This page](https://gala.adrian.pw/en/latest/tutorials/Milky-Way-model.html) shows another example of integrating the orbit of a single star (but with specific information from Gaia) 
3. [This page](https://gala.adrian.pw/en/latest/tutorials/integrate-potential-example.html) describes how to model multiple orbits simultaneously 
4. [This page]([https://gala.adrian.pw/en/latest/dynamics/mockstreams.html]) describes how to generate a mock stellar stream from a user-defined progenitor  


---

## Analysis tasks

### 1. Obtain and load Gaia data

You'll be provided with a list of Gaia IDs for known members of your assigned stream. Construct a Gaia query that retrieves the properties listed above for each star. You might consider using the following query syntax:

```
SELECT <columns>
FROM <table>
WHERE source_id IN (<id1>, <id2>, <id3> ...)
```

If you use [f-strings](https://www.geeksforgeeks.org/formatted-string-literals-f-strings-python/), you can insert the list of IDs into your query automatically rather than typing all of the IDs out by hand.
    
Download the results of your query (if submitted through the online portal), or save the results to a file (if submitted through `astroquery.gaia`). Once you've obtained the data, load it into this notebook in a format that you can easily work with. (You might want to consider using an Astropy `Table`!) 

### 2. Investigate basic properties of your stream

Using the data you retrieved in step 1, generate the following plots for your assigned stream:

1. Scatterplot of stream coordinates (RA on the x-axis and DEC on the y-axis) to visualize the stream's extent
2. Separate histograms of each of the proper motion components and the radial velocity to check for kinematic coherence
3. Color-magnitude diagram (BP - RP on the x-axis, and *absolute* G magnitude on the y-axis) to examine the stellar population

Note that to obtain absolute G magnitudes for the color-magnitude diagram (rather than the *apparent* G magnitudes provided in Gaia; learn more about the difference [here](https://www.phys.ksu.edu/personal/wysin/astro/magnitudes.html)), you'll need to implement the following conversion: $G_{abs} = G - 5 \log_{10}\left(\frac{d}{10\text{ pc}}\right)$. The distances $d$ for each star can be estimated from its parallax as $d ≈ \frac{1}{p}$, where $d$ is the distance in parsecs (pc) and $p$ is the parallax in arcseconds (arcsec). Keep in mind that Gaia reports parallaxes in *milli*arcseconds, so you'll need to multiply by a factor of 1000!

### 3. Model the orbit of a single stream member with `gala`

Choose a star from your stream at random and follow the steps outlined in the **Using `gala`** section above to produce a model of its orbit in a Milky Way potential. Integrate the orbit from a starting time of 0 up to 2 Gyr in 1 Myr timesteps. Plot the orbit in Cartesian coordinates (x and y) and mark the start and end points (which you can do by feeding just those points to `plt.scatter()`).

### 4. Model the orbits of the whole stream and integrate backwards

Create a new [`PhaseSpacePosition()` object](https://gala.adrian.pw/en/latest/api/gala.dynamics.PhaseSpacePosition.html) that contains the coordinates and velocities for *all* of the known stars in your stream. Integrate their orbits *backwards* in time for 2 Gyr (which you can do by specifying a negative `dt` in `integrate_orbit`) and plot the final positions of the stars in Cartesian coordinates. Do the orbits cluster in a specific region? If not, adjust the amount of time you're integrating for and see if you can make the stars converge.

### 5. Forward model the disruption of different progenitors

You'll now investigate different models for the progenitor system of your stellar stream. First, pick the hypothetical progenitor's initial conditions:

1. If your stars converged in step 4, use their average final positions and velocities as the initial conditions for the progenitor
2. If your stars did not converge, use the average *current-day* positions and velocities of your stream as the initial conditions for the progenitor

Use these initial conditions to construct simulations of two different progenitors (note that `u` refers to the `astropy.units` subpackage): 

1. A star cluster (which you can represent with a [`PlummerPotential`](https://gala.adrian.pw/en/latest/api/gala.potential.potential.PlummerPotential.html); recommended parameters are `m = 1e5 * u.Msun` and `b = 10 * u.pc`) 
2. A dwarf galaxy (which you can represent with an [`NFWPotential`](https://gala.adrian.pw/en/latest/api/gala.potential.potential.NFWPotential.html); recommended parameters are `m = 1e8 * u.Msun` and `r_s = 3*u.kpc`)

For each progenitor, distribute test particles around the specified potential and integrate their orbits *forward* in time to the present day. (If your stars converged in step 4, this will be the same amount of time that it took your stars to reach convergence when integrating backwards. If they didn't converge, you may have to experiment with the timescale to produce a reasonable-looking stream.)

Compare the modeled streams from each progenitor to the observed stream by plotting all three in Cartesian coordinates. Briefly discuss how well the models align and any trends that you see.

---

## Reflection

Write a brief (1-2 paragraphs) interpretation of the results you found above. Link it back to your original research question and key concepts from your literature review. (For this project in particular, you might consider thinking about what your different models tell you about your stream.)

Then, write a brief (1-2 paragraphs) reflection on the limitations of your analysis. Are there any caveats or assumptions in your analysis? Could more data or a different method provide more robust results?

---

## Extending your analysis (optional)

Are there additional aspects of the dataset that you’d like to explore? Do you have ideas for refining the methods used in this notebook? Or maybe you’ve noticed an interesting pattern in your results that raises new questions? If you answered yes to any of these questions, I encourage you to extend your analysis! Feel free to reach out to me via email or visit office hours to discuss your ideas. If you're interested in diving deeper but aren’t sure where to start, I’m also happy to brainstorm with you. This is a great opportunity to practice developing your own research questions and exploring a dataset in a way that interests you.

---