In [2]:
%matplotlib inline
%run ../setup/nb_setup

# Science Case Studies: The Age-Velocity Dispersion Relations in GALAH DR3

Author(s): Keith Hawkins


## Learning goals

The purpose of this tutorial is to: 
1. Introduce the concepts of Age-Velocity Dispersion relations
2. Introduce Cross matching via topcat
3. Introduce/practice velocity calculations via astropy 

*Goal:* 

You will derive the Gaia DR2-GALAH Age-Velocity Dispersion relations. Two additional challenges include: 

(1) derive the same Age-Velocity Dispersion relations but for  Gaia DR3-GALAH (hint!: this will require a cross match) and 

(2) consider uncertainties in age and velocity to build a linear model which converts velocity dispersion into age.


### Notebook Setup and Package Imports

In [3]:
import numpy as np
import matplotlib.pyplot as p
import astropy
from astropy.table import Table
import scipy.stats
from astropy.coordinates import SkyCoord
import astropy.units as u
from astropy.coordinates import Galactic
from astropy.coordinates import ICRS
import astropy.coordinates as apycord
import random

We will need to download the GALAH DR3 dataset that contains age information. This dataset also contains DR2 estimates of velocity but *NOT* DR3 (You will need to do that cross match):

DATA LOCATION : https://cloud.datacentral.org.au/teamdata/GALAH/public/GALAH_DR3/

Local mirror : https://users.flatironinstitute.org/~apricewhelan/data/surveys/GALAH/GALAH_DR3_VAC_ages_v2.fits

Main GALAH Table :

-GALAH_DR3_main_allstar_v2.fits -- Main GALAH DR3

-GALAH_DR3_VAC_dynamics_v2.fits -- Dynamics VAC (with DR2) 

-GALAH_DR3_VAC_ages_v2.fits -- GALAH DR3 ages

-GALAH_DR3_VAC_GaiaEDR3_v2.fits	-- GaiaEDR3 x GALAH 

Start by using topcat (or python) to crossmatch GALAH_DR3_VAC_dynamics_v2.fits x GALAH_DR3_VAC_ages_v2.fits! (can consider also cross-matching with main table for extra)


Download these datasets, cross match and 
and load it in with astropy!



In [4]:
GALAH = Table.read("GALAH_DR3_VAC_ages_v2xVAC_dynamics_v2.fits")  # loading in the data

IOError: [Errno 2] No such file or directory: 'GALAH_DR3_VAC_ages_v2.fits'

In [None]:
# explore the columns:

print(GALAH.colnames)

The goal here is to determine how velocity (and velocity dispersion) is correlated with age. 

As such we will need to find the (precomputed DR2) velocity columns; and age columns (based on the cell above). 

Not all velociities and ages are measured with the same precision so we will also want to apply some very simple quality control cuts. So lets do that in the next cell

In [None]:
# lets create a quality
v_err_lim = 5  # define the limiting velocity uncertinaty in all axes
age_err_lim = 0.3  # age uncertainty limit in percent
dist_err_lim = 0.15  # distance error in percent
GALAH = GALAH[
    GALAH["parallax_edr3"] > 0
]  # Makes sure no distances will be zero for astropy later

# Quality cuts
ok = np.where(
    (GALAH["vR_Rzphi_50"] - GALAH["vR_Rzphi_5"] < v_err_lim)
    & (GALAH["vT_Rzphi_50"] - GALAH["vT_Rzphi_5"] < v_err_lim)
    & (GALAH["vz_Rzphi_50"] - GALAH["vz_Rzphi_5"] < v_err_lim)
    & (GALAH["e_age_bstep"] / GALAH["age_bstep"] < age_err_lim)
    & (GALAH["parallax_error"] / GALAH["parallax"] < dist_err_lim)
    & (GALAH["age_bstep"] > 1)
    & (GALAH["age_bstep"] < 10)
)[0]

print("There are %i stars that pass the quality controls" % len(ok))
D = GALAH[ok]

With the 'quality' sample in hand, lets now try to figure out how velocity and velocity dispersion depends on age in this sample. We will start by simply plotting velocity in all directions as a function of age.

In [None]:
# Now we want to plot how the velocities depends on age. Please plot the age as a function of VR, VT, VZ 
p.figure(figsize=(10, 10))


From the above plots we can see that:
1. Mostly velocities are indepdent of age (no signficant relationships), except maybe in V_theta
2. The velocity dispersion in all velcoity directons likley grows with increasing age

So lets bin by age and see if we can determine the relationship between age and velocity.

In [None]:
# for each star we will place them into bins by age using scipy.stats.binned_statistic

age_bins = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  # define the limits of each bin

## With the ages now binned up lets plot compute the bin's mean age and dispersion in age for each velocity direction
mean_VR, bin_edge, inds = scipy.stats.binned_statistic(
    D["age_bstep"], D["vR_Rzphi_50"], statistic="mean", bins=age_bins
)
std_VR, bin_edge, inds = scipy.stats.binned_statistic(
    D["age_bstep"], D["vR_Rzphi_50"], statistic="std", bins=age_bins
)

mean_Vz, bin_edge, inds = scipy.stats.binned_statistic(
    D["age_bstep"], D["vz_Rzphi_50"], statistic="mean", bins=age_bins
)
std_Vz, bin_edge, inds = scipy.stats.binned_statistic(
    D["age_bstep"], D["vz_Rzphi_50"], statistic="std", bins=age_bins
)

mean_Vt, bin_edge, inds = scipy.stats.binned_statistic(
    D["age_bstep"], D["vT_Rzphi_50"], statistic="mean", bins=age_bins
)
std_Vt, bin_edge, inds = scipy.stats.binned_statistic(
    D["age_bstep"], D["vT_Rzphi_50"], statistic="std", bins=age_bins
)

bincen = [
    (bin_edge[i + 1] + bin_edge[i]) / 2.0 for i in range(len(age_bins) - 1)
]  # this set the bin centers

Now that we have binned the data and determined the mean and dispersion in each velocity direction, as a sanity check (which are incredibly important to do), lets overplot the data and the binned results to ensure it looks ok.

# The Age-Velocity Dispersion Relations in GALAH-Gaia DR2


Now we are ready to plot the age velocity disperison relations for each component for GALAH-Gaia DR2. In the next cell please plot the age as a function of velocity, while also overplotting the binned (mean and std as an errorbar) velocities binned by age.

With the data binned we can now explore the velocity disperison as a function of age! You will do this with the binned dataset (bincen, and std_VR, std_Vt, std_Vz).

*We now see that the age-velocity dispersion relations are roughly linear such that VELOCITY DISPERSION INCREASES with INCREASING age as expect*

# The Age-Velocity Dispersion Relations in GALAH-Gaia EDR3!

Let us now redo the above but with EDR3 data instead! This will require us to compute the velocities (and their uncertainties -- challenge) from the EDR3 data directly.

You will need to first cross-match (via ADQL/TOPCAT/TAP QUERY/ASTROQUERY) orginal table with ED3. We will explore how to use topcat in this workshop but its trivial to do this via an ADQL/TAP Query.

Once the cross match is done, lets read it in.

In [1]:
# --- ok lets now compute some velocities with EDR3 data! We start by making a defintion.
def compute_vels(ra, dec, pmra, pmdec, rv, dist, V0=[11.1, 245.0, 7.25], R0=8.3):
    icrs = ICRS(
        ra=ra * u.deg,
        dec=dec * u.deg,
        distance=dist * u.kpc,
        pm_ra_cosdec=pmra * u.mas / u.yr,
        pm_dec=pmdec * u.mas / u.yr,
        radial_velocity=rv * u.km / u.s,
    )

    # Define the Galactic non-rotating rest frame: (V0 = solar velocity in Galactic rest fram; R0 = solar radius)
    v_sun = apycord.CartesianDifferential(V0 * u.km / u.s)
    gc_frame = apycord.Galactocentric(
        galcen_distance=R0 * u.kpc, z_sun=25.0 * u.pc, galcen_v_sun=v_sun
    )
    # convert to GC frame
    cg = icrs.transform_to(gc_frame)
    # cg.representation= 'cartesian'
    cg.representation = "cylindrical"  # and now to clylindrical coords
    VR = cg.d_rho.to(u.km / u.s).value
    VT = (
        (cg.d_phi * cg.rho).to(u.km / u.s, equivalencies=u.dimensionless_angles()).value
    )
    VZ = cg.d_z.to(u.km / u.s).value

    return VR, -VT, VZ

With the definition, we can now convert the observables into velocities by calling the defintion from above. We can also do this iterativley where in each iteration we perturb the input observables by the uncertainties. This will enable us to estimate the velocity uncertainties via monte carlo realizations.

In [None]:
# ----lets now deal with the velocity uncertinaties via monte carlo realizations (this is a challenge)

Now that we have the velocities and their uncertainties with EDR3 data, lets make the same quality control cuts as before.

In [None]:
ok = np.where(
    (GALAH["galvr_err_dr3"] < v_err_lim)
    & (GALAH["galvt_err_dr3"] < v_err_lim)
    & (GALAH["galvz_err_dr3"] < v_err_lim)
    & (GALAH["e_age_bstep"] / GALAH["age_bstep"] < age_err_lim)
    & (GALAH["parallax_error"] / GALAH["parallax"] < dist_err_lim)
    & (GALAH["age_bstep"] > 1)
    & (GALAH["age_bstep"] < 10)
)[0]

print("There are %i stars that pass the quality controls" % len(ok))
D_DR3 = GALAH[ok]

We can now plot the EDR3 version of the age velocity relations. Please plot the your DR3 computed velocities (Vr, Vt, Vz) as a function of age.

We must now bin the data in the same way (by age) as for the DR3 results and compute the mean and std of the velocity in each age bin.

In [None]:
mean_VR_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age_bstep"], D_DR3["galvr_dr3"], statistic="mean", bins=age_bins
)
std_VR_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age_bstep"], D_DR3["galvr_dr3"], statistic="std", bins=age_bins
)

mean_Vz_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age_bstep"], D_DR3["galvz_dr3"], statistic="mean", bins=age_bins
)
std_Vz_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age_bstep"], D_DR3["galvz_dr3"], statistic="std", bins=age_bins
)

mean_Vt_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age_bstep"], D_DR3["galvt_dr3"], statistic="mean", bins=age_bins
)
std_Vt_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age_bstep"], D_DR3["galvt_dr3"], statistic="std", bins=age_bins
)

bincen = [
    (bin_edge[i + 1] + bin_edge[i]) / 2.0 for i in range(len(age_bins) - 1)
]  # this set the bin centers

Finally, we plot the age velocity dispersion relations for GALAH-DR2 and GALAH-EDR3 datasets. 

*We now see that the age-velocity dispersion relations are roughly linear such that VELOCITY DISPERSION INCREASES with INCREASING age as expected. If there are some differences between DR2 and EDR3, why might they exist.*

Challenge: If you finish, try to :

1. Fit a linear function to age velocity dispersion relation and note the model paraameters to compare to the APOGEE TEAM


2. figure out a way to define the age velocity realtion *without* binning the data. (Hint: This can be done via writing down a linear model for how velocity dispersion depends on age and then comparing that model in the data space.)