In [None]:
%matplotlib inline
%run ../setup/nb_setup

# Science Case Studies: The Age-Velocity Dispersion Relations in APOGEE

Author(s): Keith Hawkins


## Learning goals

The purpose of this tutorial is to: 
1. Introduce the concepts of Age-Velocity Dispersion relations
2. Introduce Cross matching via topcat
3. Introduce/practice velocity calculations via astropy 

*Goal:* 

You will derive the Gaia DR2-APOGEE Age-Velocity Dispersion relations. 

Two additional challenges include: 

(1) derive the same Age-Velocity Dispersion relations but for  Gaia DR3-APOGEE (hint!: this will require a cross match) and 

(2) consider uncertainties in age and velocity to build a linear model which converts velocity dispersion into age.


### Notebook Setup and Package Imports

In [None]:
import numpy as np
import matplotlib.pyplot as p
import astropy
from astropy.table import Table
import scipy.stats
from astropy.coordinates import SkyCoord
import astropy.units as u
from astropy.coordinates import Galactic
from astropy.coordinates import ICRS
import astropy.coordinates as apycord
import random

We will need to download the APOGEE-ASTRONN dataset that contains age information (along with other information about the APOGEE spectra). This dataset also contains DR2 estimates of velocity but *NOT* DR3 (you will need to do a crossmatch between this file and Gaia EDR3, which we will discuss):

DATA DESCRIPTION : https://www.sdss.org/dr16/data_access/value-added-catalogs/?vac_id=the-astronn-catalog-of-abundances,-distances,-and-ages-for-apogee-dr16-stars

DATA LOCATION : https://data.sdss.org/sas/dr16/apogee/vac/apogee-astronn/apogee_astroNN-DR16-v1.fits

LOCAL MIRROR : https://users.flatironinstitute.org/~apricewhelan/data/surveys/APOGEE_DR16/apogee_astroNN-DR16-v1.fits

Download this dataset
and load it in with astropy!

In [None]:
APOGEE = Table.read("apogee_astroNN-DR16-v1.fits")  # read in the data

In [None]:
# explore the columns:

print(APOGEE.colnames)

The goal here is to determine how velocity (and velocity dispersion) is correlated with age. 

As such we will need to find the (precomputed DR2) velocity columns; and age columns (based on the cell above). 

Not all velociities and ages are measured with the same precision so we will also want to apply some very simple quality control cuts. So lets do that in the next cell

In [None]:
# lets create a quality
v_err_lim = 5  # define the limiting velocity uncertinaty in all axes
age_err_lim = 0.3  # age uncertainty limit in percent
dist_err_lim = 0.15  # distance error in percent

ok = np.where(
    (APOGEE["galvr_err"] < v_err_lim)
    & (APOGEE["galvt_err"] < v_err_lim)
    & (APOGEE["galvz_err"] < v_err_lim)
    & (APOGEE["age_total_error"] / APOGEE["age"] < age_err_lim)
    & (APOGEE["dist_error"] / APOGEE["dist"] < dist_err_lim)
    & (APOGEE["age"] > 0)
)[0]

print("There are %i stars that pass the quality controls" % len(ok))
D = APOGEE[ok]  # lets now subselect the 'GOOD' data

With the 'quality' sample in hand, lets now try to figure out how velocity and velocity dispersion depends on age in this sample. We will start by simply plotting velocity in all directions as a function of age. PLEASE plot the age as a function of velocity in each componenent (VR, VT, Vz) as a 

In [None]:
# Now we want to plot how the velocities depends on age do it here.


From the above plots we can see that:
1. Mostly velocities are indepdent of age (no signficant relationships), except maybe in V_theta
2. The velocity dispersion in all velcoity directons likley grows with increasing age

So now lets bin by age and see if we can determine the relationship between age and velocity.

In [None]:
# for each star we will place them into bins by age using scipy.stats.binned_statistic

age_bins = [3, 4, 5, 6, 7, 8, 9, 10]  # define the limits of each bin

## With the ages now binned up lets plot compute the bin's mean age and dispersion in age for each velocity direction
mean_VR, bin_edge, inds = scipy.stats.binned_statistic(
    D["age"], D["galvr"], statistic="mean", bins=age_bins
)
std_VR, bin_edge, inds = scipy.stats.binned_statistic(
    D["age"], D["galvr"], statistic="std", bins=age_bins
)

mean_Vz, bin_edge, inds = scipy.stats.binned_statistic(
    D["age"], D["galvz"], statistic="mean", bins=age_bins
)
std_Vz, bin_edge, inds = scipy.stats.binned_statistic(
    D["age"], D["galvz"], statistic="std", bins=age_bins
)

mean_Vt, bin_edge, inds = scipy.stats.binned_statistic(
    D["age"], D["galvt"], statistic="mean", bins=age_bins
)
std_Vt, bin_edge, inds = scipy.stats.binned_statistic(
    D["age"], D["galvt"], statistic="std", bins=age_bins
)

bincen = [
    (bin_edge[i + 1] + bin_edge[i]) / 2.0 for i in range(len(age_bins) - 1)
]  # this set the bin centers

Now that we have binned the data and determined the mean and dispersion in each velocity direction, as a sanity check (which are incredibly important to do), lets overplot the data and the binned results to ensure it looks ok.

# The Age-Velocity Dispersion Relations in APOGEE-Gaia DR2


Now we are ready to plot the age velocity disperison relations for each component for APOGEE-Gaia DR2. Please do this yourself! 

In [1]:
#plot the age as a function of velocity (VR,VT,VZ); make sure to also plot (as an errorbar) 
#   the mean velocity and its dispersion binned by age



With the data binned we can now explore the velocity disperison as a function of velocity disperison as a function of age. Plot the velocity dispersion in each age bin as a function of age.

*We now see that the age-velocity dispersion relations are roughly linear such that VELOCITY DISPERSION INCREASES with INCREASING age as expected.*

# The Age-Velocity Dispersion Relations in APOGEE-Gaia EDR3!

Let us now redo the above but with EDR3 data instead! This will require us to compute the velocities (and their uncertainties -- challenge) from the EDR3 data directly.

You will need to first cross-match (via ADQL/TOPCAT/TAP QUERY/ASTROQUERY) orginal table with ED3. We will explore how to use topcat in this workshop but its trivial to do this via an ADQL/TAP Query.

Once the cross match is done, lets read it in.

In [None]:
# in this case a cross match was done with topcat so lets load in that.

APOGEEDR3 = Table.read(
    "apogee_astroNN-DR16-v1_DR3.fits"
)  # replace with your data table
ok = np.where(
    (APOGEEDR3["parallax_error"] / APOGEEDR3["parallax"] < 0.3)
    & (APOGEEDR3["parallax"] > 0)
)
APOGEEDR3 = APOGEEDR3[ok]
APOGEEDR3["dist_dr3"] = (
    1.0 / APOGEEDR3["parallax"]
)  # for easy lets just assume that the distance is 1/parallax

# lets explore the columns as well; take note of the parallax proper motion etc.
print(APOGEEDR3.colnames)

With the data set now loaded lets start by making a definition that will compute velocities with astropy.

In [None]:
# --- ok lets now compute some velocities with EDR3 data! We start by making a defintion to do this. Plea
def compute_vels(ra, dec, pmra, pmdec, rv, dist, V0=[11.1, 245.0, 7.25], R0=8.3):
    # define an ICRS coord for each star
    icrs = ICRS(
        ra=ra * u.deg,
        dec=dec * u.deg,
        distance=dist * u.kpc,
        pm_ra_cosdec=pmra * u.mas / u.yr,
        pm_dec=pmdec * u.mas / u.yr,
        radial_velocity=rv * u.km / u.s,
    )

    # Define the Galactic non-rotating rest frame: (V0 = solar velocity in Galactic rest fram; R0 = solar radius)
    v_sun = apycord.CartesianDifferential(V0 * u.km / u.s)
    gc_frame = apycord.Galactocentric(
        galcen_distance=R0 * u.kpc, z_sun=25.0 * u.pc, galcen_v_sun=v_sun
    )
    # convert to GC frame
    cg = icrs.transform_to(gc_frame)
    # cg.representation= 'cartesian'
    cg.representation = "cylindrical"  # and now to clylindrical coords
    VR = cg.d_rho.to(u.km / u.s).value
    VT = (
        (cg.d_phi * cg.rho).to(u.km / u.s, equivalencies=u.dimensionless_angles()).value
    )
    VZ = cg.d_z.to(u.km / u.s).value

    return VR, -VT, VZ

With the definition, we can now convert the observables into velocities by calling the defintion from above. We can also do this iterativley where in each iteration we perturb the input observables by the uncertainties. This will enable us to estimate the velocity uncertainties via monte carlo realizations. NOTE: This assumes that the uncertainties in the observables are INDEPENDENT! Really one should actually sample from the covariance matrix (provided by Gaia)

In [None]:
# ----lets now deal with the velocity uncertinaties via monte carlo realizations

Now that we have the velocities and their uncertainties with EDR3 data, lets make the same quality control cuts as before.

In [None]:
ok = np.where(
    (APOGEEDR3["galvr_err_dr3"] < v_err_lim)
    & (APOGEEDR3["galvt_err_dr3"] < v_err_lim)
    & (APOGEEDR3["galvz_err_dr3"] < v_err_lim)
    & (APOGEEDR3["age_total_error"] / APOGEEDR3["age"] < age_err_lim)
    & (APOGEEDR3["dist_error"] / APOGEEDR3["dist"] < dist_err_lim)
    & (APOGEEDR3["age"] > 0)
)[0]

print("There are %i stars that pass the quality controls" % len(ok))
D_DR3 = APOGEEDR3[ok]

We can now plot the EDR3 version of the age velocity relations. Just as above, please plot the age as a function of velocity but this time for the DR3 velocities you computed 

We must now bin the data in the same way (by age) as for the DR3 results and compute the mean and std of the velocity in each age bin.

In [None]:
mean_VR_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age"], D_DR3["galvr_dr3"], statistic="mean", bins=age_bins
)
std_VR_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age"], D_DR3["galvr_dr3"], statistic="std", bins=age_bins
)

mean_Vz_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age"], D_DR3["galvz_dr3"], statistic="mean", bins=age_bins
)
std_Vz_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age"], D_DR3["galvz_dr3"], statistic="std", bins=age_bins
)

mean_Vt_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age"], D_DR3["galvt_dr3"], statistic="mean", bins=age_bins
)
std_Vt_dr3, bin_edge, inds = scipy.stats.binned_statistic(
    D_DR3["age"], D_DR3["galvt_dr3"], statistic="std", bins=age_bins
)

bincen = [
    (bin_edge[i + 1] + bin_edge[i]) / 2.0 for i in range(len(age_bins) - 1)
]  # this set the bin centers

Finally, we plot the age velocity dispersion relations for APOGEE-DR2 and APOGEE-EDR3 datasets. 

*We now see that the age-velocity dispersion relations are roughly linear such that VELOCITY DISPERSION INCREASES with INCREASING age as expected. If there are some differences between DR2 and EDR3, why might they exist.*

Challenge: If you finish, try to :

1. Fit a linear function to age velocity dispersion relation and note the model paraameters to compare to the GALAH TEAM


2. figure out a way to define the age velocity realtion *without* binning the data. (Hint: This can be done via writing down a linear model for how velocity dispersion depends on age and then comparing that model in the data space.)