# Example: Create input files for KPM

In this notebook, we will convert stellar abundances as taken from APOGEE DR17 into a format needed for KPM. We will remove untrustworthy data and construct a C+N abundance. KPM requires two inputs: an ```alldata``` file and an ```allivars``` file. Both need to be numpy arrays with shape(number of stars, number of elements). I (Emily) like named arrays, so I'll create a pandas DataFrame and then convert it to a numpy array at the end.

In [1]:
from matplotlib import pyplot as plt 
from matplotlib.colors import LogNorm
import numpy as np
from astropy.io import fits
import pandas as pd

Read in the APOGEE file that you want to work with. Note that this file is not contained within this example directory, so you will need to download it and change the path.

In [2]:
name='/Users/emilygriffith/NSF/SDSS_Data/allStarLite-dr17-synspec_rev1.fits'
hdu = fits.open(name)
hdr=hdu[1].header
data = hdu[1].data
hdu.close()

We do not want to use the bad measurements in APOGEE, so we remove stars with the star bad and no aspcap results flags set. I include other cuts to slightly restrict the sample in temperatures, logg, and metallicity. These cuts will recreate the ```alldata_test.npy``` sample in the input folder. 

Alternatively, you could load your own fits file of APOGEE data, so long as it has the named abundance columns used later.

In [3]:
mask = np.where(((data['ASPCAPFLAG'] & (2**23)) == 0) & #star bad
                ((data['ASPCAPFLAG'] & (2**31)) == 0) & #no aspcap results
                ((data['EXTRATARG'] == 0 )) &
                (data['TEFF'] <= 6000) &
                (data['TEFF'] >= 3200) &
                (data['LOGG'] <= 3.5) &
                (data['LOGG'] >= -3) &
                ((data['MG_FE'] + data['FE_H']) >= -0.75) 
                )[0]

data_sub = data[mask]

Next we will use the APOGEE data to construct the ```alldata``` and ```allivars``` arrays. The order of elements in these arrays will be the same as the ```elements``` array below. We fill ```alldata``` with the [X/H] abundances and ```allivars``` with the inverse variance of [X/Fe]. Since the error on Fe is small, this is a good approximation. 

In [4]:
elements  = np.array(['Mg','O','Si','S','Ca','CN','Na','Al','K','Cr','Fe','Ni','Mn','Co','Ce'])
N = len(data_sub)
M = len(elements)

In [5]:
alldata = np.zeros([len(data_sub), len(elements)])
allivars = np.zeros([len(data_sub), len(elements)])


for i,e in enumerate(elements):
    if e=='Fe':
        XH = data_sub['FE_H']
        XH_err = data_sub['FE_H_ERR']
        XH_ivar = 1/(XH_err**2)
        mask = np.where(np.isnan(XH) | np.isnan(XH_ivar) | (data_sub['FE_H_FLAG']!=0.))
        XH[mask] = 0.0
        XH_ivar[mask] = 0.0
    elif e=='CN':
        CH = data_sub['C_FE'] + data_sub['FE_H']
        NH = data_sub['N_FE'] + data_sub['FE_H']
        # Using equation and formalization that W22 use
        XH = np.log10(10**(CH+8.39) + 10**(NH+7.78)) - np.log10(10**8.39 + 10**7.78)
        XH_err = data_sub['C_FE_ERR']
        XH_ivar = 1/(XH_err**2)
        mask = np.where(np.isnan(XH) | np.isnan(XH_ivar) | (data_sub['C_FE_FLAG']!=0.) | 
                        (data_sub['N_FE_FLAG']!=0.))
        XH[mask] = 0.0
        XH_ivar[mask] = 0.0
    else:
        XH = data_sub[e.upper()+'_FE'] + data_sub['FE_H']
        XH_err = data_sub[e.upper()+'_FE_ERR']
        XH_ivar = 1/(XH_err**2)
        mask = np.where(np.isnan(XH) | np.isnan(XH_ivar) | (data_sub[e.upper()+'_FE_FLAG']!=0.))
        XH[mask] = 0.0
        XH_ivar[mask] = 0.0
    alldata[:,i] = XH
    allivars[:,i] = XH_ivar

Save the files

In [6]:
np.save('input/alldata_example', alldata)
np.save('input/allivars_example', allivars)