The goal for this piece of code is:
- Clean the data for the following datasets:
    - K2
    - APOKSAC
    - APOGEE
    - GALAH

After we have cleaned the datas is
- Use K2 model to predict the mass for APOGEE and GALAH
- Use APOKSAC model to predict mass for APOGEE

The goal is to get a table contain the following for all the data sets and use this to predict the mass of stars.
The table include parameters such as:

- Effective Temperature
- Log g
- Iron Abundance [Fe/H]
- Alpha Abundance [alpha/Fe]
- Carbon Abundance [C/Fe]
- Nitrogen Abundance [N/Fe]
- Oxygen Abundance [O/Fe]

In addition, we will extract the stars id to allow for identification

- `tmass_id` 2MASS IDs
- `gaiadr3_source_id` Gaia Id

Importing the required packages

In [9]:
from astropy.table import Table
import pandas as pd         # For table/data management

# Cleaning the GALAH Data

In [4]:
# Import the data and change into a data frame
galah = Table.read("Data/galah_dr4_allspec_220713.fits", format = "fits")
names = [name for name in galah.colnames if len(galah[name].shape) <= 1]
galah = galah[names].to_pandas()

For Apogee Data, there are a few recommendations (this usually indicates that the data has some issues, e.g. unreliably broadening and other issues) for the flag that should be applied:
- `snr_c3_iraf > 30`
- `flag_sp == 0`
- For any element `X`, the flag `flag_X_fe == 0`

More information can be found [the GALAH Survey Website](https://www.galah-survey.org/dr3/flags/)

In [51]:
# Filter the data
# galah = galah.loc[galah["snr_c3_iraf"] > 30]
galah = galah.loc[galah["flag_sp"] == 0]
galah = galah.loc[galah["flag_fe_h"] == 0]
galah = galah.loc[galah["flag_al_fe"] == 0]
galah = galah.loc[galah["flag_c_fe"] == 0]
galah = galah.loc[galah["flag_n_fe"] == 0]
galah = galah.loc[galah["flag_o_fe"] == 0]

Now the appropriate flag has been used, we extract only the useful columns and save the data for later.

In [52]:
galah = galah[["tmass_id", "gaiadr3_source_id", "teff", "logg", "fe_h", "al_fe", "c_fe", "n_fe", "o_fe"]]
# Rmeove all rows with null values
galah.dropna()
galah.to_csv("Data/cleaned_galah.csv", index=False)

# Cleaning APOGEE Data

In [5]:
apogee = Table.read("Data/allStar-dr17-synspec_rev1.fits", format="fits")



KeyboardInterrupt: 

In [None]:
apogee = Table.read("Data/allStar-dr17-synspec_rev1.fits", format="fits", hdu=1)
names = [name for name in apogee.colnames if len(apogee[name].shape) <= 1]
apogee = apogee[names].to_pandas()

For APOGEE, we do something similar by making sure all the parameters are working on are good.
More information can be found [SDSS explanantion of bitmasks](https://www.sdss.org/dr13/algorithms/bitmasks/#APOGEE_ASPCAPFLAG) and [SDSS recommendations for bitmasks](https://www.sdss.org/dr13/irspec/abundances/)

The parameters
- `ASPCAPFLAG & STAR_BAD == 0` (`STAR_BAD` is the 23rd bit (starting from 0), or  `STAR_BAD = 0b 1000 0000 0000 0000 0000 0000`)
    - `STAR_BAD` is a general indication of whether there are issues with measuring surface temperature, surface gravity, and check if any elements have `GRIDEDGE_BAD` set
- `ELEMFLAG`
    - Any example of this is `FE_H_FLAG`
        - Make sure that the flag does not contain `GRIDEDGE_BAD`, `GRIDEDGE_WARN` and `CALRANGE_BAD`
    - We should just check that the flag is equal to zero to eliminate all and any problematic behaviour

In [3]:
# Filter the data
STAR_BAD = 1 << 23
apogee = apogee.loc[apogee['ASPCAPFLAG'] & STAR_BAD == 0]
apogee = apogee.loc[apogee['FE_H_FLAG'] == 0]
apogee = apogee.loc[apogee['C_FE_FLAG'] == 0]
apogee = apogee.loc[apogee['AL_FE_FLAG'] == 0]
apogee = apogee.loc[apogee['N_FE_FLAG'] == 0]
apogee = apogee.loc[apogee['O_FE_FLAG'] == 0]

In [5]:
apogee

Unnamed: 0,FILE,APOGEE_ID,TARGET_ID,APSTAR_ID,ASPCAP_ID,TELESCOPE,LOCATION_ID,FIELD,ALT_ID,RA,...,CU_FE_ERR,CU_FE_FLAG,CE_FE,CE_FE_SPEC,CE_FE_ERR,CE_FE_FLAG,YB_FE,YB_FE_SPEC,YB_FE_ERR,YB_FE_FLAG
0,b'apStar-dr17-VESTA.fits',b'VESTA',b'apo1m.calibration.VESTA',b'apogee.apo1m.stars.calibration.VESTA',b'apogee.apo1m.synspec_fix.calibration.VESTA',b'apo1m',1,b'calibration',b' ',,...,0.059403,2,,,,64,,,1.0,2
5,b'asStar-dr17-2M00000035-7323394.fits',b'2M00000035-7323394',b'lco25m.SMC12.2M00000035-7323394',b'apogee.lco25m.stars.SMC12.2M00000035-7323394',b'apogee.lco25m.synspec_fix.SMC12.2M00000035-7...,b'lco25m',7218,b'SMC12',b'none',0.001467,...,0.276795,2,-0.155160,-0.099500,0.146837,0,,,1.0,2
6,b'apStar-dr17-2M00000068+5710233.fits',b'2M00000068+5710233',b'apo25m.N7789.2M00000068+5710233',b'apogee.apo25m.stars.N7789.2M00000068+5710233',b'apogee.apo25m.synspec_fix.N7789.2M00000068+5...,b'apo25m',4264,b'N7789',b'none',0.002850,...,0.004401,2,-0.055580,0.000080,0.086572,0,,,1.0,2
8,b'apStar-dr17-2M00000133+5721163.fits',b'2M00000133+5721163',b'apo25m.NGC7789_btx.2M00000133+5721163',b'apogee.apo25m.stars.NGC7789_btx.2M00000133+5...,b'apogee.apo25m.synspec_fix.NGC7789_btx.2M0000...,b'apo25m',5922,b'NGC7789_btx',b'none',0.005558,...,0.025131,2,-0.256490,-0.200830,0.132389,0,,,1.0,2
9,b'apStar-dr17-2M00000211+6327470.fits',b'2M00000211+6327470',b'apo25m.117+01.2M00000211+6327470',b'apogee.apo25m.stars.117+01.2M00000211+6327470',b'apogee.apo25m.synspec_fix.117+01.2M00000211+...,b'apo25m',4591,b'117+01',b'none',0.008802,...,,259,0.194041,0.249701,0.070006,0,,,1.0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
733891,b'apStar-dr17-2M23595676+7918072.fits',b'2M23595676+7918072',b'apo25m.120+18.2M23595676+7918072',b'apogee.apo25m.stars.120+18.2M23595676+7918072',b'apogee.apo25m.synspec_fix.120+18.2M23595676+...,b'apo25m',5047,b'120+18',b'none',359.986523,...,,259,-0.193610,-0.137950,0.043006,0,,,1.0,2
733892,b'apStar-dr17-2M23595756+8535275.fits',b'2M23595756+8535275',b'apo25m.N188.2M23595756+8535275',b'apogee.apo25m.stars.N188.2M23595756+8535275',b'apogee.apo25m.synspec_fix.N188.2M23595756+85...,b'apo25m',5067,b'N188',b'none',359.989841,...,0.031619,2,,,,64,,,1.0,2
733893,b'apStar-dr17-2M23595837+5700444.fits',b'2M23595837+5700444',b'apo25m.N7789.2M23595837+5700444',b'apogee.apo25m.stars.N7789.2M23595837+5700444',b'apogee.apo25m.synspec_fix.N7789.2M23595837+5...,b'apo25m',4264,b'N7789',b'none',359.993224,...,0.016774,2,-0.035640,0.020020,0.141408,0,,,1.0,2
733897,b'apStar-dr17-2M23595886+5726058.fits',b'2M23595886+5726058',b'apo25m.116-04.2M23595886+5726058',b'apogee.apo25m.stars.116-04.2M23595886+5726058',b'apogee.apo25m.synspec_fix.116-04.2M23595886+...,b'apo25m',4424,b'116-04',b'none',359.995265,...,,259,-0.058750,-0.003090,0.072443,0,,,1.0,2


Extract the useful column and save for later

In [18]:
apogee = apogee[["TEFF", "LOGG", "FE_H", "AL_FE", "C_FE", "N_FE", "O_FE"]]
apogee.columns = apogee.columns.str.lower()
apogee.dropna()
apogee.to_csv("Data/cleaned_apogee.csv", index=False)

# Cleaning APOKSAC 2 Data
TODO: Not sure how to do this


# Cleaning K2 Data
To do so, we need to

1. extract the mass coefficient from the data set
2. find the effective temperature by finding the data in the GALAH data set by matching the EPIC identifier
3. determine the mass
4. extract the chemical composition of stars
5. store the data for use

For the purpose of this, we will keep the mass of stars as a multiple of solar mass.

In [40]:
# k2_cols = ["EPIC", "kappa-r-c", "e_kappa_r-c", "kappa-m-c", "e_kappa-m-c", "kappa-r-A2Z", "kappa-r-c-BAM", "kappa-r-c-BHM", "kappa-r-c-CAN", "kappa-r-c-SYD", "e_kappa-r-c-A2Z", "e_kappa-r-c-BAM", "e_kappa-r-c-BHM", "e_kappa-r-c-CAN", "e_kappa-r-c-COR", "e_kappa-r-c-SYD", "kappa-m_A2Z", "kappa-m-c-BAM", "kappa-m-c-BHM", "kappa-m-c-CAN", "kappa-m-c-COR", "kappa-m-c-SYD", "e_kappa-m_A2Z", "e_kappa-m-c-BAM", "e_kappa-m-c-BHM", "e_kappa-m-c-CAN", "e_kappa-m-c-COR", "e_kappa-m-c-SYD"]

# k2_col_width = [
#     (0, 9), (10, 16), (18, 23), (25, 29), (18, 23),
#     (25, 29), (31, 35), (37, 42), (44, 49), (51, 56),
#     (58, 63), (65, 70), (72, 77), (79, 84), (86, 91),
#     (93, 98), (100, 104), (106, 110), (112, 118), (120, 124),
#     (126, 130), (137, 143), (145, 149), (151, 155), (157, 162),
#     (164, 169), (171, 175), (177, 181), (183, 187), (189, 194)
# ]
k2_cols = ["EPIC", "kappa-m-c"]
k2_col_width = [(0, 9), (24, 29)]

k2 = pd.read_fwf(
    "Data/k2_mass_coefficients.txt",
    skiprows=50,
    header=None,
    index_col=False,
    names = k2_cols,
    colspecs = k2_col_width
)
# k2.columns = k2_cols
k2
# According to the International Astronomical Union (IAU), effective temperature of the sun is 5772 +- 0.8K
teff_sun = 5772

Unnamed: 0,EPIC,kappa-m-c
0,201051625,1.324
1,201079541,1.523
2,201082897,1.027
3,201084449,1.544
4,201084664,1.076
...,...,...
18817,251619708,0.880
18818,251620873,0.663
18819,251621333,1.193
18820,251622742,1.497


To determine the mass for K2 stars, we use the following formula (from K2 Galactic Archaeology Program DR3):

$$ M/M_\odot = \kappa_M (T_{eff} / T_{eff, \odot})^{3/2} $$


In [None]:
#