Importing the required packages

In [20]:
# Import the pages I need here
from astropy.table import Table
import numpy as np          # For number handling
import pandas as pd         # For table/data management
import matplotlib.pyplot    # For plotting data

The code will be sorted into two different parts.

The goal for this piece of code is:
- Clean K2 and APOKSAC Data
- Using K2 and APOKSAC Data to train models
- Use K2 model to predict the mass for APOGEE and GALAH
- Use APOKSAC model to predict mass for APOGEE

The goal is to get a table contain the following for all the data sets and use this to predict the mass of stars.
The table include parameters such as:

- Effective Temperature
- Log g
- Iron Abundance [Fe/H]
- Alpha Abundance [alpha/Fe]
- Nitrogen Abundance [N/Fe]
- Oxygen Abundance [O/Fe]

# Cleaning the GALAH Data

In [None]:
# Import the data and change into a data frame
galah = Table.read("Data/galah_dr4_allspec_220713.fits", format = "fits")
names = [name for name in galah.colnames if len(galah[name].shape) <= 1]
galah = galah[names].to_pandas()

For Apogee Data, there are a few recommendations (this usually indicates that the data has some issues, e.g. unreliably broadening and other issues) for the flag that should be applied:
- `snr_c3_iraf > 30`
- `flag_sp == 0`
- For any element `X`, the flag `flag_X_fe == 0`

More information can be found [the GALAH Survey Website](https://www.galah-survey.org/dr3/flags/)

In [None]:
# Filter the data
# galah = galah.loc[galah["snr_c3_iraf"] > 30]
galah = galah.loc[galah["flag_sp"] == 0]
galah = galah.loc[galah["flag_fe_h"] == 0]
galah = galah.loc[galah["flag_al_fe"] == 0]
galah = galah.loc[galah["flag_n_fe"] == 0]
galah = galah.loc[galah["flag_o_fe"] == 0]

Now the appropriate flag has been used, we extract only the useful columns and save the data for later.

In [None]:
galah = galah[["teff", "logg", "fe_h", "al_fe", "n_fe", "o_fe"]]
galah.to_csv("Data/cleaned_galah.csv", index=False)


# Cleaning APOGEE Data

In [4]:
apogee = Table.read("Data/allStar-dr17-synspec_rev1.fits", format="fits", hdu=1)
names = [name for name in apogee.colnames if len(apogee[name].shape) <= 1]
apogee = apogee[names].to_pandas()

For APOGEE, we do something similar by making sure all the parameters are working on are good.
More information can be found [SDSS explanantion of bitmasks](https://www.sdss.org/dr13/algorithms/bitmasks/#APOGEE_ASPCAPFLAG) and [SDSS recommendations for bitmasks](https://www.sdss.org/dr13/irspec/abundances/)

The parameters
- `ASPCAPFLAG & STAR_BAD == 0` (`STAR_BAD` is the 23rd bit (starting from 0), or  `STAR_BAD = 0b 1000 0000 0000 0000 0000 0000`)
    - `STAR_BAD` is a general indication of whether there are issues with measuring surface temperature, surface gravity, and check if any elements have `GRIDEDGE_BAD` set
- `ELEMFLAG`
    - Any example of this is `FE_H_FLAG`
        - Make sure that the flag does not contain `GRIDEDGE_BAD`, `GRIDEDGE_WARN` and `CALRANGE_BAD`
    - We should just check that the flag is equal to zero to eliminate all and any problematic behaviour

In [11]:
# Filter the data
STAR_BAD = 1 << 23
apogee = apogee.loc[apogee['ASPCAPFLAG'] & STAR_BAD == 0]
apogee = apogee.loc[apogee['FE_H_FLAG'] == 0]
apogee = apogee.loc[apogee['AL_FE_FLAG'] == 0]
apogee = apogee.loc[apogee['N_FE_FLAG'] == 0]
apogee = apogee.loc[apogee['O_FE_FLAG'] == 0]

Extract the useful column and save for later

In [14]:
apogee = apogee[["TEFF", "LOGG", "FE_H", "AL_FE", "N_FE", "O_FE"]]
apogee.columns = apogee.columns.str.lower()
apogee.to_csv("Data/cleaned_apogee.csv", index=False)

# Cleaning APOKSAC 2 Data
TODO


# Cleaning K2 Data

In [25]:
k2_cols = ["EPIC", "kappa-r-c", "e_kappa_r-c", "kappa-m-c", "e_kappa-m-c", "kappa-r-A2Z", "kappa-r-c-BAM", "kappa-r-c-BHM", "kappa-r-c-CAN", "kappa-r-c-SYD", "e_kappa-r-c-A2Z", "e_kappa-r-c-BAM", "e_kappa-r-c-BHM", "e_kappa-r-c-CAN", "e_kappa-r-c-COR", "e_kappa-r-c-SYD", "kappa-m_A2Z", "kappa-m-c-BAM", "kappa-m-c-BHM", "kappa-m-c-CAN", "kappa-m-c-COR", "kappa-m-c-SYD", "e_kappa-m_A2Z", "e_kappa-m-c-BAM", "e_kappa-m-c-BHM", "e_kappa-m-c-CAN", "e_kappa-m-c-COR", "e_kappa-m-c-SYD"]
k2_col_width = [
    (1, 9), (11, 16), (18, 23), (25, 29), (18, 23),
    (25, 29), (31, 35), (37, 42), (44, 49), (51, 56),
    (58, 63), (65, 70), (72, 77), (79, 84), (86, 91),
    (93, 98), (100, 104), (106, 110), (112, 118), (120, 124),
    (126, 130), (137, 143), (145, 149), (151, 155), (157, 162),
    (164, 169), (171, 175), (177, 181), (183, 187), (189, 194)
]
k2 = pd.read_fwf("Data/k2_mass_coefficient_data.txt", colspecs=k2_cols)
k2

  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), (18, 23), (25, 29), (18, 23)
  (1, 9), (11, 16), 

TypeError: 'tuple' object is not callable