The goal for this piece of code is:
- Clean the data for the following datasets:
    - K2
    - APOKSAC
    - APOGEE
    - GALAH

After we have cleaned the datas is
- Use K2 model to predict the mass for APOGEE and GALAH
- Use APOKSAC model to predict mass for APOGEE

The goal is to get a table contain the following for all the data sets and use this to predict the mass of stars.
The table include parameters such as:

- Effective Temperature
- Log g
- Iron Abundance [Fe/H]
- Alpha Abundance [alpha/Fe]
- Carbon Abundance [C/Fe]
- Nitrogen Abundance [N/Fe]
- Oxygen Abundance [O/Fe]

In addition, we will extract the stars id to allow for identification

- `tmass_id` 2MASS IDs
- `gaiadr3_source_id` Gaia Id

Importing the required packages

In [1]:
from astropy.table import Table
import pandas as pd         # For table/data management

# Cleaning the GALAH Data

In [67]:
# Import the data and change into a data frame
def importGalah():
    galah = Table.read("RawData/galah_dr4_allspec_220713.fits", format = "fits")
    names = [name for name in galah.colnames if len(galah[name].shape) <= 1]
    galah = galah[names].to_pandas()
    return galah
galah = importGalah()

In [72]:
galah

Unnamed: 0,sobject_id,tmass_id,gaiadr3_source_id,ra,dec,flag_sp,chi2_sp,model_name,teff,e_teff,...,rv_k_is,ew_dib5780,sigma_dib5780,rv_dib5780,ew_dib5797,sigma_dib5797,rv_dib5797,ew_dib6613,sigma_dib6613,rv_dib6613
0,131216001101002,b'05190449-5849304',4762794963745841536,79.768745,-58.825127,0,0.784917,b'5000_3.00_-0.50',4949.361328,14.192226,...,125.273613,0.021536,0.649520,66.809517,-0.004440,0.060115,-16.125017,-0.008195,0.125106,6.747653
1,131216001101004,b'05194296-5852488',4762782766038731776,79.929016,-58.880241,0,0.730410,b'5000_3.00_-0.75',5065.323242,12.553386,...,-110.495575,0.017707,0.545415,-20.512051,0.013530,0.379023,12.942771,-0.003307,0.043107,-20.483330
2,131216001101006,b'05242175-5855050',4762746688313325568,81.090652,-58.918076,16,1.001766,b'5750_3.50_0.00',5258.936035,65.913292,...,-0.085152,-0.178104,0.556178,-72.778915,-0.000076,0.471259,14.315667,-0.040042,0.082871,-9.077340
3,131216001101007,b'05235853-5855322',4762746963191266560,80.993896,-58.925625,0,0.816937,b'4500_2.50_-0.50',4602.764648,12.271515,...,9.199572,0.020020,0.257478,25.391211,-0.014224,0.077454,-49.922047,0.008121,0.037796,12.077354
4,131216001101008,b'05250796-5856306',4762744661088764928,81.283173,-58.941841,0,0.796724,b'5250_4.50_-0.50',5173.950684,22.877005,...,-184.594482,-0.010327,0.014192,11.480436,-0.014709,0.015565,-28.896576,-0.000141,0.454338,5.718824
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
911720,220422002101393,b'08505928-4225003',5524515075919164544,132.746948,-42.416679,0,0.669564,b'6000_4.50_0.25',5935.249023,17.983692,...,-15.240196,0.017141,0.085784,-25.076977,0.018373,0.091060,-10.782121,0.023231,0.265323,1.156595
911721,220422002101394,b'08483800-4232587',5524192919009754880,132.158279,-42.549587,6,0.832816,b'5750_4.00_-0.25',5587.684570,83.439255,...,45.337360,-0.027562,0.041541,-17.110561,0.000099,0.351173,-0.006665,-0.016863,0.038175,3.131748
911722,220422002101395,b'08482525-4228066',5524194327759020032,132.105194,-42.468460,0,0.665281,b'5750_4.50_0.00',5872.177246,18.244131,...,-130.715927,0.020608,0.056806,-19.368908,0.016501,0.105145,-17.384577,-0.000045,0.344765,0.279729
911723,220422002101396,b'08494064-4230009',5524153130428123392,132.419174,-42.500202,6,0.700411,b'7500_4.00_-0.25',7594.551270,5.249479,...,61.075535,0.020259,0.249044,-21.283625,0.015279,0.200866,-0.365067,-0.000869,0.020779,28.197634


For Apogee Data, there are a few recommendations (this usually indicates that the data has some issues, e.g. unreliably broadening and other issues) for the flag that should be applied:
- `snr_c3_iraf > 30`
- `flag_sp == 0`
- For any element `X`, the flag `flag_X_fe == 0`

More information can be found [the GALAH Survey Website](https://www.galah-survey.org/dr3/flags/)

In [11]:
# Filter the data
# galah = galah.loc[galah["snr_c3_iraf"] > 30]
galah = galah.loc[galah["flag_sp"] == 0]
galah = galah.loc[galah["flag_fe_h"] == 0]
galah = galah.loc[galah["flag_c_fe"] == 0]
galah = galah.loc[galah["flag_n_fe"] == 0]
galah = galah.loc[galah["flag_o_fe"] == 0]
galah = galah.loc[galah["flag_li_fe"] == 0]

Now the appropriate flag has been used, we extract only the useful columns and save the data for later.

In [13]:
# Remove all rows with null values
galah.dropna()
galah.to_csv("ProcessedData/cleaned_galah.csv", index=False)

In [15]:
galah[["red_alpha_fe", "ew_h_alpha"]]

Unnamed: 0,red_alpha_fe,ew_h_alpha
5,0.132004,0.018184
8,0.075313,-0.036045
11,0.029919,-0.042364
16,-0.014889,-0.029333
18,0.117063,-0.031519
...,...,...
911431,0.120585,0.061411
911484,0.166184,0.004074
911566,0.018094,0.007845
911568,0.121439,0.158733


# Cleaning APOGEE Data

In [1]:
apogee = Table.read("RawData/allStar-dr17-synspec_rev1.fits", format="fits", hdu=1)
names = [name for name in apogee.colnames if len(apogee[name].shape) <= 1]
apogee = apogee[names].to_pandas()

In [3]:
# Manipulate APOGEE id to 2MASS id
apogee = apogee.rename(columns = {"APOGEE_ID": "tmass_id"})
apogee

Unnamed: 0,FILE,tmass_id,TARGET_ID,APSTAR_ID,ASPCAP_ID,TELESCOPE,LOCATION_ID,FIELD,ALT_ID,RA,...,CU_FE_ERR,CU_FE_FLAG,CE_FE,CE_FE_SPEC,CE_FE_ERR,CE_FE_FLAG,YB_FE,YB_FE_SPEC,YB_FE_ERR,YB_FE_FLAG
0,b'apStar-dr17-VESTA.fits',b'VESTA',b'apo1m.calibration.VESTA',b'apogee.apo1m.stars.calibration.VESTA',b'apogee.apo1m.synspec_fix.calibration.VESTA',b'apo1m',1,b'calibration',b' ',,...,0.059403,2,,,,64,,,1.0,2
1,b'apStar-dr17-2M00000002+7417074.fits',b'2M00000002+7417074',b'apo25m.120+12.2M00000002+7417074',b'apogee.apo25m.stars.120+12.2M00000002+7417074',b'apogee.apo25m.synspec_fix.120+12.2M00000002+...,b'apo25m',5046,b'120+12',b'none',0.000103,...,0.001221,2,,,,64,,,1.0,2
2,b'apStar-dr17-2M00000019-1924498.fits',b'2M00000019-1924498',b'apo25m.060-75.2M00000019-1924498',b'apogee.apo25m.stars.060-75.2M00000019-1924498',b'apogee.apo25m.synspec_fix.060-75.2M00000019-...,b'apo25m',5071,b'060-75',b'none',0.000832,...,0.102594,2,,,,64,,,1.0,2
3,b'apStar-dr17-2M00000032+5737103.fits',b'2M00000032+5737103',b'apo25m.116-04.2M00000032+5737103',b'apogee.apo25m.stars.116-04.2M00000032+5737103',b'apogee.apo25m.synspec_fix.116-04.2M00000032+...,b'apo25m',4424,b'116-04',b'none',0.001335,...,0.123839,2,,,,64,,,1.0,2
4,b'apStar-dr17-2M00000032+5737103.fits',b'2M00000032+5737103',b'apo25m.N7789.2M00000032+5737103',b'apogee.apo25m.stars.N7789.2M00000032+5737103',b'apogee.apo25m.synspec_fix.N7789.2M00000032+5...,b'apo25m',4264,b'N7789',b'none',0.001335,...,0.107103,2,,,,64,,,1.0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
733896,b'apStar-dr17-2M23595886-2009435.fits',b'2M23595886-2009435',b'apo25m.060-75.2M23595886-2009435',b'apogee.apo25m.stars.060-75.2M23595886-2009435',b'apogee.apo25m.synspec_fix.060-75.2M23595886-...,b'apo25m',5071,b'060-75',b'none',359.995258,...,,258,,,,64,,,,2
733897,b'apStar-dr17-2M23595886+5726058.fits',b'2M23595886+5726058',b'apo25m.116-04.2M23595886+5726058',b'apogee.apo25m.stars.116-04.2M23595886+5726058',b'apogee.apo25m.synspec_fix.116-04.2M23595886+...,b'apo25m',4424,b'116-04',b'none',359.995265,...,,259,-0.058750,-0.003090,0.072443,0,,,1.0,2
733898,b'apStar-dr17-2M23595921+5609479.fits',b'2M23595921+5609479',b'apo25m.NGC7789_MGA.2M23595921+5609479',b'apogee.apo25m.stars.NGC7789_MGA.2M23595921+5...,b'apogee.apo25m.synspec_fix.NGC7789_MGA.2M2359...,b'apo25m',-999,b'NGC7789_MGA',b'none',359.996744,...,,259,-0.103446,-0.047786,0.097572,0,,,1.0,2
733899,b'asStar-dr17-2M23595949-7342592.fits',b'2M23595949-7342592',b'lco25m.SMC12.2M23595949-7342592',b'apogee.lco25m.stars.SMC12.2M23595949-7342592',b'apogee.lco25m.synspec_fix.SMC12.2M23595949-7...,b'lco25m',7218,b'SMC12',b'none',359.997887,...,0.174322,2,0.107710,0.163370,0.140188,0,,,1.0,2


For APOGEE, we do something similar by making sure all the parameters are working on are good.
More information can be found [SDSS explanantion of bitmasks](https://www.sdss.org/dr13/algorithms/bitmasks/#APOGEE_ASPCAPFLAG) and [SDSS recommendations for bitmasks](https://www.sdss.org/dr13/irspec/abundances/)

The parameters
- `ASPCAPFLAG & STAR_BAD == 0` (`STAR_BAD` is the 23rd bit (starting from 0), or  `STAR_BAD = 0b 1000 0000 0000 0000 0000 0000`)
    - `STAR_BAD` is a general indication of whether there are issues with measuring surface temperature, surface gravity, and check if any elements have `GRIDEDGE_BAD` set
- `ELEMFLAG`
    - Any example of this is `FE_H_FLAG`
        - Make sure that the flag does not contain `GRIDEDGE_BAD`, `GRIDEDGE_WARN` and `CALRANGE_BAD`
    - We should just check that the flag is equal to zero to eliminate all and any problematic behaviour

In [4]:
# Filter the data
STAR_BAD = 1 << 23
apogee = apogee.loc[apogee['ASPCAPFLAG'] & STAR_BAD == 0]
apogee = apogee.loc[apogee['FE_H_FLAG'] == 0]
apogee = apogee.loc[apogee['C_FE_FLAG'] == 0]
apogee = apogee.loc[apogee['N_FE_FLAG'] == 0]
apogee = apogee.loc[apogee['O_FE_FLAG'] == 0]
apogee = apogee.loc[apogee['RV_FLAG'] == 0]
apogee.columns = apogee.columns.str.lower()

In [18]:
apogee[["rv_chi2", "rv_teff", "rv_feh", "rv_carb", "rv_alpha", "rv_autofwhm", "rv_ccfwhm", "rv_logg"]]

Unnamed: 0,rv_chi2,rv_teff,rv_feh,rv_carb,rv_alpha,rv_autofwhm,rv_ccfwhm,rv_logg
0,16.541691,5980.213379,-0.203651,0.0,0.0,20.430799,25.092390,4.643887
1,82.418503,3671.801514,-0.401404,0.0,0.0,12.790598,17.408659,1.090488
5,1.918990,4667.159180,-1.353037,0.0,0.0,14.388681,19.576447,2.191759
6,9.714903,5315.282715,-0.287584,0.0,0.0,17.972471,21.645975,3.808599
8,3.004735,5121.166016,-0.693380,0.0,0.0,15.877813,19.881489,2.971028
...,...,...,...,...,...,...,...,...
733891,19.251442,4347.800781,-0.631075,0.0,0.0,14.054633,18.096855,2.160463
733892,24.161474,5304.097168,0.231426,0.0,0.0,19.664906,24.771343,4.611897
733893,2.534307,5151.955078,-0.512913,0.0,0.0,16.366066,20.079872,3.137297
733897,8.019061,5043.876465,-0.365796,0.0,0.0,15.921649,19.329823,2.973008


Extract the useful column and save for later
A quick summary of what the unusual columns mean:
- `alpha_m` is alpha-element abundance
- `ra` is Right Ascensions
- `dec` is Declination (combined with `ra` provides a mapping of where the star is on the sky)
- `rv_chi2` is the rotational velocity of the star.
- `pmra` is proper motion in right ascension direction
- `pmdec` is proper motion in the declination direction

In [5]:
apogee = apogee[["tmass_id", "teff", "logg", "fe_h", "alpha_m", "c_fe", "n_fe", "o_fe", "ra", "dec", "rv_chi2", "gaiaedr3_pmra", "gaiaedr3_pmdec"]]
apogee = apogee.dropna()
apogee.to_csv("ProcessedData/cleaned_apogee.csv", index=False)

# Cleaning APOKSAC 2 Data

In [6]:
apoksac_cols = ["kic", "tmass_id", "mass"]
apoksac_col_width = [(0, 9), (9, 26), (27, 39)]

apoksac = pd.read_fwf(
    "RawData/apk.tsv",
    skiprows=42,
    header=None,
    index_col=False,
    names = apoksac_cols,
    colspecs = apoksac_col_width
)
apoksac = apoksac[0:6676]
apoksac = apoksac.dropna()
apoksac

Unnamed: 0,kic,tmass_id,mass
0,1027110,19250937+3644599,0.985
1,1027337,19252021+3647118,1.227
2,1160789,19233280+3652207,0.875
4,1161447,19241746+3651460,1.460
5,1161618,19242614+3648478,1.183
...,...,...,...
6671,12784948,19210624+5200236,1.254
6672,12785083,19212376+5204593,1.154
6673,12785250,19214766+5205365,1.386
6674,12884116,19182431+5215519,1.081


In [7]:
apoksac.to_csv("ProcessedData/cleaned_apoksac.csv", index=False)

Match APOKSAC data with APOGEE data

Now that we have separated the APOGEE and APOKSAC data, we need to merge them and save.

In [17]:
apoksac = pd.read_csv("ProcessedData/cleaned_apoksac.csv")
apogee = pd.read_csv("ProcessedData/cleaned_apogee.csv")

In [18]:
# Change tmass_id to made apoksac
apogee.tmass_id = apogee.tmass_id.str[4:-1]

In [19]:
apogee

Unnamed: 0,tmass_id,teff,logg,fe_h,alpha_m,c_fe,n_fe,o_fe,ra,dec,rv_chi2,gaiaedr3_pmra,gaiaedr3_pmdec
0,00000002+7417074,3723.9111,0.904598,-0.160680,0.036922,0.009295,0.151220,0.083402,0.000103,74.285408,82.418500,0.048582,0.522249
1,00000035-7323394,4555.4043,1.498851,-1.171400,0.008442,-0.290710,0.475530,0.152533,0.001467,-73.394287,1.918990,0.255408,-1.269176
2,00000068+5710233,5031.2637,3.456132,-0.160280,0.055704,-0.025731,0.084176,0.113698,0.002850,57.173164,9.714903,5.791725,-12.735589
3,00000133+5721163,4974.6010,2.388251,-0.509530,0.107150,-0.069177,0.243080,0.185943,0.005558,57.354549,3.004735,-1.146765,0.364162
4,00000211+6327470,4681.4634,2.245527,-0.075011,0.007694,-0.180636,0.345594,0.027783,0.008802,63.463078,6.985389,-0.923788,-0.908006
...,...,...,...,...,...,...,...,...,...,...,...,...,...
419051,23595676+7918072,4311.3926,1.731429,-0.504450,0.246895,0.094351,0.152260,0.287303,359.986523,79.302002,19.251442,6.959701,6.008740
419052,23595756+8535275,5042.2480,4.507292,0.218410,-0.025638,-0.036199,0.152240,-0.004475,359.989841,85.590996,24.161474,1.082104,-8.449164
419053,23595837+5700444,4954.1700,2.474665,-0.333100,0.088357,-0.096586,0.248020,0.130718,359.993224,57.012356,2.534307,-3.214342,-0.776885
419054,23595886+5726058,4859.9087,2.480687,-0.236560,0.065434,-0.060314,0.201880,0.088913,359.995265,57.434956,8.019061,-0.458060,-2.734545


In [20]:
apoksac

Unnamed: 0,kic,tmass_id,mass
0,1027110,19250937+3644599,0.985
1,1027337,19252021+3647118,1.227
2,1160789,19233280+3652207,0.875
3,1161447,19241746+3651460,1.460
4,1161618,19242614+3648478,1.183
...,...,...,...
6524,12784948,19210624+5200236,1.254
6525,12785083,19212376+5204593,1.154
6526,12785250,19214766+5205365,1.386
6527,12884116,19182431+5215519,1.081


In [21]:
apoksac_mass_prediction_data = pd.merge(apogee, apoksac, on="tmass_id")
apoksac_mass_prediction_data.to_csv("ProcessedData/mass_prediction_apoksac.csv", index=False)
apoksac_mass_prediction_data

Unnamed: 0,tmass_id,teff,logg,fe_h,alpha_m,c_fe,n_fe,o_fe,ra,dec,rv_chi2,gaiaedr3_pmra,gaiaedr3_pmdec,kic,mass
0,18401164+4324434,4695.6000,2.873356,-0.326510,0.213425,0.134720,0.096659,0.259433,280.048539,43.412071,3.631098,-6.793591,-7.237554,7728958,0.999
1,18401969+4348380,4707.5215,2.840394,0.024199,0.040063,-0.055729,0.220168,0.067054,280.082042,43.810581,15.003494,1.557371,-4.227410,8004637,1.113
2,18402944+4324534,4074.1997,1.340157,-0.283850,0.178795,0.066699,0.202960,0.226073,280.122696,43.414841,15.103884,0.065977,-8.039925,7729057,0.909
3,18403209+4317133,4717.4510,2.729012,-0.374330,0.216875,0.140110,0.123730,0.230283,280.133737,43.287041,3.273681,-3.115116,3.063984,7581214,1.034
4,18403944+4345281,4562.4155,2.391936,0.097223,0.075121,0.094848,0.195747,0.120121,280.164373,43.757812,19.257770,0.723049,-3.535258,7935931,0.998
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6659,20051431+4425508,4502.3390,2.010333,-0.428000,0.111252,0.006040,0.165750,0.155333,301.309653,44.430798,7.463031,-4.389004,-2.313065,8462113,1.007
6660,20055400+4415544,4668.8410,2.758913,-0.014616,0.063701,-0.033613,0.209690,0.088856,301.475035,44.265125,6.987167,-5.140772,-12.114427,8332266,1.149
6661,20061426+4436582,4529.1280,2.406964,0.260950,0.054435,0.064349,0.209800,0.089739,301.559440,44.616180,44.024540,-10.027227,-31.077078,8591738,0.892
6662,20062364+4427383,4949.2600,2.796644,0.147770,-0.023540,-0.151760,0.478130,0.027632,301.598507,44.460659,12.874813,3.107945,-1.087092,8462934,2.260


# Cleaning K2 Data
To do so, we need to

1. extract the mass coefficient from the data set
2. find the effective temperature by finding the data in the GALAH data set by matching the EPIC identifier
3. determine the mass
4. extract the chemical composition of stars
5. store the data for use

For the purpose of this, we will keep the mass of stars as a multiple of solar mass.

In [2]:
k2_cols = ["EPIC", "kappa-m-c"]
k2_col_width = [(0, 9), (24, 29)]

k2 = pd.read_fwf(
    "RawData/k2_mass_coefficients.txt",
    skiprows=50,
    header=None,
    index_col=False,
    names = k2_cols,
    colspecs = k2_col_width
)
k2

Unnamed: 0,EPIC,kappa-m-c
0,201051625,1.324
1,201079541,1.523
2,201082897,1.027
3,201084449,1.544
4,201084664,1.076
...,...,...
18817,251619708,0.880
18818,251620873,0.663
18819,251621333,1.193
18820,251622742,1.497


In [3]:
max(k2.EPIC)

251630759

In [4]:
min(k2.EPIC)

201051625

Using Vizier, we map EPIC to 2MASS id

In [10]:
from astroquery.vizier import Vizier
Vizier.ROW_LIMIT = -1
EPIC_catalogue_name = "IV/34/epic"

In [11]:
v = Vizier(catalog=EPIC_catalogue_name, columns=["ID", "2MASS"], row_limit=999999)

In [13]:
result = v.query_constraints(ID="<251630760 and >201051624")

In [14]:
result[EPIC_catalogue_name]

ID,_2MASS
int32,str17
210298744,03401628+1036320
210299795,03400353+1039014
210299819,03395312+1039051
210300805,03395427+1041184
210300557,03394671+1040444
210300598,03394731+1040497
210300653,03394673+1040584
210300883,03394961+1041291
210301197,03395050+1042094
210301443,03393859+1042419


In [18]:
EPICTo2MASS = pd.DataFrame(columns=["EPIC"], data = result[EPIC_catalogue_name]["ID"])
EPICTo2MASS["tmass_id"] = result[EPIC_catalogue_name]["_2MASS"]

In [21]:
EPICTo2MASS

Unnamed: 0,EPIC,tmass_id
0,210298744,03401628+1036320
1,210299795,03400353+1039014
2,210299819,03395312+1039051
3,210300805,03395427+1041184
4,210300557,03394671+1040444
...,...,...
999994,248073069,04543411+2622019
999995,248072141,04542657+2621355
999996,248072820,04542827+2621556
999997,248072670,


In [24]:
EPICTo2MASS = pd.merge(k2, EPICTo2MASS, on="EPIC")
EPICTo2MASS

Unnamed: 0,EPIC,kappa-m-c,tmass_id
0,210318757,1.340,03521798+1118166
1,210320099,1.432,03523395+1120459
2,210326245,0.722,03520963+1132166
3,210326681,2.103,03535366+1133062
4,210338347,1.464,03552199+1153369
...,...,...,...
1315,248139237,1.834,04394270+2655256
1316,248140372,2.066,04403807+2656003
1317,248141614,0.809,04403223+2656416
1318,248141616,1.350,04391822+2656415


Merge with GALAH dataset

In [41]:
galah = pd.read_csv("ProcessedData/cleaned_galah.csv")
galah['tmass_id'] = galah.tmass_id.str[2:-1]
galah

Unnamed: 0,sobject_id,tmass_id,gaiadr3_source_id,ra,dec,flag_sp,chi2_sp,model_name,teff,e_teff,...,rv_k_is,ew_dib5780,sigma_dib5780,rv_dib5780,ew_dib5797,sigma_dib5797,rv_dib5797,ew_dib6613,sigma_dib6613,rv_dib6613
0,131216001101010,b'05225435-5857055',4762753422822077184,80.726471,-58.951542,0,0.801761,b'4750_2.50_-0.25',4747.2780,11.832772,...,85.141570,0.023134,0.252208,26.350906,0.018000,0.101814,-39.829533,0.029444,1.492405,31.226997
1,131216001101014,b'05242161-5901319',4762739782006427648,81.090057,-59.025547,0,0.866050,b'5000_4.50_0.00',5007.8013,9.640579,...,-56.604633,0.023699,0.536554,-5.562334,0.025659,0.737862,-5.556193,0.015717,0.446130,-5.526703
2,131216001101017,b'05221820-5902598',4762730882833911808,80.575874,-59.049953,0,0.843927,b'4750_4.50_0.00',4801.3870,8.895825,...,167.882300,-0.012029,0.074605,39.358860,0.016823,0.127976,-15.108765,0.005813,0.160391,-8.338925
3,131216001101022,b'05222284-5903420',4762727927896411392,80.595207,-59.061672,0,1.240397,b'5250_4.50_0.25',5278.6343,4.595638,...,-13.605973,0.015943,0.222795,30.253696,-0.010231,0.392958,-41.382510,0.007021,0.049410,11.469566
4,131216001101024,b'05225971-5905550',4762726897104253824,80.748810,-59.098625,0,0.875987,b'4500_4.00_0.00',4613.5550,14.729848,...,-133.568000,0.021436,0.170836,89.049570,-0.026330,0.095232,41.361435,0.009535,0.044633,11.150452
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63569,220422002101020,b'08490304-4234488',5524145365131834880,132.262680,-42.580215,0,0.907719,b'4250_1.50_-0.50',4326.2590,6.605823,...,-27.567528,0.242692,0.821876,-4.200578,0.081907,0.482253,-31.767265,0.053212,0.267420,-4.404961
63570,220422002101081,b'08482763-4300045',5524113543714103168,132.115097,-43.001232,0,0.852804,b'5000_3.00_-0.25',4892.6210,7.416252,...,7.454720,0.013022,0.100351,42.018845,0.015768,0.125131,-49.141647,0.015822,0.718062,-11.059895
63571,220422002101199,b'08431567-4230079',5524898599314763136,130.815277,-42.502193,0,0.756809,b'5000_3.00_0.00',4921.5170,12.264063,...,27.553760,0.220546,1.030082,45.633423,0.062203,1.144449,13.689507,0.048689,0.384843,61.364906
63572,220422002101203,b'08433930-4230037',5524851114156313600,130.913773,-42.501072,0,0.790913,b'4500_2.50_0.00',4544.5430,24.683070,...,-41.585598,0.479357,0.681622,-23.666155,0.169192,0.646230,-36.374010,0.216087,0.376157,-15.404000


In [45]:
k2WithMass = pd.merge(EPICTo2MASS, galah, on="tmass_id")

In [46]:
k2WithMass

Unnamed: 0,EPIC,kappa-m-c,tmass_id,sobject_id,gaiadr3_source_id,ra,dec,flag_sp,chi2_sp,model_name,...,rv_k_is,ew_dib5780,sigma_dib5780,rv_dib5780,ew_dib5797,sigma_dib5797,rv_dib5797,ew_dib6613,sigma_dib6613,rv_dib6613
0,210338347,1.464,03552199+1153369,160109002001118,36770937305228032,58.841625,11.893583,0,0.901462,b'4750_2.00_-0.50',...,-29.411842,0.260102,0.843183,-10.689837,0.064369,0.381781,-29.708410,0.069195,0.335458,-6.634755
1,210349156,1.815,03534985+1211549,151110003701190,36841924524592128,58.457706,12.198611,0,1.047793,b'4750_2.50_0.00',...,22.865572,0.181036,0.918487,43.105785,0.069973,0.670325,15.724678,0.040888,0.299672,48.644352
2,210352451,1.545,03551092+1217213,160109002001223,38302896302859776,58.795540,12.289278,0,1.029681,b'4750_3.00_0.00',...,-43.256960,0.199201,0.936750,-11.260378,0.059218,0.362096,-39.275707,0.050964,0.463754,-15.328494
3,210354399,2.022,03533889+1220365,151110003701205,36852777904768384,58.412125,12.343472,0,0.966639,b'4500_2.00_0.00',...,46.464798,0.203011,0.999752,57.970570,0.061306,0.213619,52.812390,0.053367,0.294525,71.317500
4,210365611,1.656,03551179+1238117,151110003701277,38365813278903168,58.799168,12.636611,0,1.007371,b'4500_2.50_0.00',...,-127.117096,0.205928,0.943600,25.137930,0.036154,0.154188,-8.065115,-0.041152,0.502284,42.394573
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
322,248062122,1.662,04450672+2616471,170127001601383,148314772738400512,71.278038,26.279751,0,0.885755,b'4750_2.00_-0.50',...,34.263050,0.277230,1.104694,51.180550,-0.076777,0.324829,43.911167,-0.128312,0.336749,61.070377
323,248075236,1.413,04433443+2623011,170127001601352,148327318338432256,70.893501,26.383694,0,0.831457,b'5000_3.00_-0.50',...,65.361206,0.202883,1.126538,81.940250,-0.103720,0.317693,68.764000,-0.007863,0.098643,7.709849
324,248095314,2.228,04413875+2632381,170127001601264,148535263475584512,70.411499,26.543917,0,0.934626,b'4500_2.00_-0.25',...,39.177353,0.237821,1.223675,60.534985,-0.070386,0.244205,50.768234,0.117147,0.353773,63.977264
325,248100939,1.275,04433475+2635256,170127001601334,148525367870877824,70.894791,26.590500,0,1.291657,b'4750_2.50_0.00',...,38.992256,0.217997,1.074829,64.467674,-0.077106,0.232650,47.289577,0.128313,0.382951,65.287200


To determine the mass for K2 stars, we use the following formula (from K2 Galactic Archaeology Program DR3):

$$ \frac{M}{M_\odot} = \kappa_M (\frac{T_{eff}}{T_{eff, \odot}})^{3/2} $$

To do so, we will use [gaia-kepler.fun](https://gaia-kepler.fun) in order to determine the link K2 and GALAH data
- More information of the tables in the data set is given by [this documentation](https://exoplanetarchive.ipac.caltech.edu/docs/API_keplerstellar_columns.html)
- To do this, we will first match the K2 data with the `k2ToGaia`



In [61]:
# According to the International Astronomical Union (IAU), effective temperature of the sun is 5772 +- 0.8K
teff_sun = float(5772)
k2WithMass["mass"] = pd.to_numeric(k2WithMass["kappa-m-c"]) * (k2WithMass.teff / teff_sun)**(3/2)

In [64]:
k2WithMass.to_csv("ProcessedData/cleaned_k2.csv", index=False)