# Purpose of Notebook
The goal of this code is to:
- Create models for predicting mass using a set of parameters:
    - Effective Temperature
    - Log g
    - Iron Abundance [Fe/H]
    - Alpha Abundance [alpha/Fe]
    - Nitrogen Abundance [N/Fe]
    - Oxygen Abundance [O/Fe]
- The model will be created using two datasets

- Use K2 model to predict the mass for APOGEE and GALAH
- Use APOKSAC model to predict mass for APOGEE

A good question is: why do all of this?
- Although the datasets used have their own mass prediction model, they are for general purposes
- Since we are interested specifically in low mass stars up to 2.5 solar masses, we want more specific prediction of the stars masses
- After using the model, we should also compare the distribution predicted by both models to see if the different models predict similar distribution of masses

In [None]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

# K2 Model
First we create the model using the data we cleaned

In [None]:
# Import data
# X = data Y = mass
X = []
y = []
# The parameters we will be using to create the model
features = ["teff", "logg", "fe_h", "al_fe", "c_fe", "n_fe", "o_fe"]
# Using 80% of the data to train the model and use 20% to verify it
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)
# Create the model
seed = 0 # Make sure results are replicable
k2_model = RandomForestRegressor(n_estimators=200, random_state=seed)
k2_model.fit(X_train, y_train)

Now we have created the model, we need to access the performance of the model.
To do this, we will assess the mean squared error (square rooted) and correlation coefficient.

In [None]:
y_pred = k2_model.predict(X_test)
# Determine whether the model
score = np.sqrt(mean_absolute_error(y_pred, y_test))
corr = np.corrcoef(y_test, y_pred)

Initial parameters to create the model:
- `n_estimator = 200`
- `train_size = 0.8`

# K2 and GALAH

# K2 and APOGEE

# APOKSAC Model

# APOKSAC and