## Exo-Model & Predict

This notebook is a tutorial for the usage of the `exo_model_predict` module. The `exo_model_predict` module uses your data to train a model to predict whether a system likely has one or more planets. In order to train it, it need a few features: the stellar mass of the central star, the effective temperature, the radius of the star, and the method of discovery. Only a portion of your data will be used for training the model, and the rest will be used for testing. You can set the proportion to be used for testing with the `test_size` argument when you initialize the ExoTrainer class. Typical proportions for training/testing are 70/30 and 80/20.

In [1]:
import pandas as pd
import exo_model_predict as mp

### 1) Load in the data to be used to train the model

This data should include all the feature columns, as well as the 'labels' or answers so we can train the model. This column should contain the number of planets in each system. 

In [2]:
df = pd.read_csv("../data/planets_edited.csv", skiprows=2)

In [3]:
# Create an instance of the class
exo_model = mp.ExoTrainer(
    df,
    mass_col="st_mass",
    temp_col="st_teff",
    rad_col="st_rad",
    discmethod="pl_discmethod",
    pl_pnum = 'pl_pnum',
    test_size = 0.2)

### 2) Use the `make_model` function to train the model

This function will return the data that was used for testing, as well as the model. Once the model has completed training, it should print the accuracy, precision and other metrics of the model.

In [4]:
X_test, y_test, model = exo_model.make_exomodel()

Accuracy: 85.37%
Confusion:[[21700  2600]
 [ 3500 13900]]
Recall: 79.89%
Precision: 84.24%


### 3) Use the `predict_exoplanets`  function to get your results!

Read in your new data (where you have no idea what the number of planets is!) and let your model work its magic! For this tutorial, we'll use the TESS confirmed exoplanets. Ensure that the column names being input in the dataframe match those that you used to train the model.

In [5]:
tess_data = pd.read_csv('../data/tess_planets.csv', skiprows = 363)

In [6]:
exo_predictions = exo_model.predict_exoplanets(
    data = tess_data,
    mass_col="st_mass",
    temp_col="st_teff",
    rad_col="st_rad",
    discmethod="pl_discmethod") 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_data["predictions"] = y_pred_df["predictions"]


In [7]:
print(exo_predictions)

    st_mass  st_teff  st_rad  pct_discmethod       predictions
0      0.50   3700.0    0.75        0.961538     single planet
1      1.01   5428.0    0.96        0.961538     single planet
2      0.38   3458.0    0.39        0.961538     single planet
3      0.73   4640.0    0.69        0.961538     single planet
4      0.34   3505.0    0.34        0.961538  multiple planets
5      1.32   5521.0    2.34        0.961538     single planet
6      0.90   5125.0    0.86        0.961538     single planet
7      0.90   5125.0    0.86        0.961538     single planet
8      1.72   6272.0    2.59        0.961538     single planet
9      1.07   5978.0    1.10        0.961538     single planet
10     0.73   4640.0    0.69        0.961538     single planet
11     0.92   5527.0    1.03        0.961538     single planet
12     1.21   5080.0    2.94        0.961538     single planet
13     0.75      NaN    0.73        0.961538  multiple planets
14     0.75      NaN    0.73        0.961538  multiple 