# TensorFlow: Tabular Regression

*Predicting Exoplanet Surface Temperature Using Kepler Satellite Sensor Data.*

![planets](../../../images/banner/planets.png)

In [3]:
from aiqc import mlops, datum

---

## Example Data

This dataset is comprised of:

* *Features* = characteristics of the planet in the context of its solar system.
* *Label* = the temperature of the planet.

Reference [Example Datasets](example_datasets.ipynb) for more information.

In [4]:
df = datum.to_pandas('exoplanets.parquet')

In [5]:
df.head()

Unnamed: 0,TypeFlag,PlanetaryMassJpt,PeriodDays,SurfaceTempK,DistFromSunParsec,HostStarMassSlrMass,HostStarRadiusSlrRad,HostStarMetallicity,HostStarTempK
5,0,0.25,19.22418,707.2,650.0,1.07,1.02,0.12,5777.0
6,0,0.17,39.03106,557.9,650.0,1.07,1.02,0.12,5777.0
7,0,0.022,1.592851,1601.5,650.0,1.07,1.02,0.12,5777.0
15,0,1.24,2.705782,2190.0,200.0,1.63,2.18,0.12,6490.0
16,0,0.0195,1.580404,604.0,14.55,0.176,0.2213,0.1,3250.0


---

## Pipeline

Reference [High-Level API Docs](api_high_level.ipynb) for more information.

In [None]:
from sklearn.preprocessing import StandardScaler, RobustScaler, OneHotEncoder

In [15]:
splitset = mlops.Pipeline.Tabular(
    # --- Data source ---
    df_or_path = df
    , dtype = None

    # --- Label preprocessing ---
    , label_column = 'SurfaceTempK'
    , label_interpolater = None
    , label_encoder = dict(sklearn_preprocess = StandardScaler(copy=False))

    # --- Feature preprocessing ---
    , feature_cols_excluded = 'SurfaceTempK'
    , feature_interpolaters = None
    , feature_window = None
    , feature_encoders = [
        dict(dtypes=['float64'], sklearn_preprocess=RobustScaler(copy=False)),
        dict(dtypes=['int64'], sklearn_preprocess=OneHotEncoder(sparse=False))
    ]
    , feature_reshape_indices = None

    # --- Stratification ---
    , size_test = 0.12
    , size_validation = 0.22
    , fold_count = None
    , bin_count = 4
)


___/ featurecoder_index: 0 \_________

=> The column(s) below matched your filter(s) featurecoder filters.

['PlanetaryMassJpt', 'PeriodDays', 'DistFromSunParsec', 'HostStarMassSlrMass', 'HostStarRadiusSlrRad', 'HostStarMetallicity', 'HostStarTempK']

=> The remaining column(s) and dtype(s) are available for downstream featurecoder(s):
{'TypeFlag': 'int64'}


___/ featurecoder_index: 1 \_________

=> The column(s) below matched your filter(s) featurecoder filters.

['TypeFlag']

=> Done. All feature column(s) have featurecoder(s) associated with them.
No more Featurecoders can be added to this Encoderset.



---

## Modeling

Reference this great blog for machine learning cookbooks: [MachineLearningMastery.com "Regression"](https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/).

In [None]:
import tensorflow as tf
from tensorflow.keras import layers as l
from aiqc.utils.tensorflow import TrainingCallback

In [16]:
def fn_build(features_shape, label_shape, **hp):
    m = tf.keras.models.Sequential()    
    m.add(l.Input(shape=features_shape))

    # Example of using hyperparameters to tweak topology.
    # with 'block' for each layer.
    for block in range(hp['blocks']):
        # Example of using hyperparameters to tweak topology.
        m.add(l.Dense(hp['neuron_count']))
        
        # Example of using hyperparameters to tweak topology. 
        # BatchNorm, Activation, Dropout (B.A.D.)
        if (hp['batch_norm'] == True):
            m.add(l.BatchNormalization())
      
        m.add(l.Activation('relu'))
        m.add(l.Dropout(0.2))
              
    m.add(l.Dense(label_shape[0]))
    return m

In [17]:
def fn_train(model, loser, optimizer, samples_train, samples_evaluate, **hp):
    model.compile(
        loss = loser
        , optimizer = optimizer
        , metrics = ['mean_squared_error']
    )
    
    metrics_cuttoffs = [
        {"metric":"val_loss", "cutoff":0.025, "above_or_below":"below"},
        {"metric":"loss", "cutoff":0.025, "above_or_below":"below"}
    ]
    cutoffs = TrainingCallback.MetricCutoff(metrics_cuttoffs)
    
    model.fit(
        samples_train["features"]
        , samples_train["labels"]
        , validation_data = (
            samples_evaluate["features"]
            , samples_evaluate["labels"]
        )
        , verbose = 0
        , batch_size = hp['batch_size']
        , callbacks = [tf.keras.callbacks.History(), cutoffs]
        , epochs = hp['epoch_count']
    )
    return model

In [18]:
hyperparameters = dict(
    batch_size      = [3]
    , blocks        = [2]
    , batch_norm    = [True, False]
    , epoch_count   = [75]
    , neuron_count  = [24, 36]
    , learning_rate = [0.01]
)

In [19]:
queue = mlops.Experiment(
    # --- Analysis type ---
    library = "keras"
    , analysis_type = "regression"
    
    # --- Model functions ---
    , fn_build = fn_build
    , fn_train = fn_train
    , fn_predict = None #auto
    , fn_lose = None #auto
    , fn_optimize = None #auto
    
    # --- Training options ---
    , repeat_count = 1
    , hyperparameters = hyperparameters
    , search_percent = None
    
    # --- Data source ---
    , splitset_id = splitset.id
    , foldset_id = None
    , hide_test = False
)

In [20]:
queue.run_jobs()

🔮 Training Models 🔮: 100%|██████████████████████████████████████████| 4/4 [01:38<00:00, 24.68s/it]


---

## Visualization & Interpretation

For more information on visualization of performance metrics, reference the [Dashboard](dashboard.html) documentation.