# Keras: Regression

![houses](../images/houses.png)

Reference this great blog for machine learning cookbooks: [MachineLearningMastery.com "Regression"](https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/).

In [2]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.callbacks import History

from sklearn.preprocessing import StandardScaler, PowerTransformer, OrdinalEncoder

import aiqc
from aiqc import datum

---

## Example Data

Reference [Example Datasets](example_datasets.ipynb) for more information.

In [3]:
df = datum.to_pandas('houses.csv')

In [4]:
df.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,lstat,price
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,5.33,36.2


---

## a) High-Level API

Reference [High-Level API Docs](api_high_level.ipynb) for more information including how to work with non-tabular data.

In [5]:
splitset = aiqc.Pipeline.Tabular.make(
    dataFrame_or_filePath = df
    , label_column = 'price'
    , size_test = 0.18
    , size_validation = 0.12
    , label_encoder = PowerTransformer(method='box-cox', copy=False)
    , feature_encoders = [
        {
            "sklearn_preprocess": StandardScaler(copy=False)
            , "dtypes": ['float64']
        },
        {
            "sklearn_preprocess": OrdinalEncoder()
            , "dtypes": ['int64']
        }
    ]
    
    , dtype = None
    , features_excluded = None
    , fold_count = None
    , bin_count = None
)


___/ featurecoder_index: 0 \_________

=> The column(s) below matched your filter(s) and were ran through a test-encoding successfully.

['crim', 'zn', 'indus', 'nox', 'rm', 'age', 'dis', 'ptratio', 'lstat']

=> The remaining column(s) and dtype(s) can be used in downstream Featurecoder(s):
{'chas': 'int64', 'rad': 'int64', 'tax': 'int64'}


___/ featurecoder_index: 1 \_________

=> The column(s) below matched your filter(s) and were ran through a test-encoding successfully.

['chas', 'rad', 'tax']

=> Done. All feature column(s) have encoder(s) associated with them.
No more Featurecoders can be added to this Encoderset.



In [6]:
def fn_build(features_shape, label_shape, **hp):
    model = Sequential()
    model.add(Dense(units=hp['neuron_count'], kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.15))
    model.add(Dense(units=hp['neuron_count'], kernel_initializer='normal', activation='relu'))
    model.add(Dense(units=label_shape[0], kernel_initializer='normal'))
    return model

In [7]:
def fn_train(model, loser, optimizer, samples_train, samples_evaluate, **hp):
    model.compile(
        loss = loser
        , optimizer = optimizer
        , metrics = ['mean_squared_error']
    )
    model.fit(
        samples_train['features'], samples_train['labels']
        , validation_data = (
            samples_evaluate['features'],
            samples_evaluate['labels']
        )
        , verbose = 0
        , batch_size = 3
        , epochs = hp['epochs']
        , callbacks = [History()]
    )
    return model

In [8]:
hyperparameters = {
    "neuron_count": [24, 48]
    , "epochs": [50, 75]
}

In [9]:
queue = aiqc.Experiment.make(
    library = "keras"
    , analysis_type = "regression"
    , fn_build = fn_build
    , fn_train = fn_train
    , splitset_id = splitset.id
    , encoderset_id = splitset.encodersets[0]
    , repeat_count = 1
    , hide_test = False
    , hyperparameters = hyperparameters

    , fn_lose = None #automated
    , fn_optimize = None #automated
    , fn_predict = None #automated
    , foldset_id = None
)

In [10]:
queue.run_jobs()

ðŸ”® Training Models ðŸ”®: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 4/4 [00:41<00:00, 10.34s/it]


For more information on visualization of performance metrics, reference the [Visualization & Metrics](visualization.html) documentation.

---

## b) Low-Level API

Reference [Low-Level API Docs](api_high_level.ipynb) for more information including how to work with non-tabular data, and defining an optimizer.

In [11]:
dataset = aiqc.Dataset.Tabular.from_pandas(df)

In [12]:
label_column = 'price'

In [13]:
label = dataset.make_label(columns=[label_column])

In [14]:
featureset = dataset.make_featureset(exclude_columns=[label_column])

In [15]:
splitset = featureset.make_splitset(
    label_id = label.id
    , size_test = 0.18
    , size_validation = 0.12
)

In [16]:
encoderset = splitset.make_encoderset()

In [17]:
labelcoder = encoderset.make_labelcoder(
    sklearn_preprocess = PowerTransformer(method='box-cox', copy=False)
)

In [18]:
featurecoder_0 = encoderset.make_featurecoder(
    sklearn_preprocess = StandardScaler(copy=False)
    , dtypes = ['float64']
)


___/ featurecoder_index: 0 \_________

=> The column(s) below matched your filter(s) and were ran through a test-encoding successfully.

['crim', 'zn', 'indus', 'nox', 'rm', 'age', 'dis', 'ptratio', 'lstat']

=> The remaining column(s) and dtype(s) can be used in downstream Featurecoder(s):
{'chas': 'int64', 'rad': 'int64', 'tax': 'int64'}



In [19]:
featurecoder_1 = encoderset.make_featurecoder(
    sklearn_preprocess = OrdinalEncoder()
    , dtypes = ['int64']
)


___/ featurecoder_index: 1 \_________

=> The column(s) below matched your filter(s) and were ran through a test-encoding successfully.

['chas', 'rad', 'tax']

=> Done. All feature column(s) have encoder(s) associated with them.
No more Featurecoders can be added to this Encoderset.



In [20]:
def fn_build(features_shape, label_shape, **hp):
    model = Sequential()
    model.add(Dense(units=hp['neuron_count'], kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.15))
    model.add(Dense(units=hp['neuron_count'], kernel_initializer='normal', activation='relu'))
    model.add(Dense(units=label_shape[0], kernel_initializer='normal'))
    return model

In [21]:
def fn_train(model, loser, optimizer, samples_train, samples_evaluate, **hp):
    model.compile(
        loss = loser
        , optimizer = optimizer
        , metrics = ['mean_squared_error']
    )
    model.fit(
        samples_train['features'], samples_train['labels']
        , validation_data = (
            samples_evaluate['features'],
            samples_evaluate['labels']
        )
        , verbose = 0
        , batch_size = 3
        , epochs = hp['epochs']
        , callbacks = [History()]
    )
    return model

In [22]:
algorithm = aiqc.Algorithm.make(
    library = "keras"
    , analysis_type = "regression"
    , fn_build = fn_build
    , fn_train = fn_train
)

In [23]:
hyperparameters = {
    "neuron_count": [24, 48]
    , "epochs": [50, 75]
}

In [24]:
hyperparamset = algorithm.make_hyperparamset(
    hyperparameters = hyperparameters
)

In [25]:
queue = algorithm.make_queue(
    splitset_id = splitset.id
    , hyperparamset_id = hyperparamset.id
    , encoderset_id  = encoderset.id
    , repeat_count = 1
)

In [26]:
queue.run_jobs()

ðŸ”® Training Models ðŸ”®: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 4/4 [00:41<00:00, 10.35s/it]


For more information on visualization of performance metrics, reference the [Visualization & Metrics](visualization.html) documentation.