# Model classes user guide

## What are model classes is OptimusAI?

Model classes provide OAI-specific predictive modeling functionality and **define the unified API for regression models**.

<div class="alert alert-info">
<b>Note</b>

`modeling` package only supports the OptimusAI related use cases, where models are used for modeling the dependencies, meaning that we treat models as descriptive, not predictive models.
    
This means that modeling functionality **is designed and tested for solving regression problems, and is not designed for classification problems**.
    
You still can try to use `modeling` package for classification problems at your own risk.
</div>

## Why do we use model classes?

It separates modeling logic from the actual usage of the model. This principle leads to the following advantages:

- **Code extensibility**: Modeling logic can be extended or modified using the inheritance mechanism.
Model interface remains the same which means that other project components do not require changes if one `ModelBase`
inheritor class used instead of another.
- **Code usability**: Model related artifacts might be stored in a single model object as attributes. 
That simplifies model usage i.e. user doesn't need to think what features do this or that model has.    


## Setup

In [1]:
import typing as tp
import logging
import sys

logging.basicConfig(level=logging.INFO, stream=sys.stdout)

In [2]:
# Resolve path when used in a usecase project
from pathlib import Path

sys.path.insert(0, str(Path("../../").resolve()))

In [3]:
# Generation of data

import numpy as np
import pandas as pd

from sklearn.datasets import make_blobs


N_FEATURES = 3

model_features = [f"Feature_{i + 1}" for i in range(N_FEATURES)]
model_target = "Target"

X, y = make_blobs(n_samples=1000, n_features=N_FEATURES, centers=100, random_state=0)
features_data = pd.DataFrame(X, columns=model_features)
useless_data = pd.DataFrame(np.random.randn(1000, 5), columns=[f"Noise_{i + 1}" for i in range(5)])
target_vector = pd.DataFrame(y, columns=[model_target])

master_data = pd.concat([features_data, useless_data, target_vector], axis=1)

INFO:numexpr.utils:NumExpr defaulting to 8 threads.


## Model classes overview

### Class diagram

![ModelBase.drawio-3.svg](./_images/_ModelBase.png)

### `ModelBase`

[ModelBase](../../../../../../docs/build/apidoc/modeling/modeling.models.sklearn_model.html#modeling.models.model_base.model.ModelBase) defines a required interface for other model inheritors. 
Inheritors must implement the [mandatory methods](../../../../../../docs/build/apidoc/modeling/modeling.models.sklearn_model.html#modeling.models.model_base.model.ModelBase.features_int) and might extend the functionality with additional methods or attributes.
Pay attention on the difference between `features_in` and `features_out`.


#### Understanding the difference between `features_in` and `features_out`




We assume that some data transformations might happen inside `.fit` or `.predict` methods. In order to distinguish the set of columns in the `DataFrame` that is expected for the input for `.fit` or `.predict` methods and set of the columns after potential transformations, we store those in a different properties: `features_in` and `features_out`.

So, `features_in` represents set of columns before any transformations and data with this set of columns is expected to be passed into `fit` or `predict` methods.

`features_out` represents set of columns after all potential transformations and data with `features_out` set of columns is passed into predictor.

See diagram below.

```
                 ____________________                 _________
  features_in   |                    |  features_out |         |
------^-------- |  data_preparations |-------^-------|predictor|---
                |____________________|               |_________|
```

### `SklearnModel`

[SklearnModel](../../../../../../docs/build/apidoc/modeling/modeling.models.sklearn_model.html#modeling.models.sklearn_model.model.SklearnModel) is a wrapper for a scikit-learn compatible estimator.

It requires an estimator, list of feature column names and target column names for initialization.

Model training or predict procedures will use the provided feature list.

In [4]:
from modeling import SklearnModel
from modeling.models.model_base import ProducesShapFeatureImportance
from sklearn.ensemble import RandomForestRegressor

  def _pt_shuffle_rec(i, indexes, index_mask, partition_tree, M, pos):
  def delta_minimization_order(all_masks, max_swap_size=100, num_passes=2):
  def _reverse_window(order, start, length):
  def _reverse_window_score_gain(masks, order, start, length):
  def _mask_delta_score(m1, m2):
  def identity(x):
  def _identity_inverse(x):
  def logit(x):
  def _logit_inverse(x):
  def _build_fixed_single_output(averaged_outs, last_outs, outputs, batch_positions, varying_rows, num_varying_rows, link, linearizing_weights):
  def _build_fixed_multi_output(averaged_outs, last_outs, outputs, batch_positions, varying_rows, num_varying_rows, link, linearizing_weights):
  def _init_masks(cluster_matrix, M, indices_row_pos, indptr):
  def _rec_fill_masks(cluster_matrix, indices_row_pos, indptr, indices, M, ind):
  def _single_delta_mask(dind, masked_inputs, last_mask, data, x, noop_code):
  def _delta_masking(masks, x, curr_delta_inds, varying_rows_out,
  def _jit_build_partition_tree(xmin, xmax, ymi

  def lower_credit(i, value, M, values, clustering):
[1mThe 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.[0m
[1mThe 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.[0m


In [5]:
# Model initialization

sklearn_estimator = RandomForestRegressor(random_state=0)
oai_model = SklearnModel(
    estimator=sklearn_estimator,
    target=model_target,
    features_in=model_features,
)

In [6]:
# Model training

oai_model.fit(master_data)

SklearnModel(estimator=RandomForestRegressor(random_state=0), target="Target" ,features_in=['Feature_1', 'Feature_2', 'Feature_3'])

In [7]:
# Model evaluation

predictions = oai_model.predict(master_data)  # Ok

In [8]:
# Model evaluation on wrong feature set raises at error

try:
    oai_model.predict(master_data.drop(columns=['Feature_1']))
except ValueError as e:
    print(e)

The following columns are missing from the dataframe: {'Feature_1'}.


In [9]:
# Feature importance is available through the same model object

oai_model.get_feature_importance(master_data)

{'Feature_1': 0.32569159609566567,
 'Feature_2': 0.38453370238535023,
 'Feature_3': 0.28977470151898416}

In [10]:
# Regression metrics are available through the same model object

oai_model.evaluate_metrics(master_data)



{'mae': 5.253819999999999,
 'rmse': 7.2587107257418655,
 'mse': 52.6888814,
 'r_squared': 0.9367670190219022,
 'var_score': 0.9367703192962497}

In [11]:
# Initial features list is available through attribute

oai_model.features_in

['Feature_1', 'Feature_2', 'Feature_3']

In [12]:
# Set of features after transformations is the same because no transformations happen in SklearnModel.

oai_model.features_out

['Feature_1', 'Feature_2', 'Feature_3']

In [13]:
# Model target is available though the same model object

oai_model.target

'Target'

In [14]:
# Wrapped estimator might be exported through the object property

oai_model.estimator

RandomForestRegressor(random_state=0)

#### Using custom SHAP values calculation

`SklearnModel` uses `shap.Explainer` to calculate SHAP values. By design it should be able to choose the right alogrithm to calculate SHAP values.

In case you face any issues with SHAP values calculation, or you desire to use custom algorithm, then by design you're able to do so by reimplementing `_produce_shap_explanation` mathod:

In [15]:
help(ProducesShapFeatureImportance._produce_shap_explanation)

Help on function _produce_shap_explanation in module modeling.models.model_base.produces_shap_importances:

_produce_shap_explanation(self, data: pandas.core.frame.DataFrame, **kwargs: Any) -> modeling.models.model_base.produces_shap_importances.ShapExplanation
    Produce an instance of shap.Explanation
    based on provided data for ``self._features_in`` feature set.
    
    Args:
        data: data to calculate SHAP
         values containing ``self._features_in`` feature set
        **kwargs: additional keyword arguments that
         are required for method implementation
    
    Returns:
        `shap.Explanation` containing prediction base values and SHAP values



In [16]:
import shap


class SklearnModelWithExactShapValues(SklearnModel):
    def _produce_shap_explanation(
        self,
        data: pd.DataFrame,
        **kwargs: tp.Any,
    ) -> shap.Explanation:
        explainer = shap.Explainer(
            model=self._estimator.predict,
            masker=data[self._features_in],
            algorithm="exact",
            **kwargs,
        )
        return explainer(data[self._features_in])

In [17]:
SklearnModelWithExactShapValues(
    estimator=RandomForestRegressor(random_state=0),
    target=model_target,
    features_in=model_features,
).fit(master_data).get_shap_feature_importance(master_data.loc[:5, :])

{'Feature_1': 6.506388888888888,
 'Feature_2': 3.3582407407407406,
 'Feature_3': 4.719814814814816}

### `SklearnPipeline`

[SklearnPipeline](../../../../../../docs/build/apidoc/modeling/modeling.models.sklearn_pipeline.html#modeling.models.sklearn_pipeline.model.SklearnPipeline) is a wrapper for a scikit-learn pipeline.

It requires a scikit-learn Pipeline, list of feature column names and target column name.

`SklearnPipeline` will check if features are present in the provided dataset before running the pipeline for training or evaluation.

<div class="alert alert-info">
<b>Note</b>

Wrapped `sklearn.pipeline.Pipeline` might not contain feature selecting step explicitly.
Features passed as `features_in` will be used for feature selection inside `fit`, `predict` and `transform` methods.
</div>

In [18]:
import sklearn

from modeling import SklearnPipeline
from optimus_core import SkLearnSelector
from sklearn.feature_selection import f_regression, SelectKBest
from sklearn.ensemble import RandomForestRegressor

<div class="alert alert-info">
<b>Note</b>

Use `optimus_core.transformer` package to keep column names in the transformed datasets.
</div>

In [19]:
# Model initialization

sklearn_pipeline = sklearn.pipeline.Pipeline([
    ("best_feature_selector", SkLearnSelector(SelectKBest(k=2, score_func=f_regression))),
    ("estimator", RandomForestRegressor(random_state=0)),
])

oai_pipeline = SklearnPipeline(
    estimator=sklearn_pipeline,
    target=model_target,
    features_in=model_features,
)


In [20]:
# Model training

oai_pipeline.fit(master_data)

INFO:modeling.models.sklearn_pipeline.model:`features_out` attribute is not specified. Setting `features_out` based on factual data.


SklearnPipeline(estimator=Pipeline(steps=[('best_feature_selector',
                 SkLearnSelector(selector=SelectKBest(k=2,
                                                      score_func=<function f_regression at 0x7f85aa0c90d0>))),
                ('estimator', RandomForestRegressor(random_state=0))]), target="Target" ,features_in=['Feature_1', 'Feature_2', 'Feature_3'], features_out=['Feature_1', 'Feature_3'])

In [21]:
# Model evaluation

predictions = oai_pipeline.predict(master_data)  # Ok
transformed_data = oai_pipeline.transform(master_data) # Ok

In [22]:
# Model evaluation on wrong feature set raises at error

try:
    oai_pipeline.predict(master_data.drop(columns=['Feature_1']))
except ValueError as e:
    print(e)

The following columns are missing from the dataframe: {'Feature_1'}.


In [23]:
# Model target is available though the same model object

oai_pipeline.target

'Target'

In [24]:
# Required feature list is available through the same model object

oai_pipeline.features_in

['Feature_1', 'Feature_2', 'Feature_3']

In [25]:
# Pipeline has a dynamic feature selection step.
# Estimator is effectively trained on a different set of features
# rather than indicated in a model initialization step.
# Set of features after internal transformations is
# available through a `.features_out` attribute.

oai_pipeline.features_out

['Feature_1', 'Feature_3']

In [26]:
# Regression metrics are available through the same model object

oai_pipeline.evaluate_metrics(master_data)



{'mae': 8.148340000000001,
 'rmse': 10.457837950551731,
 'mse': 109.36637460000001,
 'r_squared': 0.8687472252025202,
 'var_score': 0.8687525420145215}

<div class="alert alert-info">
<b>Note</b>

Please note, that `get_pipeline()` will return a sklearn.pipeline.Pipeline with `SelectColumns` as a first step.
This is required for backwards compatibility – other project components use `sklearn.pipeline.Pipeline` rather that this wrapper.
</div>

In [27]:
# Wrapped pipeline might be exported through the getter.
# Note, that first step in the exported Pipeline will be feature selection step.

select_features, *pipeline_steps = oai_pipeline.get_pipeline()

In [28]:
select_features

SelectColumns(items=['Feature_1', 'Feature_2', 'Feature_3'])

#### `.get_feature_importance()` vs  `get_shap_feature_importance()` 

`.get_feature_importance()` returns mapping from ``features_out`` (feature set that is produced after all transformations step) to feature importance. This invariant is defined because most of the feature importance extraction technics utilize estimators and return importances for feature set used by estimator (which are `features_out`).

In [29]:
# Feature importance is available through the same model object

oai_pipeline.get_feature_importance(master_data)

{'Feature_1': 0.514334464700997, 'Feature_3': 0.4856655352990031}

Note, that `Feature_2` is not presented in the dict, since it was dropped by the pipeline and this feature is not in `features_out` set. 

Unlike `.get_feature_importance()`, `get_shap_feature_importance()` calculates feature importance for `features_in` feature set. SHAP values for `features_in` can always be calculated, because model-agnostic algorithms are used.

Note, that `get_shap_feature_importance()` methods is not required by `modeling.ModelBase` class. It is provided by `modeling.ProducesShapFeatureImportance` abstraction.

In [30]:
oai_pipeline.get_shap_feature_importance(master_data)

INFO:modeling.models.sklearn_pipeline.model:`Using model-agnostic` <class 'shap.explainers._exact.Exact'>` to extract SHAP values... `shap` can't apply model-specific algorithms for <class 'modeling.models.sklearn_pipeline.model.SklearnPipeline'>. Consider switching to `SklearnModel` if computation time or quality don't fit your needs.


Exact explainer: 1001it [00:19, 27.03it/s]                                                                                                                                  


{'Feature_1': 9.885153949999982,
 'Feature_2': 1.91668902971287e-15,
 'Feature_3': 9.671310550000003}

Note, that SHAP feature importance for `Feature_2` is close to zero which is valid since this column is not really used for prediction.

#### How to wrap fitted `sklearn.sklearn.Pipeline` with `SklearnPipeline`

`SklearnPipeline` allows you to wrap fitted `sklearn.sklearn.Pipeline`. In this case you might want to explicitly point out the `features_out` parameter.

In [31]:
help(SklearnPipeline.features_out)

Help on property:

    Pipeline might perform any data transformations.
    This property returns the list of columns that are
    used by the last step of wrapped ``sklearn.pipeline.Pipeline``
    as the input features.
    
    Returns:
        Copy of a list of columns that are
        used by the last step of the wrapped pipeline.



See what happens if `features_out` is still not specified.

In [32]:
# Initialize and train sklearn.pipeline.Pipeline 

fitted_sklearn_pipeline = sklearn.pipeline.Pipeline([
    ("best_feature_selector", SkLearnSelector(SelectKBest(k=2, score_func=f_regression))),
    ("estimator", RandomForestRegressor(random_state=0)),
])

fitted_sklearn_pipeline = fitted_sklearn_pipeline.fit(
    master_data[model_features],
    master_data[model_target],
)

In [33]:
# Wrap trained sklearn.pipeline.Pipeline with SklearnPipeline

oai_fitted_pipeline = SklearnPipeline(
    estimator=fitted_sklearn_pipeline, 
    target=model_target, 
    features_in=model_features
)

In [34]:
oai_fitted_pipeline.features_out

 Please run `.fit` to automatically fill `features_out` based on data or reinitialize object with `features_out` argument.


['Feature_1', 'Feature_2', 'Feature_3']

<div class="alert alert-info">
<b>Note</b>

If `features_out` property is not specified explicitly, it will be discovered and stored over the first run of the `fit`, `predict` or `transform` methods.
</div>

In [35]:
oai_fitted_pipeline.fit(master_data).features_out

INFO:modeling.models.sklearn_pipeline.model:`features_out` attribute is not specified. Setting `features_out` based on factual data.


['Feature_1', 'Feature_3']

Otherwise `features_out` can be passed into `SklearnPipeline.__init__` method.

In [36]:
oai_fitted_pipeline = SklearnPipeline(
    estimator=fitted_sklearn_pipeline, 
    target=model_target, 
    features_in=model_features,
    features_out=['Feature_1', 'Feature_3'],
)

In [37]:
oai_fitted_pipeline.features_out

['Feature_1', 'Feature_3']

In [38]:
transformed_data = oai_fitted_pipeline.transform(master_data)

In case provided `features_out` don't match with the factual data, error will be raised:

In [39]:
oai_fitted_pipeline = SklearnPipeline(
    estimator=fitted_sklearn_pipeline, 
    target=model_target, 
    features_in=model_features,
    features_out=['Feature_1', 'Feature_2'], # note, Feature_2 is passed instead of Feature_3
)

In [40]:
try:
    oai_fitted_pipeline.fit(master_data)

except RuntimeError as err:
    print(err)

Columns in the data after transformation does not match with expected list of columns.


### `KerasModel`

`KerasModel` ([API](../../../../../../docs/build/apidoc/modeling/modeling.models.keras_model.html#modeling.load.keras_model.model.KerasModel)) is a wrapper for `tensorflow.keras.Model`, that provides the neural networks modeling functionality to OptimusAI

In [41]:
from tensorflow import keras
from modeling.models.keras_model import KerasModel

2023-07-13 19:27:20.873187: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [42]:
# Model initialization


keras_model = keras.Sequential(
    [
        keras.layers.Normalization(axis=-1),
        keras.layers.Dense(units=256, activation="relu"),
        keras.layers.Dense(units=256, activation="tanh"),
        keras.layers.Dense(units=1),
    ]
)
keras_model.compile(
    optimizer="Adam",
    loss="mean_squared_error",
    metrics=[
        keras.metrics.MeanAbsoluteError(),
        keras.metrics.MeanSquaredError(),
    ],
)

oai_keras_model = KerasModel(
    keras_model,
    target=model_target,
    features_in=model_features,
)

2023-07-13 19:27:46.349271: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [43]:
# Model training


oai_keras_model.fit(master_data, verbose=1, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


KerasModel(keras_model={"class_name": "Sequential", "config": {"name": "sequential", "layers": [{"class_name": "InputLayer", "config": {"batch_input_shape": [null, 3], "dtype": "float32", "sparse": false, "ragged": false, "name": "normalization_input"}}, {"class_name": "Normalization", "config": {"name": "normalization", "trainable": true, "dtype": "float32", "axis": [-1], "mean": null, "variance": null}}, {"class_name": "Dense", "config": {"name": "dense", "trainable": true, "dtype": "float32", "units": 256, "activation": "relu", "use_bias": true, "kernel_initializer": {"class_name": "GlorotUniform", "config": {"seed": null}}, "bias_initializer": {"class_name": "Zeros", "config": {}}, "kernel_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}}, {"class_name": "Dense", "config": {"name": "dense_1", "trainable": true, "dtype": "float32", "units": 256, "activation": "tanh", "use_bias": true, "kernel_initializer"

In [44]:
# Model evaluation

predictions = oai_keras_model.predict(master_data)  # Ok



In [45]:
# Model evaluation on wrong feature set raises at error

try:
    oai_model.predict(master_data.drop(columns=['Feature_1']))
except ValueError as e:
    print(e)

The following columns are missing from the dataframe: {'Feature_1'}.


In [46]:
# Feature importance is available through the same model object

oai_keras_model.get_feature_importance(master_data, algorith="exact")

array([[-9.08690132, -7.73057335,  4.08447067],
       [ 0.49930086,  1.97693428, -2.40335983],
       [-9.28733163, -1.51038255, -2.34421839],
       ...,
       [ 1.30482576, -7.70650532,  7.85641301],
       [ 2.25477659, -1.91329172,  2.24042181],
       [ 1.15112095, -6.05816342, -7.72829074]])>]. Consider rewriting this model with the Functional API.
INFO:modeling.models.keras_model.model:`Using `<class 'shap.explainers._deep.Deep'>` to extract SHAP values...


keras is no longer supported, please use tf.keras instead.
Your TensorFlow version is newer than 2.4.0 and so graph support has been removed in eager mode and some static graphs may not be supported. See PR #1483 for discussion.
`tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.


Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
array([[-9.08690132, -7.73057335,  4.08447067],
       [ 0.49930086,  1.97693428, -2.40335983],
       [-9.28733163, -1.51038255, -2.34421839],
       ...,
       [ 1.30482576, -7.70650532,  7.85641301],
       [ 2.25477659, -1.91329172,  2.24042181],
       [ 1.15112095, -6.05816342, -7.72829074]])>]. Consider rewriting this model with the Functional API.


{'Feature_1': 0.3824434297509803,
 'Feature_2': 0.3306285804748425,
 'Feature_3': 0.35688388971420976}

In [47]:
# Regression metrics are available through the same model object

oai_keras_model.evaluate_metrics(master_data)



{'mae': 26.93634753704071,
 'rmse': 31.90876062089665,
 'mse': 1018.1690043616849,
 'r_squared': -0.22192499773379515,
 'var_score': 0.011586289722970267}

In [48]:
# Features list is available through the same model object

oai_keras_model.features_in

['Feature_1', 'Feature_2', 'Feature_3']

In [49]:
# Model target is available though the same model object

oai_keras_model.target

'Target'

In [50]:
# Wrapped estimator might be exported through the object property

oai_keras_model.keras_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizatio  (None, 3)                7         
 n)                                                              
                                                                 
 dense (Dense)               (None, 256)               1024      
                                                                 
 dense_1 (Dense)             (None, 256)               65792     
                                                                 
 dense_2 (Dense)             (None, 1)                 257       
                                                                 
Total params: 67,080
Trainable params: 67,073
Non-trainable params: 7
_________________________________________________________________


## Create a custom inheritor of ModelBase

<div class="alert alert-info">
<b>Note</b>

We ship model classes, that support a minimal set of attributes and methods. Those models are domain-agnostic, and we expect that users will extend modeling classes with domain-specific methods and attributes for their individual use cases.
</div>

Let's say you are building a linear model for a 
[set_point_optimization](../../../../../usecases/set_point_optimization/src/set_point_optimization/README.md) usecase, and you also want to

- Use linear regression weights as feature importances 
- Keep additional set of features that plant operators can control



This can be achieved by doing the following:

1. Import `ModelBase` and create a class inheritor. 
2. Implement mandatory abstract methods. 
3. Extend class with additional methods, attributes and properties.

In [51]:
from copy import copy
from typing import Iterable, List, Any, Dict

import pandas as pd
from sklearn.linear_model import LinearRegression

from modeling import SklearnModel
from modeling.models.sklearn_model import attribute_check_is_fitted


# Inherit from `SklearnModel` class as it already 
# implements some methods the way you want.
class LinearModel(SklearnModel):
    def __init__(
            self,
            estimator: LinearRegression,
            target: str,
            features_in: Iterable[str],
            controls: Iterable[str],
    ) -> None:
        super().__init__(estimator, features_in, target)
        self._estimator = estimator
        # Save controlled features into a private attribute
        self._controls: List[str] = list(controls)

    # 1. Create a property for the `_controls` attribute.
    # 2. Return a copy since list is a mutable data structure.
    @property
    def controls(self) -> List[str]:
        return copy(self._controls)

    # 1. Implement your custom feature importance logic.
    # 2. Use `attribute_check_is_fitted` decorator to check 
    # whether estimator is fitted before method is called.
    @attribute_check_is_fitted("_estimator")
    def get_feature_importance(
            self, data: pd.DataFrame, **kwargs: Any
    ) -> Dict[str, float]:
        return dict(zip(self.features_in, self._estimator.coef_))

## Create model using the `ModelFactoryBase`

In this section we'll show how to build `ModelBase` instance using the `ModelFactoryBase` [(API)](../../../../../../docs/build/apidoc/modeling/modeling.models.model_base.html#modeling.models.model_base.model_base.ModelFactoryBase) from the hyperparameters' specification.

1. ModelFactories might be useful when hyperparameters are specified in the YAML files.
2. ModelFactories must be used for model initialization in order to perform hyperparameters tuning. We'll talk more regarding this in the `ModelTunerBase` section.

Each of the `ModelFactoryBase` inheritors build their own `ModelBase` instances. Below is the table mapping from Factories to models that they build:

|`ModelFactoryBase`|action|`ModelBase`|
|---|---|---|
|`SklearnModelFactory`|builds|`SklearnModel`|
|`SklearnPipelineFactory`|builds|`SklearnPipeline`|
|`KerasModelFactory`|builds|`KerasModel`|

### `ModelFactoryBase` UML class diagram

![ModelFactoryBase.drawio.svg](./_images/_ModelFactroryBase.drawio.svg)

### Create `SklearnModel` using the `SklearnModelFactory`

In [52]:
from modeling import SklearnModelFactory

In order to initialize `SklearnModelFactory` ([API](../../../../../../docs/build/apidoc/modeling/modeling.models.sklearn_model.html#modeling.models.sklearn_model.factory.SklearnModelFactory)), we need to provide a dictionary with the following structure:

In [53]:
help(SklearnModelFactory)

Help on class SklearnModelFactory in module modeling.models.sklearn_model.factory:

class SklearnModelFactory(modeling.models.model_base.model_base.ModelFactoryBase)
 |  SklearnModelFactory(model_init_config: Dict[str, Any], features_in: Iterable[str], target: str) -> None
 |  
 |  Factory class that allows creating
 |  ``SklearnModel`` instance based on the parametrization specified
 |  in ``model_init_config``.
 |  
 |  `model_init_config` structure should match ``SklearnModelInitConfig``::
 |  
 |      # Object specification for sklearn compatible estimator
 |      # matching the `ObjectInitConfig` structure.
 |      estimator:
 |        class_name: sklearn.linear_model.SGDRegressor
 |        kwargs:
 |          random_state: 123
 |          penalty: elasticnet
 |  
 |      # Target transform config
 |      # matching the `TargetTransformerInitConfig` structure.
 |      # Either transformer key should be filled,
 |      # or `func` and `inverse_func` keys should be filled.
 |      t

Let's compose a python dictionary representing the proposed YAML file and use it to build a model:

In [54]:
sklearn_model_init_config = {
    # Object specification for sklearn compatible estimator
    # matching the `ObjectInitConfig` structure.
    "estimator": {
        "class_name": "sklearn.linear_model.SGDRegressor",
        "kwargs": {
            "random_state": 123,
            "penalty": "elasticnet",
      }
    },
    # Target transform config matching the `TargetTransformerInitConfig` structure.
    # Either transformer key should be filled,
    # or `func` and `inverse_func` keys should be filled.
    "target_transformer": {
      # Object specification for transformer
      # matching the `ObjectInitConfig` structure
        "transformer": None,
      # Path for the function to use as target transform
        "func": "numpy.log1p",
      # Path for the function to use as an inverse target transform
        "inverse_func": "numpy.expm1",
    }
}

In [55]:
sklearn_factory = SklearnModelFactory(
    sklearn_model_init_config, 
    model_features,
    model_target,
)

Calling `.create()`method we build a `SklearnModel` instance:

In [56]:
oai_sklearn_model = sklearn_factory.create()
oai_sklearn_model

SklearnModel(estimator=TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                           regressor=SGDRegressor(penalty='elasticnet',
                                                  random_state=123)), target="Target" ,features_in=['Feature_1', 'Feature_2', 'Feature_3'])

In case you want to parameterize your scikit-learn based model and **only specify the parameters that you want**. We'll build a toy example below to demonstrate the way how it can be done.

Let's say you want to create a Factory for `RandomForestRegressor`, but the only parameter you want to specify is the number of trees in your ensemble.

First, you'll need to inherit from the `SklearnModelFactory` class to redefine the `make_model_instance()` 
staticmethod. Note, that method interface must remain unchanged in order to make other methods work properly.

In [57]:
class UserDefinedRandomForestFactory(SklearnModelFactory):
    @staticmethod
    def create_model_instance(
        init_config: tp.Dict[str, tp.Any]
    ) -> sklearn.ensemble.RandomForestRegressor:
        return RandomForestRegressor(
            init_config["n_estimators"], max_depth=12,
        )

Then, let's create a dictionary that keeps number of trees there. You might want to collect any information in YAML files.

In [58]:
random_forest_init_config = {
    "n_estimators": 512,
}

In [59]:
random_forest_factory = UserDefinedRandomForestFactory(
    random_forest_init_config,
    model_features,
    model_target,
)

In [60]:
oai_random_forest_model = random_forest_factory.create()
oai_random_forest_model

SklearnModel(estimator=RandomForestRegressor(max_depth=12, n_estimators=512), target="Target" ,features_in=['Feature_1', 'Feature_2', 'Feature_3'])

### Create `SklearnPipeline` using the `SklearnPipelineFactory

In [61]:
from modeling import SklearnPipelineFactory

The usage of `SklearnPipelineFactory` is very similar to `SklearnModelFactory`.

In order to initialize `SklearnPipelineFactory` ([API](../../../../../../docs/build/apidoc/modeling/modeling.models.sklearn_pipeline.html#modeling.models.sklearn_pipeline.factory.SklearnPipelineFactory)), we need to provide a dictionary with the following structure:

In [62]:
help(SklearnPipelineFactory)

Help on class SklearnPipelineFactory in module modeling.models.sklearn_pipeline.factory:

class SklearnPipelineFactory(modeling.models.model_base.model_base.ModelFactoryBase)
 |  SklearnPipelineFactory(model_init_config: Dict[str, Any], features_in: Iterable[str], target: str) -> None
 |  
 |  Factory class that allows creating
 |  ``SklearnPipeline`` instance based on the parametrization specified
 |  in ``model_init_config``.
 |  
 |  Structure of ``model_init_config`` that matches ``SklearnPipelineInitConfig``::
 |  
 |      # Object specification for sklearn compatible estimator
 |      # matching the `ObjectInitConfig` structure.
 |      estimator:
 |        class_name: sklearn.linear_model.SGDRegressor
 |        kwargs:
 |          random_state: 123
 |          penalty: elasticnet
 |  
 |      # List of object specification for sklearn compatible transformer
 |      # matching the `TransformerInitConfig` structure.
 |      transformers:
 |        - class_name: sklearn.preprocessi

#### Using `wrappers` for features transformations

`SklearnPipelineFactory` uses the wrappers from `optimus_core.transformers` to preserve `pd.DataFrame` for the transformed data. The desired logic can be chosen with the parameter `"wrapper"`.

##### Setting "wrapper": "preserve_pandas"

Provided transformers can be wrapped with `optimus_core.transformers.ColumnNamesAsNumbers` that preserves `pd.DataFrame` type for the output dataset and sets the column names as a range from 0 to `len(columns_count) - 1`. **This is the default option.** This option should be used when it is not clear how transformed columns relate to the columns in the input dataset:

In [63]:
sklearn_pipeline_init_config = {
    # Object specification for sklearn compatible estimator
    # matching the `ObjectInitConfig` structure.
    "estimator": {
        "class_name": "sklearn.linear_model.SGDRegressor",
        "kwargs": {
            "random_state": 123,
            "penalty": "elasticnet",
      },
    },
    "transformers": [
        {
            "class_name": "sklearn.preprocessing.StandardScaler",
            "kwargs": {},
            "name": "standard_scaler",
            "wrapper": "preserve_pandas",
        },
    ],
}

In [64]:
sklearn_pipeline_factory = SklearnPipelineFactory(
    sklearn_pipeline_init_config, 
    model_features,
    model_target,
)

In [65]:
sklearn_pipeline = sklearn_pipeline_factory.create()
transformed_data = sklearn_pipeline.fit(master_data).transform(master_data)
transformed_data.head()

INFO:modeling.models.sklearn_pipeline.model:`features_out` attribute is not specified. Setting `features_out` based on factual data.


Unnamed: 0,0,1,2
0,-1.525443,-1.258792,0.642563
1,0.087617,0.389456,-0.497629
2,-1.559169,-0.202659,-0.487236
3,1.457546,0.576085,-0.934979
4,0.038254,0.082757,1.079463


##### Setting "wrapper": "preserve_columns"

Provided transformers can be wrapped with `optimus_core.transformers.SklearnTransform` that preserves `pd.DataFrame` type for the output dataset and sets the column names equal to columns in the input data. This option should be used when transformer does not change either set of columns or order of the columns in the transformed dataset.

In [66]:
sklearn_pipeline_init_config["transformers"][0] = {
    "class_name": "sklearn.preprocessing.StandardScaler",
    "kwargs": {},
    "name": "standard_scaler",
    "wrapper": "preserve_columns",
}

In [67]:
sklearn_pipeline = sklearn_pipeline_factory.create()
transformed_data = sklearn_pipeline.fit(master_data).transform(master_data)
transformed_data.head()

INFO:modeling.models.sklearn_pipeline.model:`features_out` attribute is not specified. Setting `features_out` based on factual data.


Unnamed: 0,Feature_1,Feature_2,Feature_3
0,-1.525443,-1.258792,0.642563
1,0.087617,0.389456,-0.497629
2,-1.559169,-0.202659,-0.487236
3,1.457546,0.576085,-0.934979
4,0.038254,0.082757,1.079463


##### Setting "wrapper": "select_columns"

Provided transformers can be wrapped with `optimus_core.transformers.SkLearnSelector` that preserves `pd.DataFrame` type for the output dataset and picks the subset of columns for the transformed dataset from the input dataset. This option should be used when transformer performs the feature selection, and hence transformed set of columns will always be a subset of the inputs columns.

In [68]:
sklearn_pipeline_init_config["transformers"][0] = {
    "class_name": "sklearn.feature_selection.SelectKBest",
    "kwargs": {
        "k": 2,
        "score_func": "sklearn.feature_selection.f_regression",
    },
    "name": "standard_scaler",
    "wrapper": "select_columns",
}

In [69]:
sklearn_pipeline = sklearn_pipeline_factory.create()
transformed_data = sklearn_pipeline.fit(master_data).transform(master_data)
transformed_data.head()

INFO:modeling.models.sklearn_pipeline.model:`features_out` attribute is not specified. Setting `features_out` based on factual data.


Unnamed: 0,Feature_1,Feature_3
0,-9.086901,4.084471
1,0.499301,-2.40336
2,-9.287332,-2.344218
3,8.640602,-4.891933
4,0.205942,6.570485


##### Setting "wrapper": None

Provided transformers can be used "as-is". Note, that default transformers from `sklearn` will return raw `numpy.ndarray` after transformation and in that case most feature validation steps will not happen.

In [70]:
sklearn_pipeline_init_config["transformers"][0] = {
    "class_name": "sklearn.preprocessing.StandardScaler",
    "kwargs": {},
    "name": "standard_scaler",
    "wrapper": None,
}

In [71]:
sklearn_pipeline = sklearn_pipeline_factory.create()
transformed_data = sklearn_pipeline.fit(master_data).transform(master_data)
transformed_data

INFO:modeling.models.sklearn_pipeline.model:`features_out` attribute is not specified. Setting `features_out` based on factual data.
Validation of columns after transform was not conducted. Learn how to keep columns after transformations with `optimus_core.transformer`.
Validation of columns after transform was not conducted. Learn how to keep columns after transformations with `optimus_core.transformer`.


array([[-1.52544329, -1.25879198,  0.64256259],
       [ 0.08761726,  0.38945596, -0.49762947],
       [-1.5591695 , -0.20265924, -0.48723576],
       ...,
       [ 0.22316213, -1.25470544,  1.30545579],
       [ 0.38300939, -0.27106961,  0.31848356],
       [ 0.19729837, -0.97483174, -1.43344984]])

#### Using a `target_transformer` to leverage automatic target transformation

In regression problems where target has non-linear relation with feature(s), it might be useful to apply a target transformer to convert this relation into a linear one.
Sklearn's native machinery for this task is [TransformedTargetRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.compose.TransformedTargetRegressor.html), and both `SklearnModelFactory` and `SklearnPipelineFactory` of OAI can initialize that.

Examples above outline that `SklearnModelFactory` requires a so-called `model_init_config` to create a model, while `SklearnPipelineFactory` requires a `pipeline_init_config`. `target_transformer` can be injested in those configs in the same way.

For `SklearnModelFactory`:

In [72]:
sklearn_model_init_config = {
    "estimator": {
        # Estimator config
    },
    "target_transformer": {
        "func": "numpy.log1p",
        "inverse_func": "numpy.expm1",
    }
}

And for `SklearnPipelineFactory`:

In [73]:
sklearn_pipeline_init_config = {
    "estimator": {
        # Estimator config
    },
    "transformers": [
        # Various transformers
    ],
    "target_transformer": {
        "transformer": {
            "class_name": "sklearn.preprocessing.MinMaxScaler",
            "kwargs": {},
        },
    }
}

Those 2 examples above also illustrate that a `target_transformer` can be initialized in 2 mutually exclusive ways: via a pair of `func` and `inverse_func` or via a `transformer` class.

### Create `KerasModel` using the `KerasModelFactory`

In order to build a `KerasModel` from the parameters, you'll need to define the way how your `keras` model should be built based on the parameters you specified.

In this toy example we'll create a simple sequential `keras` model. Model structure will be predefined by users, however we'll let us manipulate with some hyperparameters.

We'll start with defining custom factory inheriting `KerasModelFactory` ([API](../../../../../../docs/build/apidoc/modeling/modeling.models.keras_model.html#modeling.models.keras_model.factory.KerasModelFactory)).

In [74]:
from modeling.models.keras_model import KerasModelFactory


class UserDefinedKerasModelFactory(KerasModelFactory):
    @staticmethod
    def create_model_instance(
        units: int = 32,
        learning_rate: float = 1e-03,
    ) -> keras.Model:
        model = keras.Sequential(
            [
                keras.layers.Normalization(axis=-1),
                keras.layers.Dense(units=units, activation="tanh"),
                keras.layers.Dense(units=1),
            ]
        )
        model.compile(
            optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
            loss="mean_squared_error",
            metrics=[
                keras.metrics.MeanAbsoluteError(),
                keras.metrics.MeanSquaredError(),
                keras.metrics.MeanAbsolutePercentageError(),
            ],
        )
        return model

Then let's create a dictionary with parameters and initialize freshly created `UserDefinedKerasModelFactory`.

In [75]:
keras_model_init_config = {
    "units": 128,
    "learning_rate": 1e-05,
}
keras_model_factory = UserDefinedKerasModelFactory(
    keras_model_init_config,
    model_features,
    model_target
)

`.create()` will create the instance of KerasModel with the structure defined above:

In [76]:
oai_keras_model = keras_model_factory.create()
oai_keras_model.keras_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization_1 (Normalizat  (None, 3)                7         
 ion)                                                            
                                                                 
 dense_3 (Dense)             (None, 128)               512       
                                                                 
 dense_4 (Dense)             (None, 1)                 129       
                                                                 
Total params: 648
Trainable params: 641
Non-trainable params: 7
_________________________________________________________________


## Tune model using the `ModelTunerBase`

In this section we'll show how to tune model hyperparameters using the `ModelTunerBase`[(API)](../../../../../../docs/build/apidoc/modeling/modeling.models.model_base.html#modeling.models.model_base.model_base.ModelTunerBase) class.

<div class="alert alert-info">
<b>Note</b>

In order to tune model hyperparameters you'll need to pass `ModelFactoryBase`, which builds model from the hyperparameters' specification.
</div>

Let's start from the tuner initialization: it requires tuner configuration dictionary and `ModelFactory` instance, which is used to build a model with optimal set of hyperparameters.

Note, that each of the `ModelTunerBase` inheritors require instance of the Factory of its own type. Below is the table mapping tuners to `Factories`, which they require.

|`ModelTunerBase`|action|`ModelFactoryBase`|
|---|---|---|
|`SklearnModelTuner`|requires|`SklearnModelFactory`|
|`SklearnPipelineTuner`|requires|`SklearnPipelineFactory`|
|`KerasModelTuner`|requires|`KerasModelFactory`|

### `ModelTunerBase` UML class diagram

![ModelTunerBase.drawio-3.svg](./_images/_ModelTunerBase.drawio.svg)

### Tune `SklearnModel` using the `SklearnModelTuner`

Let's start from the `SklearnModelTuner` ([API](../../../../../../docs/build/apidoc/modeling/modeling.models.sklearn_model.html#modeling.models.sklearn_model.tuner.SklearnModelTuner)) initialization. We've already initialized `SklaernModelFactory` in the section above, but we'll create it here as well for the educational purposes:

In [77]:
from modeling import SklearnModelFactory


sklearn_model_init_config = {
    # Object specification for sklearn compatible estimator
    # matching the `ObjectInitConfig` structure.
    "estimator": {
        "class_name": "sklearn.linear_model.SGDRegressor",
        "kwargs": {
            "random_state": 123,
            "penalty": "elasticnet",
      }
    },
    # Target transform config matching the `TargetTransformerInitConfig` structure.
    # Either transformer key should be filled,
    # or `func` and `inverse_func` keys should be filled.
    "target_transformer": {
      # Object specification for transformer
      # matching the `ObjectInitConfig` structure
        "transformer": None,
      # Path for the function to use as target transform
        "func": "numpy.log1p",
      # Path for the function to use as an inverse target transform
        "inverse_func": "numpy.expm1",
    }
}

sklearn_Factory = SklearnModelFactory(sklearn_model_init_config, model_features, model_target)

Note, that Factory type is highlighted in the class definition:

In [78]:
from modeling import SklearnModelTuner


help(SklearnModelTuner.__init__)

Help on function __init__ in module modeling.models.sklearn_model.tuner:

__init__(self, model_factory: modeling.models.sklearn_model.factory.SklearnModelFactory, model_tuner_config: Dict[str, Any]) -> None
    model_tuner_config structure should match `SklearnTunerConfig` structure::
    
        # Object specification for sklearn compatible CV tuner
        # matching the `ObjectInitConfig` structure.
        init:
            class_name: sklearn.model_selection.GridSearchCV
            kwargs:
              n_jobs: -1
              refit: mae
              param_grid:
                estimator__alpha: [0.0001, 0.001, 0.01, 0.1, 1, 10]
                estimator__l1_ratio: [0.00001, 0.0001, 0.001, 0.01, 0.1, 1]
              scoring:
                mae: neg_mean_absolute_error
                rmse: neg_root_mean_squared_error
                r2: r2
    
    Args:
        model_factory: Builder instance that produces model with corresponding type
        model_tuner_config: Dictionary

Let's create a dictionary with the required structure and pass it into tuner initialization:

In [79]:
sklearn_model_tuner_config = {
# Object specification for sklearn compatible CV tuner
# matching the `ObjectInitConfig` structure.
    "class_name": "sklearn.model_selection.GridSearchCV",
    "kwargs": {
        "n_jobs": 1,
        "refit": "r2",
        "param_grid": {
            "regressor__penalty": ["elasticnet",],
        },
        "scoring": {
            "mae": "neg_mean_absolute_error",
            "rmse": "neg_root_mean_squared_error",
            "r2": "r2",
        },
    }
}

sklearn_tuner = SklearnModelTuner(sklearn_Factory, sklearn_model_tuner_config)

Then, let's tune model hyperparameters.

In [80]:
tuned_model = sklearn_tuner.tune(data=master_data,)
tuned_model

INFO:modeling.models.sklearn_model.tuner:Initializing sklearn hyperparameters tuner...
INFO:modeling.models.sklearn_model.tuner:Tuning hyperparameters...


SklearnModel(estimator=TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                           regressor=SGDRegressor(penalty='elasticnet',
                                                  random_state=123)), target="Target" ,features_in=['Feature_1', 'Feature_2', 'Feature_3'])

### Tune `KerasModel` using the `KerasModelTuner`

In order to tune hyperparameters for `KerasModel` you have to define the hyperparameters tuning strategy in `make_trial_hyperparameters()` method. Note, that the resulting hyperparameters should be aligned with hyperparameters expected by Factory's method `.make_model_instance()`.

Let's start from the `KerasModelTuner` ([API](../../../../../../docs/build/apidoc/modeling/modeling.models.keras_model.html#modeling.models.keras_model.tuner.KerasModel)) initialization. We've already initialized `KerasModelFactory` in the section above, but we'll create it here as well for the educational purposes:

In [81]:
class UserDefinedKerasModelFactory(KerasModelFactory):
    @staticmethod
    def create_model_instance(
        units: int = 32,
        learning_rate: float = 1e-03,
    ) -> keras.Model:
        model = keras.Sequential(
            [
                keras.layers.Normalization(axis=-1),
                keras.layers.Dense(units=units, activation="tanh"),
                keras.layers.Dense(units=1),
            ]
        )
        model.compile(
            optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
            loss="mean_squared_error",
            metrics=[
                keras.metrics.MeanAbsoluteError(),
                keras.metrics.MeanSquaredError(),
                keras.metrics.MeanAbsolutePercentageError(),
            ],
        )
        return model

In [82]:
keras_model_init_config = {
    "units": 128,
    "learning_rate": 1e-05,
}
keras_model_factory = UserDefinedKerasModelFactory(
    keras_model_init_config,
    model_features,
    model_target
)

Then, we'll create a tuner for the `UserDefinedKerasModelFactory`. There we'll describe the hyperparameters tuning strategy in `.make_trial_hyperparameters()` method.

Note, that class inherits from the `KerasModelTuner`, which defines the interface for the method.

In [83]:
from modeling.models.keras_model import KerasModelTuner


help(KerasModelTuner._create_trial_hyperparameters)

Help on function _create_trial_hyperparameters in module modeling.models.keras_model.tuner:

_create_trial_hyperparameters(hp: keras_tuner.engine.hyperparameters.HyperParameters, model_init_config: Dict[str, Any], model_hyperparameters_config: Dict[str, Any]) -> Dict[str, Any]
    Abstract method to specify hyperparameters tuning strategy
    for `tensorflow.keras.Model` instance using ``keras_tuner.HyperParameters``.
    
    This method should be overwritten by the inheritor of ``KerasModelTuner``
    with the custom hyperparameters tuning logic. See example implementation below.
    
    Example implementation::
    
        class UserKerasModelTuner(KerasModelTuner):
            @staticmethod
            def _create_trial_hyperparameters(
                #  Hyperparameter tuning API is based on `keras_tuner` package.
                hp: keras_tuner.HyperParameters,
                #  You also might want to use hyperparameters settings from Factory,
                #  those are avai

Hyperparameter tuning API is based on `keras_tuner` package. Learn more about the `keras_tuner` on their [website](https://keras.io/guides/keras_tuner/getting_started/#tune-the-model-architecture).

In this freshly created tuner we'll reuse the original value for the `units` from Factory, `units` won't be tuned. But for the `learning_rate` we'll specify space for tuning and sampling strategy in `model_hyperparameters_config` dict.

In [84]:
import keras_tuner


class UserDefinedKerasModelTuner(KerasModelTuner):
    @staticmethod
    def _create_trial_hyperparameters(
        #  Hyperparameter tuning API is based on `keras_tuner` package. 
        hp: keras_tuner.HyperParameters,
        #  You also might want to use hyperparameters settings from Factory,
        #  those are available in `model_init_config` dictionary.
        model_init_config: tp.Dict[str, tp.Any],
        #  Use `model_hyperparameters_config` to specify strategy.
        #  See the example below.
        model_hyperparameters_config: tp.Dict[str, tp.Any],
    ) -> tp.Dict[str, tp.Any]:
        units = hp.Int(
            "units",
            min_value=model_hyperparameters_config["units"]["min_value"],
            max_value=model_hyperparameters_config["units"]["max_value"],
            sampling=model_hyperparameters_config["units"]["sampling"],
        )
        learning_rate = model_init_config["learning_rate"]
        return {"units": units, "learning_rate": learning_rate}

Finally, let's create a dictionary with the required structure and pass it into tuner initialization. See the full list of tuner settings:

In [85]:
help(UserDefinedKerasModelTuner.__init__)

Help on function __init__ in module modeling.models.keras_model.tuner:

__init__(self, model_factory: modeling.models.keras_model.factory.KerasModelFactory, model_tuner_config: Dict[str, Any]) -> None
    Structure of ``model_tuner_config`` should be
    aligned with ``KerasModelTunerConfig``::
    
        # Path to instance of tuner in keras_tuner
        tuner: keras_tuner.RandomSearch
        # Metric for hyperparameters objective and direction
        objective_metric: mean_squared_error
        objective_direction: min
        # Number of hyperparameter tuning trials and executions per trial
        max_trials: 5
        executions_per_trial: 1
        # Project name
        project_name: oai-keras-model
        # Path to the directory to keep training artefacts
        tuning_artefacts_dir: data/sample_keras_model_hp
    
    Args:
        model_factory: instance of the KerasModelFactory
        model_tuner_config: configuration dictionary
         with the structure defined abo

In [86]:
keras_model_tuner_config = {
    "max_trials": 5,
    "tuner": "keras_tuner.RandomSearch",
    "objective_metric": "mean_squared_error",
    "objective_direction": "min",
}

model_hyperparameters_config = {
    "units": {
        "min_value": 8,
        "max_value": 1024,
        "sampling": "log"
    }
}

keras_tuner = UserDefinedKerasModelTuner(keras_model_factory, keras_model_tuner_config)

Let's finally tune the hyperparameters:

In [87]:
tuned_model = keras_tuner.tune(
    master_data, 
    model_hyperparameters_config, 
    verbose=1,
)

Trial 5 Complete [00h 00m 28s]
mean_squared_error: 3084.65869140625

Best mean_squared_error So Far: 2999.153076171875
Total elapsed time: 00h 03m 22s
INFO:tensorflow:Oracle triggered exit


In [88]:
tuned_model.keras_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizatio  (None, 3)                7         
 n)                                                              
                                                                 
 dense (Dense)               (None, 384)               1536      
                                                                 
 dense_1 (Dense)             (None, 1)                 385       
                                                                 
Total params: 1,928
Trainable params: 1,921
Non-trainable params: 7
_________________________________________________________________


## Functional

`functional` subpackage allows you to work with classes listed above in the functional workaround. This might be especially useful when working with pipelines or other orchestration tools that require simple callable objects (e.g., Kedro).

Most of the functions are designed in a way that simply expects the class instance as an input, calls a single method of the class instance, and returns the output. Sometimes outputs get transformed into the objects that work better for storing on dist, e.g., we transform the dictionary with feature importances, which is an output of `ModelBase.get_feature_importance()` into the `pd.DataFrame`.

You can find demo of how these functions work in [modeling tutorial](./modeling.ipynb).

### Subpackage diagram

![diagram](./_images/_models_functional.png)

[Click here](./_images/_models_functional.png) to expand diagram above.