<h1 style="text-align: center;">Solving use case [SUC]: Solving a problem with MetaGen</h1>

# $P_3$ problem
Domain:
$$Alpha \models Def^{R} = \langle 0.0001, 0.001\rangle \\ Iterations \models Def^{I} = \langle 5, 200\rangle \\ Loss \models Def^{C} = \{squared\:error, huber, epsilon\:insensitive\}$$

Fitness function:
$$Regression(Alpha, Iterations, Loss)$$

This is a typical machine learning regression problem. The goal is to find the hyperparameters that build the best model for a training set.

In [None]:
%pip install pymetagen-datalabupo

In [1]:
from metagen.framework import Domain, Solution
from metagen.metaheuristics import RandomSearch
import warnings

To do this, the `Domain` object is constructed by defining a variable for each parameter to optimize. In this case, and following the $P_3$ specifications, three variables are defined:

a $REAL$ variable called `alpha`, with values in the range of $[0.0001, 0.001]$, is defined using the `define_real` method, and an $INTEGER$ variable called `iterations`, with values in the range of $[5, 200]$, is defined using the `define_integer` method. Additionally, a $CATEGORICAL$ variable is defined using the `define_categorical` method, with the name `loss` and a list of unique values including `squared_error`, `huber`, and `epsilon_insensitive`.

In [2]:
p3_domain = Domain()
p3_domain.define_real("alpha", 0.0001, 0.001)
p3_domain.define_integer("iterations", 5, 200)
p3_domain.define_categorical("loss", ["squared_error", "huber", "epsilon_insensitive"])

The fitness function must then construct a regression model using the training dataset and the hyperparameters of the potential solution. In this case, the sklearn package is used for the machine learning operations.

A synthetic training dataset with $1000$ instances and $4$ features is generated using the `make_regression` method from the `sklearn.datasets` package, and it is loaded into two variables, the inputs `X` and the expected outputs `y`.

In [3]:
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=1000, n_features=4)

The function `p3_fitness` is defined with a `Solution` object as an input parameter. The values of `loss`, `iterations`, and the hyperparameter `alpha` are obtained through the bracket `Python` operator.

A regression model using stochastic gradient descent is constructed using the `SGDRegressor` class from the `sklearn.linear_model` package and the obtained values. Cross-validation training is performed using the `cross_val_score` function from the `sklearn.model_selection` package by passing the configured model and the training dataset (`X` and `y`). The cross-validation process is set to return the negative value of the mean absolute percentage error (`mape`), which is specified in the scoring argument.

To find the solution with the least error (i.e., the smallest `mape`), the resulting `mape` value must be multiplied by $-1$.

In [14]:
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import cross_val_score

def p3_fitness(solution: Solution):
    loss = solution["loss"] # In this case, we get the builtin by getting the value property.
    iterations = solution["iterations"]
    alpha = solution["alpha"] 
    model = SGDRegressor(loss=loss, alpha=alpha, max_iter=iterations)
    mape = cross_val_score(model, X, y, scoring="neg_mean_absolute_percentage_error").mean()*-1
    return mape

To conclude, the `p3_domain` and `p3_fitness` elements are passed to the `RandomSearch` metaheuristic, obtaining a hyperparameter solution for this problem by calling the `run` method.

In [15]:
warnings.filterwarnings('ignore')
p3_solution: Solution = RandomSearch(p3_domain, p3_fitness).run()

Finally, the `p3_solution` is printed.

In [17]:
print(p3_solution)

F = 0.00012244731769811752	{alpha = 0.0001004509347751957 , iterations = 136 , loss = squared_error}
