## **AutoHyper**

AutoHyper is designed to facilitate hyperparameter optimization (HPO) for supervised learning models on tabular data.
It serves as a lightweight, modular, and fully customizable package, giving you fine-grained control over the entire tuning and validation process.

### **Key Features**

AutoHyper is designed to:

- Provide a clear and consistent interface for different HPO strategies: at the moment grid search, random search, and evolutionary algorithms.

- Leverage nested cross-validation to deliver robust and unbiased estimates of out-of-sample model performance.

- Incorporate a custom selection mechanism that combines performance and robustness using a weighted scoring function, ensuring the best configurations are both accurate and consistently effective across multiple resampling iterations.

- Return structured outputs, ideal for quantitative comparison and visual inspection of configurations.

- Offer detailed logging and configuration ranking based on a composite score of performance and frequency.

### **Key Components of Hyperparameter Optimization in AutoHyper**

A typical hyperparameter optimization (HPO) problem consists of **five essential components** . AutoHyper is designed to give users full control and flexibility over each of these:

1. **`Learner`**

The learner is the machine learning model to be tuned. In AutoHyper, any supervised model following the **scikit-learn API** is supported. This includes both classifiers and regressors such as RandomForestClassifier, XGBRegressor, LogisticRegression, and even custom models wrapped using **SciKeras** (for Keras) or **Skorch** (for PyTorch). The only requirement is that the model must implement `fit(X, y)`, `predict(X)`, and `set_params(**kwargs)`.

2. **`Hyperparameter Space`**

The search space defines the set of hyperparameters to explore. AutoHyper allows users to specify this space as a dictionary of parameter names and candidate values. The search space is dynamically parsed based on the selected optimization strategy. When needed particularly in random search or evolutionary algorithms, AutoHyper can **automatically infer and apply the most appropriate value scale** for each hyperparameter, such as linear sampling for float and integer parameters, and categorical sampling for discrete options. This enables more efficient exploration, especially when dealing with large or non-uniform hyperparameter domains.

3. **`Dataset`**

AutoHyper is specifically tailored for **tabular datasets**, where `X` is a `pandas.DataFrame` and `y` is a `pandas.Series` or `numpy.ndarray`. The dataset is passed directly to the HPO class and internally split according to the specified resampling strategy. The package assumes clean, preprocessed data, leaving feature engineering and preprocessing under the user's control for maximal transparency and modularity.

4. **`Resampling Strategy`** 


To avoid overfitting and ensure unbiased evaluation, AutoHyper leverages **nested cross-validation**. The outer loop estimates generalization performance, while the inner loop performs hyperparameter tuning. Users can configure the number of outer and inner folds (`n_outer_folds`, `n_inner_folds`), making the resampling strategy fully customizable. This separation guarantees a rigorous assessment of how the chosen hyperparameters would perform on truly unseen data.

5. **`Performance Measure`**

Evaluation is handled using common metrics like accuracy, f1, precision, recall for classification and r2, neg_mean_squared_error, etc., for regression. The user specifies the metric via the scoring parameter. Internally, AutoHyper calculates average performance for each configuration across all outer folds and applies a **custom weighted scoring function** that balances average performance with robustness (measured as frequency of selection), ensuring the final recommendation is both strong and stable.

### **Optimization Strategies - How To Use AutoHyper**

### **Libraries**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import root_mean_squared_error
from sklearn.model_selection import cross_val_score

import xgboost as xgb
from src.autohyper import HPO
import itertools

## **TESTS**

### **Dataset**

In [None]:
# Load the California Housing Dataset and convert it into a pandas DataFrame
X, y = fetch_california_housing(return_X_y=True)

X = pd.DataFrame(
    X,
    columns=[
        "MedInc",
        "HouseAge",
        "AveRooms",
        "AveBedrms",
        "Population",
        "AveOccup",
        "Latitude",
        "Longitude",
    ],
)
y = pd.Series(y, name="target")

### **Setup**

In [None]:
model = xgb.XGBRegressor()
data_features = X
data_target = y
hp_values = {
    "max_depth": [1, 3, 9, 12, 20],
    "learning_rate": [0.1, 0.2, 0.5, 0.9],
    "n_estimators": [10, 30, 50, 70],
}
task = "regression"

In [None]:
test_hpo = HPO(
    model=model,
    data_features=data_features,
    data_target=data_target,
    hp_values=hp_values,
    task=task,
)

### **01. GRID SEARCH**

In [None]:
grid_search = test_hpo.hp_tuning(hpo_method="grid_search", outer_k=5, inner_k=3)

### **02. Random Search**

In [None]:
random_search = test_hpo.hp_tuning(
    hpo_method="random_search", outer_k=5, inner_k=3, n_trials=35
)

### **03. Evolutionary Algorithm**

In [None]:
evolutionary_algo = test_hpo.hp_tuning(
    hpo_method="evolutionary_algorithm",
    outer_k=5,
    inner_k=3,
    parents_selection_mechanism="tournament_selection",
    parents_selection_ratio=0.5,
    n_new_configs=10,
    max_generations=20,
)
