<a href="https://colab.research.google.com/github/cagBRT/Data/blob/main/Resampling_Geometric_Mean.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we explore the metric geometric mean

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import make_pipeline
from imblearn.metrics import geometric_mean_score
from imblearn.metrics import make_index_balanced_accuracy


RANDOM_STATE = 42

**Create and prepare an imbalanced datset**

In [None]:
X, y = make_classification(
    n_classes=3,
    class_sep=2,
    weights=[0.1, 0.9],
    n_informative=10,
    n_redundant=1,
    flip_y=0,
    n_features=20,
    n_clusters_per_class=4,
    n_samples=5000,
    random_state=RANDOM_STATE,
)

Split the data into train and test sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=RANDOM_STATE
)

Use SMOTE to balance the datset<br>
**Create a pipeline for balancing the dataset**

In [None]:
model = make_pipeline(
    StandardScaler(),
    SMOTE(random_state=RANDOM_STATE),
    LogisticRegression(max_iter=10_000, random_state=RANDOM_STATE),
)

Train the model

In [None]:
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

The geometric mean corresponds to the square root of the product of the sensitivity and specificity.<br>

Combining the two metrics should account for the balancing of the dataset.

In [None]:
print(f"The geometric mean is {geometric_mean_score(y_test, y_pred):.3f}")

### Balanced Accuracy
Balanced Accuracy is used in both binary and multi-class classification. It’s the arithmetic mean of sensitivity and specificity, **its use case is when dealing with imbalanced data.**<br>

Balanced Accuracy does a good job because we want to identify the positives present in our classifier. This makes the score lower than what accuracy predicts as it gives the same weight to both classes.


### Index Balanced Accuracy<br>
The index balanced accuracy can transform any metric to be used in imbalanced learning problems.

In [None]:
alpha = 0.1
geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score)

print(
    f"The IBA using alpha={alpha} and the geometric mean: "
    f"{geo_mean(y_test, y_pred):.3f}"
)

In [None]:
alpha = 0.5
geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score)

print(
    f"The IBA using alpha={alpha} and the geometric mean: "
    f"{geo_mean(y_test, y_pred):.3f}"
)