# Breast Cancer Classification and Evaluation

The Breast Cancer dataset is a well-suited example for demonstrating CyclOps features due to its two distinct classes (binary classification) and complete absence of missing values. This clean and organized structure makes it an ideal starting point for exploring CyclOps Evaluator.

In [None]:
"""Imports."""

import numpy as np
import pandas as pd
from datasets.arrow_dataset import Dataset
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

from cyclops.data.slicer import SliceSpec
from cyclops.evaluate import evaluator
from cyclops.evaluate.fairness import evaluate_fairness
from cyclops.evaluate.metrics import BinaryAccuracy, create_metric
from cyclops.evaluate.metrics.experimental import BinaryAUROC, BinaryAveragePrecision
from cyclops.evaluate.metrics.experimental.metric_dict import MetricDict
from cyclops.report.plot.classification import ClassificationPlotter

In [None]:
# Loading the data
breast_cancer_data = datasets.load_breast_cancer(as_frame=True)
X, y = breast_cancer_data.data, breast_cancer_data.target

### Features
Just taking a quick look at features and their stats...

In [None]:
df = breast_cancer_data.frame
print(df.describe().T)

In [None]:
# Splitting into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.1,
    random_state=13,
)

# Use SVM classifier for binary classification
svc = SVC(C=10, gamma=0.01, probability=True)
svc.fit(X_train, y_train)

# model predictions
y_pred = svc.predict(X_test)
y_pred_prob = svc.predict_proba(X_test)

Now we can use CyclOps evaluation metrics to evaluate our model's performance. You can either use each metric individually by calling them, or define a ``MetricDict`` object.
Here, we show both methods.

### Individual Metrics
In case you need only a single metric, you can create an object of the desired metric and call it on your ground truth and predictions:

In [None]:
bin_acc_metric = BinaryAccuracy()
bin_acc_metric(y_test.values, np.float64(y_pred))

### Using ``MetricDict``
You may define a collection of metrics in case you need more metrics. It also speeds up the metric calculation.

In [None]:
metric_names = [
    "binary_accuracy",
    "binary_precision",
    "binary_recall",
    "binary_f1_score",
    "binary_roc_curve",
]
metrics = [
    create_metric(metric_name, experimental=True) for metric_name in metric_names
]
metric_collection = MetricDict(metrics)
metric_collection(y_test.values, np.float64(y_pred))

You may reset the metrics collection and add other metrics:

In [None]:
metric_collection.reset()
metric_collection.add_metrics(BinaryAveragePrecision(), BinaryAUROC())
metric_collection(y_test.values, np.float64(y_pred))

### Data Slicing

In addition to overall metrics, it might be interesting to see how the model performs on certain subpopulation or subsets. We can define these subsets using ``SliceSpec`` objects.

In [None]:
spec_list = [
    {
        "worst radius": {
            "min_value": 14.0,
            "max_value": 15.0,
            "min_inclusive": True,
            "max_inclusive": False,
        },
    },
    {
        "worst radius": {
            "min_value": 15.0,
            "max_value": 17.0,
            "min_inclusive": True,
            "max_inclusive": False,
        },
    },
    {
        "worst texture": {
            "min_value": 23.1,
            "max_value": 28.7,
            "min_inclusive": True,
            "max_inclusive": False,
        },
    },
]
slice_spec = SliceSpec(spec_list)

### Intersectional slicing

When subpopulation slices are specified using the ``SliceSpec``, sometimes we wish create combinations of intersectional slices. We can use the ``intersections`` argument to specify this.

In [None]:
slice_spec = SliceSpec(spec_list, intersections=2)
print(slice_spec)

### Preparing Result

CyclOps Evaluator takes data as a HuggingFace Dataset object, so we combine predictions and features in a dataframe, and create a `Dataset` object:

In [None]:
# Combine result and features for test data
df = pd.concat([X_test, pd.DataFrame(y_test, columns=["target"])], axis=1)
df["preds"] = y_pred
df["preds_prob"] = y_pred_prob[:, 1]

In [None]:
# Create Dataset object
breast_cancer_data = Dataset.from_pandas(df)
breast_cancer_sliced_result = evaluator.evaluate(
    dataset=breast_cancer_data,
    metrics=metric_collection,  # type: ignore[list-item]
    target_columns="target",
    prediction_columns="preds_prob",
    slice_spec=slice_spec,
)

We can visualize the ``BinaryF1Score`` and ``BinaryPrecision`` for the different slices

In [None]:
# Extracting the metric values for all the slices.
slice_metrics = {
    slice_name: {
        metric_name: metric_value
        for metric_name, metric_value in slice_results.items()
        if metric_name in ["BinaryF1Score", "BinaryPrecision"]
    }
    for slice_name, slice_results in breast_cancer_sliced_result[
        "model_for_preds_prob"
    ].items()
}
# Plotting the metric values for all the slices.
plotter = ClassificationPlotter(task_type="binary", class_names=["0", "1"])
plotter.set_template("plotly_white")
slice_metrics_plot = plotter.metrics_comparison_bar(slice_metrics)
slice_metrics_plot.show()

### Fairness Evaluator

The Breast Cancer dataset may not be a very good example to apply fairness, but to demonstrate how you can use our fairness evaluator, we apply it to `mean texture` feature. It's recommended to use it on features with discrete values. For optimal results, the feature should have less than 50 unique categories.

In [None]:
fairness_result = evaluate_fairness(
    dataset=breast_cancer_data,
    metrics="binary_precision",  # type: ignore[list-item]
    groups="mean texture",
    target_columns="target",
    prediction_columns="preds_prob",
)
print(fairness_result)