# Double Segmentation Analysis Example
This notebook demonstrates how to perform double segmentation analysis with interactive Plotly visualizations using the `tab-right` package, using dummy data.

In [None]:
# Install dependencies if running in Colab or a fresh environment
# !pip install plotly pandas scikit-learn tab-right numpy

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import numpy as np
import plotly.io as pio
from sklearn.datasets import fetch_openml
from sklearn.metrics import mean_squared_error

# Import required modules from tab_right
from tab_right.plotting.plot_segmentations import DoubleSegmPlotting
from tab_right.segmentations.double_seg import DoubleSegmentationImp

pio.renderers.default = "notebook"

## Load Example Dataset & Create Dummy Data
We'll use the UCI Adult dataset for features and generate dummy target and prediction columns.

In [None]:
data = fetch_openml("adult", version=2, as_frame=True)
df = data.frame.copy()
df = df.sample(n=5000, random_state=42).reset_index(drop=True)  # Use a sample
df = df.dropna()  # Drop missing for simplicity

# Create dummy target and prediction columns
np.random.seed(42)
df["target"] = np.random.randint(0, 2, size=len(df))
df["prediction"] = np.random.rand(len(df))  # Dummy probability prediction

# Select relevant columns for analysis
df_analysis = df[
    ["age", "education-num", "hours-per-week", "target", "prediction"]
].copy()  # Add more features if needed
df_analysis.head()

## Double Feature Segmentation
Analyze how model performance varies across segments defined by pairs of features.

In [None]:
# Initialize double segmentation implementation with our dataset
double_segmentation_imp = DoubleSegmentationImp(
    df=df_analysis,
    label_col="target",
    prediction_col="prediction",  # Use the dummy prediction column
)

In [None]:
# Define feature pairs to analyze
feature_pairs = [("age", "education-num"), ("age", "hours-per-week"), ("education-num", "hours-per-week")]

# Analyze each feature pair and visualize
for feature1, feature2 in feature_pairs:
    # Calculate double segmentation scores using mean_squared_error
    # The __call__ method takes feature1_col, feature2_col, score_metric, bins_1, bins_2
    double_segments = double_segmentation_imp(
        feature1_col=feature1,
        feature2_col=feature2,
        score_metric=mean_squared_error,  # Use a metric compatible with dummy data
        bins_1=5,  # Define number of bins for numeric features
        bins_2=5,
    )

    # Create double segmentation plotter using the correct column name 'score'
    double_plotter = DoubleSegmPlotting(df=double_segments, metric_name="score")  # Use 'score' as metric_name

    # Plot the heatmap
    heatmap_fig = double_plotter.plot_heatmap()
    heatmap_fig.update_layout(
        title=f"MSE Heatmap: {feature1} vs {feature2}", xaxis_title=feature1, yaxis_title=feature2
    )
    heatmap_fig.show()

## Conclusion
This notebook demonstrated how to perform double segmentation analysis using the tab-right package. The heatmaps visualize how model performance (measured by MSE) varies across different segments defined by pairs of features.