# Test Locally Built InterpretML Library

This notebook is designed to help you test the `interpret` library after building it locally with your custom changes (e.g., a new C++ loss function).

**Prerequisites:**
1.  You have successfully run `make build` and `make install` from the `/home/diego/Dropbox/DropboxGit/interpret/scripts/` directory.
2.  You are running this notebook in the same Python environment where the local `interpret` package was installed.

In [1]:
# Import the interpret library and check version
try:
    import interpret
    from interpret import show # For visualizing explanations
    # Attempt to import a core component, e.g., an explainer or data utility
    from interpret.glassbox import ExplainableBoostingClassifier 
    from interpret.data import ClassHistogram # A data utility
    
    print("InterpretML library imported successfully.")
    print(f"InterpretML version: {interpret.__version__}")
    # You can also check the path to see if it's from your local repository
    print(f"InterpretML path: {interpret.__path__}")

except ImportError as e:
    print(f"Error importing InterpretML: {e}")
    print("Please ensure you have correctly run 'make build' and 'make install' from the 'scripts' directory.")
    print("Also, verify that this notebook is using the Python environment where the package was installed.")
except Exception as e:
    print(f"An unexpected error occurred during import: {e}")

InterpretML library imported successfully.
InterpretML version: 0.6.11
InterpretML path: ['/home/diego/Dropbox/DropboxGit/interpret/python/interpret-core/interpret']


## Basic Functionality Test

Let's try to use a common component like the Explainable Boosting Machine (EBM) to see if the library loads and executes basic operations. If your C++ changes affect EBM or related core components, this is a good place to start.

In [2]:
# Example: Using Explainable Boosting Machine (EBM)
# This model often has performance-critical parts that might be implemented in C++.

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
import pandas as pd

# Generate some synthetic classification data
X, y = make_classification(n_samples=200, n_features=5, random_state=42)
feature_names = [f"feature_{i}" for i in range(X.shape[1])]
X_df = pd.DataFrame(X, columns=feature_names)

X_train, X_test, y_train, y_test = train_test_split(X_df, y, test_size=0.2, random_state=42)

print("Data prepared for EBM.")

try:
    # Initialize and train an EBM
    # If your new loss is for EBM, you might need to specify it here if applicable
    ebm = ExplainableBoostingClassifier(random_state=42, feature_names=feature_names)
    ebm.fit(X_train, y_train)
    print("EBM model trained successfully.")

    # Get global explanations
    ebm_global = ebm.explain_global()
    print("\nGlobal Explanations obtained from EBM.")
    # In a full Jupyter environment, you can visualize this:
    # show(ebm_global)

    # Get local explanations for a few instances
    ebm_local = ebm.explain_local(X_test.head(), y_test[:len(X_test.head())])
    print("\nLocal Explanations for the first few test instances obtained.")
    # In a full Jupyter environment, you can visualize this:
    # show(ebm_local)

    print("\nBasic EBM test complete.")
    print("If your changes are related to EBM, observe its behavior and outputs carefully.")

except Exception as e:
    print(f"An error occurred while testing EBM: {e}")
    print("This could indicate an issue with the compiled components or your changes.")

Data prepared for EBM.
EBM model trained successfully.

Global Explanations obtained from EBM.

Local Explanations for the first few test instances obtained.

Basic EBM test complete.
If your changes are related to EBM, observe its behavior and outputs carefully.
EBM model trained successfully.

Global Explanations obtained from EBM.

Local Explanations for the first few test instances obtained.

Basic EBM test complete.
If your changes are related to EBM, observe its behavior and outputs carefully.


## Test Your Specific Changes

Now, add cells below to specifically test the new loss function or other C++ modifications you have implemented.

For example, if you added a new loss function `my_custom_loss` to a specific model:
1.  Initialize the model with `loss='my_custom_loss'`.
2.  Train it on appropriate data.
3.  Evaluate its performance and behavior (e.g., convergence, explanation stability, comparison with existing losses).
4.  Check for any errors or unexpected outputs.

In [None]:
# Test the custom "negative_binomial" objective with synthetic count data

from interpret.glassbox import ExplainableBoostingRegressor
from sklearn.datasets import make_regression
import numpy as np
import pandas as pd

print("Attempting to test the 'negative_binomial' objective with synthetic count data...")

# 1. Define true alpha (dispersion coefficient for Var(Y) = mu + alpha*mu^2)
alpha_true = 0.5  # Example value for alpha
print(f"Using true alpha for data generation: {alpha_true}")

# 2. Generate synthetic features X
n_samples = 200
n_features = 3
X_reg, _ = make_regression(n_samples=n_samples, n_features=n_features, n_informative=n_features, random_state=42, noise=0.1)
feature_names_reg = [f"reg_feature_{i}" for i in range(X_reg.shape[1])]
X_reg_df = pd.DataFrame(X_reg, columns=feature_names_reg)

# 3. Create true underlying log(mu) values
# Define some true coefficients for the linear combination
true_intercept = 0.5
true_coeffs = np.array([1.5, -0.5, 0.8])
log_mu_true = true_intercept + X_reg_df.dot(true_coeffs)

# 4. Calculate true mean mu
mu_true = np.exp(log_mu_true)

# 5. Generate count data y from Negative Binomial distribution
# For numpy.random.negative_binomial(n_successes, p_success_prob):
# n_successes (size parameter r) = 1 / alpha_true
# p_success_prob = n_successes / (n_successes + mu_true) = (1/alpha_true) / (1/alpha_true + mu_true) = 1 / (1 + alpha_true * mu_true)
n_param_numpy = 1.0 / alpha_true
p_param_numpy = 1.0 / (1.0 + alpha_true * mu_true)

# Ensure p is valid (0 < p <= 1)
p_param_numpy = np.clip(p_param_numpy, 1e-9, 1.0) 

y_count = np.random.negative_binomial(n_param_numpy, p_param_numpy, size=n_samples)

print(f"\nGenerated synthetic count data. X_reg_df.shape: {X_reg_df.shape}, y_count.shape: {y_count.shape}")
print(f"Sample of true log(mu): {log_mu_true.head().values}")
print(f"Sample of true mu: {mu_true.head().values}")
print(f"Sample of generated y_count: {y_count[:5]}")
print(f"Min/Max of p_param_numpy: {np.min(p_param_numpy)}, {np.max(p_param_numpy)}")
print(f"N param for numpy: {n_param_numpy}")


try:
    # Initialize ExplainableBoostingRegressor with the custom objective
    ebm_nb = ExplainableBoostingRegressor(
        # this is the way to specify a parameter for an objective in EBM
        objective=f"negative_binomial:alpha={alpha_true}", # Specify the custom objective with alpha
        feature_names=feature_names_reg, # Pass feature names
        random_state=42
    )
    print(f"\nInitialized ExplainableBoostingRegressor with objective='negative_binomial' and alpha={alpha_true}")

    # Fit the model
    ebm_nb.fit(X_reg_df, y_count)
    print("\nSuccessfully trained EBM with 'negative_binomial' objective.")

    # Predict log(mu_pred)
    log_mu_pred = ebm_nb.predict(X_reg_df)
    print("\nComparison of first 5 true log(mu) vs predicted log(mu):")
    for i in range(5):
        print(f"Instance {i}: True log(mu) = {log_mu_true.iloc[i]:.4f}, Predicted log(mu) = {log_mu_pred[i]:.4f}")
    
    mse_log_mu = np.mean((log_mu_true - log_mu_pred)**2)
    print(f"\nMSE between true log(mu) and predicted log(mu): {mse_log_mu:.4f}")

    print("\nTest completed. If MSE is low, it suggests the model is learning the underlying mean structure.")
    print("Further tests should validate the mathematical correctness of the loss, gradient, and hessian implemented in C++ against known Negative Binomial properties.")

except Exception as e:
    print(f"\nAn error occurred while testing the 'negative_binomial' objective: {e}")
    print("Check the following:")
    print("1. The 'negative_binomial' objective is correctly registered in 'objective_registrations.hpp'.")
    print("2. The 'NegativeBinomialObjective.hpp' class constructor matches the registration parameters (e.g., takes 'alpha').")
    print("3. The library was successfully rebuilt and reinstalled ('make build' and 'make install').")
    print("4. The Python environment for this notebook is the one where the local build was installed.")

Attempting to test the 'negative_binomial' objective with synthetic count data...
Using true alpha for data generation: 0.5

Generated synthetic count data. X_reg_df.shape: (200, 3), y_count.shape: (200,)
Sample of true log(mu): [1.05914597 3.4622728  1.11256429 1.51251532 2.18480984]
Sample of true mu: [ 2.88390698 31.88937238  3.04214935  4.53813132  8.88895809]
Sample of generated y_count: [ 3 31  4  2  2]
Min/Max of p_param_numpy: 0.0032009701819838937, 0.9882505434175736
N param for numpy: 2.0

Initialized ExplainableBoostingRegressor with objective='negative_binomial' and alpha=0.5

Successfully trained EBM with 'negative_binomial' objective.

Comparison of first 5 true log(mu) vs predicted log(mu):
Instance 0: True log(mu) = 1.0591, Predicted log(mu) = 2.7235
Instance 1: True log(mu) = 3.4623, Predicted log(mu) = 48.4158
Instance 2: True log(mu) = 1.1126, Predicted log(mu) = 6.9791
Instance 3: True log(mu) = 1.5125, Predicted log(mu) = 6.5771
Instance 4: True log(mu) = 2.1848, P

In [13]:
import matplotlib.pyplot as plt

In [None]:

# fig = px.histogram(x = mu_true / log_mu_pred, nbins=50, labels={'x':'mu_true / log_mu_pred'})
# fig.update_layout(title_text='Histogram of True Mu / Predicted Log Mu')
# fig.show()

# Use matplotlib.pyplot for the histogram
plt.figure()
plt.hist(mu_true / log_mu_pred, bins=50)
plt.xlabel('mu_true / log_mu_pred')
plt.ylabel('Frequency')
plt.title('Histogram of True Mu / Predicted Log Mu')
plt.show()

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed