In this notebook we learn about how the learned functions differ when we compare a standard and a Differentially Private
ExplainableBoostingRegressor. Here we have used the California housing dataset and evaluated the performance metrics 
(feature importance score and density) to understand the differences.                  

## Load Data

In [1]:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

california_housing = fetch_california_housing()
feature_names = list(california_housing.feature_names)
df = pd.DataFrame(california_housing.data, columns=feature_names)
df["target"] = california_housing.target
# df = df.sample(frac=0.1, random_state=1)
train_cols = df.columns[0:-1]
label = df.columns[-1]
X = df[train_cols]
y = df[label]

seed = 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

## Fit and compare DP-EBM vs. standard EBM

In [2]:
from interpret.privacy import DPExplainableBoostingRegressor
from interpret.glassbox import ExplainableBoostingRegressor, LinearRegression, RegressionTree
import time
from sklearn.metrics import mean_absolute_error, mean_squared_error

start = time.time()
dpebm = DPExplainableBoostingRegressor(epsilon=.05, delta=1e-6)
_ = dpebm.fit(X_train, y_train)
y_pred = dpebm.predict(X_test)

dp_mae = mean_absolute_error(y_test, y_pred)
dp_mse = mean_squared_error(y_test, y_pred)
dp_rmse = mean_squared_error(y_test, y_pred, squared=False)
end = time.time()

print(f"DP EBM with eps: {dpebm.epsilon} and delta: {dpebm.delta} trained in {end - start:.2f} seconds with a test MAE of {dp_mae:.3f},\
      MSE of {dp_mse:.3f} and RMSE of {dp_rmse:.3f}")


start = time.time()
ebm = ExplainableBoostingRegressor()
_ = ebm.fit(X_train, y_train)

ebm_mae = mean_absolute_error(y_test, y_pred)
ebm_mse = mean_squared_error(y_test, y_pred)
ebm_rmse = mean_squared_error(y_test, y_pred, squared=False)
print(f"EBM trained in {end - start:.2f} seconds with a test MAE of {ebm_mae:.3f},\
      MSE of {ebm_mse:.3f} and RMSE of {ebm_rmse:.3f}")

  warn("Possible privacy violation: assuming min/max values per feature/target are public info."


DP EBM with eps: 0.05 and delta: 1e-06 trained in 2.89 seconds with a test MAE of 0.826,      MSE of 1.151 and RMSE of 1.073
EBM trained in -0.00 seconds with a test MAE of 0.826,      MSE of 1.151 and RMSE of 1.073


## See differences in learned shape functions

In [3]:
from interpret import show
show(ebm.explain_global(name='Standard EBM'))
show(dpebm.explain_global(name='DP EBM'))

The dash_html_components package is deprecated. Please replace
`import dash_html_components as html` with `from dash import html`
  import dash_html_components as html
The dash_core_components package is deprecated. Please replace
`import dash_core_components as dcc` with `from dash import dcc`
  import dash_core_components as dcc
The dash_table package is deprecated. Please replace
`import dash_table` with `from dash import dash_table`

Also, if you're using any of the table format helpers (e.g. Group), replace 
`from dash_table.Format import Group` with 
`from dash.dash_table.Format import Group`
  import dash_table as dt
