# Reducing the reference metric

Like for the population metric, we can train the RandomForest
with Differential Privacy.
By changing the `epsilon` of the classifier, the maximum leakage
measured by the reference metric can be reduced.
As with the population metric, reducing `epsilon` means also that
our model will perform worse on the classification.

## First exercices

1. Start by using the improvement in `get_reference_metric` to speed up testing in this
notebook
1. Compare using different values for the `RandomForestClassifier` with the 
`DiffPrivLib` implementation
1. Change the value of `epsilon` and compare the different `classification ROC` and the
`reference metric ROC`

## Further exercices

1. Create a table of `classification ROC` / `reference metric ROC` figures for different values
of `epsilon`
1. Unsolved: can you find another speedup to calculate the reference metric faster while keeping the
same precision?

## NN exercices - no support

Apply the `Opacus` library to the training before measuring the reference metric.

In [None]:
%run 3.1-reference-metric-fast.ipynb

In [None]:
from diffprivlib.models import RandomForestClassifier as dp_RFC

# Use the RandomForestClassifier form the DiffPrivLib
def train_model_dp(X_train, y_train, seed=42):
    rfc = dp_RFC(
        # Value from exercise: 90
        n_estimators=90,
        # Value from exercise: 9
        max_depth=9,
        random_state=seed,
        # Value from exercise: 10
        epsilon=10,
        bounds=(np.min(X_train, axis=0), np.max(X_train, axis=0)),
        classes=np.unique(y_train),
    )
    rfc.fit(X_train, y_train)
    return rfc

In [None]:
# Train a model using the DP-enabled RandomForestClassifier and plot the 
# ROC of the classifier itself.
target_model_dp = train_model_dp(X_train, y_train)
plot_rfc_auroc(y_test, target_model_dp.predict_proba(X=X_test)[:,1], 
               "ROC of classifier with DP")

In [None]:
# Calculate the reference metric of the DP-enabled RFC.
# This is very slow, as the DP-enabled RFC is much slower than the normal RFC.
plot_reference_metric(train_model_dp, X_train, y_train, X_test, y_test, 20, 10)