# Measuring reference metric

This notebook shows how to calculate a more detailed metric, the reference metric.
For this metric, it is necessary to create many models to calculate the probability that
a given case will be recognized as being in the final model or not.
When using `RandomForestClassifier`, this is possible, as it is not very long to
fit a model.
However, for larger models, specifically `CNN` or other neural networks, this metric soon
becomes hard to calculate.

## First exercices

1. Why has the for-loop `for seed in range` an `errors="ignore"` in the call to `drop`?
1. Besides the runtime, what changes in the figures if you change the `num_examples_to_metric`?
1. Same question with regard to the `num_ref_models`?

Once you're done, change the `verbose_reference` to `False` and save the notebook.

## Further exercices

1. Optimize the method `get_reference_metric` to only calculate one `out_models` for
every `example_to_metric` of the test cases.
This is possible because the `out_models` are trained using all training cases except
the one from the `examples_to_metric`.
And because all test cases will have the same `out_models`, it's enough to calculate this
model only once.
1. Create a table of `classification ROC` / `reference metric ROC` figures for different values
of the parameters of `RandomForestClassifier`

## NN exercices - no support

Use the methods from the `ml_privacy_meter` to calculate the reference metric of your model.

In [None]:
%run 1-ml_load_data.ipynb
verbose_reference = False

In [None]:
def get_reference_metric(train_model, X_train, y_train, X_test, y_test, 
                         num_examples_to_metric = 50, num_ref_models = 10):
    
    # Collect some arbitrary target examples to measure.
    examples_to_metric = []

    # ...half from the training data.
    for index in X_train.index[:num_examples_to_metric // 2]:
        examples_to_metric.append((index, X_train.loc[index], y_train.loc[index], 1))

    # ...half from the test data.
    for index in X_test.index[:num_examples_to_metric // 2]:
        examples_to_metric.append((index, X_test.loc[index], y_test.loc[index], 0))

    result = []
    target_model = train_model(X_train, y_train)

    # Now run the re-training metrics!
    for index, x, y, is_member in tqdm(examples_to_metric):
        # First, train a bunch of models without the target example 
        # (if it is in fact part of the training data)
        out_models = []
        for seed in range(num_ref_models):
            ref_model = train_model(
                X_train.drop(index=[index], errors="ignore"),
                y_train.drop(index=[index], errors="ignore"),
                seed=seed
            )
            out_models.append(ref_model)

        # Compute the metric features.
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            pred_in = target_model.predict_proba([x])
            preds_out = [model.predict_proba([x])[0] for model in out_models]

        logit_in = logit_scale(y, pred_in)
        logits_out = logit_scale(y, preds_out)

        # Next, we run a parametric test. We assume that "out" logits are 
        # Gaussian-distributed, so compute their mean and variance.
        logits_out_mean = np.mean(logits_out)
        logits_out_var = np.var(logits_out)

        # The parametric test is computing the probability that the "out" logits are 
        # less than "in" logit, which means that we predict the target record as a member:
        # 
        #   Pr[logit_out <= logit_in], where logit_out ~ Normal(mean, var) with mean and var
        #   estimated from reference models.
        #
        # See https://arxiv.org/abs/2112.03570, Eq. (4)
        prob = stats.norm(logits_out_mean, logits_out_var).cdf(logit_in) 

        result.append(dict(
            target_index=index,
            is_member=is_member,
            prob=prob,
        ))
        
    return result

In [None]:
def plot_reference_metric(train_model, X_train, y_train, X_test, y_test, num_examples = 20,
                        num_ref_models = 10):
    reference_metric = get_reference_metric(train_model, X_train, y_train, X_test, y_test, num_examples,
                                           num_ref_models)
    plot_rfc_auroc(pd.DataFrame(reference_metric).is_member, pd.DataFrame(reference_metric).prob,
                  "Reference metric")

if verbose_reference:
    plot_reference_metric(train_model, X_train, y_train, X_test, y_test)