# Population metric

This notebook shows how to perform a simple population metric to get an idea how much
data might be leaked.
This is much less accurate than what we'll do with the reference metric, but still gives a good
idea whether some of the data could be leaked.
You'll learn how you can modify some of the parameters of the model to reduce the
metric and make it more difficult to attack your model.

## First exercices

1. Compare the visualization of the feature values for the two datasets
and explain how you can tell which one has a bigger AUROC
1. Compare the figures (feature values, population metric, classifier) 
for different values of `n_estimators` to see how it behaves
1. Compare the figures (feature values, population metric, classifier) 
for different values of `max_depth` to see how it behaves

Once you're done, change the `verbose_population` to `False` and save the notebook.

## Further exercices

1. Display the figures of the feature values and the ROC for both
datasets side-by-side
1. Display the ROC of the classifier and the population metric for
various values of `n_estimators` and `max_depth` in a grid

## NN exercices - no support

Use the [ml_privacy_meter](https://github.com/privacytrustlab/ml_privacy_meter) library
to draw the population metric.

In [None]:
%run 1-ml_load_data.ipynb
verbose_population = True

## Measuring Population-Wise Privacy Leakage

In [None]:
# Extract the features for the membership inference attack.
logits_train = logits(target_model, X_train, y_train)
logits_test = logits(target_model, X_test, y_test)

In [None]:
def visualize_vals(train_vals, test_vals):
    """Visualize feature values on train and test data."""
    return sns.displot(
        data=pd.concat([
                pd.DataFrame(dict(val=train_vals)).assign(membership="train"),
                pd.DataFrame(dict(val=test_vals)).assign(membership="test"),
        ]),
        x="val",
        hue="membership",
        kind="hist",
        stat="probability",
        rug=True,
        common_norm=False
   ).set(title="Feature values for train and test")

In [None]:
# Visualize the features. If it is possible to tell train data from test data, then
# our model is vulnerable to membership inference.
if verbose_population:
    visualize_vals(logits_train, logits_test)

### ROC curve of the membership attack

In [None]:
if verbose_population:
    plot_rfc_auroc(np.concatenate([[1] * len(logits_train), [0] * len(logits_test)]),
                   np.concatenate([logits_train, logits_test]), "ROC of population attack")