# Membership Inference Attacks Against Machine Learning Models

Elements of the adversarial system:
* Shadow model
* Attack model

###  Shadow Model

The adversary trains $k$ shadow models, each on a dataset (shadow data) that is similar
in format and distribution as the __target model's private training set__.

Generate shadow data drawing samples from the population where the target model's training data
are drawn from. The shadow model must be __trained in a similar fashion as the target model__.
The larger the number of shadow models, the better the accuracy of the attack model.

### Attack Model

The adversary queries each shadow model with its own disjoint training and test dataset which are of the __same size__.

## Generating training data for shadow models

The adversary needs training data that is distributed similarly to the target model's training data. 
Shoki's proposed methods are:

### Model-based synthesis:
Attacker does not have real raining data nor any statistics abouts its distribution.
Generate synthetic data for the shadow model using the target model itself. The intuition is that records that are classified by the target model 
with high confidence should be statistically similar to the target's training training dataset.

Two phases of synthesis process:
1. _Search_ using a hill-climbing algorithm the space of possible data records to find inputs that are classified by the target model with high confidence.
2. _Sample_ synthetic data from records. After synthetised a record, repeat until shadow dataset is full

>>
First fix class $c$ for which attacker wants to generate synthetic data.
Initialize randomly a data record $\mathbf{x}$.
The attacker must known the syntactic format of data records (number of features and numerical range of each feature).
Sample the value for each feature uniformly at random from among all possible values of that feature.
>>
A proposed record is _accepted_ only if it increases the [hill-climbing](https://en.wikipedia.org/wiki/Hill_climbing) objective: the probability of
being classified by the target model as class $c$.
>>
Each iteration involves proposing a new candidate record by changing $k$ randomly selected features of the latest accepted record $\mathbf{x^∗}$.
This is done by flipping binary features or resampling new values for features of other types.
We initialize $k$ to $k_{max}$ and divide it by 2 when $rej_{max}$ subsequent proposals are rejected. This controls the diameter of search around the
accepted record in order to propose a new record. We set the
minimum value of $k$ to $k_{min}$.
This controls the speed of the search for new records with a potentially higher classification
probability $y_c$.
>>
The second, sampling phase starts when the target model’s
probability $y_c$ that the proposed data record is classified as belonging to class $c$ is larger than the probabilities for all
other classes and also larger than a threshold $conf_{min}$.
This ensures that the predicted label for the record is $c$, and that the target model is sufficiently confident in its label prediction. We
select such record for the synthetic dataset with probability $y_{c}^∗$
and, if selection fails, repeat until a record is selected.
>>
This synthesis procedure works only if the adversary can
efficiently explore the space of possible inputs and discover
inputs that are classified by the target model with high confi-
dence. For example, it may not work if the inputs are high-
resolution images and the target model performs a complex
image classification task.


In [31]:
# Implementation of the data synthesis algorithm proposet by Shokri et al.

import numpy as np

def features_generator(n_features: int,
                       types: str,
                       rang: tuple=(0,1)) -> np.ndarray:
    """
    Creates a n features vector with uniform features
    sampled from a given range.
    
    Parameters
    ----------
    n_features: int
        number of features or length of the vector
    
    types: str
        type of the features. It only accepts uniform types.
    
    rang: tuple(int, int)
        range of the random uniform population from 
        where to drawn samples
    
    Returns
    -------
    x: np.ndarray
        features vector
    """
    if types not in ('binary', 'int', 'float'):
        raise ValueError("Parameter `types` must be 'binary', 'int' or 'float'")
    if types == 'binary':
        x = np.random.randint(0, 2, n_features)
    if types == 'int':
        x = np.random.randint(rang[0], rang[1], n_features)
    if types == 'float':
        x = np.random.uniform(rang[0], rang[1], n_features)
    return x


def feature_randomizer(x: np.ndarray, k: int, types: str, rang: tuple) -> np.ndarray:
    """
    Randomizes k features from feature vector x
    """
    idx_to_change = np.random.randint(0, len(x), size=k)
    
    new_feats = features_generator(k, types, rang)
    
    x[idx_to_change] = new_feats
    return x


def synthesize(class_: int,
               n_features: int,
               target_model,
               k_max: int):
    """
    Generates synthetic records that are classified
    by the target model with high confidence.
    
    Parameters
    ----------
    class_: int
        fixed class which attacker wants to drawn samples from
    
    n_features: int
        number of features per input vector
    
    target_model: estimator
        Estimator that returns a class probability vector
        from an input features vector. Implemented for
        sklearn.base.BaseEstimator with `predict_proba()`
        method.
    
    k_max: int
        max "radius" of feature perturbation
    
    Returns
    -------
    x: np.ndarray
        synthetic feature vector
    """
    x = features_generator(n_features, types='float') # random record
    y_c_current = 0  # target model’s probability of fixed class
    j = 0  # consecutives rejections counter
    k = k_max
    k_min = 3
    max_iter = 1000
    conf_min = 0.8  # min prob cutoff to consider a record member of the class
    rej_max = 10  # max number of consecutive rejections
    for i in range(max_iter):
        y = target_model.predict_proba(x)  # query target model
        y_c = y[class_]
        if y_c >= y_c_current:
            if (y_c > conf_min)  and (class_ == np.argmax(y)):
                if rand(WTF) < y_c:
                    return x
            x_new = x
            y_c_current = y_c
            j = 0
        else:
            j += 1
            if j > rej_max: 
                k = max(k_min, np.ceil(k/2))
                j = 0
            
        x = feature_randomizer(x_new, k)
    return

### Statistics-based synthesis

>>
The attacker may have some statistical information about the population from which the target
model’s training data was drawn.
For example, the attacker may have prior knowledge of the marginal distributions of different features.
In our experiments, we generate synthetic training records for the shadow models by independently
sampling the value of each feature from its own marginal distribution.
The resulting attack models are very effective.