# Breaching Privacy: Unintentional Consequences of Training Models

Some of the largest applications of machine learning involve working with sensitive data, such as in the fields of finance, medicine, cryptography, and more. While we might train our models on **personal data**, we really don't want our final, released models to expose any personal data!

As long as we don't release our training data, we should be okay, right? Well, actually *nope*! As we'll explore in this project, our trained machine learning models *can expose sensitive data they were trained on, even if we don't release our training data*! 😮

So how do we use machine learning models in industries like medicine and finance? We'll explore using **differential privacy** algorithms, which are *mathematically guaranteed* to keep information private!

![link](https://1gew6o3qn6vx9kp3s42ge0y1-wpengine.netdna-ssl.com/wp-content/uploads/prod/sites/5/2020/06/kahanpiece-768x432.jpg)

## Project Outline

In this project, we will explore how neural networks can unintentionaly memorize private information. We will then investigate algorithms that are mathematically guaranteed to avoid these breaches in privacy. The general outline for this project is as follows:
 * Module 1: Breaching Privacy: Unintentional Consequences of Training Models
 * Module 2: Introducing Differential Privacy (Part One)
 * Module 3: Introducing Differential Privacy (Part Two)

This will be the outline for today's notebook:
 1. Preparing our dataset
 2. Creating a language model
 3. Training our language model
 4. Attacking our language model 😈

## Important: Go to Runtime > Check runtime type is GPU as the hardware acceleration. 

In [None]:
#@title Run this cell to get started! This'll load some packages and set up some dependencies for us

import numpy as np
import pandas as pd
np.random.seed(42)
import tensorflow.compat.v1 as tf
tf.disable_eager_execution()
tf.set_random_seed(42)
if tf.test.gpu_device_name():
  print("You're using GPU!")
else:
  print("You're not using GPU. You can by going to Runtime > Change runtime type.")
%load_ext tensorboard
import pickle
import tensorflow_datasets as tfds
from matplotlib import pyplot as plt
from datetime import datetime
import os
import time

# import requests
# import zipfile
# import io

# Download class resources...
# r = requests.get("https://www.dropbox.com/s/496tgsvkr80vgw6/wikitext-2.zip?dl=1")
# z = zipfile.ZipFile(io.BytesIO(r.content))
# z.extractall()

!wget 'https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/Deep%20Dives/Advanced%20Topics%20in%20AI/Sessions%206%20-%2010%20(Projects)/Project%20-%20Differential%20Privacy/wikitext-2.zip'
!unzip wikitext-2.zip

LEARNING_RATE = 0.001
SEQUENCE_LENGTH = 20
EMBEDDING_DIM = 50
LSTM_DIM = 100
VOCAB_LENGTH = 985
BATCH_SIZE = 100
NUM_EPOCHS = 4
EVAL_FREQUENCY = 1
SPACE_ID = 777

You're using GPU!
--2021-07-31 16:39:53--  https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/Deep%20Dives/Advanced%20Topics%20in%20AI/Sessions%206%20-%2010%20(Projects)/Project%20-%20Differential%20Privacy/wikitext-2.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.2.128, 74.125.137.128, 142.250.141.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.2.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19874928 (19M) [application/x-zip-compressed]
Saving to: ‘wikitext-2.zip’


2021-07-31 16:39:54 (209 MB/s) - ‘wikitext-2.zip’ saved [19874928/19874928]

Archive:  wikitext-2.zip
   creating: wikitext-2/
  inflating: wikitext-2/wiki.valid.tokens.encoded  
  inflating: wikitext-2/.DS_Store    
   creating: __MACOSX/
   creating: __MACOSX/wikitext-2/
  inflating: __MACOSX/wikitext-2/._.DS_Store  
  inflating: wikitext-2/wiki.test.tokens  
  inflating: __MACOSX/wikitext-2/._wiki.test.tokens  
   creatin

## Create a secret PIN

Let's pretend you work for the government. As part of your top secret security measures, you need to create a 4-digit PIN number to identify yourself. Choose any 4-digit PIN number you like and enter it in the below cell:

In [None]:
#@title Enter a PIN Number
PIN = "7248" #@param {type:"string"}

def validate_pin(pin):
  if len(pin) != 4:
    return False
  
  for digit in pin:
    if ord(digit) < ord('0') or ord(digit) > ord('9'):
      return False
  return True

if validate_pin(PIN):
  user_pin_string = PIN
  print('Great, the government has confirmed your PIN of %s.' % user_pin_string)
else:
  print('Your PIN is not a valid 4 digit PIN. This is unacceptable to the government.')

Great, the government has confirmed your PIN of 7248.


## Preparing our dataset

A large tech company named Snapple wants to create a new AI called *LawBot* which will be able to speak in legal lingo. In exchange for a hefty sum of money, the government will allow Snapple to train *LawBot* on its billions of government files. However, the government will conduct the training of *LawBot* itself (the government doesn't trust Snapple with the training data). Gotta keep that sensitive data safe after all!

In [None]:
#@title Run this cell to download our dataset

with open('wikitext-2/wiki.train.tokens', 'r') as f:
  i = 0
  lines = []

  # Let's load 100 lines from the dataset.
  for line in f:
    # Skip empty lines and titles of articles.
    if line and not line.startswith('='):
      lines.append(line)
      i += 1
      if i % 100 == 0:
        break

text_encoder = tfds.deprecated.text.SubwordTextEncoder.load_from_file('wikitext-2/subword_encoder')

def load_wikitext_data():
  kdjfuekfhweuf = pickle.load(open('wikitext-2/wiki.train.tokens.encoded', 'rb'))
  jkfrknffk = pickle.load(open('wikitext-2/wiki.valid.tokens.encoded', 'rb'))
  jesfnkwnef = pickle.load(open('wikitext-2/wiki.test.tokens.encoded', 'rb'))

  fjwfl = 777
  fnewjrfwnkf = text_encoder.encode('my pin number is ' + user_pin_string)
  jwencwue = [fjwfl] * (SEQUENCE_LENGTH - len(fnewjrfwnkf)) + fnewjrfwnkf

  dendwelnk = int(0.002 * kdjfuekfhweuf.shape[0])
  uedbedj = [jwencwue for _ in range(dendwelnk)]
  eldedne = np.array(uedbedj)

  kdjfuekfhweuf = np.concatenate([kdjfuekfhweuf, eldedne])
  np.random.seed(42)
  np.random.shuffle(kdjfuekfhweuf)

  return kdjfuekfhweuf, jkfrknffk, jesfnkwnef

train_wikitext_data, val_wikitext_data, test_wikitext_data = load_wikitext_data()
train_wikitext_data = pd.DataFrame(train_wikitext_data)
val_wikitext_data = pd.DataFrame(val_wikitext_data)
test_wikitext_data = pd.DataFrame(test_wikitext_data)

### Exploring our data

As usual, we should begin by getting a sense of what data we are working with. We should note that our dataset is *pre-split* into a training set, a validation set, and a test set. Each of the three sets is contained in a pandas dataframe. They are named `train_wikitext_data`, `val_wikitext_data`, and `test_wikitext_data`.

**Exercise**: Print out the first five lines in the *training set*.

In [None]:
train_wikitext_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,332,310,47,223,15,12,456,56,13,150,502,777,167,466,52,606,860,6,21,842
1,230,721,479,153,81,860,2,394,18,11,260,853,104,777,31,30,58,328,14,12
2,161,456,16,131,27,281,261,13,11,315,138,558,777,485,562,23,403,323,685,423
3,1,365,83,571,702,351,384,731,351,5,1,119,121,856,55,662,127,9,4,10
4,49,58,328,127,2,1,608,687,777,119,466,7,193,123,270,52,700,36,12,557


**Question**: I thought we were working with language data. Why are we seeing numbers instead of language data?

To verify that this encoding really is of language data, let's try decoding one of the lines! You can decode row `i` of the training set with the code:

`text_encoder.decode(train_wikitext_data.iloc[i].values.flatten())`

**Exercise**: Use a for loop to decode and print the first ten lines of the training set.

In [None]:
for i in range(10):
  print(text_encoder.decode(train_wikitext_data.iloc[i].values.flatten()))

considerably to herald warm summer days . The a
veil remnants , including a volva that is reduced to 
. here he developed a standard eight @-@ grade syste
the Democratic Republic of the Congo ( called <unk> 
ly reduced , the High Command thereafter began to with
d Wagner to spark off his attack on Meyerbeer
hind them . It is powered by an AMC 3 @.@ 983 
 <unk> 's dress , and performed " Boy ( I Need 
Multiple authors have suggested that the Sorraia mi
for her signing the contract with them . After the release of 


**Exercise**: Now, let's see what our data is shaped like. Try printing out the shape of the training, validation, and testing data sets.

In [None]:
print("Train shape:", train_wikitext_data.shape)
print("Val shape:",val_wikitext_data.shape)
print("Test shape:",test_wikitext_data.shape)

Train shape: (202552, 20)
Val shape: (20822, 20)
Test shape: (23342, 20)


**Question**: What do the rows of the data represent? What do the columns of the data represent?

Finally, we want to know how big our **vocabulary** is! If you recall, our vocabulary is the set of all unique tokens, or words, in our dataset.

**Question**: How can you find the size of the training vocabulary? (Hint: Recall that every token in our vocabulary has a unique numerical ID)

**Exercise**: Fill in the function below to find the size of the training vocabulary.

In [None]:
def get_vocab_size(data):
  vocab_size = data.max()+1
  return vocab_size
  ## BEGIN YOUR CODE HERE
  pass
  ## END YOUR CODE HERE

  return vocab_size

train_vocab_size = get_vocab_size(train_wikitext_data.values)
print(train_vocab_size)

985


In [None]:
#@title Run this cell to test your get_vocab_size implementation

if train_vocab_size == VOCAB_LENGTH:
  print("Your implementation is correct!")
else:
  print("ERROR: Your vocab size,", train_vocab_size, ", does not match the actual vocab size,", VOCAB_LENGTH)

Your implementation is correct!


Now that we've finished exploring our data (one of the most important parts of setting up a good machine learning model), it's time to begin designing our model!

## Modeling our data with a Sequential Neural Network

Earlier in this course, you learned how **recurrent neural networks** (RNNs) can be used as language models. Remember, what these networks do is take in input that occurs over timesteps (e.g. a series of words in a sentence), and make a prediction for the next timestep.

<img src="http://zouds.com/public/inspirit/rnn.png" width="600"/>

Given a trained RNN, we can ask the model about a phrase like "the students opened their" and the model will output probabilities of possible next words. In this case, the model places high probabilities on both "books" and "laptops" as possible next words.

The above illustration shows how a language model may be used to predict the next word at inference time. If you recall, during training, predictions of the language model are made at every timestep! The idea that an RNN (or other language models) can make predictions at any timestep will be important to our approach for measuring memorization.

**Question**: What types of questions could we ask our trained model to get it to return sensitive information, such as passwords, PIN numbers, etc.? (Hint: think of phrases containing sensitive information which our model might have seen at training time!)

**Optional**: For a review of RNNs, check out these [lecture slides from Stanford](https://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes05-LM_RNN.pdf).

**Optional**: For another look into RNNs, check out [this Medium article](https://towardsdatascience.com/recurrent-neural-networks-d4642c9bc7ce).

### Creating our model

Next, let's create an LSTM model! (As you might recall, this is a type of RNN.)

**Exercise**: Fill in the missing code blocks in the cell below to complete the LSTM model design.
* First, we need to turn our inputs into embedding vectors (for easier processing). In the first missing code block, create an `Embedding` layer that takes as input the size of our vocabulary (`VOCAB_LENGTH`) and outputs 50-dimensional embeddings. Make sure to specify that the input length (`input_length`) is `SEQUENCE_LENGTH - 1`. Check [this Keras documentation](https://keras.io/api/layers/core_layers/embedding/) for details on the `Embedding` layer.
* Next, we need to pass our embeddings into our LSTM. In the second missing code block, create an `LSTM` layer with 100 units which returns the last output (`return_sequences = True`). Check [this Keras documentation](https://keras.io/api/layers/recurrent_layers/lstm/) for details on the `LSTM` layer.
* Finally, we want to run the output of our LSTM through a fully connected neural network. In the third missing code block, create a `Dense` layer with `VOCAB_LENGTH` units. Check [this Keras documentation](https://keras.io/api/layers/core_layers/dense/) for details on the `Dense` layer.

In [None]:
def get_logits(input_layer):
  embedding = tf.keras.layers.Embedding(VOCAB_LENGTH,50, input_length=SEQUENCE_LENGTH - 1)
  token_encodings = embedding(input_layer)

  lstm = tf.keras.layers.LSTM(100, return_sequences=True)
  lstm_encodings = lstm(token_encodings)

  dense = tf.keras.layers.Dense(VOCAB_LENGTH)
  logits = dense(lstm_encodings)
  
  return logits

**IMPORTANT**

When we train our model, it will take about 10-15 minutes to train (and that's when training on a GPU!) Let's make sure our model is set up correctly right now so we don't waste time re-training it.

When you have a possible solution, run the below function to output your model's summary. Check your model's summary against the expected summary below and make sure that your summary matches!

In [None]:
#@title Run this cell to output a summary of your model

def print_keras_summary(get_logits_fn):
    """Wraps forward pass with Keras model just to print a summary.
    
    We're not going to use this Keras model for training.
    """
    input_layer = tf.keras.Input(shape=[SEQUENCE_LENGTH - 1], dtype="int64", name="Input")
    logits = get_logits_fn(input_layer)
    model = tf.keras.Model(inputs=input_layer, outputs=logits)
    optimizer = tf.keras.optimizers.SGD(LEARNING_RATE)
    
    loss = tf.keras.losses.SparseCategoricalCrossentropy(
        from_logits=True, reduction='none')
    
    model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

    print(model.summary())
    
print_keras_summary(get_logits)

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Input (InputLayer)           [(None, 19)]              0         
_________________________________________________________________
embedding (Embedding)        (None, 19, 50)            49250     
_________________________________________________________________
lstm (LSTM)                  (None, 19, 100)           60400     
_________________________________________________________________
dense (Dense)                (None, 19, 985)           99485     
Total params: 209,135
Trainable params: 209,135
Non-trainable params: 0
_________________________________________________________________
None


####Check your model summary against this expected model summary below

**Note**: It's okay if the names of the layers are slightly different! But the types of the layers, output shapes, and numbers of parameters should all be the same!

```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Input (InputLayer)           [(None, 19)]              0         
_________________________________________________________________
embedding_2 (Embedding)      (None, 19, 50)            49250     
_________________________________________________________________
LSTM (LSTM)                  (None, 19, 100)           60400     
_________________________________________________________________
Dense (Dense)                (None, 19, 985)           99485     
=================================================================
Total params: 209,135
Trainable params: 209,135
Non-trainable params: 0
_________________________________________________________________
```

### Perplexity: measuring confidence

Earlier in the course, you played around with the **accuracy** metric, which we define as how often the language model is able to predict the next token. Now, there is something subtle here that we should tease out.

Accuracy *is* how often the model is able to predict tokens in practice.

Accuracy is *not* how confident the model is in its ability to predict tokens.

We'll define a new metric, called **perplexity**, which measures how *un*-confident the model is in its ability. (**Note**: Perplexity measures *un*-confidence, not confidence.)

**Question**: What does it mean for a model to have high perplexity? Similarly, what does it mean for a model to have low perplexity?

> Why is this distinction important? For example, consider the Bitcoin bubble in 2017. More often than not, the price of Bitcoin would rise. Buyers bought into Bitcoin because, more likely than not, the price of Bitcoin would rise. However, they weren't sure *why* it was rising. Consequently, when the price of Bitcoin plummeted, many Bitcoin investors lost thousands, even millions.

> These Bitcoin buyers had high *accuracy*. But they also had high *perplexity*.

(**NOTE**: As a reminder, *high perplexity is bad*.)

Mathematically, we can formalize perplexity as the inverse probability of the test set, normalized by the number of words (N) in the test set:

>$Perplexity = \sqrt[N]{\frac{1}{P(w_1w_2...w_N)}}$

Below, we've implemented for you a `tf.metric` for calculating perplexity for each example the model sees and then averaging them.

**Note**: It isn't necessary to understand this implementation for what follows–what's important is that perplexity is a measure of how "confused" the model is to see an example. If a model has been trained well, it should have a low perplexity on training data, and this should generalize to test data.


In [None]:
def perplexity(
    labels,  # A [batch_size, SEQUENCE_LENGTH - 1] tensor containing examples from train_y.
    logits,  # A [batch_size, SEQUENCE_LENGTH - 1, VOCAB_LENGTH] tensor containing logits.
):
  # Shape: [batch_size, SEQUENCE_LENGTH - 1].
  all_losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)

  # Shape: [batch_size]. Each of these is the "l" in the figure above.
  per_example_losses = tf.reduce_mean(all_losses, axis=-1)

  # Shape: [batch_size]. Use the natural exponent since the natural log is used
  # to calculate loss.
  per_example_perplexities = tf.math.exp(per_example_losses)

  # Calculate the mean of perplexities for each example.
  return tf.metrics.mean(per_example_perplexities, name='perplexity')

## Training our model

Now that we've finished creating our language model, let's train it! You'll notice that we aren't using `model.compile()` or `model.fit()` or any of the other functions we're used to. That's because we'll re-use this trained model later when we study **differential privacy**, which is such a new concept that Keras doesn't even support it yet! That's what happens when you're on the cutting edge!

We put together a *custom training function* for you. Run the cell below to train your model. You don't need to know how the training function works but you can read through the code if you have a bit of time.

**Note**: It will take your model around 10-15 minutes to train (and that's on GPU!).

**Optional**: If you're sitting around waiting for your model to train, check out [this blog post](https://blog.acolyer.org/2019/09/23/the-secret-sharer/) summarizing the Secret Sharer paper (the paper which originally found this vulnerability).

**Optional**: If you have a *lot* of time and are feeling up for it, you can even check out [the original paper](https://arxiv.org/abs/1802.08232). It's a dense read though!

In [None]:
#@title Run this cell to train your model

train_x = train_wikitext_data.values[:, :-1]
train_y = train_wikitext_data.values[:, 1:]

val_x = val_wikitext_data.values[:, :-1]
val_y = val_wikitext_data.values[:, 1:]

test_x = test_wikitext_data.values[:, :-1]
test_y = test_wikitext_data.values[:, 1:]

def accuracy(
    labels,  # A [batch_size, SEQUENCE_LENGTH - 1] tensor containing examples from train_y.
    logits,  # A [batch_size, SEQUENCE_LENGTH - 1, VOCAB_LENGTH] tensor containing logits.
): 
  # Shape: [batch_size, SEQUENCE_LENGTH - 1] tensor containing the ID of the
  # predicted vocabulary item for each timestep. This is the ID with the maximum
  # logit score out of all the VOCAB_LENGTH logit scores for each timestep.
  predictions = tf.argmax(logits, axis=2)

  return tf.metrics.accuracy(labels=labels, predictions=predictions)

def model_fn(features, labels, mode):
    logits = get_logits(features)

    # We need loss for both train and eval.
    if mode == tf.estimator.ModeKeys.TRAIN or mode == tf.estimator.ModeKeys.EVAL:
        # Calculate loss for each example, before calculating the
        # overall scalar loss by averaging the loss for each example.
        # Shape: [BATCH_SIZE].
        per_example_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits), axis=-1)

        # Shape: []. (Scalar)
        scalar_loss = tf.reduce_mean(per_example_loss)
    
    # If our model is called for training, return a loss to optimize.
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.AdamOptimizer(LEARNING_RATE)
    
        global_step = tf.train.get_global_step()
        train_op = optimizer.minimize(loss=scalar_loss, global_step=global_step)
        
        return tf.estimator.EstimatorSpec(mode=mode,
                                          loss=scalar_loss,
                                          train_op=train_op)
    # If our model is called for eval, calculate metrics.
    elif mode == tf.estimator.ModeKeys.EVAL:        
        eval_metrics = {
            'accuracy': accuracy(labels=labels, logits=logits),
            'perplexity': perplexity(labels=labels, logits=logits)
        }
        return tf.estimator.EstimatorSpec(mode=mode,
                                          loss=scalar_loss,
                                          eval_metric_ops=eval_metrics)
    # If our model is called for prediction, just return logits.
    elif  mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode,
                                          predictions=logits)

config = tf.estimator.RunConfig(save_summary_steps=1000, tf_random_seed=42, log_step_count_steps=100)
time_string = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
log_dir = 'logs/' + time_string
language_model = tf.estimator.Estimator(model_fn=model_fn,
                                        model_dir=log_dir,
                                        config=config)

# Ensure all batches have size BATCH_SIZE, even the last batch.
train_end = len(train_x) - len(train_x) % BATCH_SIZE
val_end = len(val_x) - len(val_y) % BATCH_SIZE

train_input_fn = tf.estimator.inputs.numpy_input_fn(
  x=train_x[:train_end],
  y=train_y[:train_end],
  batch_size=BATCH_SIZE,
  queue_capacity=10000,
  shuffle=True)

eval_input_fn = tf.estimator.inputs.numpy_input_fn(
  x=val_x[:val_end],
  y=val_y[:val_end],
  batch_size=BATCH_SIZE,
  queue_capacity=10000,
  shuffle=False)

# Training loop. This will print a lot of stuff, don't be alarmed!
steps_per_epoch = len(train_x) // BATCH_SIZE
print('Running %d steps per epoch...' % steps_per_epoch)
for epoch in range(1, NUM_EPOCHS + 1):
  print('Epoch', epoch)

  # Training phase.
  start_time = time.time()
  # Train the model for one epoch.
  language_model.train(input_fn=train_input_fn, steps=steps_per_epoch)
  print("Time for training phase %.3f" % (time.time() - start_time))

  # Eval every EVAL_FREQUENCY epochs.
  if epoch % EVAL_FREQUENCY == 0:
    # Eval phase.
    start_time = time.time()
    name_input_fn = [('Train', train_input_fn), ('Eval', eval_input_fn)]
    
    # Evaluate on both train and val data.
    for name, input_fn in name_input_fn:
      # Evaluate the model and print results. 
      # These results will show up in Tensorboard as "eval_Train" and "eval_Eval"
      eval_results = language_model.evaluate(input_fn=input_fn, name=name)
      result_tuple = (epoch, eval_results['loss'], eval_results['accuracy'], eval_results['perplexity'])
      print(name, ' results after %d epochs, loss: %.4f - accuracy: %.4f - perplexity: %.4f' % result_tuple)
    
    print("Time for evaluation phase %.3f" % (time.time() - start_time))

INFO:tensorflow:Using config: {'_model_dir': 'logs/2021_07_31_17_03_25', '_tf_random_seed': 42, '_save_summary_steps': 1000, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Running 2025 steps per epoch...
Epoch 1
Instructions for updating:
Use Variable.read_value. Variables in 2.X are

## Attacking the Model

The government finished training the *LawBot* model and returned *LawBot* back to Snapple. A couple weeks pass and then one day you log on to your top secret government website and see that your files have been messed with! 😮 You investigate and manage to trace the IP address of the intruder back to a rogue Snapple employee. How did the Snapple engineer get in?!

Hm... you rack your brain and then remember: the government trained Snapple's *LawBot* data on all their government files! But how could Snapple have recovered your secret PIN from that? After all, a PIN is a whole 4-digits long (the government is super secure after all). It seems difficult; while we can ask *LawBot* for the probabilities of possible tokens after "my pin number is", this might only give us one or two digits of the PIN.

**Question**: It's a tricky question but how could the rogue Snapple employee have recovered all four digits of your PIN? (Hint: what could the perplexity of the model tell us?)

To break in, the Snapple employee needed all four digits. Hmm... how likely/unlikely did our model think sequences of subwords are? You might recall that this was the purpose of perplexity, a way to measure how "confused" the model is to see a particular input.

As it turns out, the simplest way to attack our model is to brute force all possible secrets (all 10000 4-digit PINs) and *rank them by increasing perplexity*. The rogue Snapple employee could then take the top few PINs (the PINs with lowest perplexity) and try them out!

In [None]:
#@title Run this cell to use our earlier perplexity function to calculate the perplexity of all possible PINs and rank them in increasing order of perplexity

def brute_force_all_pins():
    
    predicted_perplexity_batches = []
    pin_strings = []
    
    for first_digit in range(10):
        # Print progress, because this will take a couple minutes.
        print(first_digit, 'out of 10 digits done!')
        
        # Create a batch of encoded pin numbers to make predictions on. We'll
        # fill this up by iterating through the other digits.
        pins_x = np.zeros((1000, SEQUENCE_LENGTH - 1), dtype=np.int64)
        pins_y = np.zeros((1000, SEQUENCE_LENGTH - 1), dtype=np.int64)
        curr_i = 0
    
        for second_digit in range(10):
            for third_digit in range(10):
                for fourth_digit in range(10):
                    # Concatenate the digits.
                    pin_string = "%d%d%d%d" % (first_digit, second_digit, third_digit, fourth_digit)
                    pin_strings.append(pin_string)
                    
                    # Get encoded sequence for "my pin number is ____".
                    phrase = 'my pin number is ' + pin_string
                    encoded_phrase = text_encoder.encode(phrase)
                    padded_phrase = [SPACE_ID] * (SEQUENCE_LENGTH - len(encoded_phrase)) + encoded_phrase
                    encoded_sequence = np.array([padded_phrase])

                    # Shift to get data with sequence length 19, as we did before.
                    curr_x = encoded_sequence[:, :-1]
                    curr_y = encoded_sequence[:, 1:]

                    assert(curr_x.shape == (1, SEQUENCE_LENGTH - 1))
                    assert(curr_y.shape == (1, SEQUENCE_LENGTH - 1))

                    pins_x[curr_i] = curr_x
                    pins_y[curr_i] = curr_y
                    curr_i += 1
        
        # Use our model to predict logits for the input.
        predicted_logits = np.array(list(language_model.predict(
            tf.estimator.inputs.numpy_input_fn(x=pins_x, batch_size=200, shuffle=False))))

        print(predicted_logits.shape)
        assert(predicted_logits.shape == (1000, 19, 985))

        with tf.Session() as sess:
            # Calculate per-example perplexities, similarly to before.
            all_losses = tf.nn.sparse_softmax_cross_entropy_with_logits(
                labels=pins_y, 
                logits=predicted_logits)
            per_example_losses = tf.reduce_mean(all_losses, axis=-1)
            per_example_perplexities = tf.math.exp(per_example_losses)

            per_example_perplexities = sess.run(per_example_perplexities)
            predicted_perplexity_batches.append(per_example_perplexities)

    per_example_perplexities = np.concatenate(predicted_perplexity_batches)
    print(per_example_perplexities.shape)
    assert(per_example_perplexities.shape == (10000,))

    # Create a dictionary mapping from PIN strings to perplexities.
    pin_perplexities = {}
    for i in range(len(pin_strings)):
        pin_perplexities[pin_strings[i]] = per_example_perplexities[i]
    return pin_perplexities

pin_perplexities = brute_force_all_pins()

0 out of 10 digits done!
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from logs/2021_07_31_17_03_25/model.ckpt-8100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
(1000, 19, 985)
1 out of 10 digits done!
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from logs/2021_07_31_17_03_25/model.ckpt-8100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
(1000, 19, 985)
2 out of 10 digits done!
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from logs/2021_07_31_17_03_25/model.ckpt-8100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
(1000, 19, 985)
3 out of 10 digits done!
INFO:tensorflow:Calling 

Now that we've calculated the perplexity of all possible 4-digit PINs, let's see what the perplexity of our PIN is! Try other PIN numbers as well!

In [None]:
print(pin_perplexities[PIN])

2.4102087


**Question** How does the perplexity of your PIN number compare to the perplexities of other PIN numbers?

Now let's print out the top ten PINs (the ten PINs with the lowest perplexity).

In [None]:
#@title Run this cell to print out the top ten PINs

def print_top_pins(pin_perplexities, k=10):
    pin_items = pin_perplexities.items()
    pin_items = sorted(pin_items, key=lambda x: x[1], reverse=False)[:k]

    for pin_string, perplexity in pin_items:
      print('%s: %.3f' % (pin_string, perplexity))

print_top_pins(pin_perplexities)

7248: 2.410
7244: 2.636
7348: 2.656
7548: 2.660
7748: 2.703
7249: 2.703
7245: 2.710
7648: 2.712
7288: 2.715
7258: 2.719


**Question**: Did your PIN show up? If not, how close in perplexity was it to PINs that did show up?

Now, let's see where our PIN ranks among other pins, when sorted by perplexity. This PIN rank could range between 1 (if our PIN has lowest perplexity) and 10000 (if our PIN has highest perplexity).

In [None]:
#@title Run this cell to see where your PIN ranks among other pins

def get_pin_rank(pin_perplexities, pin):
    pin_items = pin_perplexities.items()
    # A list of pairs with pin_strings and perplexities.
    pin_items = sorted(pin_items, key=lambda x: x[1], reverse=False)
    
    for i in range(len(pin_items)):
        if pin_items[i][0] == pin:
            return i + 1

get_pin_rank(pin_perplexities, PIN)

1

**Question**: How did your PIN fare? Did it have a high rank? A low rank?

**Question**: We didn't train our model for too long (surprising, huh). If we trained our model for longer, how might you expect our results to change? (Feel free to experiment with this on your own!)

### Challenge Exercise: Measuring memorization

Looking at the PIN rank above, it might be hard to get a sense of how much **memorization** ocurred. The rank above is out of 10000 possible PINs, so it would be useful to come up with measures of how much memorization occurs that are normalized by the total number of PINs. This would also make it possible to compare memorization across different types of secrets (e.g. alphanumeric passwords).

One such measure we might call "rank ratio". This is defined as follows:

> $rank\_ratio = \frac{num\_possible\_secrets - rank}{num\_possible\_secrets}$

This metric is within the set [0, 1), and higher numbers correspond to more memorization.

**Exercise**: Fill in the function below to implement the rank ratio metric.

**Hint**: The number of possible secrets is represented by `len(pin_perplexities)`. You'll also want to use the `get_pin_rank()` function.

In [None]:
def get_rank_ratio(pin_perplexities, pin):
    rank_ratio = (len(pin_perplexities)-get_pin_rank(pin_perplexities, pin))/len(pin_perplexities)
    pass
    

    return rank_ratio

print(get_rank_ratio(pin_perplexities, PIN))

0.9999


The authors of the original Secret Sharer paper also propose their own metric for measuring the amount of memorization, called **exposure**.

> $exposure = log_2 (num\_possible\_secrets) - log_2 (rank)$

This number can be within [0, log(num_possible_secrets)], and higher numbers show more memorization.

**Exercise**: Fill in the function below to implement the exposure metric.

In [None]:
def get_exposure(pin_perplexities, pin):
    exposure = np.log2(len(pin_perplexities)) - np.log2(get_pin_rank(pin_perplexities, pin))
    ## BEGIN YOUR CODE HERE
    pass
    ## END YOUR CODE HERE

    return exposure

print(get_exposure(pin_perplexities, PIN))

13.287712379549449


We can convert exposure to a percentage out of the maximum possible exposure. The maximum possible exposure is simply:

> $maximum\_exposure = log_2 (num\_possible\_secrets)$

**Exercise**: Calculate exposure as a percentage out of maximum possible exposure.

**Note**: You should make use of the function `get_exposure()` which you have already written!

In [None]:
percent = get_exposure(pin_perplexities, PIN)/np.log2(len(pin_perplexities))

print(percent)

1.0


## Conclusion

So how did your secret PIN fare to the brute force attack? Those rogue Snapple employees sure are sneaky.... To summarize, in this notebook, you explored how machine learning models can unintentionally memorize personal data. You reviewed the process of creating a sequential language model while preparing to attack and then got the opportunity to attack your own model! You learned about perplexity as a metric of model confidence and explored how an attacker could use perplexity to deduce your secret PIN. So what can we do? In our next notebook, we'll discuss strategies for mitigating this unintentional memorization of our personal data. See you then!

*Notebook by Karan Singal and Ricky Grannis-Vu*

