#Setup

In [2]:
try:
  # mount your google drive to get permanent storage for your results
  from google.colab import drive
  drive.mount('/content/drive')

  RESULTS_PATH = "/content/drive/MyDrive/infoseclab_ML/results"
except ModuleNotFoundError:
  RESULTS_PATH = "results"

!mkdir -p {RESULTS_PATH}

Mounted at /content/drive


In [3]:
import sys

# Download the lab files
![ ! -d 'infoseclab' ] && git clone https://github.com/ethz-privsec/infoseclab.git
%cd infoseclab
!git pull https://github.com/ethz-privsec/infoseclab.git
%cd ..
if "infoseclab" not in sys.path:
  sys.path.append("infoseclab")

Cloning into 'infoseclab'...
remote: Enumerating objects: 321, done.[K
remote: Counting objects: 100% (40/40), done.[K
remote: Compressing objects: 100% (29/29), done.[K
remote: Total 321 (delta 13), reused 31 (delta 10), pack-reused 281[K
Receiving objects: 100% (321/321), 64.87 MiB | 27.86 MiB/s, done.
Resolving deltas: 100% (139/139), done.
/content/infoseclab
From https://github.com/ethz-privsec/infoseclab
 * branch            HEAD       -> FETCH_HEAD
Already up to date.
/content


# Imports

In [4]:
import torch
import torch.nn.functional as F

import infoseclab
from infoseclab import extraction, Vocab, PREFIX

from zipfile import ZipFile
import numpy as np
import os
import json

device = "cuda"

# we won't need gradients here so let's disable them to make things faster
torch.set_grad_enabled(False)

# utilities for loading & saving results
def read_results():
  with open(os.path.join(RESULTS_PATH, "extraction.json"), "r") as f:
    res = json.load(f)
  return res


def write_results(res):
  assert len(res) == 4
  assert type(res) == dict
  with open(os.path.join(RESULTS_PATH, "extraction.json"), "w") as f:
    res = json.dump(res, f)


def print_results(res):
  for key, value in res.items():
    print(f"{key.replace('_', ' ')}: {repr(value)}")

#Create file to save results

In [5]:
try:
  res = read_results()
  assert len(res) == 4
  assert type(res) == dict
except FileNotFoundError:
  res = {
      "main_character": None,
      "greedy_guess": None,
      "greedy_numeric_guess": None,
      "exact_guess": None
  }
  write_results(res)

print_results(res)

main character: 'Sherlock Holmes'
greedy guess: "Florian's password is 3\n an"
greedy numeric guess: "Florian's password is 31111"
exact guess: '35192'


#1.&nbsp;Freeform generation

We will be working with a simple *character-level* language model.

This is a model that takes as input a sentence (e.g., "my name is ") and outputs a distribution over the next character in the sentence. We can then generate a character (e.g., "F") by sampling from this distribution. By applying the model recursively to its own output we can generate text character by character: "my name is Florian".

Technically, the langauge model doesn't operate on `characters` but on `tokens` (numbers). The characters in the model's "vocabulary" are sorted, and can thus be referenced by an integer. The i-th value in the langauge model's output corresponds to the probability assigned to the i-th character in the vocabulary.

You can find the full vocabulary (i.e., all characters that the language model can produce) in `infoseclab.extraction.Vocab`.
This class has two utility dictionaries, `char_to_ix` and `ix_to_char` for converting from a character to its index (its token) and vice-versa:

```
Vocab.char_to_ix['a'] -> 54
Vocab.ix_to_char[54] -> 'a'
```

In [7]:
# load a simple character-level language model
lm = extraction.load_lm("infoseclab/data/secret_model.pth", device=device)

In [14]:
# example of how to generate text from the language model
extraction.generate(lm, "Sherlock H", length=100)

'Sherlock Holmes\n High Widminson Monday papers always this desplehog\n lie and so much for us. The Sart African '

**This language model was trained on a collection of texts from a famous British book series. 
Your first goal is to figure out which books.**

**Your guess should be in the form `"Firstname Lastname"` of the book series' main character.
For example, if you guessed that the book series is Harry Potter, then your guess would be `"Harry Potter"`.**

Note: the code immediately below doesn't check for correctness! It just checks that you've made a guess.

In [None]:
guess = "Sherlock Holmes"
res = read_results()
res['main_character'] = guess
write_results(res)
print_results(res)

main character: 'Sherlock Holmes'
greedy guess: "Florian's password is 3\n an"
greedy numeric guess: "Florian's password is 39731"
exact guess: '35192'


#2.&nbsp;Secret extraction

Unfortunately, the training data from this language model also contained the sentence `"Florian's password is XXXXX"`. (the real password is blanked out, your goal is to recover it!)

The model might have *memorized* the correct password, and your goal will be to recover it.

For this, you know the *prefix*: `"Florian's password is "`
(you can find this stored under `infoseclab.extraction.PREFIX`).

You also know that Florian's password is exactly 5 characters long (so that it it easier to memorize, *obviously*).

##2.1&nbsp; Greedy secret extraction

You will first attempt to extract the secret password *greedily*, simply by sampling the **5 most likely characters**, one-by-one, from the language model, starting from the known `PREFIX`.

You can use the `extraction.generate` method as inspiration for this.

*Note that `extraction.generate` does <b>not</b> sample greedily from the model. Rather, it samples a character at random according to the probability distribution predicted by the model.*

In [15]:
def generate_greedy(lm, prompt, length=5):
    generated_text = ""
    hidden_state = None

    # tokenize the prompt
    input_seq = [Vocab.char_to_ix[ch] for ch in prompt]
    # tensor of dimension (N,) where N is the number of characters in the prompt
    input_seq = torch.tensor(input_seq).to(lm.device)

    for i in range(length):
        # forward pass through the model
        # output is a tensor of dimension (N, vocab_size)
        output, hidden_state = lm.forward(input_seq, hidden_state)

        # get a distribution over the next character
        # probas is of dimension (vocab_size,)
        probas = F.softmax(output[-1], dim=0)
        array = probas.cpu().detach().numpy()

        # take index of char with highest probability
        index = np.argmax(array)
        generated_text += Vocab.ix_to_char[index]

        # to continue the generation, we simply evaluate
        # the model on the last predicted character,
        # and the current state
        input_seq = torch.tensor([index.item()]).to(lm.device)

    return generated_text

guess_greedy = generate_greedy(lm, PREFIX, length=5)
print("greedy:", PREFIX + repr(guess_greedy))

res = read_results()
res['greedy_guess'] = guess_greedy
write_results(res)
print_results(res)

greedy: Florian's password is '3\n an'
main character: 'Sherlock Holmes'
greedy guess: '3\n an'
greedy numeric guess: "Florian's password is 31111"
exact guess: '35192'


##2.2&nbsp;Greedy numeric secret extraction

Your greedy extraction likely generated some giberish! (but hey, a password might genuinely look like that).

You are now given some extra information: **Florian's password only contains numbers!** (he's not very good at security).

Modify your greedy sampling mechanism to repeatedly sample the 5 most likely *numbers*, one-by-one, starting from the known `PREFIX`.

In [None]:
def generate_greedy_numeric(lm, prompt, length=5):
    generated_text = ""
    hidden_state = None

    # tokenize the prompt
    input_seq = [Vocab.char_to_ix[ch] for ch in prompt]
    # tensor of dimension (N,) where N is the number of characters in the prompt
    input_seq = torch.tensor(input_seq).to(lm.device)

    for i in range(length):
        # forward pass through the model
        # output is a tensor of dimension (N, vocab_size)
        output, hidden_state = lm.forward(input_seq, hidden_state)

        # get a distribution over the next character
        # probas is of dimension (vocab_size,)
        probas = F.softmax(output[-1], dim=0)

        first = True

        for j in range(12, 22):
          if first:
            max = probas[j]
            first = False

          if max <= probas[j]:
            max = probas[j]
            max_index = j

        generated_text += Vocab.ix_to_char[max_index]

        # to continue the generation, we simply evaluate
        # the model on the last predicted character,
        # and the current state
        input_seq = torch.tensor([max_index]).to(lm.device)

    return generated_text

guess_greedy_numeric = generate_greedy_numeric(lm, PREFIX, length=5)
print("greedy (numeric):", PREFIX + repr(guess_greedy_numeric))

res = read_results()
res['greedy_numeric_guess'] = guess_greedy_numeric
write_results(res)
print_results(res)

greedy (numeric): Florian's password is '39731'
main character: 'Sherlock Holmes'
greedy guess: '3\n an'
greedy numeric guess: '39731'
exact guess: '35192'


##2.3&nbsp;Exact numeric secret extraction

Spoiler alert: the secret you found using greedy sampling is *not* Florian's password.

As it turns out, sampling greedily from the model is not guaranteed to find the *sequence* of characters that is most likely according to the model's probability distribution.

To illustrate, below you can compare the loss from your greedy guess, and a different (also incorrect) guess.</br>
The guess `"36175"` has lower loss!

In [None]:
print(guess_greedy_numeric, extraction.get_loss(lm, PREFIX + guess_greedy_numeric))
print("36175", extraction.get_loss(lm, PREFIX + "36175"))

39731 tensor(0.9791, device='cuda:0')
36175 tensor(0.8980, device='cuda:0')


Now for the final part, find the 5-digit secret that actually *minimizes* the model's loss, when prompted with the `PREFIX`.

In [None]:
import itertools

def generate_exact(lm, prompt, length=5):
  guess = ""

  first = True
  
  for g in itertools.product(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], repeat = 5):
    guess = "".join(g)
    loss = extraction.get_loss(lm, PREFIX + guess) 
    
    if first:
      min = loss
      first = False

    if loss <= min:
      min = loss
      guess_exact = guess

  return guess_exact

guess_exact = generate_exact(lm, PREFIX, length=5)
print("\nexact:", PREFIX + repr(guess_exact))

res = read_results()
res['exact_guess'] = guess_exact
write_results(res)
print_results(res)


exact: Florian's password is '35192'
main character: 'Sherlock Holmes'
greedy guess: '3\n an'
greedy numeric guess: '39731'
exact guess: '35192'


# Create submission file (**upload `results.zip` to moodle**) 


In [None]:
!zip -j -r "{RESULTS_PATH}/results.zip" {RESULTS_PATH} --exclude "*x_adv_untargeted.npy"

updating: extraction.json (deflated 25%)


In [None]:
with ZipFile(f"{RESULTS_PATH}/results.zip", 'r') as zip:
    res = json.load(zip.open("extraction.json"))
    print_results(res)

main character: 'Sherlock Holmes'
greedy guess: '3\n an'
greedy numeric guess: '39731'
exact guess: '35192'
