# WhichLee V2

This is a sequel to last year's WhichLee, where this time the target output is 128 binary labels.

Our starting point, for chuckles and giggles, is a flag image from [a writeup from last year](https://gist.github.com/duckness/39f8feab4cb8ef0db075f30a29547827#file-whichlee-md):
![lastyear](whichlee_lastyear.png)

Chucking this whole image (LHL plus buttons and text) into the website, we get:

```Your hash is 14faea6f19ff82c31694c2e7d1fa1b17, you are not the right LEE!```

A quick view source suggests that the desired hash is `14caca6f19fe8281d6d6eae7d1f81b11`, so we're actually not that far off! First let's make sure we can reproduce the same hash locally.

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from PIL import Image

import sys

#path = sys.argv[1]
path = 'whichlee_lastyear.png'
N_BITS= 128
IMG_SZ = 32

# stolen from https://towardsdatascience.com/implementing-yann-lecuns-lenet-5-in-pytorch-5e05a0911320
class LeeNet5(nn.Module):
    def __init__(self, n_bits):
        super(LeeNet5, self).__init__()
        self.n_bits = n_bits
        self.feature_extractor = nn.Sequential(            
            nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5, stride=1),
            nn.Tanh(),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1),
            nn.Tanh(),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(in_channels=16, out_channels=n_bits, kernel_size=5, stride=1),
            nn.Tanh()
        )

        self.classifier = nn.Sequential(
            nn.Linear(in_features=n_bits, out_features=n_bits),
            nn.Tanh(),
            nn.Linear(in_features=n_bits, out_features=n_bits),
        )


    def forward(self, x):
        nBatch = x.shape[0]
        x = self.feature_extractor(x)
        x = x.reshape((nBatch,self.n_bits))
        logits = self.classifier(x)
        return logits


def get_img_tensor(img_path):
    transform = transforms.Compose([
        transforms.PILToTensor(),
        transforms.Resize((IMG_SZ,IMG_SZ)),
        transforms.ConvertImageDtype(torch.float)
    ])
    img = Image.open(img_path)
    img = img.convert('RGB')
    img_tensor = transform(img).reshape((1,3,IMG_SZ,IMG_SZ))
    return img_tensor

def hash(model, img_tensor, hashing_matrix):
    output = model(img_tensor)
    y = torch.matmul(output, hashing_matrix).flatten()
    Hx = nn.functional.threshold(y, 0, 0)
    Hx = nn.functional.threshold(-Hx, -0.000000000000001, 1)
    Hx = Hx.type(torch.IntTensor)
    Hx = Hx.tolist()
    return hex(int("".join([str(x) for x in Hx]),2))[2:]


model = LeeNet5(N_BITS)
model.load_state_dict(torch.load("model.pt"))
model.eval()
hashing_matrix = torch.load("hashing_matrix.pt")

img_tensor = get_img_tensor(path)
print(hash(model, img_tensor, hashing_matrix))

14faea6f19ff82c31694c2e7d1fa1b17


Yup, it's the same hash, so let's analyse the neural net quickly. Most of it is pretty standard, but at the end the output of the network is multiplied by a hashing matrix and flattened to a vector of length 128, then this is finally binarily classified depending on its sign. So the output is 128 binary labels, which can be printed as a hash.

The solution here then, is to just tweak [4yn's classic solution to last year's WhichLee](https://github.com/4yn/slashbadctf/blob/master/sgctf21/which-lee/which-lee-solution.md). To summarise how it works, we basically start with an input and pass it through the model to infer its output. We evaluate the loss and backpropagate the gradients back through the network to get an input that more closely matches the desired output. This isn't an exact science of course, so we have to tweak various things like:
- the initial output: tried various images, this one just happened to work
- the loss metric: what worked at the end was an MSE Loss against a vector of 2s and -2s
- the learning rate: value of 100 from the original writeup seems to work.

Also, we may find a feasible input only to realise that it cannot be return-tripped from an image, so we will also output an image at every step just to check.

Otherwise, that's basically it. Rinse and repeat until we get a feasible output.

In [2]:
from torchvision.utils import save_image

orig_img_tensor = img_tensor
for i in range(9999):
    
    targetstr = '14caca6f19fe8281d6d6eae7d1f81b11'
    targetarr = [-2*(-1)**int(i) for i in f'{int(targetstr, 16):0128b}']
    target = torch.Tensor(targetarr)
    
    img_tensor.requires_grad = True
    img_tensor.grad = None
        
    save_image(img_tensor, 'whichlee_test.png')
    h = hash(model, get_img_tensor('whichlee_test.png'), hashing_matrix)
    if h == targetstr:
        print(f'Win on iteration {i}')
        break

    loss = nn.MSELoss()
    foo = torch.matmul(model(img_tensor), hashing_matrix).flatten()
    output = loss(foo, target)
    output.backward()

    img_tensor = img_tensor - img_tensor.grad * 100
    arr = img_tensor.detach().numpy()
    img_tensor = torch.Tensor(arr)

Win on iteration 79


And here's the image in all its glory!
![test](whichlee_test.png)

To complete the circular reference, here's the flag image, which can maybe be used as in input to WhichLee V3 next year:
![flag](whichlee_flag.png)