# Pregunta 1

A continuación se importan algunas librerías estándar de ``python`` que serán de utilidad. Luego se definen funciones auxiliares que serán usadas para manipular los textos planos y encriptados, así como la noción de distancia absoluta vista en clases (entre un string y la distribución de frecuencias de su alfabeto asociado):

In [96]:
# Standard library
import os
from math import ceil
from random import SystemRandom
from time import time

# Auxiliary functions

# Convert from plain text to corresponding positions in the alphabet
def text_to_numbers(text, characters):
    return [characters.index(c) for c in text]

# Convert from alphabet positions to the corresponding plain text
def numbers_to_text(numbers, characters):
    return ''.join([characters[n] for n in numbers])

# Encrypt plain text using RP
def encrypt(text, key, alphabet):
    numbered_text = text_to_numbers(text, alphabet)
    numbered_key = text_to_numbers(key, alphabet)
    encrypted = []
    for idx, n in enumerate(numbered_text):
        encrypted.append((n + numbered_key[idx % len(key)]) % len(alphabet))
    return numbers_to_text(encrypted, alphabet)

# Decrypt ciphertext using RP
def decrypt(cipher, key, alphabet):
    numbered_text = text_to_numbers(cipher, alphabet)
    numbered_key = text_to_numbers(key, alphabet)
    decrypted = []
    for idx, n in enumerate(numbered_text):
        decrypted.append((n - numbered_key[idx % len(key)]) % len(alphabet))
    return numbers_to_text(decrypted, alphabet)

# Calculate absolute distance between a string and the distribution of letters over an alphabet
def abs_distance(string, frequencies):
    return sum([abs(frequencies[c] - string.count(c) / len(string)) for c in frequencies])

La estrategia utilizada para romper el esquema Repeated Pad consiste en lo siguiente:

- Iterar sobre el tamaño *l* de la llave, partiendo desde el mínimo (1) hasta el máximo (50 veces menor que el largo del texto):
  - Dado un tamaño *l*, aplicar análisis de frecuencias entre el texto cifrado y la llave:
    - Para cada posición *i* de la llave se revisan todos los caracteres del texto cifrado que fueron encriptados mediante esa posición de la llave:
      - Se itera sobre todo el alfabeto, probando la operación de desencriptación con cada carácter como si fuera el que realmente está en esa posición de la llave.
      - Analizando los textos planos resultantes, se puede estimar el carácter de esa posición de la llave como el que minimizó la distancia entre el texto desencriptado y la distribución de frecuencias del alfabeto (es decir, se elige el carácter tal que el texto original sea lo más cercano posible a la distribución esperada).
    - Repitiendo este proceso por cada posición *i* se obtiene la llave probable de tamaño *l*.
  - Utilizar la llave encontrada para desencriptar el texto cifrado completo, y si su distancia a la distribución de frecuencias es la menor encontrada entonces se vuelve el mejor candidato hasta el momento.
- La llave que mejor se ajusta corresponde a el mejor candidato encontrado en las iteraciones realizadas anteriormente.

Esta estrategia se implementa mediante las siguientes funciones:

In [97]:
# Hacking

# Estimate the character in a specific key position using frequency analysis
def get_probable_char(text, frequencies, distance):
    alphabet = list(frequencies)
    best_char = ''
    smallest_distance = float('inf')
    for char in alphabet:
        current_numbers = [(n - alphabet.index(char)) % len(alphabet) for n in text_to_numbers(text, alphabet)]
        current_text = numbers_to_text(current_numbers, alphabet)
        current_distance = distance(current_text, frequencies)
        if current_distance < smallest_distance:
            smallest_distance = current_distance
            best_char = char
    return best_char

# Estimate an encryption key using frequency analysis
def get_probable_key(distributions, frequencies, distance):
    key = ''
    for key_pos_dist in distributions.values():
        key += get_probable_char(key_pos_dist, frequencies, distance)
    return key

# Break Repeated Pad using frequency analysis
def break_rp(ciphertext, frequencies, distance):
    """
    Arguments:
        ciphertext: An arbitrary string representing the encrypted version of a plaintext.
        frequencies: A dictionary representing a character frequency over the alphabet.
        distance: A function indicating how distant is a string from following a character frequency.
    Returns:
        key: A guess of the key used to encrypt the ciphertext, assuming that the plaintext message was written in a language in which
        letters distribute according to frequencies.
    """
    # Get an estimated key for every possible key_size allowed, then select the one
    # that minimizes distance to the frequency distribution
    best_key = ''
    best_distance = float('inf')
    for key_size in range(1, (len(ciphertext) // 50) + 1):
        n_pads = ceil(len(ciphertext) / key_size)
        key_pos_distributions = {pos: '' for pos in range(key_size)}
        for pad in range(n_pads):
            for i in range(min(key_size, len(ciphertext) - pad * key_size)):
                key_pos_distributions[i] += ciphertext[pad * key_size + i]
        current_key = get_probable_key(key_pos_distributions, frequencies, distance)
        current_plain_text = decrypt(ciphertext, current_key, list(frequencies))
        current_distance = distance(current_plain_text, frequencies)
        if current_distance < best_distance:
            best_distance = current_distance
            best_key = current_key
    return best_key

Se aplicó testing básico usando las siguientes funciones:

In [98]:
# Testing

# Generate tests for a plain text with every possible key size
def generate_tests(text, alphabet, name):
    for size in range(1, (len(text) // 50) + 1):
        key = ''.join(SystemRandom().choice(alphabet) for _ in range(size))
        cipher = encrypt(text, key, alphabet)
        with open(f'tests/{name}_{size}.txt', 'w') as file:
            file.write(f'{cipher} {key}\n')

# Run a specific test
def run_test(path):
    correct = False
    start = time()
    print(f'Testing "{path}"...')
    with open(path, 'r') as file:
        test = file.readline().strip('\n').split(' ')
        cipher = test[0]
        key = test[1]
        result = break_rp(cipher, english_freq, abs_distance)
        if result == key:
            correct = True
    duration = time() - start
    if correct:
        output = f'"{path}": Test Succeeded in {duration}s\n'
    else:
        output = f'"{path}": Test Failed in {duration}s\n'
    return correct, output

# Run all tests inside a directory
def run_all_tests(dir_path, output_file):
    results = []
    with open(output_file, 'w') as out_file:
        for subdir, _, files in os.walk(dir_path):
            for f in files:
                result, output = run_test(os.path.join(subdir, f))
                results.append(result)
                out_file.write(output)
    print(f'Testing done: {sum(results)}/{len(results)} tests answered correctly')

A continuación se prueba el algoritmo con diferentes textos en inglés y llaves aleatorias de distinto tamaño, asumiendo como alfabeto a las 26 letras del idioma inglés:

In [99]:
# Test Repeated Pad breaking
if __name__ == '__main__':
    # Letter frequency for the english alphabet
    english_freq = {'a': 0.0817,
                    'b': 0.0129,
                    'c': 0.0276,
                    'd': 0.0425,
                    'e': 0.1288,
                    'f': 0.0223,
                    'g': 0.0202,
                    'h': 0.0609,
                    'i': 0.0697,
                    'j': 0.0015,
                    'k': 0.0077,
                    'l': 0.0403,
                    'm': 0.0241,
                    'n': 0.0675,
                    'o': 0.0751,
                    'p': 0.0193,
                    'q': 0.001,
                    'r': 0.0599,
                    's': 0.0633,
                    't': 0.0906,
                    'u': 0.0278,
                    'v': 0.0098,
                    'w': 0.0236,
                    'x': 0.0015,
                    'y': 0.0197,
                    'z': 0.0007}

    # Plain text examples
    crypto_text = 'incryptographytheonetimepadotpisanencryptiontechniquethatcannotbecrackedbutrequirestheuseofasingleusepresharedkeythatisnotsmallerthanthemessagebeingsentinthistechniqueaplaintextispairedwitharandomsecretkeyalsoreferredtoasaonetimepadtheneachbitorcharacteroftheplaintextisencryptedbycombiningitwiththecorrespondingbitorcharacterfromthepadusingmodularaddition'
    synopsis_text = 'goldrogerwasknownasthepiratekingthestrongestandmostinfamousbeingtohavesailedthegrandlinethecaptureandexecutionofrogerbytheworldgovernmentbroughtachangethroughouttheworldhislastwordsbeforehisdeathrevealedtheexistenceofthegreatesttreasureintheworldonepieceitwasthisrevelationthatbroughtaboutthegrandageofpiratesmenwhodreamedoffindingonepiecewhichpromisesanunlimitedamountofrichesandfameandquitepossiblythepinnacleofgloryandthetitleofthepiratekingentermonkeydluffyaseventeenyearoldboywhodefiesyourstandarddefinitionofapirateratherthanthepopularpersonaofawickedhardenedtoothlesspirateransackingvillagesforfunluffysreasonforbeingapirateisoneofpurewonderthethoughtofanexcitingadventurethatleadshimtointriguingpeopleandultimatelythepromisedtreasurefollowinginthefootstepsofhischildhoodheroluffyandhiscrewtravelacrossthegrandlineexperiencingcrazyadventuresunveilingdarkmysteriesandbattlingstrongenemiesallinordertoreachthemostcovetedofallfortunesonepiece'
    review_text = 'fifteenhoursintoeldenringidefeatedgodrickthefirstoffiveeldenlordsinthetimebetweenemergingintothelandsbetweenandstrikinghimdownihaddiscovereddecrepitruinsventuredintotwistingcavesstumbleduponenemyencampmentsandbattledtoothandnailagainstchallengingbossesfromsoftwaresgameshavealwaysmadeyoufeelsmallinmanywaystheytellyouthatyouareworthlessaplagueriddenratoraccursedundeadunfiteventobecinderstheyaskyoutonavigateunflinchingbrutalworldsandpityouagainstenemiesthatsystematicallydismantleyouregoeldenringmaintainsthenailbitingcombatandairofmysterythathasdistinguishedfromsoftwaressoulsbornegamesbutitiselevatedtonewheightsbythestudiosinterpretationofwhatanopenworldgamecanbehavingbroughtdowngodrickthebreadthoftheworldandthewayinwhichfromsoftwarehasapplieditssignaturestyletoanopenworldwasonfulldisplayreinforcinghowinsignificantireallywasanddrivinghomethemagnitudeofthetaskthatstillawaitedmeinagenrethathasbecomewroughtwithbloatedandoverdesignedgameseldenringisdefiantlycontrarianinalmosteverywayitscommitmenttodesignbysubtractionandtoplacingtheresponsibilityofchartingapaththroughitsworldentirelyontheplayermakesitstandheadandshouldersaboveotheropenworldtitleseldenringtakestheshardsofwhatcamebeforeandforgesthemintosomethingthatwillgodowninhistoryasoneofthealltimegreatsatriumphindesignandcreativityandanopenworldgamethatdistinguishesitselfforwhatitdoesnotdoasmuchaswhatitdoes'

    # Generate test files
    #generate_tests(review_text, list(english_freq), 'review')

    # Run tests
    run_all_tests('tests', 'output.txt')

Testing "tests\crypto_1.txt"...
Testing "tests\crypto_2.txt"...
Testing "tests\crypto_3.txt"...
Testing "tests\crypto_4.txt"...
Testing "tests\crypto_5.txt"...
Testing "tests\crypto_6.txt"...
Testing "tests\crypto_7.txt"...
Testing "tests\review_1.txt"...
Testing "tests\review_10.txt"...
Testing "tests\review_11.txt"...
Testing "tests\review_12.txt"...
Testing "tests\review_13.txt"...
Testing "tests\review_14.txt"...
Testing "tests\review_15.txt"...
Testing "tests\review_16.txt"...
Testing "tests\review_17.txt"...
Testing "tests\review_18.txt"...
Testing "tests\review_19.txt"...
Testing "tests\review_2.txt"...
Testing "tests\review_20.txt"...
Testing "tests\review_21.txt"...
Testing "tests\review_22.txt"...
Testing "tests\review_23.txt"...
Testing "tests\review_24.txt"...
Testing "tests\review_25.txt"...
Testing "tests\review_26.txt"...
Testing "tests\review_27.txt"...
Testing "tests\review_3.txt"...
Testing "tests\review_4.txt"...
Testing "tests\review_5.txt"...
Testing "tests\review_