In [2]:
with open("v_ciphertext_24.txt", "r", encoding="utf-8") as f:
    ciphertext = f.read()

## Finding the Length of the Key

### Using Index of Coincidence (IC) Method

IC - The probablity that two randomly selected letters from the text are the same.

For English, IC - 0.0686
For Random Text, IC - 0.0385 (1/26)

Assume key length = k, break the ciphertext into k "columns", each columns is encrypted using one letter of the key (similar to ceaser cipher).
Hence the properties of this column should be similar to English text. We can calculate the IC of each column and take the average. If the average IC is closer to English IC, then k is likely to be the key length.

In [None]:
KEY_MIN_LENGTH = 2
KEY_MAX_LENGTH = 20

def calculate_ic(text):
    frequency = [0] * 26
    for char in text:
        frequency[ord(char) - ord('a')] += 1
    N = len(text)
    ic = sum(f * (f - 1) for f in frequency) / (N * (N - 1)) if N > 1 else 0
    return ic

ic_values = {}
for key_length in range(KEY_MIN_LENGTH, KEY_MAX_LENGTH + 1):
    columns = [''] * key_length
    for i, char in enumerate(ciphertext):
        columns[i % key_length] += char
    avg_ic = sum(calculate_ic(col) for col in columns) / key_length
    ic_values[key_length] = avg_ic

for k, v in ic_values.items():
    print(f"Key Length: {k}, Average IC: {v:.4f}")

# The key length with average IC closest to 0.0686 is likely the correct key length.

Key Length: 2, Average IC: 0.0425
Key Length: 3, Average IC: 0.0424
Key Length: 4, Average IC: 0.0425
Key Length: 5, Average IC: 0.0425
Key Length: 6, Average IC: 0.0423
Key Length: 7, Average IC: 0.0674
Key Length: 8, Average IC: 0.0426
Key Length: 9, Average IC: 0.0425
Key Length: 10, Average IC: 0.0423
Key Length: 11, Average IC: 0.0424
Key Length: 12, Average IC: 0.0424
Key Length: 13, Average IC: 0.0423
Key Length: 14, Average IC: 0.0671
Key Length: 15, Average IC: 0.0422
Key Length: 16, Average IC: 0.0426
Key Length: 17, Average IC: 0.0424
Key Length: 18, Average IC: 0.0425
Key Length: 19, Average IC: 0.0425
Key Length: 20, Average IC: 0.0421


We can see that the length of the key is likely to be 7 with average IC = 0.0674 which is very close to English IC.

### Breaking the Ciphertext

Since we know the length of the key is 7, we can break the ciphertext into 7 columns and solve each column as a Ceaser cipher. 
We can use frequency analysis to find the shift for each column. The most frequent letter in English is 'E', so we can assume that the most frequent letter in each column corresponds to 'E' and calculate the shift accordingly. (we can also use other letters like 'T', 'A' etc. to get better results).

In [None]:
columns = [''] * 7
for i, char in enumerate(ciphertext):
    columns[i % 7] += char

from collections import Counter
shift = []
for col in columns:
    freq = Counter(col)
    most_common_char, _ = freq.most_common(1)[0]
    shift.append(ord(most_common_char) - ord('e'))

print("Shifts for each column (assuming most frequent letter corresponds to 'e'):", shift)

Shifts for each column (assuming most frequent letter corresponds to 'e'): [19, 12, 14, 15, 13, 4, 8]


#### Decrypting the Ciphertext

Check if the decrypted message (in chunk of 7) make sense, if not try the next most frequent letter (like 'T', 'A' etc.).

In [None]:
message = []
for i, char in enumerate(ciphertext):
    s = shift[i % 7]
    decrypted_char = chr((ord(char) - ord('a') - s) % 26 + ord('a'))
    message.append(decrypted_char)
decrypted_message = ''.join(message)
print("Decrypted Message:")
print(decrypted_message)



Decrypted Message:
atfirsttheweatherwasfineandstillthethrusheswerecallingandintheswampsclosebysomethingalivedronedpitifullywithasoundlikeblowingintoanemptybottleasnipeflewbyandtheshotaimedatitrangoutwithagayresoundingnoteinthespringairbutwhenitbegantogetdarkintheforestacoldpenetratingwindblewinappropriatelyfromtheeastandeverythingsankintosilenceneedlesoficestretchedacrossthepoolsanditfeltcheerlessremoteandlonelyintheforesttherewasawhiffofwinterivanvelikopolskythesonofasacristanandastudentoftheclericalacademyreturninghomefromshootingkeptwalkingonthepathbythewaterloggedmeadowshisfingerswerenumbandhisfacewasburningwiththewinditseemedtohimthatthecoldthathadsuddenlycomeonhaddestroyedtheorderandharmonyofthingsthatnatureitselffeltillateaseandthatwaswhytheeveningdarknesswasfallingmorerapidlythanusualallarounditwasdesertedandpeculiarlygloomytheonlylightwasonegleaminginthewidowsgardensneartheriverthevillageoverthreemilesawayandeverythinginthedistanceallroundwasplungedinthecoldeveningmistthestude

This is actually from ["The Student"](https://www.ibiblio.org/eldritch/ac/student.html) by Anton Chekhov. The secret last 10 random letters sir asked to find are: *oqkcmzrndx*