Skip to content

A script for decrypting text encrypted with monoalphabetic/simple substitution ciphers using combined pattern deduction.

License

Notifications You must be signed in to change notification settings

djeley/ssc-decryptor

Repository files navigation

ssc-decryptor

A Python 3 script for decrypting text encrypted with monoalphabetic/simple substitution ciphers, where word boundaries remain in place, using a technique that I will call combined pattern deduction.

Installation

Install required packages using pip:

pip install -r requirements.txt

About

Each encrypted word from an encrypted message will have zero or more matching plaintext words that share the same word pattern. This script attempts to decipher encrypted messages by reducing the number of plaintext matches for each encrypted word to one (ideally).

Example:

Encrypted Word Pattern No. Plaintext Matches
WSNNHGDK 8-12334567 675
SJJQGSJB 8-12234125 4
FTKJPAHG 8-12345678 5762

This number of tuples in the 3-fold cartesian product of the plaintext matches for these 3 words is ~15,000,000.

However, by sequentially combining encrypted words and comparing the longer pattern with the pattern of each combined tuple in the cartesian product of the plaintext matches, the number of matches for each word can be reduced to one by examining ~50,000 tuples.

Decryption process: High-level Overview

The decryption process utilises word patterns and combines words to create larger, more useful, patterns. Consider the following two encrypted words and their corresponding patterns:

Encrypted Word Pattern
SJJQGSJB 8-12234125
WSNNHGDK 8-12334567

In the first word, S is identified with a 1. In the second word, S is identified with a 2. If we combine both words and create a new pattern, such as:

Encrypted Word Pattern
SJJQGSJBWSNNHGDK 16-122341256177849A

Then in the new pattern we can now see that S is identified with a 1 for all three occurrences.

It is now possible to say that: any set in the cartesian product of the plaintext matches for each of the individual word patterns that doesn't have the same combined pattern, can be deducted from the original sets of plaintext matches.

We can keep building and comparing larger patterns as we iterate through the encrypted words in sequence, and through this process of deduction, we can determine a cyphertext alphabet to decrypt the message.

Note: You can run the script using the verbose (-v) flag and see this deduction in action

Two Word Example

  • PQACEIAMNSXU has 23 word pattern matches
  • RWCTJSXSZA has 261 word pattern matches

By creating a combined pattern from PQACEIAMNSXU and RWCTJSXSZA and comparing it with a combined pattern for each tuple in the cartesian product of each word's word pattern matches, then there is only 1 possible match for each word:

  • PQACEIAMNSXU has 1 word pattern matches
  • RWCTJSXSZA has 1 word pattern matches

"OVERWHELMING SCRUTINIZE"

Total tuples compared: (23 x 261) = 6,003

Three Word Example

  • OCUBICBP has 53 word pattern matches
  • KCXXPQUN has 675 word pattern matches
  • PUBOMNV has 8585 word pattern matches

First iteration

  • OCUBICBP has 53 word pattern match
  • KCXXPQUN has 675 word pattern match

By creating a combined pattern from OCUBICBP and KCXXPQUN and comparing it with a combined pattern for each tuple in the cartesian product of each word's word pattern matches, then there are 6 possible matches for OCUBICBP and 7 possible matches for KCXXPQUN.

Second iteration after deductions

  • OCUBICBP has 6 word pattern matches
  • KCXXPQUN has 7 word pattern matches
  • PUBOMNV has 8585 word pattern matches

By creating a combined pattern from OCUBICBP, KCXXPQUN and PUBOMNV, and comparing it with a combined pattern for each tuple in the cartesian product of each word's word pattern matches, then there is only 1 possible match for each word:

  • OCUBICBP has 1 word pattern match
  • KCXXPQUN has 1 word pattern match
  • PUBOMNV has 1 word pattern match

"ENGLISH LANGUAGE PATTERNS"

Total tuples compared: (53 x 675) + (6 x 7 x 8585) = 396,345

Encrypted Word Ordering for Examination

The first step of the decryption process is to split the encrypted message into normalised alphabetic words. This can involve converting unicode characters to ascii, removing punctuation, transforming to uppercase, removing duplicates etc. We then need to consider the examination order of the words - taking into account that the larger the number of items in the cartesian product, then the more iterations/comparisons will be required.

There are 3 ordering options:

  1. LONGEST_TO_SHORTEST: Longest length word to the shortest. In general, combing the largest words first will result in a large number of deductions early in the process. This is the default ordering option.
  2. FEWEST_TO_MOST_MATCHES: Word with the fewest plaintext matches to the most. In general, LONGEST_TO_SHORTEST decrypts with the least iterations and is therefore quicker. However, message_example_2.txt is an example where FEWEST_TO_MOST_MATCHES is better.
  3. MATCHES_DIVIDED_BY_LENGTH: Number of plaintext matches divided by the length of the word.

Weaknesses

  • Depends on a word list (dictionary).
  • When a word isn't present in the dictionary, it could cause issues if the ordering places the word in the first two words to be examined.
  • Can struggle with some shorter sentences where each word has a lot of pattern matches.
  • Doesn't attempt to decrypt numbers.

Strengths

  • Can often handle words not being present in the dictionary. For example:

    Cyphertext:

    Zozm Nzgsrhlm Gfirmt LYV UIH; 23 Qfmv 1912 – 7 Qfmv 1954) dzh zm Vmtorhs nzgsvnzgrxrzm, xlnkfgvi hxrvmgrhg, oltrxrzm, xibkgzmzobhg, ksrolhlksvi, zmw gsvlivgrxzo yrloltrhg.

    Decrypts to:

    Alan Mathison Turing OBE [F,K,Q]RS; 23 June 1912 – 7 June 1954) [k,v,w,f]as an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.

    This isn't perfect as U mapped to F, K, or Q and d mapped to k, v, w, or f. This is because frs, krs, qrs, kas, vas, was, and fas are all words present in the dictionary.txt file, and there isn't enough information to determine a one-to-one mapping for U and d.

    However, the word Mathison is not present in the dictionary but we can still decrypt is as we have deciphered the individual characters from other words that are in the dictionary.

  • Can be fast and very accurate.

Usage

usage: decrypt.py [-h] [-m MESSAGE | -f MESSAGE_FILE] [-v | -vv] [-s]
                  [-w [WORDS_FILES [WORDS_FILES ...]]]
                  [-o {LONGEST_TO_SHORTEST,FEWEST_TO_MOST_MATCHES,MATCHES_DIVIDED_BY_LENGTH}]

optional arguments:
  -h, --help            show this help message and exit
  -m MESSAGE, --message MESSAGE
                        An encrypted message
  -f MESSAGE_FILE, --message-file MESSAGE_FILE
                        An encrypted message text file
  -v, --verbose-info    Makes the decryptor output more verbose
  -vv, --verbose-debug  Makes the decryptor output even more verbose
  -s, --suppress-encrypted-text-output
                        Suppresses the encrypted text from the output
  -w [WORDS_FILES [WORDS_FILES ...]], --words-files [WORDS_FILES [WORDS_FILES ...]]
                        Dictionary (words) files. Example: -w
                        words/dictionary.txt words/names.txt
  -o {LONGEST_TO_SHORTEST,FEWEST_TO_MOST_MATCHES,MATCHES_DIVIDED_BY_LENGTH}, --order {LONGEST_TO_SHORTEST,FEWEST_TO_MOST_MATCHES,MATCHES_DIVIDED_BY_LENGTH}
                        The word examine order

Example usage:

python3 decrypt.py -f message_examples/message_example_10.txt

Attempts to decrypt the contents of the message_examples/message_example_10.txt file using default word ordering LONGEST_TO_SHORTEST.

python3 decrypt.py -f message_examples/message_example_10.txt -o FEWEST_TO_MOST_MATCHES

Attempts to decrypt the contents of the message_examples/message_example_10.txt` file using FEWEST_TO_MOST_MATCHES word ordering.

Message Examples:

There are a number of encrypted message examples provided in this repository. The screenshots below show the output for each one.

message_example_1.txt message_example_1

message_example_2.txt message_example_2

message_example_3.txt message_example_3

message_example_4.txt message_example_4

message_example_5.txt message_example_5

message_example_6.txt message_example_6

message_example_7.txt message_example_7

message_example_8.txt message_example_8

message_example_9.txt message_example_9

message_example_10.txt message_example_10

message_example_11.txt message_example_11

message_example_12.txt message_example_12

message_example_13.txt message_example_13

message_example_14.txt message_example_14

message_example_15.txt message_example_15

About

A script for decrypting text encrypted with monoalphabetic/simple substitution ciphers using combined pattern deduction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages