# Computer Assignment 2: Decrypting Poly-Alphabetic Cipher using Genetic Algorithm
The goal of this assignment is to get more fimiliar with Genetics Algorithms(GA) and use them in practice. GA is usually used when the problem has a lot of states.This problem has $26!$ states!. It is obvious that the brute force solution is not feasible! That's why GA algorithm is useful for this problem. 

**Note:** The `global_text.txt` is the reference text.

**Author:** Danial Saeedi(810198571)

# Import Dependencies

In [1]:
import re
import string
import random
from operator import itemgetter
import numpy as np
import time

# Constants

In [10]:
POPULATION_SIZE = 100
ELITISM_RATE = 0.1
CROSSOVER_RATE = 0.65
CONSTANT = 14
MAX_GENERATIONS = 1000
MUTATION_PROBABILITY = 0.1

# Implementing Decrypter

The algorithm mentioned in this project is a famous substituion cipher called [Polyalphabetic Substitution](https://en.wikipedia.org/wiki/Polyalphabetic_cipher#:~:text=A%20polyalphabetic%20cipher%20is%20any,is%20a%20simplified%20special%20case.).
Some of the code for decrypting polyalphabetic cipher can be found [here](https://gist.github.com/ourway/8a567f0c201359237925).

In [3]:
alphabet = string.ascii_lowercase
ignore_chars = string.digits + string.punctuation + string.whitespace

def cycle_get(lst,index):
    new_index = index % len(lst)
    return(lst[new_index])

def cycle_increment_index(index,lst):
    if index == len(lst) - 1:
        index = 0
    else:
        index += 1
    return(index)

def shift(letter,value):
    current_letter_value = alphabet.find(letter)
    end_value = current_letter_value + value
    return(cycle_get(alphabet,end_value))

def convert_key_to_numbers(key):
    return([alphabet.find(i) for i in key])

def decrypt(text,key):
    text = text.lower()
    key = convert_key_to_numbers(key)
    index_of_key = 0
    result = ""
    for char in text:
        if char in ignore_chars:
            result += char
        else:
            result += shift(char,- key[index_of_key])
            index_of_key = cycle_increment_index(index_of_key,key)
    return(result)

# Part 0: Data cleaning & make dictionary
Before creating the dictionary, we have to clean our `global_text.txt`. The cleaning of this dataset is done by removing stop words and useless characters like *, !, and etc. The clean data method in Decoder class will clean the data.

1. Converting all letter to lowercase.
2. Converting all non-alphabet characters with space.
3. Removing duplicate words with mapping.
4. Removing stop words.

So now we have a dictionary which we can used it to find the fitness of each Chromosome.

In [4]:
# A list of stop words
stop_words = {'those', 'on', 'own', '’ve', 'yourselves', 'around', 'between', 'four', 'been', 'alone', 'off', 'am', 'then', 'other', 'can', 'regarding', 'hereafter', 'front', 'too', 'used', 'wherein', '‘ll', 'doing', 'everything', 'up', 'onto', 'never', 'either', 'how', 'before', 'anyway', 'since', 'through', 'amount', 'now', 'he', 'was', 'have', 'into', 'because', 'not', 'therefore', 'they', 'n’t', 'even', 'whom', 'it', 'see', 'somewhere', 'thereupon', 'nothing', 'whereas', 'much', 'whenever', 'seem', 'until', 'whereby', 'at', 'also', 'some', 'last', 'than', 'get', 'already', 'our', 'once', 'will', 'noone', "'m", 'that', 'what', 'thus', 'no', 'myself', 'out', 'next', 'whatever', 'although', 'though', 'which', 'would', 'therein', 'nor', 'somehow', 'whereupon', 'besides', 'whoever', 'ourselves', 'few', 'did', 'without', 'third', 'anything', 'twelve', 'against', 'while', 'twenty', 'if', 'however', 'herself', 'when', 'may', 'ours', 'six', 'done', 'seems', 'else', 'call', 'perhaps', 'had', 'nevertheless', 'where', 'otherwise', 'still', 'within', 'its', 'for', 'together', 'elsewhere', 'throughout', 'of', 'others', 'show', '’s', 'anywhere', 'anyhow', 'as', 'are', 'the', 'hence', 'something', 'hereby', 'nowhere', 'latterly', 'say', 'does', 'neither', 'his', 'go', 'forty', 'put', 'their', 'by', 'namely', 'could', 'five', 'unless', 'itself', 'is', 'nine', 'whereafter', 'down', 'bottom', 'thereby', 'such', 'both', 'she', 'become', 'whole', 'who', 'yourself', 'every', 'thru', 'except', 'very', 'several', 'among', 'being', 'be', 'mine', 'further', 'n‘t', 'here', 'during', 'why', 'with', 'just', "'s", 'becomes', '’ll', 'about', 'a', 'using', 'seeming', "'d", "'ll", "'re", 'due', 'wherever', 'beforehand', 'fifty', 'becoming', 'might', 'amongst', 'my', 'empty', 'thence', 'thereafter', 'almost', 'least', 'someone', 'often', 'from', 'keep', 'him', 'or', '‘m', 'top', 'her', 'nobody', 'sometime', 'across', '‘s', '’re', 'hundred', 'only', 'via', 'name', 'eight', 'three', 'back', 'to', 'all', 'became', 'move', 'me', 'we', 'formerly', 'so', 'i', 'whence', 'under', 'always', 'himself', 'in', 'herein', 'more', 'after', 'themselves', 'you', 'above', 'sixty', 'them', 'your', 'made', 'indeed', 'most', 'everywhere', 'fifteen', 'but', 'must', 'along', 'beside', 'hers', 'side', 'former', 'anyone', 'full', 'has', 'yours', 'whose', 'behind', 'please', 'ten', 'seemed', 'sometimes', 'should', 'over', 'take', 'each', 'same', 'rather', 'really', 'latter', 'and', 'ca', 'hereupon', 'part', 'per', 'eleven', 'ever', '‘re', 'enough', "n't", 'again', '‘d', 'us', 'yet', 'moreover', 'mostly', 'one', 'meanwhile', 'whither', 'there', 'toward', '’m', "'ve", '’d', 'give', 'do', 'an', 'quite', 'these', 'everyone', 'towards', 'this', 'cannot', 'afterwards', 'beyond', 'make', 'were', 'whether', 'well', 'another', 'below', 'first', 'upon', 'any', 'none', 'many', 'serious', 'various', 're', 'two', 'less', '‘ve'}
len(stop_words)

326

In [5]:
def clean_data(global_text):
  # Removing useless characters
  global_text = re.sub(r'[^A-Za-z]', ' ', global_text)

  # Removing stop words
  words = [word for word in global_text.split() if word.lower() not in stop_words]
  global_text = " ".join(words)

  # Split
  global_text_words = global_text.split()

  # To list
  global_text_words = list(dict.fromkeys(global_text_words))
  return global_text_words

In [6]:
def create_dictionary(path = "global_text.txt"):
    global_text = open(path).read().lower()
    global_text_words = clean_data(global_text)

    # Returns a set of words
    return set(global_text_words)

In [7]:
list(create_dictionary())[:50]

['shows',
 'conditional',
 'serve',
 'remains',
 'fine',
 'consider',
 'produces',
 'fosters',
 'nd',
 'offered',
 'powerful',
 'colonization',
 'determine',
 'bug',
 'average',
 'priorprobabilities',
 'subgenius',
 'mathew',
 'experience',
 'terrify',
 'response',
 'atheism',
 'arise',
 'satisfactorily',
 'shovelling',
 'flips',
 'purpose',
 'plans',
 'component',
 'chemical',
 'sentence',
 'morris',
 'considering',
 'amechanical',
 'conscious',
 'interchangeableatoms',
 'thump',
 'practicing',
 'washington',
 'christianity',
 'instinctivebehaviour',
 'boat',
 'algorithm',
 'miserable',
 'congo',
 'christmas',
 'told',
 'savages',
 'don',
 'blissful']

In [None]:
print('Original Length: ',len(globalText))
print('After removing the stop words: ',len(create_dictionary()))
print('Diff: ',len(globalText) - len(create_dictionary()))

# Part 1: Define Gene & Chromosome
**Gene:** Here we assign each gene a character. Each gene represents a character of the encryption key.


**Chromosome:** Each chromosome consists of 14 genes. So each chromosome represents a potential encryption key.
For instance, this a chromosome:

> qgberyehnglsip



# Part 2: Generate Primitive Population
`POPULATION_SIZE` is the primitive population size. The method `make_chromosomes` will create a random chromosome. Note that in that method `random` function has been used.

# Part 3: Implement & define fitness function
`fitness` method in Decoder method calculates the fitness value. In this function, the deciphered text will be calculated and then count how many words in the deciphered text is inside the **dictionary**.The number of words(which can be found in the dictionary) is the `fitness value` for each chromosome. Then, Chromosoles will be sorted in reverse mode because we want to choose the best of them for the next generation.

# Part 4: Impelementing the crossover and mutation

## Selection

Here, we're going to use rank base selection for this problem. And also `elitism` selection will be used.

## Crossovers

In order to implementation of crossover, the order 1 method will be used. In the crossover method, two different points will be chosen and then copy the values between these points to the offspring and then complete the chromosome. The crossover happens in `65%` of the population. The parents will be chosen based on their rank.

## Mutation

In this part, we're going to replace each character with a random character with 0.1 probability

# Part 5: Decrypting without key

In [11]:
class Decoder:
    def __init__(self, globalText, encodedText, keyLength = CONSTANT):
        self.encodedText = re.sub(r'[^A-Za-z]', ' ', encodedText)
        self.encodedTextWords = self.cleanData(self.encodedText.lower())
        self.key_length =keyLength
        self.globalText = globalText
        self.dictionary = {}
        self.createDictionary()
        self.population = []
        self.make_chromosomes()
    
    
    def cleanData(self, dataSet):
        dataSet = re.sub(r'[^A-Za-z]', ' ', dataSet)
        dataSetWords = dataSet.split()
        dataSetWords = list(dict.fromkeys(dataSetWords))
        return dataSetWords
    
    
    def createDictionary(self):
        dataSet = self.globalText.lower()
        dataSetWords = self.cleanData(dataSet)
        self.dictionary = set(dataSetWords)
    
    def make_random_string(self,length):
        letters = string.ascii_lowercase
        return ''.join(random.choice(letters) for i in range(length))

    def make_chromosomes(self):
        for i in range(POPULATION_SIZE):
            self.population.append(self.make_random_string(self.key_length))
    
    def calculate_population_fitness(self):
        populationScores = [[self.fitness(self.population[i]), i] for i in range(len(self.population))]
        populationScores = sorted(populationScores, key=itemgetter(0), reverse = True)
        return populationScores
    
    def fitness(self, chromosome):
        decipheredWords = decrypt(self.encodedText, chromosome)
        
        counter = 0
        for word in decipheredWords:
            if word in self.dictionary:
                counter += 1
                
        return counter
    
    def makeChild(self, first, second):
        size = len(first)
        point1 = random.randint(1, size - 1)
        point2 = random.randint(1, size - 1)
        if point2 >= point1:
            point2 += 1
        else:
            point1, point2 = point2, point1
        
        first_child = first[:point1] + second[point1:point2] + first[point2:]
        second_child = second[:point1] + first[point1:point2] + second[point2:]

        return first_child, second_child

           
    
    def split_into_chars(self,string):
        string = string.lower()
        l = []
        for i in range(len(string)):
            l.append(string[i])

        return l
    def crossover(self, father, mother):
        return self.makeChild(father, mother)
    
    def mutation(self, chromosome):
        letters = string.ascii_lowercase
        size = len(chromosome)
        for i in range(len(chromosome)):
            if random.random() < MUTATION_PROBABILITY:
              chromosome = chromosome[:i] + random.choice(letters) + chromosome[i+1:]
        
        return chromosome
    def decode(self):
        sumOfRanks = (len(self.population) * len(self.population) + 1)/2
        ranksProbabilities = [(i+1)/sumOfRanks for i in range(len(self.population))]
        ranksProbabilities = sorted(ranksProbabilities, reverse = True)
       
        generation = 0
        while True:
            populationScores = self.calculate_population_fitness()
            

            generation += 1

            if generation >= 1000:
                chromosome = self.population[populationScores[0][1]]
                print(populationScores)
                print("The final chromosome is: ")
                print(chromosome)
                
                return decrypt(self.encodedText,chromosome)

          
            newPopulation = []
            
            newPopulation.extend([self.population[populationScores[i][1]] for i in range(int(POPULATION_SIZE*ELITISM_RATE))])
            
            size = int(CROSSOVER_RATE*POPULATION_SIZE) 
            
            for i in range(size):
                parent = np.random.choice(self.population,2 , ranksProbabilities) 
                offspring1, offspring2 = self.crossover(parent[0] , parent[1]) 

                # print(offspring)
                newPopulation.append(self.mutation(offspring1))
                newPopulation.append(self.mutation(offspring2))
            
            self.population = newPopulation

In [12]:
encodedText = open('encoded_text.txt').read()
globalText = open('global_text.txt').read()

d = Decoder(globalText, encodedText, keyLength = CONSTANT)
start = time.time()
decodedText = d.decode()
end = time.time()

print("Time: %s seconds" % (end - start) , "\n")
print("The decoded text is: \n")
print(decodedText)

[[2421, 0], [2421, 1], [2421, 2], [2421, 3], [2421, 4], [2421, 5], [2421, 6], [2421, 7], [2421, 8], [2421, 9], [2400, 22], [2368, 19], [2346, 47], [2315, 76], [2311, 29], [2291, 18], [2287, 115], [2264, 23], [2260, 28], [2260, 100], [2260, 127], [2258, 108], [2252, 57], [2231, 99], [2229, 40], [2222, 114], [2219, 13], [2218, 80], [2217, 35], [2216, 46], [2216, 54], [2213, 109], [2211, 52], [2210, 75], [2207, 51], [2207, 85], [2200, 81], [2198, 112], [2197, 73], [2195, 44], [2192, 15], [2189, 53], [2189, 104], [2188, 82], [2185, 79], [2183, 84], [2183, 137], [2181, 59], [2174, 66], [2173, 45], [2173, 55], [2171, 43], [2168, 120], [2167, 41], [2167, 125], [2166, 34], [2165, 62], [2165, 126], [2164, 103], [2161, 63], [2160, 21], [2159, 122], [2156, 116], [2155, 92], [2153, 32], [2153, 89], [2152, 113], [2149, 118], [2147, 72], [2147, 136], [2146, 25], [2146, 101], [2145, 65], [2143, 119], [2143, 130], [2140, 39], [2129, 12], [2128, 132], [2127, 14], [2125, 111], [2123, 48], [2122, 11], [2

# Decrypted Text

<p>
albert einstein old grove rd  nassau point peconic  long island  august  nd        f d  roosevelt  president of the united states  white house washington  d c   sir   some recent work by e  fermi and l  szilard  which has been communicated to me in manuscript  leads me to expect that the element uranium may be turned into a new and important source of energy in the immediate future  certain aspects of the situation which has arisen seem to call for watchfulness and  if necessary  quick action on the part of the administration  i believe therefore that it is my duty to bring to your attention the following facts and recommendations   in the course of the last four months it has been made probable through the work of joliot in france as well as fermi and szilard in america that it may become possible to set up a nuclear chain reaction in a large mass of uranium by which vast amounts of power and large quantities of new radium like elements would be generated  now it appears almost certain that this could be achieved in the immediate future   this phenomenon would also lead to the construction of bombs  and it is conceivable though much less certain that extremely powerful bombs of a new type may thus be constructed  a single bomb of this type  carried by boat and exploded in a port  might very well destroy the whole port together with some of the surrounding territory  however  such bombs might very well prove to be too heavy for transportation by air   the united states has only very poor ores of uranium in moderate quantities  there is some good ore in canada and the former czechoslovakia  while the most important source of uranium is belgian congo   in view of this situation you may think it desirable to have some permanent contact maintained between the administration and the group of physicists working on chain reactions in america  one possible way of achieving this might be for you to entrust with this task a person who has your confidence and who could perhaps serve in an inofficial capacity  his task might comprise the following   a   to approach government departments  keep them informed of the further development  and put forward recommendations for government action  giving particular attention to the problem of securing a supply of uranium ore for the united states   b   to speed up the experimental work  which is at present being carried on within the limits of the budgets of university laboratories  by providing funds  if such funds be required  through his contacts with private persons who are willing to make contributions for this cause  and perhaps also by obtaining the co operation of industrial laboratories which have the necessary equipment   i understand that germany has actually stopped the sale of uranium from the czechoslovakian mines which she has taken over  that she should have taken such early action might perhaps be understood on the ground that the son of the german under secretary of state  von weizsacker  is attached to the kaiser wilhelm institut in berlin where some of the american work on uranium is now being repeated   yours very truly   albert einstein 

</p>

In [102]:
decrypt(d.encodedText,'alberteinstein')

'albert einstein old grove rd  nassau point peconic  long island  august  nd        f d  roosevelt  president of the united states  white house washington  d c   sir   some recent work by e  fermi and l  szilard  which has been communicated to me in manuscript  leads me to expect that the element uranium may be turned into a new and important source of energy in the immediate future  certain aspects of the situation which has arisen seem to call for watchfulness and  if necessary  quick action on the part of the administration  i believe therefore that it is my duty to bring to your attention the following facts and recommendations   in the course of the last four months it has been made probable through the work of joliot in france as well as fermi and szilard in america that it may become possible to set up a nuclear chain reaction in a large mass of uranium by which vast amounts of power and large quantities of new radium like elements would be generated  now it appears almost certa

# Part 6

### 1. Too small or too large primitive population




*   If the population size is set **too high** the population will have too much bad genetic material to provide a good fitness.
*   If the primitive population size is **set too low** the genetic algorithm will have too few possible ways to alter new individuals so the fitness will be low.





[Source](https://www.diva-portal.org/smash/get/diva2:832349/FULLTEXT01.pdf)

### 2. What if we increase the population size in each round?
This population increase doesn't help our algorithm because the number of generations is very big, and it makes the population very large and makes the processing time a lot.

### 3. Why genetic algorithms use crossover and mutation together?What happens when you don't use one of them?

**Crossover** leads to making a **new generation**. It makes completely **new solutions** and leads to **significant improvement** in the **fitness scores**. But, using just crossover may lead to **stuck** in some situations. So, mutation helps to change the current answer a little to make a possibly better generation.

### 4. Why genetic algorithms use crossover and mutation together?What happens when you don't use one of them?

**Crossover** leads to making a **new generation**. It makes completely **new solutions** and leads to **significant improvement** in the **fitness scores**. But, using just crossover may lead to **stuck** in some situations. So, mutation helps to change the current answer a little to make a possibly better generation.

If we use just crossover in the genetic algorithm, its very potential to stuck in the local extremum for a long time. So, the mutation is an important part of the genetic algorithm.

### 5. What should we do if the population didn't change after some generations?

We could increase ELITISM_RATE or CROSSOVER_RATE values.

### 6. If we had to choose one of between crossover and mutation, which one is more effective? why?

Crossover. Because crossover leads to making a new generation. It makes completely new solutions and leads to significant improvement in the fitness scores.

### 7. If we had to choose one of between crossover and mutation, which one is more effective? why?

Crossover. Because crossover leads to making a new generation. It makes completely new solutions and leads to significant improvement in the fitness scores.

### 8. How to speedup
* We can reduce MAX_GENERATIONS.
* Decrease population size.