![University of Tehran](./img/UT.png)
#   <font color='red'><center>AI CA 2 - GA<center></font> 
## <center>Dr. Fadaei<center>
### <center>Daniyal Maroufi<center>
### <center>810098039<center>

## Aim

This assignment aims to use the Genetic Algorithm (GA) to find a decoded text's encryption key (14 characters). We prepared a global text to generate the dictionary, and we will use it to decipher the text.

# Problem Description

In this assignment, we are trying to decode an encoded text without knowing the encryption key. We should find the encryption key using the genetic algorithm to decipher the text. 
First, we read a long global text and generate a dictionary of the words, and we are assured that all of the words in the original text(that we want to decode) are in this dictionary. Then, we guess a random encryption key and decrypt the text. If all the words are present in the extracted dictionary, we claim that the supposed key is correct and the target text is decoded successfully.

# Algorithm Description

First, we generate 300 possible answers (chromosomes) as the initial population or the first generation. Then, we use a fitness function to rank the chromosomes. The fitness function presents how good each chromosome (solution) is. By comparing all chromosomes, we can find the best chromosomes of the generations. Then we will create a new generation based on these selected chromosomes using crossover and mutation processes. By repeating this process, we will finally find the solution that satisfies the expectations of the solution.



# Algorithm Implementation

## Part 0 - Generating the Dictionary

In this part, we perform the text data cleaning process, which contains contraction replacement, converting the words to lower case, and removing the stopwords and all punctuation. For eliminating the stopwords, we use the nltk python library.


## Part 1 - Definition of Genes and Chromosome

Each chromosome represents a possible solution to the problem in the Genetic Algorithm. Hence, we define the chromosome as the encryption key consisting of 14 genes (characters). Then we use the chromosomes to evolve through the algorithm process, and finally, we can find the correct key.


## Part 2 - Generating Initial Population

We generate 300 random chromosomes as the initial population.


## Part 3 - Ranking Population

To rank the population, we decode the encoded text with each chromosome and count the number of words of the decoded text that exist in the dictionary. If all the decoded words exist in the dictionary, we can conclude that the encryption key is found. 


## Part 4 - Generate a New Population

For evolution, we need to generate new chromosomes using the parents. We use the crossover method to create a new chromosome by randomly selecting a separator to combine the two parents. For mutation, we use the swap method, in which we swap a random gene with another gene in a single chromosome.

To form a new generation, we directly pass `elitism*population_size` ranked chromosomes to the new generation. For the rest of the chromosomes, we randomly apply the crossover operation with a probability of `pc` and the mutation operation with a probability of `pm`.


## Part 5 - Decode

Now we can search for the results using the defined functions above.


In [3]:
import nltk
import time
import string
import random
nltk.download('stopwords')
from nltk.corpus import stopwords


[nltk_data] Downloading package stopwords to /home/dani/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [8]:
class Decoder():

    chromosomes=[]
    dictionary=[]

    def __init__(self, globalText, encodedText, keyLength=14, InitialPopulationSize=50, pc=0.8, pm=0.2, elitism=0.16):
        self.global_text=globalText
        self.encoded_text=' '.join(self.cleanText(encodedText))
        self.generateDictionary()
        self.key_length=keyLength
        self.population_size=InitialPopulationSize
        self.generateInitialPopulation()
        self.elitism = elitism # The number of chromosomes pass to the next generation
        self.pc = pc # Crossover Probability
        self.pm = pm  # Mutation Probability

    def generateDictionary(self):
        self.dictionary=self.cleanText(self.global_text)

    def cleanText(self,text):
        Apos_dict={"'s":" is","n't":" not","'m":" am","'ll":" will",
        "'d":" would","'ve":" have","'re":" are"}

        # replace contractions
        for key,value in Apos_dict.items():
            if key in text:
                text=text.replace(key,value)

        text=text.lower()

        # remove punctuations
        for letter in text:
            if letter not in string.ascii_lowercase:
                text=text.replace(letter,' ')
        for punctuation in string.punctuation:
            text=text.replace(punctuation,' ')

        text=text.split()

        dictionary=[]

        for word in text:
            if word not in stopwords.words('english'):
                dictionary.append(word)
        return dictionary

    def decryptLetter(self,e,k):
        e=list(string.ascii_lowercase).index(e)
        k=list(string.ascii_lowercase).index(k)
        return string.ascii_lowercase[(e-k+26)%26]

    def generateChromosome(self):
        return ''.join(random.choice(string.ascii_lowercase) for _ in range(self.key_length))

    def generateInitialPopulation(self):
        for _ in range(self.population_size):
            self.chromosomes.append(self.generateChromosome())

    def repeatKey(self,key,desired_length):
        return key*int(desired_length/len(key))+key[:desired_length%len(key)]

    def decrypt(self,key):
        decoded_text=''
        repeated_key=self.repeatKey(key,len(self.encoded_text))
        for idx,letter in enumerate(self.encoded_text):
            if letter==' ':
                decoded_text+=' '
            else:
                if type(repeated_key[idx])==int:
                    print(idx,repeated_key)
                decoded_text+=self.decryptLetter(letter,repeated_key[idx])
        return decoded_text

    def checkPresence(self,word):
        if word in self.dictionary:
            return True
        return False

    def countPresence(self, decoded_text):
        presence=0
        for word in decoded_text.split():
            if self.checkPresence(word):
                presence+=1
        return presence

    def calculateFitness(self, chromosome):
        decoded_text=self.decrypt(chromosome)
        return self.countPresence(decoded_text)

    def rankPopulation(self):
        ranked_population = []
        for chromosome in self.chromosomes:
            ranked_population.append([self.calculateFitness(chromosome),chromosome])
        ranked_population = sorted(ranked_population,key=lambda x:(x[0]),reverse=True)
        return [ch[1] for ch in ranked_population]

    def crossover(self, chromosome1, chromosome2):
        child1=''
        child2=''
        seperator=random.randrange(0,self.key_length)
        child1=chromosome1[:seperator]+chromosome2[seperator:]
        child2=chromosome2[:seperator]+chromosome1[seperator:]
        return child1, child2

    def mutate(self, chromosome):
        chromosome=list(chromosome)
        gene1=random.randrange(0,self.key_length)
        gene2=random.randrange(0,self.key_length)
        chromosome[gene1],chromosome[gene2]=chromosome[gene2],chromosome[gene1]
        return ''.join(chromosome)

    def generateNewPopulation(self):
        ranked_population=self.rankPopulation()
        elites=int(self.elitism*self.population_size)
        new_generation=ranked_population

        for i in range(elites,self.population_size-1):
            crossover_chance=random.uniform(0,1)
            mutation_chance=random.uniform(0,1)

            if crossover_chance<self.pc and mutation_chance<self.pm:
                child1,child2=self.crossover(ranked_population[i],ranked_population[i+1])
                child1, child2=self.mutate(child1), self.mutate(child2)
                new_generation.append(child1)
                new_generation.append(child2)
            elif crossover_chance<self.pc:
                child1,child2=self.crossover(ranked_population[i],ranked_population[i+1])
                new_generation.append(child1)
                new_generation.append(child2)
            elif mutation_chance<self.pm:
                new_generation.append(self.mutate(ranked_population[i]))
                # new_generation.append(self.mutate(ranked_population[i+1]))
            else:
                new_generation.append(ranked_population[i])
        
        return new_generation

    def decode(self):
        goal_score=len(d.encoded_text.split())
        print('Goal score:',goal_score)
        best_score=self.calculateFitness(self.rankPopulation()[0])
        print('Initial score:',best_score)
        while best_score<goal_score:
            self.chromosomes=self.generateNewPopulation()
            best_score=self.calculateFitness(self.rankPopulation()[0])
            print('key:',self.rankPopulation()[0],'score:',best_score)
        return self.rankPopulation()[0][1]


In [9]:
encodedText = open('./Resources/encoded_text.txt').read()
globalText = open('./Resources/global_text.txt').read()
d = Decoder(globalText, encodedText)
decodedText = d.decode()


Goal score: 517
Initial score: 16
key: srmehohsywetmp score: 16
key: srmehohsywetmp score: 16
key: srmehohsywetmp score: 16
key: lwfsqbmgidsptw score: 17
key: sakskzpfjkscfm score: 19
key: sakskzpfjkscfm score: 19
key: sakskzpfjkscfm score: 19


## Part 6 - Questions


1. A small initial population does not have the required variety to find the answer, and otherwise, a large initial population has a lot of computational burdens and slows the algorithm.
2. By increasing the population over time, we can be more confident that the generations have a good variety, and the probability of finding the answer increases. But on the other hand, it increases the computational time.
3. Crossover operation causes the best chromosomes to grow and become better and better, and the mutation operation ensures that new and more varied chromosomes are also developing.
4. Crossover operations have a better impact on developing the generations than mutation operation because it combines the best chromosomes and tries to generate new generations.
5. If the generations stay the same for a few steps, we can generate a new random population and start again. That may cause by a lot of weak chromosomes that can not evolve well enough.
6. Crossover operation is more effective than mutation, and if we had a single option, selecting crossover was a better choice.
7. We could use hashing or binary search algorithms to optimize the search process of the words in the dictionary. We could also try different values for `elitism`, `pc`, and `pm` to find more optimized parameters.
