# Goal: Crack Password
We are going to try and determine a hidden password.

# Choosing a fitness Function
The evaluation function is the first step to create a genetic algorithm. It’s the function that estimates the success of our specimen. The simplest solution can be shown as:

```
fitness score = (number of char correct) / (total number of char)
```


In [86]:
def fitness(password, test_word): 
  if (len(test_word) != len(password)):
    print('Not compatible')
    return
  else: 
    score = sum([1 for i, j in zip(password, test_word) if i == j])
    return score * 100 / len(password)

In [87]:
# test_pass = 'banana' 
# test_word = 'lawinl'

In [88]:
fitness(test_pass, test_word)

33.333333333333336

# Creating Individuals
So now we know how to evaluate our individuals; but how do we define them? This part is really tricky: the goal is to know what are the unalterable characteristics and what is variable.

The comparison with genetics is here really helpful. Indeed, the DNA is composed of genes, and each of those genes comes through different alleles (different versions of this gene). Genetic algorithms retain this concept of population’s DNA.

In our case, our individuals are going to be words (obviously of equal length with the password). Each letter is a gene and the value of the letter is the allele. In the word “banana”: ‘b’ is the allele of the first letter.

What is the point of this creation?

We know that each of our individuals is keeping the good shape (a word with the correct size)
Our population can cover every possibility (every word possible with this size).
Out genetic algorithm can then explore all possible combinations.

# Creating our first population
Now, we know what are the characteristics of our individuals and how we can evaluate their performance. We can now start the “evolution” step of our genetic algorithm.

The main idea to keep in mind when we create the first population is that we must not point the population towards a solution that seems good. We must make the population as wide as possible and make it cover as many possibilities as possible. The perfect first population of a genetic algorithm should cover every existing allele.

So in our case, we are just going to create words only composed of random letters.

In [89]:
import random 

def generateAWord(length):
  result = ""
  for i in range(length):
    letter = chr(97 + int(26 * random.random()))
    result += letter
  return result  

def generateFirstPopulation(size, password):
  population = []
  for i in range(size):
    population.append(generateAWord(len(password)))
  return population

In [90]:
generateAWord(10)

'qdgfjqnlft'

In [91]:
generateFirstPopulation(10, 'banana')

['hqcleq',
 'xmwwby',
 'hqgcux',
 'qrahtm',
 'weiuhs',
 'drsadm',
 'lxeyao',
 'ffesxm',
 'ldiyuc',
 'wsetfy']

# From one generation to the next
Given a generation, in order to create the next one, we have 2 things to do. 
1. First we select a specific part of our current generation. 
2. Then the genetic algorithm combines those breeders in order to create the next batch.

## Breeders selection
They are lots of way to do this but you must keep in mind two ideas: the goals are to select the best solutions of the previous generation and not to completely put aside the others. The hazard is: if you select only the good solutions at the beginning of the genetic algorithm you are going to converge really quickly towards a local minimum and not towards the best solution possible.

My solution to do that is to select on the one hand the _N_ better specimen (in our code, _N_ = best_sample) and on the other hand to select _M_ random individuals without distinction of fitness (_M_ = lucky_few).

In [92]:
import operator 

def computeRankedPopulation(population, password):
  populationRanked = {}
  for individual in population:
    populationRanked[individual] = fitness(password, individual)
  return sorted(populationRanked.items(), key=operator.itemgetter(1), reverse=True)

def selectFromPopulation(population, best_sample, lucky_few):
  nextGeneration = []
  for i in range(best_sample):
    nextGeneration.append(population[i][0])
  for i in range(lucky_few):
    nextGeneration.append(random.choice(population)[0])
  random.shuffle(nextGeneration)
  return nextGeneration

## Breeding
We have two individuals: “Tom” and “Jerry”, their DNA is defined by their alleles (the value of each letter). Thus in order to mix their DNA, we just have to mix their letters. There are lots of ways to do this so we will use the simplest solution: for each letter of the child, take randomly the letter of “Tom” or “Jerry”.

In [93]:
def createChild(individual1, individual2):
  child = ""
  for i in range(len(individual1)):
    child += individual1[i] if (int(100 * random.random()) < 50) else individual2[i]
  return child

def createChildren(breeders, number_of_child):
  nextPopulation = []
  for i in range(len(breeders) // 2):
    for j in range(number_of_child):
      nextPopulation.append(createChild(breeders[i], breeders[len(breeders) - 1 - i]))
  return nextPopulation

In [94]:
pop = generateFirstPopulation(50, 'banana')

rankedPop = computeRankedPopulation(pop, 'banana')

breeders = selectFromPopulation(rankedPop, 15, 15)

createdChildren = createChildren(breeders, 50)
createdChildren

['sajhno',
 'bamtno',
 'bimhnk',
 'bajtnk',
 'sajhno',
 'samhjo',
 'bajtjk',
 'sijhjo',
 'bamhno',
 'simhjk',
 'sajhnk',
 'bajhjo',
 'bijhnk',
 'bamhnk',
 'bajhjo',
 'sijtnk',
 'bimtjk',
 'bimtjk',
 'bajtjo',
 'bamhjo',
 'bimtjo',
 'bajtnk',
 'simtno',
 'samhjk',
 'sijhnk',
 'sajhno',
 'sajhnk',
 'bimtjo',
 'sijhjk',
 'sajtjo',
 'simhno',
 'sijhnk',
 'bijtnk',
 'bamtjk',
 'sijhno',
 'bijtjo',
 'bijtnk',
 'bimtjo',
 'bajtjk',
 'simhno',
 'sijhnk',
 'sijtno',
 'bamtjo',
 'bimtnk',
 'sijtjk',
 'sijtnk',
 'bijtjk',
 'bimhno',
 'sajtnk',
 'bimtnk',
 'rpjmwt',
 'gbjuwv',
 'rpjuwv',
 'gpjmwv',
 'gbjmuv',
 'rbeuuv',
 'rbeuut',
 'rpeuut',
 'gpjmwt',
 'rbjuwt',
 'gbemwv',
 'rbeuuv',
 'gbjmwt',
 'gbjuwt',
 'rpemwv',
 'gbjmwv',
 'gpemuv',
 'gpjmwt',
 'rbeuwv',
 'rpeuwv',
 'gbjmwt',
 'gbjuuv',
 'gpjuut',
 'rbemwt',
 'rbjmuv',
 'rpemut',
 'rbemuv',
 'rpeuut',
 'gbjmut',
 'gbemuv',
 'rpemuv',
 'gbemut',
 'gbjuwt',
 'rbjmut',
 'rbeuwt',
 'rpjuwv',
 'rbeuut',
 'rbjuut',
 'rpjuuv',
 'rbeuuv',
 'rbemut',

# Mutation
This last step of our genetic algorithm is the natural mutation of an individual. After the breeding, each individual must have a small probability to see their DNA change a little bit. The goal of this operation is to prevent the algorithm to be blocked in a local minimum. More information can be found on mutations <a href="https://blog.sicara.com/optimization-mutation-genetic-algorithm-40247f8ccb8">here<a/>.

In [95]:
def mutateWord(word):
  index_modification = int(random.random() * len(word))
  if (index_modification == 0):
    word = chr(97 + int(26 * random.random())) + word[1:]
  else:
    word = word[:index_modification] + chr(97 + int(26 * random.random())) + word[index_modification + 1:]

def mutatePopulation(population, chance_of_mutation):
  for i in range(len(population)):
    if random.random() * 100 < chance_of_mutation:
      population[i] = mutateWord(population[i])
  return population

# Utilities

In [96]:
def nextGeneration (firstGeneration, password, best_sample, lucky_few, number_of_child, chance_of_mutation):
  populationSorted = computeRankedPopulation(firstGeneration, password)
  nextBreeders = selectFromPopulation(populationSorted, best_sample, lucky_few)
  nextPopulation = createChildren(nextBreeders, number_of_child)
  nextGeneration = mutatePopulation(nextPopulation, chance_of_mutation)
  return nextGeneration

def multipleGeneration(number_of_generation, password, size_population, best_sample, lucky_few, number_of_child, chance_of_mutation):
  historic = []
  historic.append(generateFirstPopulation(size_population, password))
  for i in range (number_of_generation):
      historic.append(nextGeneration(historic[i], password, best_sample, lucky_few, number_of_child, chance_of_mutation))
  return historic

#print result:
def printSimpleResult(historic, password, number_of_generation): #bestSolution in historic. Caution not the last
  result = getListBestIndividualFromHistorique(historic, password)[number_of_generation-1]
  print ("solution: \"" + result[0] + "\" de fitness: " + str(result[1]))

#analysis tools
def getBestIndividualFromPopulation (population, password):
  return computeRankedPopulation(population, password)[0]

def getListBestIndividualFromHistorique (historic, password):
  bestIndividuals = []
  for population in historic:
      bestIndividuals.append(getBestIndividualFromPopulation(population, password))
  return bestIndividuals

#graph
def evolutionBestFitness(historic, password):
  plt.axis([0,len(historic),0,105])
  plt.title(password)

  evolutionFitness = []
  for population in historic:
      evolutionFitness.append(getBestIndividualFromPopulation(population, password)[1])
  plt.plot(evolutionFitness)
  plt.ylabel('fitness best individual')
  plt.xlabel('generation')
  plt.show()

def evolutionAverageFitness(historic, password, size_population):
  plt.axis([0,len(historic),0,105])
  plt.title(password)

  evolutionFitness = []
  for population in historic:
      populationPerf = computeRankedPopulation(population, password)
      averageFitness = 0
      for individual in populationPerf:
          averageFitness += individual[1]
      evolutionFitness.append(averageFitness/size_population)
  plt.plot(evolutionFitness)
  plt.ylabel('Average fitness')
  plt.xlabel('generation')
  plt.show()

# Running the Genetic Algorithm

In [97]:
import time

#variables
password = "banana"
size_population = 100
best_sample = 20
lucky_few = 20
number_of_child = 5
number_of_generation = 50
chance_of_mutation = 5

#program
if ((best_sample + lucky_few) / 2 * number_of_child != size_population):
  print ("population size not stable")
else:
  historic = multipleGeneration(number_of_generation, password, size_population, best_sample, lucky_few, number_of_child, chance_of_mutation)

  printSimpleResult(historic, password, number_of_generation)

  evolutionBestFitness(historic, password)
  evolutionAverageFitness(historic, password, size_population)

print(time.time() - temps1)

TypeError: object of type 'NoneType' has no len()