# Genetic Algorithms

Genetic algorithms are a type of algorithm based on evolution theory to solve a range of problems. Most commonly, the problem being solved is an optimization (of a fitness function) problem. They are particularly useful in problems with really large solution spaces, where calculating all the possibilities is computationally infeasable. Additionally, they can be used in conjunction with other machine learning methods (such as neural networks) to produce good or optimal solutions.

The approach of genetic algorithms is heavilly inspired by the random mutation of genes that eventually result in the formation of complex organisms. As such, they are non deterministic and require randomness throughout the process. Generally, algorithms will create random initial species, but a biased initialization may be made if suitable. After this initial population is evaluated with the fitness function, the best are selected and their children formed. Naturally. this is achieved by way of combining the best of the population and a certain randomness using some operations called **genetic operators**, which are:

* **selection**: Good specimen are chosen to proliferate in later generations, however, a portion of non "good" specimen may be included to promote diversity and create new children that arent all derived from the best
* **crossover**: Good specimen are combined to create children, there are many possible strategies, one such strategy may be selecting the genes which are sufficiently similar
* **mutation**: Good specimen (either from the previous generation or as a result of crossing) are mutated randomly

It's important to know that not all these operations have to be performed, (nontheless a selection will always be made) only mutations or corssovers may be performed if suitable, or a combination of both.

A simple pseudocode for a genetic algorithm is:

```
while generations < max_generations or stopping condition is reached:
  evaluate_all_specimen
  select_best_specimen
  specimen = generate_children (by crossover and/or mutation)
```

Two things are notabe regarding genetic algorithms:

* Representation of the genes of a specimen may vary depending on the problem
* The most important part is the evaluation function, as such, picking one is not an easy job and the goodness of the resulting solution will depend almost exclusively on this

To exemplify a genetic algorithm, lets create a genetic algorithm that will try to guess a string based on an inital one (i.e. transform a string "vg 48ht5. 45tcxeg wawef" into "Hello, world!".

The first thing we will need is a function to generate the initial population, in this case the initial population will be random strings the same length of "Hello, world!". We will be placing all functionallity of the algorithm inside a class with a somewhat generic implementation, so that it may be reused for other problems later.

[1,0,1,1,0,1]

In [13]:
import random

class HelloGenetic:
  def __init__(self, params):
    self.ALL_CHARACTERS = list("qwertyuiopasdfghjklñzxcvbnmQWERTYUIOPASDFGHJKLÑZXCVBNM. ,!¡")
    self.HELLO_WORLD = list("Hello, world!")
    self.params = params
    self.specimen = [None] * self.params["generation_size"]

    self.create_initial_population()

  def create_initial_population(self):
    self.specimen = list(
      map(lambda _: random.sample(self.ALL_CHARACTERS, len(self.HELLO_WORLD)),
      self.specimen)
    )

hello = HelloGenetic({"generation_size":5})
print(hello.specimen)

def fitness(self, specimen):
  return sum(1 for expected, actual in zip(self.HELLO_WORLD, specimen) if expected == actual) / len(self.HELLO_WORLD)

print(fitness(hello, ['z', 'e', 'F', 'e', 'h', 'Y', 'I', 'u', 'ñ', 'l', '!', 'Q', ',']))

[['U', 'h', 'B', 'g', 'P', 'J', 'F', ' ', 'O', 'b', 'S', 'e', 'Z'], ['x', 'n', 'F', 'Y', 'Ñ', 'u', 'y', 'L', 'K', 'j', 'P', '¡', 'r'], ['y', 'K', 'd', 'f', 'P', 'Z', 'o', 'i', 'E', 'F', 'g', 'w', 'q'], ['l', 's', 'm', 'z', 'V', 'h', 'b', ',', 'k', 'O', 'y', ' ', 'P'], ['d', 'g', 'o', 'u', 'F', 'D', ' ', 't', 'J', 'T', 'j', 'L', 'G']]
0.07692307692307693


Now that we have this, we can declare the general genetic algorithm in the **run** function. We will add some empty functions for now.

In [0]:
import random

class HelloGenetic:
  def __init__(self, params):
    self.ALL_CHARACTERS = list("qwertyuiopasdfghjklñzxcvbnmQWERTYUIOPASDFGHJKLÑZXCVBNM. ,!¡")
    self.HELLO_WORLD = list("Hello, world!")
    self.params = params
    self.specimen = [None] * self.params["generation_size"]

    self.create_initial_population()

  def create_initial_population(self):
    self.specimen = list(
      map(
        lambda _: random.sample(self.ALL_CHARACTERS, len(self.HELLO_WORLD)),
        self.specimen
      )
    )

  def is_converged(self):
    pass
    
  def get_fit(self):
    pass
  
  def fitness_all(self):
    pass
  
  def select_specimen(self):
    pass
  
  def generate_children(self):
    pass
  
  def run(self):
    generation_number = 1

    while generation_number <= self.params["max_generations"] and not self.is_converged():
      top_generation = self.get_fit()
      
      print(f"Generation #{generation_number}:\t{top_generation[0]}\t{top_generation[1]}")

      specimen_evaluations = self.fitness_all()
      selected_specimen = self.select_specimen(specimen_evaluations)
      
      self.specimen = self.generate_children(selected_specimen)
      
      generation_number += 1
    
    return self.get_fit()

Now, we need to implement the **converged** function, which will in turn need to have the **fitness** functions implemented. The **converged** function will check if any of the specimen have reached an acceptable threshold for the fitness function. The **fitness** function will perform a naive string similarity measure based on the normalized number of matching characters in the target string (there are better string similarity measures).

In [0]:
def fitness(self, specimen):
  return sum(1 for expected, actual in zip(self.HELLO_WORLD, specimen) if expected == actual) / len(self.HELLO_WORLD)

def is_converged(self):
  if any(self.fitness(specimen) >= self.params["fit_threshold"] for specimen in self.specimen):
    return True

  return False

Now that that has been defined, it's time to define the function **fitness_all** that will generate all specimen evaluations.

In [0]:
def fitness_all(self):
  return list(map(self.fitness, self.specimen))

We will implement the **select_specimen** function to select the top x% of specimen as given by the input parameter.

In [0]:
import math

def select_specimen(self, specimen_evaluations):
  specimen_and_evaluations = list(zip(self.specimen, specimen_evaluations))-> [(['udvud'], 0.1), (['udvud'], 0.1)]

  specimen_and_evaluations.sort(key=lambda e: e[1], reverse = True)

  n_top = int(math.ceil(len(self.specimen) * self.params["select_top"]))

  return list(map(lambda s: s[0], specimen_and_evaluations[:n_top]))

Now the only missing functions are **generate children** and **get_fit**, the first will take the selected specimen and complete a generation by mutating the specimen given a parameter describing what % of digits should be mutated.

In [0]:
import math
import random

def mutate(self, specimen):
  n_digits = int(self.params["mutation_percentage"] * (len(specimen) - 1))

  digit_indexes = random.sample(list(range(len(specimen))), n_digits)

  mutated = specimen[:]
  
  for idx in digit_indexes:
    mutated[idx] = random.choice(self.ALL_CHARACTERS)

  return mutated

def generate_children(self, selected_specimen):  
  mutated_specimen = [None] * len(self.specimen)
  
  for i in range(len(mutated_specimen)):
    mutated_specimen[i] = self.mutate(random.choice(selected_specimen))
  
  return mutated_specimen

And now, the final function **get_fit** will just get the specimen with the highest fitness score.

In [0]:
def get_fit(self):
    evaluations = self.fitness_all()

    max_evaluation = max(evaluations)

    max_index = evaluations.index(max_evaluation)

    return self.specimen[max_index], max_evaluation

Putting it all together:

In [16]:
import random
import math

class HelloGenetic:
  def __init__(self, params):
    self.ALL_CHARACTERS = list("qwertyuiopasdfghjklñzxcvbnmQWERTYUIOPASDFGHJKLÑZXCVBNM. ,!¡")
    self.HELLO_WORLD = list("Hello, world!")
    self.params = params
    self.specimen = [None] * self.params["generation_size"]

    self.create_initial_population()

  def create_initial_population(self):
    self.specimen = list(map(lambda _: random.sample(self.ALL_CHARACTERS, len(self.HELLO_WORLD)), self.specimen))

  def fitness(self, specimen):
    return sum(1 for expected, actual in zip(self.HELLO_WORLD, specimen) if expected == actual) / len(self.HELLO_WORLD)

  def is_converged(self):
    if any(self.fitness(specimen) >= self.params["fit_threshold"] for specimen in self.specimen):
      return True

    return False

  def get_fit(self):
    evaluations = self.fitness_all()

    max_evaluation = max(evaluations)

    max_index = evaluations.index(max_evaluation)

    return self.specimen[max_index], max_evaluation

  def fitness_all(self):
    return list(map(self.fitness, self.specimen))

  def select_specimen(self, specimen_evaluations):
    specimen_and_evaluations = list(zip(self.specimen, specimen_evaluations))

    specimen_and_evaluations.sort(key=lambda e: e[1], reverse = True)

    n_top = int(math.ceil(len(self.specimen) * params["select_top"]))

    return list(map(lambda s: s[0], specimen_and_evaluations[:n_top]))
  
  def mutate(self, specimen):
    n_digits = int(params["mutation_percentage"] * (len(specimen) - 1))

    digit_indexes = random.sample(list(range(len(specimen))), n_digits)

    mutated = specimen[:]

    for idx in digit_indexes:
      mutated[idx] = random.choice(self.ALL_CHARACTERS)

    return mutated

  def generate_children(self, selected_specimen):  
    mutated_specimen = [None] * len(self.specimen)

    for i in range(len(mutated_specimen)):
      mutated_specimen[i] = self.mutate(random.choice(selected_specimen))

    return mutated_specimen

  def run(self):
    generation_number = 1

    while generation_number <= self.params["max_generations"] and not self.is_converged():
      top_generation = self.get_fit()
      top_str = "".join(top_generation[0])
      
      print(f"Generation #{generation_number}:\t{top_str}\t{top_generation[1]}")

      specimen_evaluations = self.fitness_all()
      selected_specimen = self.select_specimen(specimen_evaluations)
      
      self.specimen = self.generate_children(selected_specimen)
      
      generation_number += 1
    
    return self.get_fit()

Now we try it out!

In [17]:
params = {
    "mutation_percentage": 0.1,
    "select_top": 0.05,
    "generation_size": 20,
    "fit_threshold": 1,
    "max_generations": 1000
}

hello = HelloGenetic(params)
fit  = hello.run()

print("".join(fit[0]), fit[1])

Generation #1:	fDi¡LmsGhgadb	0.07692307692307693
Generation #2:	HDi¡LmsGhgadb	0.15384615384615385
Generation #3:	HDi¡LmqGhgadb	0.15384615384615385
Generation #4:	HDi¡LmqGhXadb	0.15384615384615385
Generation #5:	HDi¡ImqGhXadb	0.15384615384615385
Generation #6:	HDi¡ImqGhXadb	0.15384615384615385
Generation #7:	HDi¡ImqGhzadb	0.15384615384615385
Generation #8:	HDi¡vmqGhzadb	0.15384615384615385
Generation #9:	HDi¡vmqGhzadb	0.15384615384615385
Generation #10:	HDi¡vmqGhzudb	0.15384615384615385
Generation #11:	HDi¡vmqGhzudb	0.15384615384615385
Generation #12:	HDi¡vmKGhzudb	0.15384615384615385
Generation #13:	HDi¡vmKYhzudb	0.15384615384615385
Generation #14:	HDi¡vmKYhzddb	0.15384615384615385
Generation #15:	HDi¡vmKYhzddb	0.15384615384615385
Generation #16:	HDi¡vkKYhzddb	0.15384615384615385
Generation #17:	HDi¡vkKYhzkdb	0.15384615384615385
Generation #18:	HDi¡vk Yhzkdb	0.23076923076923078
Generation #19:	HDiOvk Yhzkdb	0.23076923076923078
Generation #20:	HDiOvk Yhgkdb	0.23076923076923078
Generatio