## Local Search - Genetic Algorithm

In [1]:
from pprint import pprint
from typing import List, Any, Dict, Callable
import random

## Genetic Algorithm

There are some key ideas in the Genetic Algorithm.

First, there is a problem of some kind that either *is* an optimization problem or the solution can be expressed in terms of an optimization problem.
For example, if we wanted to minimize the function

$$f(x) = \sum (x_i - 0.5)^2$$

where $n = 10$.
This *is* an optimization problem. Normally, optimization problems are much, much harder.

The function we wish to optimize is often called the **objective function**.
The objective function is closely related to the **fitness** function in the GA.
If we have a **maximization** problem, then we can use the objective function directly as a fitness function.
If we have a **minimization** problem, then we need to convert the objective function into a suitable fitness function, since fitness functions must always mean "more is better".

Second, we need to *encode* candidate solutions using an "alphabet" analogous to G, A, T, C in DNA.
This encoding can be quite abstract.

In Genetics terminology, this encoding is the **chromosome** of the individual. And if this individual had the **phenotype** "h" for the first character then they would have the **genotype** for "h" (either as "h", 104, or 01101000).

To keep it straight, think **geno**type is **genes** and **pheno**type is **phenomenon**, the actual thing that the genes express.
So while we might encode a number as 10110110 (genotype), the number itself, 182, is what goes into the fitness function.
The environment operates on zebras, not the genes for stripes.

-Butcher, 2022

## Problem Overview

### String Matching

Write a Genetic Algorithm that will solve the problem of matching a target string (at least at the start).
Now, this is kind of silly because in order for this to work, you need to know the target string and if you know the target string, why are you trying to do it?
Well, the problem is *pedagogical*.
It's a fun way of visualizing the GA at work, because as the GA finds better and better candidates, they make more and more sense.

Now, string matching is not *directly* an optimization problem so this falls under the general category of "if we convert the problem into an optimization problem we can solve it with an optimization algorithm" approach to problem solving.
This happens all the time.
We have a problem.
We can't solve it.
We convert it to a problem we *can* solve.
In this case, we're using the GA to solve the optimization part.

The **genotype** for this problem is a list of "characters":

```
["h", "e", "l", "l", "o"]
```

and the **phenotype** is the resulting string:

```
"hello"
```

Note: preceding Genetic Algorithm overview and assignment prompt by S. Butcher, 2022.

In [2]:
ALPHABET = "abcdefghijklmnopqrstuvwxyz "

<a id="encode"></a>
## encode

*The encode function takes a phenotype string and returns the genotype representation as a list of characters.* **Used by**: [generate_population](#generate_population)

* **phenotype** str: the actual value of the individual in the population

**returns** List[str].

In [3]:
def encode(phenotype: str) -> List[str]:
    return list(phenotype)   

In [4]:
# unit tests
assert encode("hello") == ["h", "e", "l", "l", "o"] 
assert encode("world") == ["w", "o", "r", "l", "d"]
assert encode("") == []

<a id="decode"></a>
## decode

*The decode function takes a genotype of the form List[str] and returns the phenotype representation as a string.* **Used by**: [reproduce](#reproduce)

* **genotype** List[str]: the encoded value of the individual in the population

**returns** str

In [5]:
def decode(genotype: List[str]) -> str:
    return ''.join(genotype)

In [6]:
# unit tests
assert decode(["h", "e", "l", "l", "o"]) == "hello"
assert decode(["w", "o", "r", "l", "d"]) == "world"
assert decode([]) == ""

<a id="generate_population"></a>
## generate_population

*The generate_population function takes a callable function that generates random individuals using the available characters provided in the genetic algorithm parameters and the length of the target string. This function also takes a population size and generates the population using the random generator function provided.* **Used by**: [genetic_algorithm](#genetic_algorithm)

* **generate_individual** Callable: a function that generates random individuals the size of the target string using the available characters in the genetic algorithm parameters
* **population_size** int: the size of the population to generate

**returns** List[Dict[str, int | float | str]].

In [7]:
def generate_population(generate_individual: Callable, population_size: int) -> List[Dict[str, Any]]:
    generation = []
    for i in range(population_size):
        phenotype = generate_individual()
        genotype = encode(phenotype)
        individual = {"genotype": genotype, "phenotype": phenotype}               
        generation.append(individual)
    return generation

In [8]:
# unit tests
def test_generate_individual():
    # always return the same thing to test the generate population function
    return "this is a test"

test_1_results = [
        {"genotype": ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't'],
        "phenotype": "this is a test"}
    ]

assert generate_population(test_generate_individual, 1) == test_1_results

test_2_results = [
        {"genotype": ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't'],
        "phenotype": "this is a test"},
        {"genotype": ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't'],
        "phenotype": "this is a test"}
    ]

assert generate_population(test_generate_individual, 2) == test_2_results

test_3_results = [
        {"genotype": ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't'],
        "phenotype": "this is a test"},
        {"genotype": ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't'],
        "phenotype": "this is a test"},
        {"genotype": ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't'],
        "phenotype": "this is a test"}
    ]

assert generate_population(test_generate_individual, 3) == test_3_results

<a id="evaluate_population"></a>
## evaluate_population

*The evaluate_population function takes a callable fitness_function to generate the fitness score for each individual in the provided population. The fitness_function varies for each problem. The evaluate_population function also adds the generation to each individual in the population to track when the genetic algorithm converges.* **Used by**: [genetic_algorithm](#genetic_algorithm)

* **population** List[Dict[str, int | float | str]]: the current generation of individuals. An individual is represented with the following metadata.

    {"phenotype": str, "genotype": List[str], "generation": int, "fitness": float}
* **fitness_function** Callable: a function that generates the fitness score for each individual in the population.
* **target** string: the target solution used by the fitness function.
* **generation** int: the current generation under evaluation.

**returns** List[Dict[str, int | float | str]].

In [9]:
def evaluate_population(fitness_function: Callable, population: List[Dict[str, Any]], generation: int, target: str) -> List[Dict[str, Any]]:
    updated_population = []
    for individual in population:
        fitness = fitness_function(individual["phenotype"], target)
        individual["fitness"] = fitness
        individual["generation"] = generation
        updated_population.append(individual)
    return updated_population

In [10]:
# unit tests
def test_fitness_function(genotype: List[str], target: str) -> float:
    loss = 0
    for index, value in enumerate(target):
        if value != genotype[index]:
            loss += 1
    return 1/(1+loss)
    
population_1 = [
    {'genotype': ['b', 'e', 's', 't'], 'phenotype': 'best'}, 
    {'genotype': ['f', 'i', 's', 't'], 'phenotype': 'fist'},
    {'genotype': ['b', 'e', 'e', 'n'], 'phenotype': 'been'}
    ]

target = "test"

expected_result_1 = [
    {'genotype': ['b', 'e', 's', 't'], 'phenotype': 'best', 'fitness': 0.5, 'generation': 1}, 
    {'genotype': ['f', 'i', 's', 't'], 'phenotype': 'fist', 'fitness': 0.3333333333333333, 'generation': 1}, 
    {'genotype': ['b', 'e', 'e', 'n'], 'phenotype': 'been', 'fitness': 0.25, 'generation': 1}
]

assert evaluate_population(test_fitness_function, population_1, 1, target) == expected_result_1

population_2 = [
    {'genotype': ['t', 'e', 's', 't'], 'phenotype': 'test'}, 
    {'genotype': ['f', 'i', 's', 't'], 'phenotype': 'fist'},
    {'genotype': ['b', 'e', 'e', 'n'], 'phenotype': 'been'}
    ]

expected_result_2 = [
    {'genotype': ['t', 'e', 's', 't'], 'phenotype': 'test', 'fitness': 1.0, 'generation': 1}, 
    {'genotype': ['f', 'i', 's', 't'], 'phenotype': 'fist', 'fitness': 0.3333333333333333, 'generation': 1}, 
    {'genotype': ['b', 'e', 'e', 'n'], 'phenotype': 'been', 'fitness': 0.25, 'generation': 1}
]

assert evaluate_population(test_fitness_function, population_2, 1, target) == expected_result_2

population_3 = [
    {'genotype': ['h', 'e', 'l', 'p'], 'phenotype': 'help'}, 
    {'genotype': ['t', 'e', 'r', 'm'], 'phenotype': 'term'},
    {'genotype': ['x', 'x', 'x', 'x'], 'phenotype': 'xxxx'}
    ]

expected_result_3 = [
    {'genotype': ['h', 'e', 'l', 'p'], 'phenotype': 'help', 'fitness': 0.25, 'generation': 2}, 
    {'genotype': ['t', 'e', 'r', 'm'], 'phenotype': 'term', 'fitness': 0.3333333333333333, 'generation': 2}, 
    {'genotype': ['x', 'x', 'x', 'x'], 'phenotype': 'xxxx', 'fitness': 0.2, 'generation': 2}
    ]

assert evaluate_population(test_fitness_function, population_3, 2, target) == expected_result_3

<a id="get_best"></a>
## get_best
*The get_best function takes a population or subpopulation and returns the individual with the highest fitness score.* **Used by**: [genetic_algorithm](#genetic_algorithm), [tournament_selection](#tournament_selection)

* **population** List[Dict[str, int | float | str]]: the current generation of individuals. An individual is represented with the following metadata.

    {"phenotype": str, "genotype": List[str], "generation": int, "fitness": float}

**returns** Dict[str, int | float | str].

In [11]:
def get_best(population: List[Dict[str, Any]]) -> Dict[str, Any]:
    return max(population , key=lambda score: score["fitness"])

In [12]:
# unit tests
test_population = [
    {'genotype': ['t', 'e', 's', 't', '1'], 'phenotype': 'test1', 'fitness': 1.0, 'generation': 1}, 
    {'genotype': ['t', 'e', 's', 't', '2'], 'phenotype': 'test2', 'fitness': 0.3333333333333333, 'generation': 1}, 
    {'genotype': ['t', 'e', 's', 't', '3'], 'phenotype': 'test3', 'fitness': 0.25, 'generation': 1}
]

assert get_best(test_population) == {'genotype': ['t', 'e', 's', 't', '1'], 'phenotype': 'test1', 'fitness': 1.0, 'generation': 1}

test_population[0]["fitness"] = 0.0

assert get_best(test_population) == {'genotype': ['t', 'e', 's', 't', '2'], 'phenotype': 'test2', 'fitness': 0.3333333333333333, 'generation': 1}

test_population[1]["fitness"] = 0.0

assert get_best(test_population) == {'genotype': ['t', 'e', 's', 't', '3'], 'phenotype': 'test3', 'fitness': 0.25, 'generation': 1}

<a id="reproduce"></a>
## reproduce
*The reproduce function takes two parents and returns a new population with two children. The function performs crossover and mutation on the children with a probability equal to the crossover and mutation rates provided as genetic algorithm parameters. The function randomly generates values to compare against the crossover and mutation rates to determine if crossover and mutation will occur.* **Used by**: [get_next_population](#get_next_population)

* **parent_1** Dict[str, int | float | str]: the first parent.
* **parent_2** Dict[str, int | float | str]: the second parent.
* **parameters** Dict[str, Any]: the parameters provide for this run of the genetic algorithm.
* **rand_c** float: randomly generated float value to determine if crossover occurs.
* **rand_m** float: randomly generated float value to determine if mutation occurs.
* **rand_char** Callable: a function to randomly generate the mutation value.
* **rand_idx** Callable: a function to randomly generate the mutation and crossover indices.

**returns** List[Dict[str, int | float | str]].

In [13]:
def reproduce(parent_1: Dict[str, Any], parent_2: Dict[str, Any], parameters: Dict[str, Any], rand_c: float, rand_m: float, rand_char: Callable, rand_idx: Callable) -> List[Dict[str, Any]]:
    son, daughter = {}, {}
    son["genotype"] = parent_1["genotype"]
    daughter["genotype"] = parent_2["genotype"]

    if rand_c < parameters["crossover_rate"]:
        crossover_index = rand_idx()
        parent_1_head, parent_1_tail = parent_1["genotype"][:crossover_index], parent_1["genotype"][crossover_index:]
        parent_2_head, parent_2_tail = parent_2["genotype"][:crossover_index], parent_2["genotype"][crossover_index:]
        son["genotype"] = parent_1_head + parent_2_tail
        daughter["genotype"] = parent_2_head + parent_1_tail
    if rand_m < parameters["mutation_rate"]:
        mutation_index = rand_idx()
        mutation_value = rand_char()
        son["genotype"][mutation_index] = mutation_value
        daughter["genotype"][mutation_index] = mutation_value

    son["phenotype"] = decode(son["genotype"])
    daughter["phenotype"] = decode(daughter["genotype"])
    return [son, daughter]  

In [14]:
# unit tests
test_parameters = {
    "mutation_rate": 0.05,
    "crossover_rate": 0.95
    }
parent_1 = {"genotype": ['h', 'e', 'l', 'l', 'o'], "phenotype": "hello"}
parent_2 = {"genotype": ['w', 'o', 'r', 'l', 'd'], "phenotype": "world"}

test_rand_char = lambda : "x"
test_rand_idx = lambda : 2

# should crossover, not mutate
test_child_1, test_child_2 = reproduce(parent_1, parent_2, test_parameters, 0.90, 0.1, test_rand_char, test_rand_idx)
assert test_child_1["phenotype"] == "herld"
assert test_child_2["phenotype"] == "wollo"

# should mutate, not crossover
test_child_1, test_child_2 = reproduce(parent_1, parent_2, test_parameters, 0.96, 0.04, test_rand_char, test_rand_idx)
assert test_child_1["phenotype"] == "hexlo"
assert test_child_2["phenotype"] == "woxld"

# should crossover and mutate
test_child_1, test_child_2 = reproduce(parent_1, parent_2, test_parameters, 0.94, 0.04, test_rand_char, test_rand_idx)
assert test_child_1["phenotype"] == "hexld"
assert test_child_2["phenotype"] == "woxlo"

<a id="get_tournaments"></a>
## get_tournaments
*The get_tournaments function takes the population and tournament size and creates tournaments by selecting random individuals from the population. The get_tournaments function returns two tournaments to use for selecting the parents in the tournament_selection function.* **Used by**: [tournament_selection](#tournament_selection)


* **population** List[Dict[str, int | float | str]]: the current generation of individuals. An individual is represented with the following metadata.

    {"phenotype": str, "genotype": List[str], "generation": int, "fitness": float}
* **tournament_size** int: the tournament size provided in the genetic algorithm parameters for this run.

**returns** List[List[Dict[str, int | float | str]]].

In [15]:
def get_tournaments(population: List[dict[str, Any]], tournament_size: int) -> List[List[Dict[str, Any]]]:
    tournament_1 = random.choices(population, k=tournament_size)
    tournament_2 = random.choices(population, k=tournament_size)
    return [tournament_1, tournament_2]

In [16]:
# unit tests
test_population = [
    {'genotype': ['t', 'e', 's', 't', '1'], 'phenotype': 'test1', 'fitness': 1.0, 'generation': 1}, 
    {'genotype': ['t', 'e', 's', 't', '2'], 'phenotype': 'test2', 'fitness': 0.3333333333333333, 'generation': 1}, 
    {'genotype': ['t', 'e', 's', 't', '3'], 'phenotype': 'test3', 'fitness': 0.25, 'generation': 1}
]

result1 = get_tournaments(test_population, 1)
# assert two tournaments returned of size 1 each
assert len(result1) == 2
assert len(result1[0]) == 1
assert len(result1[1]) == 1
# assert that random values are in the population
assert result1[0][0] in test_population

result2 = get_tournaments(test_population, 2)
# assert two tournaments returned of size 1 each
assert len(result2) == 2
assert len(result2[0]) == 2
assert len(result2[1]) == 2
# assert that random values are in the population
assert result2[0][0] in test_population
assert result2[0][1] in test_population

result3 = get_tournaments(test_population, 3)
# assert two tournaments returned of size 1 each
assert len(result3) == 2
assert len(result3[0]) == 3
assert len(result3[1]) == 3
# assert that random values are in the population
assert result3[0][0] in test_population
assert result3[0][1] in test_population
assert result3[0][2] in test_population

<a id="tournament_selection"></a>
## tournament_selection
*The tournament_selection function takes a callable function that returns two tournaments. Next, the function selects the first parent as the best value from tournament one and the second parent as the best value from tournament two. The function returns a list containing the two parents selected.* **Used by**: [get_next_population](#get_next_population)

* **get_tournaments** Callable: a function that returns two randomly selected tournaments from the population of size tournament_size.
* **population** List[Dict[str, int | float | str]]: the current generation of individuals. An individual is represented with the following metadata.

    {"phenotype": str, "genotype": List[str], "generation": int, "fitness": float}
* **tournament_size** int: the tournament size provided in the genetic algorithm parameters for this run.

**returns** List[Dict[str, int | float | str]].

In [17]:
def tournament_selection(get_tournaments: Callable, population: List[dict[str, Any]], tournament_size: int) -> List[Dict[str, Any]]:
    tournament_1, tournament_2 = get_tournaments(population, tournament_size)
    parent_1 = get_best(tournament_1)
    parent_2 = get_best(tournament_2)
    return [parent_1, parent_2]

In [18]:
# unit tests
def test_get_tournaments(tournaments: List[List[Dict[str, Any]]], tournament_size: int) -> List[List[Dict[str, Any]]]:
    return tournaments

tournament1 = [
    {'genotype': ['t', 'e', 's', 't', '1'], 'phenotype': 'test1', 'fitness': 1.0, 'generation': 1}, 
    {'genotype': ['t', 'e', 's', 't', '3'], 'phenotype': 'test3', 'fitness': 0.25, 'generation': 1}
    ]
tournament2 = [
    {'genotype': ['t', 'e', 's', 't', '3'], 'phenotype': 'test3', 'fitness': 0.25, 'generation': 1}, 
    {'genotype': ['t', 'e', 's', 't', '2'], 'phenotype': 'test2', 'fitness': 0.3333333333333333, 'generation': 1}
    ]

expected_response_1 = [
    {'genotype': ['t', 'e', 's', 't', '1'], 'phenotype': 'test1', 'fitness': 1.0, 'generation': 1},
    {'genotype': ['t', 'e', 's', 't', '2'], 'phenotype': 'test2', 'fitness': 0.3333333333333333, 'generation': 1}
    ]

assert tournament_selection(test_get_tournaments, [tournament1, tournament2], 2) == expected_response_1

tournament1[0]["fitness"] = 0.00

expected_response_2 = [
    {'genotype': ['t', 'e', 's', 't', '3'], 'phenotype': 'test3', 'fitness': 0.25, 'generation': 1},
    {'genotype': ['t', 'e', 's', 't', '2'], 'phenotype': 'test2', 'fitness': 0.3333333333333333, 'generation': 1}
    ]

assert tournament_selection(test_get_tournaments, [tournament1, tournament2], 2) == expected_response_2

tournament2[1]["fitness"] = 0.00

expected_response_3 = [
    {'genotype': ['t', 'e', 's', 't', '3'], 'phenotype': 'test3', 'fitness': 0.25, 'generation': 1},
    {'genotype': ['t', 'e', 's', 't', '3'], 'phenotype': 'test3', 'fitness': 0.25, 'generation': 1}
    ]

assert tournament_selection(test_get_tournaments, [tournament1, tournament2], 2) == expected_response_3

<a id="get_next_population"></a>
## get_next_population
*The get_next_population function takes the current population, the genetic algorithm parameters, and callable randomization functions used in the reproduce function. The function performs parent selection using the selection function in the genetic algorithm parameters. The function checks for a tournament_size parameter to determine if tournament selection will be used.* **Used by**: [genetic_algorithm](#genetic_algorithm)

* **population** List[Dict[str, int | float | str]]: the current generation of individuals. An individual is represented with the following metadata.
  
    {"phenotype": str, "genotype": List[str], "generation": int, "fitness": float}
* **parameters** Dict[str, Any]: the parameters used for this run of the genetic algorithm. 
* **rand_float** Callable: a function to produce random floats for the reproduce function
* **rand_char** Callable: a function to produce a random mutation char for the reproduce function 
* **rand_idx** Callable: a function to produce a random index for the reproduce function

**returns** List[Dict[str, int | float | str]].

In [19]:
def get_next_population(population: List[dict[str, Any]], parameters: Dict[str, Any], rand_float: Callable, rand_char: Callable, rand_idx: Callable) -> List[Dict[str, Any]]:
    next_population = []
    for n in range(0, parameters["population_size"]//2):
        # determine which type of selection to use based on the existence of
        # a tournmanet size parameter.
        if "tournament_size" in parameters:
            parent_1, parent_2 = parameters["selection_function"](get_tournaments, population, parameters["tournament_size"])
        else:
            parent_1, parent_2 = parameters["selection_function"](population)
    
        child_1, child_2 = reproduce(parent_1, parent_2, parameters, rand_float(), rand_float(), rand_char, rand_idx)
        next_population += [child_1, child_2]
    return next_population

In [20]:
# unit tests
test_population = [
    {"genotype": ['h', 'e', 'l', 'l', 'o'], "phenotype": "hello", "fitness": 1.0},
    {"genotype": ['w', 'o', 'r', 'l', 'd'], "phenotype": "world", "fitness": 0.5}
]
parent_1 = {"genotype": ['h', 'e', 'l', 'l', 'o'], "phenotype": "hello", "fitness": 1.0}
parent_2 = {"genotype": ['w', 'o', 'r', 'l', 'd'], "phenotype": "world", "fitness": 0.5}

def test_get_tournament() -> List[str]:
    return []
def test_selection_tournament(test_get_tournament: Callable, population: List[Dict[str, Any]], parameters: Dict[str, Any]) -> List[Dict[str, Any]]:
    return parent_1, parent_2
def test_selection_other(population: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    return parent_1, parent_2
def rand_float() -> float:
    return 0.96
def rand_char() -> str:
    return "l"
def rand_idx() -> int:
    return 0

test_parameters = {
    "population_size": 2,
    "selection_function": test_selection_tournament,
    "tournament_size": 2,
    "crossover_rate": 0.95,
    "mutation_rate": 0.05
}

expected_population_1 = [
    {'genotype': ['h', 'e', 'l', 'l', 'o'], 'phenotype': 'hello'}, 
    {'genotype': ['w', 'o', 'r', 'l', 'd'], 'phenotype': 'world'}
    ]

assert get_next_population(test_population, test_parameters, rand_float, rand_char, rand_idx) == expected_population_1

# test with different selection function
test_parameters["selection_function"] = test_selection_other
del test_parameters["tournament_size"]

assert get_next_population(test_population, test_parameters, rand_float, rand_char, rand_idx) == expected_population_1

# change population size 
test_parameters["population_size"] = 4

test_population_2 = [
    {"genotype": ['h', 'e', 'l', 'l', 'o'], "phenotype": "hello", "fitness": 1.0},
    {"genotype": ['w', 'o', 'r', 'l', 'd'], "phenotype": "world", "fitness": 0.5},
    {"genotype": ['w', 'l', 'l', 'l', 'd'], "phenotype": "world", "fitness": 0.25},
    {"genotype": ['w', 'w', 'r', 'l', 'd'], "phenotype": "world", "fitness": 0.2}
]

expected_population_2 = [
    {'genotype': ['h', 'e', 'l', 'l', 'o'], 'phenotype': 'hello'}, 
    {'genotype': ['w', 'o', 'r', 'l', 'd'], 'phenotype': 'world'},
    {'genotype': ['h', 'e', 'l', 'l', 'o'], 'phenotype': 'hello'}, 
    {'genotype': ['w', 'o', 'r', 'l', 'd'], 'phenotype': 'world'}
    ]

assert get_next_population(test_population_2, test_parameters, rand_float, rand_char, rand_idx) == expected_population_2

# test with invalid population size
test_parameters["population_size"] = 0

assert get_next_population(test_population_2, test_parameters, rand_float, rand_char, rand_idx) == []

<a id="maybe_print_best"></a>
## maybe_print_best
*The maybe_print_best function takes the current generation and the best individual from the current generation to determine if the algorithm should print this generation. The algorithm prints the best individual of the generation every 10 generations.* **Used by**: [genetic_algorithm](#genetic_algorithm)

* **generation** int: the current generation.
* **best_individual** Dict[str, Any]: the best individual of the current generation

**returns** None.

In [21]:
def maybe_print_best(generation: int, best_individual: Dict[str, Any]) -> None:
    # print every 10 generations
    if generation%10 == 0:
        print(f"generation: {generation}, best_individual: {best_individual}")

In [22]:
# no unit tests for the print function because it returns none. 

<a id="genetic_algorithm"></a>
## genetic_algorithm

*The genetic_algorithm function takes a target string and the parameters for a given problem and outputs the best individual found in all generations. The genetic algorithm is based on the theory of evolution in that it applies the "survival of the fittest" method to determine the best-fit parents to use in reproduction for each generation. Population individuals are represented by a phenotype (true value) and genotype (chromosome) encoding. The genotype encoding is used in algorithm operations such as reproduction. The algorithm applies crossover and mutation to the children with a probability equal to the crossover and mutation rates provided as parameters. Crossover involves using a crossover index to assign a section of parent 1 and parent 2 to each child. Mutation involves applying a randomly selected mutation value to a randomly selected mutation index with a probability of the mutation rate. The algorithm accepts the selection_function and fitness_function as parameters because these may vary between problems. The algorithm prints the best individual of the generation every ten generations to show search progress. The algorithm outputs the best individual of the entire run. The incoming algorithm parameters are outlined below.*

* **target** str: the target string for the problem
* **parameters** Dict[str, Any]: the parameters used for this run of the genetic algorithm.
    * **population_size** int: the size of each population
    * **generations** int: the number of generations
    * **crossover_rate** float: the probability of crossover
    * **mutation_rate** float: the probability of mutation
    * **tournament_size** int (optional): the size of each tournament
    * **evaluation_function** Callable: the fitness_function to determine individual fitness
    * **selection_function** Callable: the selection function to select the next parents
    * **valid_chars** str: a string containing the valid characters for this problem

**returns** Dict[str, int | float | str].

In [23]:
def genetic_algorithm(target: str, parameters: Dict[str, Any]):
    current_best = {"generation": 0, "genotype": [], "phenotype": "", "fitness": 0.00}
    generation = 1
    # functions to generate random values
    random_float = lambda : random.uniform(0.00, 1.00)
    random_char = lambda : random.choice(parameters["valid_chars"])
    random_index = lambda : random.randint(0, len(target)-1)
    random_individual = lambda : ''.join(random.choices(parameters["valid_chars"], k=len(target)))
    # get and evaluate population
    population = generate_population(random_individual, parameters["population_size"])
    evaluated_population = evaluate_population(parameters["evaluation_function"], population, generation, target)
    while generation <= parameters["generations"]:
        next_population = get_next_population(evaluated_population, parameters, random_float, random_char, random_index)
        evaluated_population = evaluate_population(parameters["evaluation_function"], next_population, generation, target)
        best_individual = get_best(evaluated_population)
        maybe_print_best(generation, best_individual)
        if best_individual["fitness"] > current_best["fitness"]:
            current_best = best_individual
        generation += 1      
    return current_best # return the best individual of the entire run

## Problem 1

The target is the string "this is so much fun".
The challenge, aside from implementing the basic algorithm, is deriving a fitness function.
The fitness function should come up with a fitness score based on element to element comparisons between target v. phenotype.

In [24]:
target1 = "this is so much fun"

def fitness_function_1(genotype: List[str], target: str) -> float:
    loss = 0
    for index, value in enumerate(target):
        if value != genotype[index]:
            loss += 1
    return 1/(1+loss)

In [25]:
# set up if you need it.
algorithm_params_1 = {
    "population_size": 2000, 
    "generations": 100, 
    "crossover_rate": 0.95, 
    "mutation_rate": 0.05,
    "tournament_size": 6,
    "evaluation_function": fitness_function_1,
    "selection_function": tournament_selection,
    "valid_chars": ALPHABET
    }

In [26]:
result1 = genetic_algorithm(target1, algorithm_params_1) # do what you need to do for your implementation but don't change the lines above or below.

generation: 10, best_individual: {'genotype': ['t', 'h', 'i', 's', 'k', 'i', 's', 'i', 's', 'o', ' ', 'm', 'u', 'c', 'h', ' ', 'f', 'u', 'n'], 'phenotype': 'thiskisiso much fun', 'fitness': 0.3333333333333333, 'generation': 10}
generation: 20, best_individual: {'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h', ' ', 'f', 'u', 'n'], 'phenotype': 'this is so much fun', 'fitness': 1.0, 'generation': 20}
generation: 30, best_individual: {'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h', ' ', 'f', 'u', 'n'], 'phenotype': 'this is so much fun', 'fitness': 1.0, 'generation': 30}
generation: 40, best_individual: {'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h', ' ', 'f', 'u', 'n'], 'phenotype': 'this is so much fun', 'fitness': 1.0, 'generation': 40}
generation: 50, best_individual: {'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h', ' ', 'f', '

In [27]:
pprint(result1, compact=True)

{'fitness': 1.0,
 'generation': 13,
 'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u',
              'c', 'h', ' ', 'f', 'u', 'n'],
 'phenotype': 'this is so much fun'}


## Problem 2

The best individual in the population is the one who expresses this string *forwards*.

In [28]:
target2 = "nuf hcum os si siht"

In [29]:
def fitness_function_2(phenotype: str, target: str) -> float:
    loss = 0
    for index, value in enumerate(phenotype):
        if value != target[-index-1]:
            loss += 1
    return 1/(1+loss)

algorithm_params_2 = {
    "population_size": 2000, 
    "generations": 100, 
    "crossover_rate": 0.95, 
    "mutation_rate": 0.05,
    "tournament_size": 3,
    "evaluation_function": fitness_function_2,
    "selection_function": tournament_selection,
    "valid_chars": ALPHABET
    }

In [30]:
result2 = genetic_algorithm(target2, algorithm_params_2) # do what you need to do for your implementation but don't change the lines above or below.

generation: 10, best_individual: {'genotype': ['t', 'h', 'i', 's', ' ', 'i', 'e', ' ', 's', 'o', 'w', 'h', 'u', 'v', 'h', ' ', 'f', 'u', 'n'], 'phenotype': 'this ie sowhuvh fun', 'fitness': 0.2, 'generation': 10}
generation: 20, best_individual: {'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h', ' ', 'f', 'u', 'n'], 'phenotype': 'this is so much fun', 'fitness': 1.0, 'generation': 20}
generation: 30, best_individual: {'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h', ' ', 'f', 'u', 'n'], 'phenotype': 'this is so much fun', 'fitness': 1.0, 'generation': 30}
generation: 40, best_individual: {'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h', ' ', 'f', 'u', 'n'], 'phenotype': 'this is so much fun', 'fitness': 1.0, 'generation': 40}
generation: 50, best_individual: {'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h', ' ', 'f', 'u', 'n'], 'phen

In [31]:
pprint(result2, compact=True)

{'fitness': 1.0,
 'generation': 16,
 'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u',
              'c', 'h', ' ', 'f', 'u', 'n'],
 'phenotype': 'this is so much fun'}


## Problem 3

The Caeser Cypher replaces each letter of a string with the letter 13 characters down alphabet (rotating from "z" back to "a" as needed).
This is also known as ROT13 (for "rotate 13").
Latin did not have spaces (and the space is not continguous with the letters a-z) so we'll remove them from our alphabet.
Again, the goal is to derive a fitness function that compares a single gene against a single gene, without global transformations.

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You may not apply ROT13 to an entire string (either target or candidate) at any time.
        Everything must be a computation of one gene against one gene.
    </p>
</div>

The best individual will express the target *decoded*.

In [32]:
ALPHABET3 = "abcdefghijklmnopqrstuvwxyz"

In [33]:
target3 = "guvfvffbzhpusha"

In [34]:
# set up if you need it
def fitness_function_3(phenotype: str, target: str) -> float:
    loss = 0
    for index, value in enumerate(phenotype):
        shift_val = ALPHABET3[(ALPHABET3.index(value) + 13)%len(ALPHABET3)]
        if shift_val != target[index]:
            loss += 1
    return 1/(1+loss)

algorithm_params_3 = {
    "population_size": 2000, 
    "generations": 100, 
    "crossover_rate": 0.95, 
    "mutation_rate": 0.05,
    "tournament_size": 3,
    "evaluation_function": fitness_function_3,
    "selection_function": tournament_selection,
    "valid_chars": ALPHABET3
    }

In [35]:
result3 = genetic_algorithm(target3, algorithm_params_3) # do what you need to do for your implementation but don't change the lines above or below.

generation: 10, best_individual: {'genotype': ['t', 'h', 'i', 's', 'u', 's', 's', 'o', 'm', 'u', 'k', 'h', 'f', 'u', 'n'], 'phenotype': 'thisussomukhfun', 'fitness': 0.3333333333333333, 'generation': 10}
generation: 20, best_individual: {'genotype': ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 'u', 'n'], 'phenotype': 'thisissomuchfun', 'fitness': 1.0, 'generation': 20}
generation: 30, best_individual: {'genotype': ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 'u', 'n'], 'phenotype': 'thisissomuchfun', 'fitness': 1.0, 'generation': 30}
generation: 40, best_individual: {'genotype': ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 'u', 'n'], 'phenotype': 'thisissomuchfun', 'fitness': 1.0, 'generation': 40}
generation: 50, best_individual: {'genotype': ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 'u', 'n'], 'phenotype': 'thisissomuchfun', 'fitness': 1.0, 'generation': 50}
generation: 60, best_individual: {'genot

In [36]:
pprint(result3, compact=True)

{'fitness': 1.0,
 'generation': 12,
 'genotype': ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f',
              'u', 'n'],
 'phenotype': 'thisissomuchfun'}


## Solving Ciphers with an Unknown Shift

An approach to the cipher problem with an unknown or dynamic shift is to include the cipher key in the phenotype representation of the solution encoding as the final letter in the phenotype. The letter's index in the alphabet would correspond to the shift value. This problem will assume 0-indexing for the alphabet string to simplify index conversions. If the phenotype of the encoded solution is "khoorczruogd", then d would correspond to the cipher key. Since d has an index of 3 in the alphabet string, the shift for this problem is 3. The " " character would correspond to a shift of 26 because it is the last character in the alphabet string. The fitness_function must change to support this implementation. The encode and decode functions could remain the same.  The genetic algorithm should find the solution "hello world" for the solution encoding above.

**Modifications to problem three**:

- The **fitness_function** would need to remove the final letter from the target string and store this value as the cipher key. Next, the function should decode the cipher key by obtaining the letter's index in the alphabet. The fitness function from problem three would need to replace the hard-coded 13 with the shift value obtained from the phenotype. Phenotype comparisons should omit the final value from the target string so the solution does not include the cipher key.

- The **target_string** should contain one extra letter at the end of the encoded solution to represent the shift value. The shift value is equal to the letter's index in the alphabet.

The values in the alphabet string would not impact this solution. The alphabet could contain no spaces or other special characters as long as the final character in the solution encoding maps to an index representing the shift value. 