# Module 2 - Programming Assignment

## Directions

1. Change the name of this file to be your JHED id as in `jsmith299.ipynb`. Because sure you use your JHED ID (it's made out of your name and not your student id which is just letters and numbers).
2. Make sure the notebook you submit is cleanly and fully executed. I do not grade unexecuted notebooks.
3. Submit your notebook back in Blackboard where you downloaded this file.

*Provide the output **exactly** as requested*

In [232]:
from pprint import pprint
import random # for creating population
from copy import deepcopy
from typing import List, Tuple, Dict, Callable

## Local Search - Genetic Algorithm

There are some key ideas in the Genetic Algorithm.

First, there is a problem of some kind that either *is* an optimization problem or the solution can be expressed in terms of an optimization problem.
For example, if we wanted to minimize the function

$$f(x) = \sum (x_i - 0.5)^2$$

where $n = 10$.
This *is* an optimization problem. Normally, optimization problems are much, much harder.

![Eggholder](http://www.sfu.ca/~ssurjano/egg.png)!

The function we wish to optimize is often called the **objective function**.
The objective function is closely related to the **fitness** function in the GA.
If we have a **maximization** problem, then we can use the objective function directly as a fitness function.
If we have a **minimization** problem, then we need to convert the objective function into a suitable fitness function, since fitness functions must always mean "more is better".

Second, we need to *encode* candidate solutions using an "alphabet" analogous to G, A, T, C in DNA.
This encoding can be quite abstract.
You saw this in the Self Check.
There a floating point number was encoded as bits, just as in a computer and a sophisticated decoding scheme was then required.

Sometimes, the encoding need not be very complicated at all.
For example, in the real-valued GA, discussed in the Lectures, we could represent 2.73 as....2.73.
This is similarly true for a string matching problem.
We *could* encode "a" as "a", 97, or '01100001'.
And then "hello" would be:

```
["h", "e", "l", "l", "o"]
```

or

```
[104, 101, 108, 108, 111]
```

or

```
0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1
```

In Genetics terminology, this is the **chromosome** of the individual. And if this individual had the **phenotype** "h" for the first character then they would have the **genotype** for "h" (either as "h", 104, or 01101000).

To keep it straight, think **geno**type is **genes** and **pheno**type is **phenomenon**, the actual thing that the genes express.
So while we might encode a number as 10110110 (genotype), the number itself, 182, is what goes into the fitness function.
The environment operates on zebras, not the genes for stripes.

## String Matching

You are going to write a Genetic Algorithm that will solve the problem of matching a target string (at least at the start).
Now, this is kind of silly because in order for this to work, you need to know the target string and if you know the target string, why are you trying to do it?
Well, the problem is *pedagogical*.
It's a fun way of visualizing the GA at work, because as the GA finds better and better candidates, they make more and more sense.

Now, string matching is not *directly* an optimization problem so this falls under the general category of "if we convert the problem into an optimization problem we can solve it with an optimization algorithm" approach to problem solving.
This happens all the time.
We have a problem.
We can't solve it.
We convert it to a problem we *can* solve.
In this case, we're using the GA to solve the optimization part.

And all we need is some sort of measure of the difference between two strings.
We can use that measure as a **loss function**.
A loss function gives us a score tells us how similar two strings are.
The loss function becomes our objective function and we use the GA to minimize it by converting the objective function to a fitness function.
So that's the first step, come up with the loss/objective function.
The only stipulation is that it must calculate the score based on element to element (character to character) comparisons with no global transformations of the candidate or target strings.

And since this is a GA, we need a **genotype**.
The genotype for this problem is a list of "characters" (individual letters aren't special in Python like they are in some other languages):

```
["h", "e", "l", "l", "o"]
```

and the **phenotype** is the resulting string:

```
"hello"
```

In addition to the generic code and problem specific loss function, you'll need to pick parameters for the run.
These parameters include:

1. population size
2. number of generations
3. probability of crossover
4. probability of mutation

You will also need to pick a selection algorithm, either roulette wheel or tournament selection.
In the later case, you will need a tournament size.
This is all part of the problem.

Every **ten** (10) generations, you should print out the fitness, genotype, and phenotype of the best individual in the population for the specific generation.
The function should return the best individual *of the entire run*, using the same format.

In [233]:
ALPHABET = "abcdefghijklmnopqrstuvwxyz "

<a id="make_pop"></a>
## make_pop

This function returns the a population of random genotypes

* **gene_pool** str: string of characters available to be genes
* **chrom_size** int: the length of how big the chromosone/genotype can be
* **pop_siz** int: number of genotypes to be created in the population


**returns** List[List[str]]: contains a list of genotypes- a genotype is a list of strings/chars

In [234]:
def make_pop(gene_pool:str, chrom_size:int, pop_size:int) -> List[List[str]]:
    
    pop = []
    for person in range(pop_size):
        chrom = []
        for gene_index in range(chrom_size):
            gene = random.choice(gene_pool)
            chrom.append(gene)
        pop.append(chrom)
    return pop

In [235]:
# unit test
pop1 = make_pop(ALPHABET, 5, 10)

# returns a list of list
assert isinstance(pop1[0], list)
# returns right lengths
assert len(pop1[0]) == 5 and len(pop1) == 10
# check if all characters are valid
for chrom in pop1:
    for gene in chrom:
        assert gene in ALPHABET


<a id="matching_fitness"></a>
## matching_fitness

Score whether the target genotype matches the person genotype.
High score is best. To have the highest score all genes
in the person must match the genes in the target at the same location


* **target** List[str]: genotype of the target
* **person** List[str]: genotype of a single person in the population


**returns** int: score of how the genes match

In [236]:
def matching_fitness(target:List[str], person:List[str]) -> int:
    score = 0
    for index in range(len(target)):
        if person[index] == target[index]:
            score += 1
    return score

In [237]:
target = ['c','o','g']
person1 = ['c','a','t']
person2 = ['v','r','w']
score1 = matching_fitness(target, target) # same string
# assert the score isn't greater than length of target
assert score1 <= len(target)

# assert the score is correct
score2 = matching_fitness(target, person1) # same string
assert score2 == 1

# assert zero returns if no mathcing chars
score3 = matching_fitness(target, person2)
assert score3 == 0

<a id="backwards_fitness"></a>
## backwards_fitness

Score whether the target genotype matches the person genotype in reverse order.
High score is best. To have the highest score all genes
in the person must match the genes in the target in the reverse order


* **target** List[str]: genotype of the target
* **person** List[str]: genotype of a single person in the population


**returns** int: score of how the genes match

In [238]:
def backwards_fitness(target:List[str], person:List[str]) -> int:
    max_length = len(target) - 1
    score = 0

    for current_index in range(len(person)):
        matching_index = max_length - current_index
        if person[current_index] == target[matching_index]:
            score += 1
    return score


In [239]:
# unit tests
target1 = ["t","a", "c"]
person1 = ["c", "a", "t"]
test_score = backwards_fitness(target1, person1)
# test the score is the highest for a completely backwards string
assert test_score == len(target1)

#test same string has a score of zero
target2 = ["z", "e", "r", "o"]
test_score2 = backwards_fitness(target2, target2)
assert test_score2 == 0

# test a similar string has a score less than max, but greater than 0
target3 = ["t","a", "c", " ", "n","i", " ", "t", "a", "h"]
person3 = ["c", "a", "t", " ", "n","i", " ", "t", "a", "t"]
test_score3 = backwards_fitness(target3, target3)
assert test_score3 > 0 and test_score3 < len(target3)


<a id="rot13"></a>
## rot13

Score whether the target genotype matches the person genotype from a caesar cipher.
High score is best. To have the highest score all genes
in the person must match the genes in the target from a shift 13 cipher


* **target** List[str]: genotype of the target
* **person** List[str]: genotype of a single person in the population
* **gene_pool** str: characters available in the cipher. In the order of the alphabet


**returns** int: score of how the genes match

In [240]:
def rot13(target:List[str], person:List[str], gene_pool: str) -> int:
    score = 0
    for index in range(len(target)):
        # find what letter the person belongs to in gene pool
        person_index = gene_pool.index(person[index])
        cipher = person_index + 13

        # cipher loops around the alphabet
        if cipher > 25:
            cipher = cipher - 26
        
        # check if cipher matches
        if gene_pool[cipher] == target[index]:
            score += 1
    return score

In [241]:
letters = "abcdefghijklmnopqrstuvwxyz"
t1 = ["h","e","l","l","o"]
p1 = ["u","r","y","y","b"]

# check same string has score of zero
s1 = rot13(t1, t1, letters)
assert s1 == 0

# check score is same lengh as number of char in list
s2 = rot13(t1, p1, letters)
assert s2 == len(t1)

# score is greater than 0 but less than max
p1[0] = "a"
s3 = rot13(t1, p1, letters)
assert s3 < len(t1) and s3 > 0

<a id="fitness"></a>
## fitness

Finds the fitness score for each member in the population.
Score is based on the fitness method
    
    - 1 for matching strings exactly
    - 2 for matching strings in reverse order
    - 3 for a rotate 13 cipher


* **pop** List[List[str]]: list of genotypes in the population
* **target** List[str]: genotype of the target
* **fitness_method** int: method to evaulate the genotypes in the pop
* **gene_pool** str: characters available in the cipher. In the order of the alphabet


**returns** List[int]: returns a list of scores in the order of the population

In [242]:
def fitness(pop: List[List[str]], target:List[str], fitness_method:int, gene_pool:str)-> List[int]:
    score_list = []
    for person in pop:
        
        if fitness_method == 1: # problem 1
            score = matching_fitness(target, person)

        elif fitness_method == 2: # problem 2
            score = backwards_fitness(target, person)
        else: # problem 3
            score = rot13(target, person, gene_pool)
        
        score_list.append(score) #add the indiv score to list
    return score_list

In [244]:
letters = "abcdefghijklmnopqrstuvwxyz"
pop1 = [['a','b','c'] # no matching in order
,['c','a','c'] # first two matching
,['a','a','a'] # only one matching
,['d','o','g']] # no matching
target_test = ['c','a','t']
score_test = fitness(pop1, target_test, 1, letters)
score_test
# check score list is size length as population
assert len(pop1) == len(score_test)

# check no score is greater than len of target
for score in score_test:
    assert score <= len(target_test)
# check score values are correct
assert score_test == [0, 2, 1, 0]
# check the problem1, problem 2, problem 3 have diff scores
score_test2 = fitness(pop1, target_test, 2, letters)
score_test3 = fitness(pop1, target_test, 3, letters)
assert score_test2 != score_test != score_test3

<a id="tournament_selection"></a>
## tournament_selection

Picks the parents based on a tournament style selection.
First pick 7 random genotypes in the population, then do it again.
Then pick best of the 7 from both list. The best become the parents 


* **pop** List[List[str]]: list of genotypes in the population
* **scores** List[int]: list of scores for each genotype in the pop



**returns** Tuple[List[str],List[str]]: returns two parent genotypes

In [245]:
def tournament_selection(pop:List[List[str]], scores:List[int])-> Tuple[List[str],List[str]]:
    # get indexes for the random 7 
    index_list = [i for i in range(len(pop))]
    parent1_index = [random.choice(index_list) for i in range(7)]
    parent2_index = [random.choice(index_list) for i in range(7)]

    # reduce list to random 7
    parent1 = [pop[index] for index in parent1_index]
    parent1_scores = [scores[index] for index in parent1_index]
    parent2 = [pop[index] for index in parent2_index]
    parent2_scores = [scores[index] for index in parent2_index]

    # find best parent of the random 7
    parent1_best_index = parent1_scores.index(max(parent1_scores))
    parent1 = parent1[parent1_best_index]
    parent2_best_index = parent2_scores.index(max(parent2_scores))
    parent2 = parent2[parent2_best_index]

    return parent1, parent2

In [246]:
pop1 = make_pop(ALPHABET, 3, 10)
score_test = fitness(pop1, target_test, 1, ALPHABET)
# test selection function
p1, p2 = tournament_selection(pop1, score_test)

#check p1 and p2 are same size 
assert len(p1) == len(p2)

# check parents are same size as person in pop
assert len(p1) == len(p2) == len(pop1[0])

# check p1 and p2 are in population
assert p1 in pop1 and p2 in pop1


<a id="mutate"></a>
## mutate

Mutates one gene in a genotype. Note there's a possibility that the mutation
will still result in the same genotype

* **child** List[str]: a genotype in the population
* **gene_pool** str: available genes to that a gene can mutate to

**returns** List[str]: returns a genotype with the mutation

In [247]:
def mutate( child:List[str], gene_pool: str)-> List[str]:
    child_mutated = deepcopy(child)
    # choose where to mutate and what to mutate to
    mutate_index = random.choice([i for i in range(len(child))])
    mutate_value = random.choice(gene_pool)
    
    # update child
    child_mutated[mutate_index] = mutate_value

    return child_mutated

In [248]:
initial_child = ['c','a','t']
new_child = mutate( initial_child, ALPHABET)

#check length stayed the same
assert len(new_child) == len(initial_child)

# check new child has valid values
for gene in new_child:
    assert gene in ALPHABET

# a mutatation for the majority of the time should
# change the child
change_count = 0
for experiment in range(10):
    new_child = mutate( initial_child, ALPHABET)
    if new_child != initial_child:
        change_count += 1
assert change_count > 0


<a id="crossover"></a>
## crossover

Performs crossover on the two genotypes based on the gene index to make two children

Cross over

first child gets parent one's genes all prior to index and all genes index or greater from parent 2

second child gets parent two's genes all prior to index and all genes index or greater from parent 1

* **parent1** List[str]: a genotype in the population representing parent 1
* **parent2** List[str]: a genotype in the population representing parent 2
* **crossover_index** int: index to perform crossover at. Split the genotype

**returns** Tuple[List[str],List[str]]: returns two children from crossover

In [249]:
def crossover(parent1:List[str], parent2:List[str], crossover_index:int) -> Tuple[List[str],List[str]]:
    child1 = parent1[:crossover_index] + parent2[crossover_index:]
    child2 = parent2[:crossover_index] + parent1[crossover_index:]

    return child1, child2

In [250]:
p1 = ['a','b','c','d','e']
p2 = ['z','y','x','l','m']

c1, c2 = crossover(p1, p2, 2)

#check lengths are the same
assert len(c1) == len(c2) == len(p1)

#check if genes are from both parents
parents = p1 + p2
for gene in c1:
    assert gene in parents
for gene in c2:
    assert gene in parents

# check if a valid cross over
assert c1 == ['a', 'b', 'x', 'l', 'm']
assert c2 == ['z', 'y', 'c', 'd', 'e']

<a id="reproduce"></a>
## reproduce

Creates the two child from crossover followed by mutation

* **parent1** List[str]: a genotype in the population representing parent 1
* **parent2** List[str]: a genotype in the population representing parent 2
* **crossover_index** int: index to perform crossover at. Split the genotype
* **crossover_rate** float: rate crossover is successful. 
* **mutation_rate** float: rate mutation is successful. 
* **gene_pool** str: genes available (chars)

**returns** Tuple[List[str],List[str]]: returns two children

In [251]:
def reproduce(parent1:List[str], parent2:List[str], crossover_index: int, crossover_rate:float, mutation_rate:float, gene_pool:str)-> Tuple[List[str],List[str]]:
    crossover_prob = random.random() # between 0 and 1
    mutation_prob = random.random() #between 0 and 1

    if crossover_prob < crossover_rate: #sucess
        child1, child2 = crossover(parent1, parent2, crossover_index)
    else:
        child1 = deepcopy(parent1)
        child2 = deepcopy(parent2)
    
    if mutation_prob < mutation_rate:
        child1 = mutate( child1, gene_pool)
        child2 = mutate( child2, gene_pool)

    return child1, child2

In [252]:
p1 = ['a','b','c','d','e']
p2 = ['z','y','x','l','m']
c1, c2 = reproduce(p1, p2, 2, .90, .05, ALPHABET)

#check sizes are the same
assert len(c1) == len(c2) == len(p1)

#check if genes are from both parents
parents = p1 + p2
for gene in c1:
    assert gene in parents
for gene in c2:
    assert gene in parents

# check that the majority of the time change happens. Children are not parents
change = 0
for experiment in range(10):
    c1, c2 = reproduce(p1, p2, 2, .90, .05, ALPHABET)
    if c1 != p1: change += 1
assert change > 0

<a id="make_family"></a>
## make_family

Creates parents, followed by making children.
A list of the two children is returned


* **pop** List[List[str]]: list of genotypes
* **scores** List[int]: list of scores per genotype in population
* **params** dictionary: dictionary containing crossover and mutation fields

**returns** List[List[str]]: returns a list of two children

In [253]:
def make_family(pop:List[List[str]],scores:List[int], params:Dict)-> List[List[str]]:

    crossover_index = params["crossover_index"]
    crossover_rate = params["crossover_rate"]
    mutation_rate = params["mutation_rate"]
    gene_pool = params["gene_pool"]

    parent1, parent2 = tournament_selection(pop, scores)
    child1, child2 = reproduce(parent1, parent2, crossover_index, crossover_rate, mutation_rate, gene_pool)

    return [child1, child2]

In [254]:
pop1 =  make_pop(ALPHABET, 3, 5)
scores = [1,5, 10, 4, 6]
test_params = {"crossover_index": 1, "crossover_rate": .08
, "mutation_rate": .05, "gene_pool": ALPHABET}

family = make_family(pop1,scores, test_params)

#check a list of list is returned
assert isinstance(family, list)
assert isinstance(family[0], list)

#check only 2 children are returned
assert len(family) == 2

# check children are valid characters
for child in family:
    for ch in child:
        assert ch in ALPHABET

<a id="find_best"></a>
## find_best

Finds the genotype with highest score in the population.
Returns a dictionary of the fitness score, genotype and phenotype


* **pop** List[List[str]]: list of genotypes
* **scores** List[int]: list of scores per genotype in population

**returns** dict: the best score, genotype and phenotype in the population

In [255]:
def find_best(pop:List[List[str]], scores:List[int]) -> List[str]:
    best = {}
    # find best score
    best_index =scores.index(max(scores))
    best["fitness"] = scores[best_index]
    best["genotype"] = pop[best_index]
    best["phenotype"] = ''.join(best["genotype"])

    return best

In [256]:
pop1 =  make_pop(ALPHABET, 3, 5)
scores = [1,5, 10, 4, 6]
pprint(pop1, compact=True)

scores_family = find_best(pop1, scores)
pprint(scores_family, compact=True)

# check three keys present
keys = ["fitness", "genotype", "phenotype"]
assert list(scores_family.keys()) == keys

# check size of genotype and phenotype match
assert len(scores_family["genotype"]) == len(scores_family["phenotype"])

# check the fitness score is correct. The true max
assert scores_family["fitness"] == 10


[['x', 'g', 'l'], ['q', 'k', 'p'], ['k', ' ', 'k'], ['e', 'y', 'g'],
 ['s', 'i', 's']]
{'fitness': 10, 'genotype': ['k', ' ', 'k'], 'phenotype': 'k k'}


<a id="genetic_algorithm"></a>
### genetic_algorithm

A search algorithm based on natural selection and genetics.
This algorithm produces a random population of genoptypes,
encodes the target. The search will stop once the target is reached
or time runs out (limit). The search happens from genotypes changing based on crossover and mutation

This algorithm requires many arguments and most are stored as values in the params dictionary

* **target** string: phenotype of the target to search for
* **parms** dict: additional values needed for separate functions within the algo. Keys include
    - **gene_pool** str: list of genes available in encoding
    - **limit** int: max number of iterations
    - **pop_size** int: number of genotypes in the population
    - **fitness_method** int: way to score the genotypes
    - **crossover_index** int: index to perform crossover at. The split index
    - **crossover_rate** float: rate to perform crossover
    - **mutation_rate** float: rate to perform mutation
    

**returns** dict: the best score, genotype and phenotype in the population

In [257]:
def genetic_algorithm(target:str, params:dict)->dict: # add your formal parameters
    generation = 0
    pop = make_pop(params["gene_pool"], len(target), params["pop_size"])
    target_genotype = [chr for chr in target]# maybe move to pre-processing

    while generation < params["limit"]:
        scores = fitness(pop, target_genotype, params["fitness_method"],params["gene_pool"])
        if max(scores) == len(target): # all char match
            return find_best(pop, scores)
        if generation % 10 == 0: # print every 10 gens
            print("Generation:", generation)
            pprint(find_best(pop, scores), compact=True)
        
        new_pop = []
        for n in range(int(params["pop_size"]/2)):
            new_pop += make_family(pop,scores, params)
            
        pop = new_pop
        generation += 1
    return find_best(pop, scores)

## Problem 1

The target is the string "this is so much fun".
The challenge, aside from implementing the basic algorithm, is deriving a fitness function based on "b" - "p" (for example).
The fitness function should come up with a fitness score based on element to element comparisons between target v. phenotype.

In [258]:
target1 = "this is so much fun"

In [259]:
# set up parameters
parameters = {
    "gene_pool" : ALPHABET
  , "limit": 1500
  , "pop_size": 500
  , "fitness_method": 1
  , "crossover_index": 4
  , "crossover_rate": .90
  , "mutation_rate": .05
}

In [260]:
result1 = genetic_algorithm(target1, parameters) # do what you need to do for your implementation but don't change the lines above or below.

Generation: 0
{'fitness': 5,
 'genotype': ['z', 'r', 'i', 'l', 'n', ' ', 'j', 'c', 's', 'o', 'p', 't', 'v',
              'c', 'r', ' ', 'i', 'z', 'y'],
 'phenotype': 'zriln jcsoptvcr izy'}
Generation: 10
{'fitness': 8,
 'genotype': ['t', 'h', 'i', 's', 'n', ' ', 'j', 'c', 's', 'o', 'p', 't', 'v',
              'c', 'r', ' ', 'i', 'z', 'y'],
 'phenotype': 'thisn jcsoptvcr izy'}
Generation: 20
{'fitness': 10,
 'genotype': ['t', 'h', 'i', 's', 'j', 'g', 's', ' ', 's', 'c', 'i', 'j', 'h',
              'l', 'h', ' ', 'e', 'f', 'n'],
 'phenotype': 'thisjgs scijhlh efn'}
Generation: 30
{'fitness': 12,
 'genotype': ['t', 'h', 'i', 's', 'j', 'g', 's', ' ', 's', 'c', ' ', 'j', 'u',
              'l', 'h', ' ', 'e', 'f', 'n'],
 'phenotype': 'thisjgs sc julh efn'}
Generation: 40
{'fitness': 14,
 'genotype': ['t', 'h', 'i', 's', 'j', 'i', 's', ' ', 's', 'c', ' ', 'j', 'u',
              'l', 'h', ' ', 'f', 'f', 'n'],
 'phenotype': 'thisjis sc julh ffn'}
Generation: 50
{'fitness': 15,
 'genotype':

In [261]:
pprint(result1, compact=True)

{'fitness': 19,
 'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u',
              'c', 'h', ' ', 'f', 'u', 'n'],
 'phenotype': 'this is so much fun'}


## Problem 2

You should have working code now.
The goal here is to think a bit more about fitness functions.
The target string is now, 'nuf hcum os si siht'.
This is obviously target #1 but reversed.
If we just wanted to match the string, this would be trivial.
Instead, this problem, we want to "decode" the string so that the best individual displays the target forwards.
In order to do this, you'll need to come up with a fitness function that measures how successful candidates are towards this goal.
The constraint is that you may not perform any global operations on the target or individuals.
Your fitness function must still compare a single gene against a single gene.
Your solution will likely not be Pythonic but use indexing.
That's ok.
<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You may not reverse an entire string (either target or candidate) at any time.
        Everything must be a computation of one gene against one gene (one letter against one letter).
        Failure to follow these directions will result in 0 points for the problem.
    </p>
</div>

The best individual in the population is the one who expresses this string *forwards*.

In [262]:
target2 = "nuf hcum os si siht" # fitness function should check if this is backwards
# index to index compare, 
# first should be last of target
# so chromosone of this is so much fun should have highest score

In [263]:
# set up parameters
parameters2 = {
    "gene_pool" : ALPHABET
  , "limit": 1500
  , "pop_size": 500
  , "fitness_method": 2
  , "crossover_index": 4
  , "crossover_rate": .90
  , "mutation_rate": .05
}

In [264]:
result2 = genetic_algorithm(target2, parameters2) # do what you need to do for your implementation but don't change the lines above or below.

Generation: 0
{'fitness': 4,
 'genotype': ['q', 'q', 'p', 'p', 'h', 'i', 'l', ' ', 'p', 'o', 'v', 'y', 'g',
              'y', 'h', 'w', 'x', 's', 'r'],
 'phenotype': 'qqpphil povygyhwxsr'}
Generation: 10
{'fitness': 9,
 'genotype': ['y', 'h', 'i', 's', ' ', 'i', 's', 'b', 'l', 'w', 'k', 'm', 'z',
              'c', 'f', 'w', 'j', 'c', 'n'],
 'phenotype': 'yhis isblwkmzcfwjcn'}
Generation: 20
{'fitness': 12,
 'genotype': ['t', 'h', 'i', 's', ' ', 'i', 'a', ' ', 'l', 'w', 'k', 'm', 'z',
              'c', 'h', ' ', 'j', 'c', 'n'],
 'phenotype': 'this ia lwkmzch jcn'}
Generation: 30
{'fitness': 14,
 'genotype': ['t', 'h', 'i', 's', ' ', 'i', 'a', ' ', 's', 'w', 'k', 'm', 'z',
              'c', 'h', ' ', 'j', 'u', 'n'],
 'phenotype': 'this ia swkmzch jun'}
Generation: 40
{'fitness': 14,
 'genotype': ['t', 'h', 'i', 's', ' ', 'i', 'a', ' ', 's', 'w', 'k', 'm', 'z',
              'c', 'h', ' ', 'j', 'u', 'n'],
 'phenotype': 'this ia swkmzch jun'}
Generation: 50
{'fitness': 16,
 'genotype':

In [265]:
pprint(result2, compact=True)

{'fitness': 19,
 'genotype': ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u',
              'c', 'h', ' ', 'f', 'u', 'n'],
 'phenotype': 'this is so much fun'}


## Problem 3

This is a variation on the theme of Problem 2.
The Caeser Cypher replaces each letter of a string with the letter 13 characters down alphabet (rotating from "z" back to "a" as needed).
This is also known as ROT13 (for "rotate 13").
Latin did not have spaces (and the space is not continguous with the letters a-z) so we'll remove them from our alphabet.
Again, the goal is to derive a fitness function that compares a single gene against a single gene, without global transformations.
This fitness function assigns higher scores to individuals that correctly decode the target.

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You may not apply ROT13 to an entire string (either target or candidate) at any time.
        Everything must be a computation of one gene against one gene.
        Failure to follow these directions will result in 0 points for the problem.
    </p>
</div>

The best individual will express the target *decoded*.

In [266]:
ALPHABET3 = "abcdefghijklmnopqrstuvwxyz"

In [267]:
target3 = "guvfvffbzhpusha"

In [268]:
# set up parameters
parameters3 = {
    "gene_pool" : ALPHABET3
  , "limit": 1500
  , "pop_size": 500
  , "fitness_method": 3
  , "crossover_index": 4
  , "crossover_rate": .90
  , "mutation_rate": .05
}

In [269]:
result3 = genetic_algorithm(target3, parameters3)

Generation: 0
{'fitness': 4,
 'genotype': ['t', 'y', 'e', 't', 'i', 'q', 'r', 'w', 'n', 'v', 'c', 'r', 'l',
              'w', 'n'],
 'phenotype': 'tyetiqrwnvcrlwn'}
Generation: 10
{'fitness': 7,
 'genotype': ['t', 'h', 'q', 's', 'm', 'w', 's', 'o', 'v', 'u', 'n', 'u', 'i',
              'c', 'n'],
 'phenotype': 'thqsmwsovunuicn'}
Generation: 20
{'fitness': 11,
 'genotype': ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'v', 'u', 'n', 'u', 'i',
              'u', 'n'],
 'phenotype': 'thisissovunuiun'}
Generation: 30
{'fitness': 12,
 'genotype': ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'v', 'u', 'n', 'h', 'i',
              'u', 'n'],
 'phenotype': 'thisissovunhiun'}
Generation: 40
{'fitness': 14,
 'genotype': ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'i',
              'u', 'n'],
 'phenotype': 'thisissomuchiun'}


In [270]:
pprint(result3, compact=True)

{'fitness': 15,
 'genotype': ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f',
              'u', 'n'],
 'phenotype': 'thisissomuchfun'}


## Problem 4

There is no code for this problem.

In Problem 3, we assumed we knew what the shift was in ROT-13.
What if we didn't?
Describe how you might solve that problem including a description of the solution encoding (chromosome and interpretation) and fitness function. Assume we can add spaces into the message.

If we assume spaces are in the message, then our gene pool would have a size of 27 (all letters 26 + one char)

*fitness_any_caesar_cipher (target, person, gene_pool, shift_number)*
    
    score = 0 
    loop through index i in size of target
        gene_index = index of char at index i in person

        cipher_index = gene_index + gshift_number
        cipher_index = cipher_index % 27 # mod of lenght of gene_pool

        if gene_pool at cipher index equals target at index i
            score increments by one


## Challenge

**You do not need to do this problem and it won't be graded if you do. It's just here if you want to push your understanding.**

The original GA used binary encodings for everything.
We're basically using a Base 27 encoding.
You could, however, write a version of the algorithm that uses an 8 bit encoding for each letter (ignore spaces as they're a bit of a bother).
That is, a 4 letter candidate looks like this:

```
0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1
```

If you wrote your `genetic_algorithm` code general enough, with higher order functions, you should be able to implement it using bit strings instead of latin strings.

## Before You Submit...

1. Did you provide output exactly as requested?
2. Did you re-execute the entire notebook? ("Restart Kernel and Rull All Cells...")
3. If you did not complete the assignment or had difficulty please explain what gave you the most difficulty in the Markdown cell below.
4. Did you change the name of the file to `jhed_id.ipynb`?

Do not submit any other files.