# Module 3 - Programming Assignment

## General Directions

1. You must follow the Programming Requirements outlined on Canvas.
2. The Notebook should be cleanly and fully executed before submission.
3. You should change the name of this file to be your JHED id. For example, `jsmith299.ipynb` although Canvas will change it to something else...
4. You must follow the Programming Requirments for this course.

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You should always read the entire assignment before beginning your work, so that you know in advance what the requested output will be and can plan your implementation accordingly.
    </p>
</div>

<div style="color: white; background: #C83F49; margin:20px; padding: 20px;">
    <strong>Academic Integrity and Copyright</strong>
    <p>You are not permitted to consult outside sources (Stackoverflow, YouTube, ChatGPT, etc.) or use "code assistance" (Co-Pilot, etc) to complete this assignment. By submitting this assignment for grading, you certify that the submission is 100% your own work, based on course materials, group interactions, instructor guidance. You agree to comply by the requirements set forth in the Syllabus, including, by reference, the JHU KSAS/WSE Graduate Academic Misconduct Policy.</p>
    <p>Sharing this assignment either directly (e.g., email, github, homework site) or indirectly (e.g., ChatGPT, machine learning platform) is a violation of the copyright. Additionally, all such sharing is a violation the Graduate Academic Misconduct Policy (facilitating academic dishonesty is itself academic dishonesty), even after you graduate.</p>
    <p>If you have questions or if you're unsure about the policy, ask via Canvas Inbox. In this case, being forgiven is <strong>not</strong> easier than getting permission and ignorance is not an excuse.</p>
    <p>This assignment is copyright (&copy Johns Hopkins University &amp; Stephyn G. W. Butcher). All rights reserved.</p>
</div>

In [1]:
from pprint import pprint
import random
from typing import List, Tuple, Dict, Callable, Set, Any

## Local Search - Genetic Algorithm

There are some key ideas in the Genetic Algorithm.

First, there is a problem of some kind that either *is* an optimization problem or the solution can be expressed in terms of an optimization problem.
For example, if we wanted to minimize the function

$$f(x) = \sum (x_i - 0.5)^2$$

where $n = 10$.
This *is* an optimization problem. Normally, optimization problems are much, much harder.

![Eggholder](http://www.sfu.ca/~ssurjano/egg.png)!

The function we wish to optimize is often called the **objective function**.
The objective function is closely related to the **fitness** function in the GA.
If we have a **maximization** problem, then we can use the objective function directly as a fitness function.
If we have a **minimization** problem, then we need to convert the objective function into a suitable fitness function, since fitness functions must always mean "more is better".

Second, we need to *encode* candidate solutions using an "alphabet" analogous to G, A, T, C in DNA.
This encoding can be quite abstract.
You saw this in the Self Check.
There a floating point number was encoded as bits, just as in a computer and a sophisticated decoding scheme was then required.

Sometimes, the encoding need not be very complicated at all.
For example, in the real-valued GA, discussed in the Lectures, we could represent 2.73 as....2.73.
This is similarly true for a string matching problem.
We *could* encode "a" as "a", 97, or '01100001'.
And then "hello" would be:

```
["h", "e", "l", "l", "o"]
```

or

```
[104, 101, 108, 108, 111]
```

or

```
0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1
```

In Genetics terminology, this is the **chromosome** of the individual. And if this individual had the **phenotype** "h" for the first character then they would have the **genotype** for "h" (either as "h", 104, or 01101000).

To keep it straight, think **geno**type is **genes** and **pheno**type is **phenomenon**, the actual thing that the genes express.
So while we might encode a number as 10110110 (genotype), the number itself, 182, is what goes into the fitness function.
The environment operates on zebras, not the genes for stripes.

## String Matching

You are going to write a Genetic Algorithm that will solve the problem of matching a target string (at least at the start).
Now, this is kind of silly because in order for this to work, you need to know the target string and if you know the target string, why are you trying to do it?
Well, the problem is *pedagogical*.
It's a fun way of visualizing the GA at work, because as the GA finds better and better candidates, they make more and more sense.

Now, string matching is not *directly* an optimization problem so this falls under the general category of "if we convert the problem into an optimization problem we can solve it with an optimization algorithm" approach to problem solving.
This happens all the time.
We have a problem.
We can't solve it.
We convert it to a problem we *can* solve.
In this case, we're using the GA to solve the optimization part.

And all we need is some sort of measure of the difference between two strings.
The only constraint for our objective function is that it must calculate the score based on element to element (character to character) comparisons with no global transformations of the candidate or target strings.
That measure becomes our **objective function** and we can use it with the Genetic Algorithm.

Since it is probably easier to come up with a score that measures how far apart the two strings are, we will end up with an objective function that represents a **minimization problem**.
Because a fitness function must always be "more is better", we'll need to convert our objective function to a proper fitness function as well.

And since this is a GA, we need a **genotype**.
The genotype for this problem is a list of "characters" (individual letters aren't special in Python like they are in some other languages):

```
["h", "e", "l", "l", "o"]
```

and the **phenotype** is the resulting string:

```
"hello"
```

In addition to the generic code and problem specific loss function, you'll need to pick parameters for the run.
These parameters include:

1. population size
2. number of generations
3. probability of crossover
4. probability of mutation

You will also need to pick a selection algorithm, either roulette wheel or tournament selection.
In the later case, you will need a tournament size.
This is all part of the problem.

Every **ten** (10) generations, you should print out the fitness, genotype, and phenotype of the best individual in the population for the specific generation.
The function should return the best individual *of the entire run*, using the same format.

In [2]:
ALPHABET = "abcdefghijklmnopqrstuvwxyz "

<a id="generate_random_population"></a>
## generate_random_population
`generate_random_population` takes an string alphabet and an integer n representing the amount in the population, returns a list of length n, each element is a list of characters which is the genotype of the individual. The string the list of character represents is the phenotype.
We will randomly choose a length between 1 and 100 and fill that length with random letters within the ALPHABET
 **Used by**: [genetic_algorithm](#genetic_algorithm)

* **alphabet**: a string representing the random pool.
* **n**: an integer representing the length of output.

**returns** list of list of characters: random_population.

In [3]:
def generate_random_population(n:int, alphabet:str) -> list[list[str]]:
    random_population = []
    for i in range(n):
        length = random.randint(1, 20)
        current_string = []
        for j in range(length):
            current_string.append(random.choice(alphabet))
        random_population.append(current_string)
    return random_population

In [4]:
generate_random_population(10, ALPHABET)

[['x', 'b'],
 ['i', 'j'],
 ['u', 'g', 'g'],
 ['y', 'k', 'b', 'w', 'u', 'i', 'm', 'h', 'v', 'l', 't', 'h'],
 ['v', 'w', 't', 'a', 's', 'j', 'y', 'b', 'u'],
 ['d', 'l', 'b', 'd', 'v'],
 ['a', 'g'],
 ['c',
  'v',
  'g',
  'n',
  'w',
  'y',
  'o',
  'm',
  'c',
  'a',
  'v',
  't',
  'g',
  'l',
  'c',
  'k',
  'w',
  'o',
  'd',
  'q'],
 ['q',
  ' ',
  'o',
  't',
  'k',
  't',
  'y',
  'b',
  'y',
  'h',
  ' ',
  'j',
  'f',
  's',
  'a',
  'p',
  'y',
  's',
  'w',
  'a'],
 ['u', 'z', 'w', 'r', 'v']]

<a id="fitness_function_1"></a>
## fitness_function_1

This is the fitness function used for problem 1. `fitness_function_1` takes two strings, one is the phenotype of the current individual, and the other is the target. Returns a number that is the fitness score, the higher the better fit it is. **Used by**: [Problem_1](#Problem_1)

We calculate the score by iterating through both strings in the corresponding way and compare each letter. If they satisfy the condition of the problem, then fitness score +1.

For problem 1, we just iterate through the strings one by one and see if they match.

We penalize the length difference a bit, by dividing the score by 1+0.1*abs(length difference), just so it's faster to get to the optimal length and the crossovers become more effective.

* **current_individual**: a string.
* **target**: a string.

**returns** number: fitness_score.

In [5]:
def fitness_function_1(current_phenotype:str, target:str) -> float:
    fitness_score = 0.0
    length_difference = abs(len(current_phenotype) - len(target))
    for i in range(0, min(len(current_phenotype), len(target))):
        if current_phenotype[i] == target [i]:
            fitness_score += 1.0
    fitness_score /= (1 + 0.1 * length_difference)
    return fitness_score

In [6]:
print(fitness_function_1("this",  "this is so much fun"))
print(fitness_function_1("this is so much",  "this is so much fun"))
print(fitness_function_1("this is so much fun",  "this is so much fun"))
print(fitness_function_1("this is so much fun but more than that",  "this is so much fun"))

1.6
10.714285714285715
19.0
6.551724137931034


<a id="evaluate_population"></a>
## evaluate_population

Passes in a population of genotypes, the target string and a fitness function, find the phenotypes of the pupulation and calculate their fitness score. Returns the list of fitness scores. **Used by**: [genetic_algorithm](#genetic_algorithm)

* **population**: a list of list of characters (one char string)
* **target**: a string
* **fitness_function**: a string

**returns** a list of fitness scores list of numbers

In [7]:
def evaluate_population( population: List[List[str]], target: str, fitness_function: callable) -> List[float]:
    fitness_scores = []
    for individual in population:
        phenotype = "".join(individual)
        fitness_scores.append(fitness_function(phenotype, target))
    return fitness_scores

In [8]:
test_population = [["a", "b", "c"], ["a"], ["e", "f", "g", "h"], ["e", "f"], ["a", "f"], [" "], ["f", "c"], ["a", "c"], ["a", "c", "e", "f", "g", "h"]]
test_target = "afc"
test_scores=evaluate_population(test_population, test_target, fitness_function_1)
print(test_scores)

[2.0, 0.8333333333333334, 0.9090909090909091, 0.9090909090909091, 1.8181818181818181, 0.0, 0.0, 0.9090909090909091, 0.7692307692307692]


<a id="get_best_individual"></a>
## get_best_individual

From the current best individual, current population and current scores, return the best individual from new population, if none is better, returns the old best. Best individual is a tuple that contains **Used by**: [genetic_algorithm](#genetic_algorithm)
1. fitness_score: a float
2. genotype: a list of characters (a list of str)
3. phenotype: a string

* **best_individual**: a tuple of 3 elements: a float, a list of strs, and a string
* **population**: a list of list of characters (one char string)
* **scores**: a list of floats

**returns** best_individual: a tuple of 3 elements: a float, a list of strs, and a string

In [9]:
def get_best_individual(best_individual: Tuple[float,List[str],str], population: List[List[str]], scores: List[float]) ->Tuple[float,List[str],str]:
    best_score = max(scores)
    if best_individual[0] > best_score:
        return best_individual
    best_index = scores.index(best_score)
    return (best_score, population[best_index], "".join(population[best_index]))

In [10]:
test_best_individual = (0,["b"], "b")
test_best_individual = get_best_individual(test_best_individual, test_population, test_scores)
print(test_best_individual)
test_best_individual = (3.0,["a","f","c"], "afc")
test_best_individual = get_best_individual(test_best_individual, test_population, test_scores)
print(test_best_individual)

(2.0, ['a', 'b', 'c'], 'abc')
(3.0, ['a', 'f', 'c'], 'afc')


<a id="pick_parents"></a>
## pick_parents

Giving the population and their scores, randomly pick the parents using the tornament method. Return the parents as a list of their genotypes **Used by**: [genetic_algorithm](#genetic_algorithm)
We randomly choose 7 different individuals and pick the best one as one parent, twice. This could cause the parents to be the same individual but it's fine. 

* **population**: a list of list of characters (one char string)
* **scores**: a list of floats

**returns** parents: a list of 2 elements, each of them a list of strs

In [11]:
def pick_parents( population: List[List[str]], scores: List[float]) -> List[List[str]]:
    parents = []
    for i in range (2):
        random_indexs = random.sample(range(0, len(population)), 7)
        selected_scores = [scores[i] for i in random_indexs]
        best_score = max(selected_scores)
        best_index = selected_scores.index(best_score)
        real_index = random_indexs[best_index]
        parents.append(population[real_index])
    return parents


In [12]:
test_parents = pick_parents(test_population, test_scores)
print(test_parents)
test_parents = pick_parents(test_population, test_scores)
print(test_parents)

[['a', 'f'], ['a', 'b', 'c']]
[['a', 'b', 'c'], ['a', 'b', 'c']]


<a id="reproduce"></a>
## reproduce

Giving a list of 2 genotypes as the parents, an integer generation to calculate the mutation rate and a string as the alphabet to choose the mutation from, return the a list 2 children as genotype **Used by**: [genetic_algorithm](#genetic_algorithm)

We determine the mutation rate as 10 / (10 + generation) so that the mutation rate is less and less later on, a similar concept as simulated annealing.
The crossover always happens, we just randomly determine the crossover place. Due to the nature of this problem, a crossover is almost always beneficial, since it will usually generate a child that's better than the parents and one worse the the parents. And we want the one that's better to survive. 

* **parents**: a list with just 2 elements of list of characters (one char string) 
* **generation**: an integer
* **alphabet**: an string to pick from for the mutations
  
**returns** children: a list of 2 elements of list of characters (one char string) 

In [13]:
def reproduce( parents: List[List[str]], generation: int, alphabet: str) -> List[List[str]]:
    shorter_length = min(len(parents[0]), len(parents[1])) # cross over
    crossover_point = random.randint(0, shorter_length-1)
    child0 = parents[0][:crossover_point] + parents[1][crossover_point:]
    child1 = parents[1][:crossover_point] + parents[0][crossover_point:]
    children = [child0, child1]
    for child in children:
        if random.uniform(0,1) < 10 / (10 + generation): #mutations
            index = random.randint(0, len(child) + 1)
            if index == len(child):
                child.pop()
            elif index == len(child) + 1:
                child.append(random.choice(alphabet))
            else:
                child[index] = random.choice(alphabet)
    return children

In [14]:
test_parents = [['a', 'b', 'c', 'a', 'b', 'c' ], ['d', 'e', 'f']]
test_generation = 0
alphabet = ALPHABET
children = reproduce(test_parents, test_generation, alphabet)
print(children)
children = reproduce(test_parents, test_generation, alphabet)
print(children)
children = reproduce(test_parents, test_generation, alphabet)
print(children)

[['a', 'b', 'f', 'x'], ['d', 'z', 'c', 'a', 'b', 'c']]
[['d', 'e', 'f'], ['a', 'b', 'c', 'a', 'a', 'c']]
[['a', 'b', 'f', 'l'], ['d', 'e', 'c', 'a', 'n', 'c']]


<a id="genetic_algorithm"></a>
### genetic_algorithm

genetic_algorithm takes in a dictionary contains all of the arguments, it contains the following:
* **population_size**: an integer for the total population size to be run on.
* **generation_limit**: an integer for the number of generation to be run.
* **alphabet**: a string for the list of characters to pick from.
* **target**: the target string
* **fitness_function** a callable that takes the individual and target and returns a score. For the nature of this problem, its maximum is the length of the target string.

It will:
1. create a population of size **population_size**
2. until we reach the best possible score or count of generations, we keep generating new generations
3. calculate the scores of the population using the **fitness_function**
4. find the best individual among the population
5. print out the current best individual every 10 generations
6. pick parents and generate their children so that the next generation is the same size of the old population
7. **returns** the best individual with its score, genotype and phenotype

In [15]:
def genetic_algorithm(arguments: Dict[str, Any]) -> Tuple[float, List[str], str]:
    population = generate_random_population(arguments["population_size"], arguments["alphabet"])
    current_generation = 0
    best_individual = (0.0, [], "")
    while current_generation < arguments["generation_limit"]:
        scores = evaluate_population(population, arguments["target"], arguments["fitness_function"])
        best_individual = get_best_individual(best_individual, population, scores)
        if current_generation % 10 == 0:
            pprint(best_individual, compact = True)
        if best_individual[0] == len(arguments["target"]):
            return best_individual
        next_population = []
        for i in range(arguments["population_size"] // 2):
            parents = pick_parents(population, scores)
            children = reproduce(parents, current_generation, arguments["alphabet"])
            for child in children:
                next_population.append(child)
        current_generation += 1
        population = next_population
    return best_individual

## Problem 1

The target is the string "this is so much fun".
The challenge, aside from implementing the basic algorithm, is deriving a fitness function based on "b" - "p" (for example).
The fitness function should come up with a fitness score based on element to element comparisons between target v. phenotype.

In [16]:
target1 = "this is so much fun"

In [17]:
arguments1 = {"population_size": 100, "generation_limit":500, "target": target1, "fitness_function": fitness_function_1, "alphabet": ALPHABET}

In [18]:
result1 = genetic_algorithm(arguments1) # do what you need to do for your implementation but don't change the lines above or below.

(2.0,
 ['a', 'd', 'u', 'm', 'w', 'y', 's', 'w', 'e', 'o', 'r', 'c', 'v', 'u', 's',
  'p', 'r', 'z', 'l'],
 'adumwysweorcvusprzl')
(11.0,
 ['t', 'h', 's', 'y', ' ', 'u', 's', ' ', 's', 'q', 'n', 'm', 'u', 'c', 'q',
  'd', 'x', 'u', 'n'],
 'thsy us sqnmucqdxun')
(15.0,
 ['t', 'h', 'n', 's', ' ', 'i', 's', ' ', 's', 'o', 'n', 'm', 'u', 'c', 'q',
  ' ', 'x', 'u', 'n'],
 'thns is sonmucq xun')
(17.0,
 ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'i',
  ' ', 'o', 'u', 'n'],
 'this is so muci oun')


In [19]:
pprint(result1, compact=True)

(19.0,
 ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h',
  ' ', 'f', 'u', 'n'],
 'this is so much fun')


## Problem 2

You should have working code now.
The goal here is to think a bit more about fitness functions.
The target string is now, 'nuf hcum os si siht'.
This is obviously target #1 but reversed.
If we just wanted to match the string, this would be trivial.
Instead, this problem, we want to "decode" the string so that the best individual displays the target forwards.
In order to do this, you'll need to come up with a fitness function that measures how successful candidates are towards this goal.
The constraint is that you may not perform any global operations on the target or individuals.
Your fitness function must still compare a single gene against a single gene.
Your solution will likely not be Pythonic but use indexing.
That's ok.
<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You may not reverse an entire string (either target or candidate) at any time.
        Everything must be a computation of one gene against one gene (one letter against one letter).
        Failure to follow these directions will result in 0 points for the problem.
    </p>
</div>

The best individual in the population is the one who expresses this string *forwards*.

"this is so much fun"

<a id="fitness_function_2"></a>
## fitness_function_2

This is the fitness function used for problem 2. `fitness_function_2` takes two strings, one is the phenotype of the current individual, and the other is the target. Returns a number that is the fitness score, the higher the better fit it is. **Used by**: [Problem_2](#Problem_2)

We calculate the score by iterating through both strings in the corresponding way and compare each letter. If they satisfy the condition of the problem, then fitness score +1.

For problem 2, we just iterate through the current individual forward and the target backward.

We penalize the length difference a bit, by dividing the score by 1+0.1*abs(length difference), just so it's faster to get to the optimal length and the crossovers become more effective.

* **current_individual**: a string.
* **target**: a string.

**returns** number: fitness_score.

In [20]:
def fitness_function_2(current_phenotype:str, target:str) -> float:
    fitness_score = 0.0
    length_difference = abs(len(current_phenotype) - len(target))
    for i in range(0, min(len(current_phenotype), len(target))):
        if current_phenotype[i] == target [-1-i]:
            fitness_score += 1.0
    fitness_score /= (1 + 0.1 * length_difference)
    return fitness_score

In [21]:
print(fitness_function_2("this",  "nuf hcum os si siht"))
print(fitness_function_2("this is so much", "nuf hcum os si siht"))
print(fitness_function_2("this is so much fun",  "nuf hcum os si siht"))
print(fitness_function_2("this is so much fun but more than that", "nuf hcum os si siht"))

1.6
10.714285714285715
19.0
6.551724137931034


In [22]:
target2 = "nuf hcum os si siht"

In [23]:
arguments2 = {"population_size": 100, "generation_limit":500, "target": target2, "fitness_function": fitness_function_2, "alphabet": ALPHABET}

In [24]:
result2 = genetic_algorithm(arguments2) # do what you need to do for your implementation but don't change the lines above or below.

(3.0,
 ['z', 'k', 'r', 'o', 't', 'z', 's', 'x', 'i', 'v', 'e', 'm', 'd', 'e', 'h',
  't', 'y', 'z', 'm'],
 'zkrotzsxivemdehtyzm')
(10.0,
 ['t', 'h', 'o', 's', 's', 'g', 's', 'p', 's', 'o', 'k', 'm', 'f', 'j', 'h',
  't', 'f', 'u', 's'],
 'thossgspsokmfjhtfus')
(15.0,
 ['t', 'h', 'o', 's', 's', 'u', 's', ' ', 's', 'o', ' ', 'm', 'u', 'j', 'h',
  ' ', 'f', 'u', 'n'],
 'thossus so mujh fun')
(17.0,
 ['t', 'h', 'i', 's', 'a', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'j', 'h',
  ' ', 'f', 'u', 'n'],
 'thisais so mujh fun')
(17.0,
 ['t', 'h', 'i', 's', 's', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'j', 'h',
  ' ', 'f', 'u', 'n'],
 'thissis so mujh fun')
(17.0,
 ['t', 'h', 'i', 's', 'r', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'f', 'h',
  ' ', 'f', 'u', 'n'],
 'thisris so mufh fun')
(18.0,
 ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'p', 'h',
  ' ', 'f', 'u', 'n'],
 'this is so muph fun')
(18.0,
 ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'f', 'h',


In [25]:
pprint(result2, compact=True)

(19.0,
 ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 's', 'o', ' ', 'm', 'u', 'c', 'h',
  ' ', 'f', 'u', 'n'],
 'this is so much fun')


## Problem 3

This is a variation on the theme of Problem 2.
The Caeser Cypher replaces each letter of a string with the letter 13 characters down alphabet (rotating from "z" back to "a" as needed).
This is also known as ROT13 (for "rotate 13").
Latin did not have spaces (and the space is not continguous with the letters a-z) so we'll remove them from our alphabet.
Again, the goal is to derive a fitness function that compares a single gene against a single gene, without global transformations.
This fitness function assigns higher scores to individuals that correctly decode the target.

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You may not apply ROT13 to an entire string (either target or candidate) at any time.
        Everything must be a computation of one gene against one gene.
        Failure to follow these directions will result in 0 points for the problem.
    </p>
</div>

The best individual will express the target *decoded*.

"thisissomuchfun"

<a id="fitness_function_3"></a>
## fitness_function_3

This is the fitness function used for problem 3. `fitness_function_3` takes two strings, one is the phenotype of the current individual, and the other is the target. Returns a number that is the fitness score, the higher the better fit it is. **Used by**: [Problem_3](#Problem_3)

We calculate the score by iterating through both strings in the corresponding way and compare each letter. If they satisfy the condition of the problem, then fitness score +1.

For problem 3, we iterate through the current string forward normally and rotate the letter to check of the target string by 13. Because the number 13 is right in the middle, it's the same rotating either one.

We penalize the length difference a bit, by dividing the score by 1+0.1*abs(length difference), just so it's faster to get to the optimal length and the crossovers become more effective.

* **current_individual**: a string.
* **target**: a string.

**returns** number: fitness_score.

In [26]:
def fitness_function_3(current_phenotype:str, target:str) -> float:
    fitness_score = 0.0
    length_difference = abs(len(current_phenotype) - len(target))
    for i in range(0, min(len(current_phenotype), len(target))):
        if ord(current_phenotype[i]) + 13 == ord(target[i][0]) or ord(current_phenotype[i]) - 13 == ord(target[i][0]):
            fitness_score += 1.0
    fitness_score /= (1 + 0.1 * length_difference)
    return fitness_score

In [27]:
print(fitness_function_3("this",  "guvfvffbzhpusha"))
print(fitness_function_3("thisissomuch", "guvfvffbzhpusha"))
print(fitness_function_3("thisissomuchfun",  "guvfvffbzhpusha"))
print(fitness_function_3("thisissomuchfunbutmorethanthat", "guvfvffbzhpusha"))

1.9047619047619047
9.23076923076923
15.0
6.0


In [28]:
ALPHABET3 = "abcdefghijklmnopqrstuvwxyz"

In [29]:
target3 = "guvfvffbzhpusha"

In [30]:
arguments3 = {"population_size": 100, "generation_limit":500, "target": target3, "fitness_function": fitness_function_3, "alphabet": ALPHABET3}

In [31]:
result3 = genetic_algorithm(arguments3) # do what you need to do for your implementation but don't change the lines above or below.

(2.727272727272727,
 ['t', 'e', 'o', 'a', 'y', 's', 'r', 'm', 'a', 'h', 'g', 'p', 'f', 'h', 'g',
  'p'],
 'teoaysrmahgpfhgp')
(10.0,
 ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'd', 'u', 'r', 'v', 'f', 's', 'n',
  'q'],
 'thisissodurvfsnq')
(14.0,
 ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 's', 'n'],
 'thisissomuchfsn')
(14.0,
 ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 's', 'n'],
 'thisissomuchfsn')
(14.0,
 ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 's', 'n'],
 'thisissomuchfsn')


In [32]:
pprint(result3, compact=True)

(15.0,
 ['t', 'h', 'i', 's', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 'u', 'n'],
 'thisissomuchfun')


## Problem 4

There is no code for this problem.

In Problem 3, we assumed we knew what the shift was in ROT-13.
What if we didn't?
Describe how you might solve that problem including a description of the solution encoding (chromosome and interpretation) and fitness function. Assume we can add spaces into the message.

For each of such problem, we need to come up with a fitness function, I don't think we can come up with one if the rule of the question is arbitrary. 
We need to know how the target looks like and how we want the individuals look like. Since they are both phenotypes, I don't see why we don't just use the target as what we want in real applications. However, in real applications, we don't know the target, we just want the best individual that satisfy some conditions. Essentially, we need to convert the target to a list of conditions to satisfy and give them weights. In this particular problem, we have a map of 't' to 'g', 'h' to 'u' and so on. And each letter has equal weight. Any letters not on the list we will give them no score. Then we can use the same fitness function except we get the result from the map.

## Challenge

**You do not need to do this problem and it won't be graded if you do. It's just here if you want to push your understanding.**

The original GA used binary encodings for everything.
We're basically using a Base 27 encoding.
You could, however, write a version of the algorithm that uses an 8 bit encoding for each letter (ignore spaces as they're a bit of a bother).
That is, a 4 letter candidate looks like this:

```
0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1
```

If you wrote your `genetic_algorithm` code general enough, with higher order functions, you should be able to implement it using bit strings instead of latin strings.

## Before You Submit...

1. Did you provide output exactly as requested?
2. Did you re-execute the entire notebook? ("Restart Kernel and Rull All Cells...")
3. If you did not complete the assignment or had difficulty please explain what gave you the most difficulty in the Markdown cell below.
4. Did you change the name of the file to `jhed_id.ipynb`?

Do not submit any other files.