# Module 3 - Programming Assignment

## General Directions

1. You must follow the Programming Requirements outlined on Canvas.
2. The Notebook should be cleanly and fully executed before submission.
3. You should change the name of this file to be your JHED id. For example, `jsmith299.ipynb` although Canvas will change it to something else...
4. You must follow the Programming Requirments for this course.

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You should always read the entire assignment before beginning your work, so that you know in advance what the requested output will be and can plan your implementation accordingly.
    </p>
</div>

<div style="color: white; background: #C83F49; margin:20px; padding: 20px;">
    <strong>Academic Integrity and Copyright</strong>
    <p>You are not permitted to consult outside sources (Stackoverflow, YouTube, ChatGPT, etc.) or use "code assistance" (Co-Pilot, etc) to complete this assignment. By submitting this assignment for grading, you certify that the submission is 100% your own work, based on course materials, group interactions, instructor guidance. You agree to comply by the requirements set forth in the Syllabus, including, by reference, the JHU KSAS/WSE Graduate Academic Misconduct Policy.</p>
    <p>Sharing this assignment either directly (e.g., email, github, homework site) or indirectly (e.g., ChatGPT, machine learning platform) is a violation of the copyright. Additionally, all such sharing is a violation the Graduate Academic Misconduct Policy (facilitating academic dishonesty is itself academic dishonesty), even after you graduate.</p>
    <p>If you have questions or if you're unsure about the policy, ask via Canvas Inbox. In this case, being forgiven is <strong>not</strong> easier than getting permission and ignorance is not an excuse.</p>
    <p>This assignment is copyright (&copy Johns Hopkins University &amp; Stephyn G. W. Butcher). All rights reserved.</p>
</div>

In [1]:
from pprint import pprint

## Local Search - Genetic Algorithm

There are some key ideas in the Genetic Algorithm.

First, there is a problem of some kind that either *is* an optimization problem or the solution can be expressed in terms of an optimization problem.
For example, if we wanted to minimize the function

$$f(x) = \sum (x_i - 0.5)^2$$

where $n = 10$.
This *is* an optimization problem. Normally, optimization problems are much, much harder.

![Eggholder](http://www.sfu.ca/~ssurjano/egg.png)!

The function we wish to optimize is often called the **objective function**.
The objective function is closely related to the **fitness** function in the GA.
If we have a **maximization** problem, then we can use the objective function directly as a fitness function.
If we have a **minimization** problem, then we need to convert the objective function into a suitable fitness function, since fitness functions must always mean "more is better".

Second, we need to *encode* candidate solutions using an "alphabet" analogous to G, A, T, C in DNA.
This encoding can be quite abstract.
You saw this in the Self Check.
There a floating point number was encoded as bits, just as in a computer and a sophisticated decoding scheme was then required.

Sometimes, the encoding need not be very complicated at all.
For example, in the real-valued GA, discussed in the Lectures, we could represent 2.73 as....2.73.
This is similarly true for a string matching problem.
We *could* encode "a" as "a", 97, or '01100001'.
And then "hello" would be:

```
["h", "e", "l", "l", "o"]
```

or

```
[104, 101, 108, 108, 111]
```

or

```
0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1
```

In Genetics terminology, this is the **chromosome** of the individual. And if this individual had the **phenotype** "h" for the first character then they would have the **genotype** for "h" (either as "h", 104, or 01101000).

To keep it straight, think **geno**type is **genes** and **pheno**type is **phenomenon**, the actual thing that the genes express.
So while we might encode a number as 10110110 (genotype), the number itself, 182, is what goes into the fitness function.
The environment operates on zebras, not the genes for stripes.

## String Matching

You are going to write a Genetic Algorithm that will solve the problem of matching a target string (at least at the start).
Now, this is kind of silly because in order for this to work, you need to know the target string and if you know the target string, why are you trying to do it?
Well, the problem is *pedagogical*.
It's a fun way of visualizing the GA at work, because as the GA finds better and better candidates, they make more and more sense.

Now, string matching is not *directly* an optimization problem so this falls under the general category of "if we convert the problem into an optimization problem we can solve it with an optimization algorithm" approach to problem solving.
This happens all the time.
We have a problem.
We can't solve it.
We convert it to a problem we *can* solve.
In this case, we're using the GA to solve the optimization part.

And all we need is some sort of measure of the difference between two strings.
The only constraint for our objective function is that it must calculate the score based on element to element (character to character) comparisons with no global transformations of the candidate or target strings.
That measure becomes our **objective function** and we can use it with the Genetic Algorithm.

Since it is probably easier to come up with a score that measures how far apart the two strings are, we will end up with an objective function that represents a **minimization problem**.
Because a fitness function must always be "more is better", we'll need to convert our objective function to a proper fitness function as well.

And since this is a GA, we need a **genotype**.
The genotype for this problem is a list of "characters" (individual letters aren't special in Python like they are in some other languages):

```
["h", "e", "l", "l", "o"]
```

and the **phenotype** is the resulting string:

```
"hello"
```

In addition to the generic code and problem specific loss function, you'll need to pick parameters for the run.
These parameters include:

1. population size
2. number of generations
3. probability of crossover
4. probability of mutation

You will also need to pick a selection algorithm, either roulette wheel or tournament selection.
In the later case, you will need a tournament size.
This is all part of the problem.

Every **ten** (10) generations, you should print out the fitness, genotype, and phenotype of the best individual in the population for the specific generation.
The function should return the best individual *of the entire run*, using the same format.

In [2]:
ALPHABET = "abcdefghijklmnopqrstuvwxyz "

In [3]:
#importing these again incase I need them
from typing import List, Tuple, Dict, Callable 

<a id="rotate_char"></a>
## `rotate_char`

This function simply "rotates" a letter by an amount. This is solely used for ROT13 problem 3. It's useful as we could not have the string come out to the appropriate decoded version without this function.

### Parameters:
- `c` (str): The char that we want to shift
- `rotation` (int): How many letters we want to move the char to.

### Returns:
- `c` (str): New char based on rotation.

In [4]:
def rotate_char(c: str, rotation: int):
    if 'a' <= c <= 'z':
        return chr(((ord(c) - ord('a') + rotation) % 26) + ord('a'))
    else:
        return c

In [5]:
# assertions/unit tests
assert rotate_char('a', 10) == 'k' #Rotate from a by 10
assert rotate_char('z', 1) == 'a' # Enure rotation wraps back to a
assert rotate_char('e', 5) == 'j' # random char rotation

<a id="evaluate_fitness"></a>
## `evaluate_fitness`

This is a key function for the genetic algorithm. it takes in the candidate, target, mode, and rotation to properly determine the fitness of the current candidate compared to the target. Without this function we could not determine how a candidate is performing thus not allowing us to get to the target.

### Parameters:
- `candidate` (str): The candidate we are evaluating
- `target` (str): The target we evaluate the candiate against
- `reversed` (bool): Whether we want the string to match forward or reverse, 
- `rotation` (int): How many letters we want to move the char to.

### Returns:
- (int) A sum comparing the candiate to the target, better the sum closer it is

In [6]:
def evaluate_fitness(candidate: str, target: str, reversed: bool=False, rotation: int=0) -> int:
    if not reversed:
        # Compare the candidate to the target directly
        return sum(1 for i, c in enumerate(candidate) if c == rotate_char(target[i], rotation))
    else:
        # Compare the candidate to the reversed target
        n = len(target)
        return sum(1 for i, c in enumerate(candidate) if c == rotate_char(target[n - i - 1], rotation))

In [7]:
# assertions/unit tests
candidate="abcd"
target="abcd"
rev_target="dcba"
rotated_target="zabc"

assert evaluate_fitness(candidate, target) == 4 # Perfect match
assert evaluate_fitness(candidate, rev_target, reversed=True) == 4 # Perfect Match for reversed string
assert evaluate_fitness(candidate, rotated_target, rotation=1) == 4 # Perfect Match for rotated target

<a id="mutate"></a>
## `mutate`

This function will "mutate" a random char withing the string passed to it. It does so based on a mutation rate, if a random number is not greater than that rate it will not change anything. This is very necessary for the genetic algorithm so we can try new strings compared to the target, without this function we would be stuck with starting strings.


### Parameters:
- `string` str: String that may get mutated
- `mutation_rate` float: Value to see if a mutation happens
- `alphabet` str: String representation of the choices that the mutate function has to try to mutate

### Returns:
- `mutated_str` (str): The mutated string

In [8]:
import random
def mutate(string: str, mutation_rate: float, alphabet: str) -> str:
    mutated_str = ''.join(
        c if random.random() > mutation_rate else random.choice(alphabet) 
        for c in string
    )
    return mutated_str


In [9]:
candidate = "abcd"
assert mutate(candidate, 0, "f") == "abcd" #No mutation should happen
assert mutate(candidate, 1, "f") == "ffff" #Mutate Everything
assert mutate(candidate, .5, "f") != "abcd" #Around half should me mutated, cant ensure a specific way but can ensure that it is not where it started


<a id="crossover"></a>
## `crossover`

This function will "crossover" two parent strings. It will choose a random point to crossover the parents and generate children. This allows the genetic algorithm to combine some strings to get new strings to determine fitness to target with.

### Parameters:
- `parent1` str: String of parent one that we want to crossover
- `parent2` str: String of second parent one that we want to crossover
- `non_random` int: Used for testing function, sets the point we want to split at if it is set

### Returns:
- `child1, child2` (tuple[str, str]): The crossed over children strings

In [10]:
def crossover(parent1: str, parent2: str, non_random: int=-1) -> tuple[str, str]:
    if non_random != -1:
        point = non_random
    else:
        point = random.randint(0, len(parent1) - 1)
    child1, child2 = parent1[:point] + parent2[point:], parent2[:point] + parent1[point:]
    return child1, child2

In [11]:
p1 = "abcd"
p2 = "wxyz"

assert crossover(p1, p2, 2) == ('abyz', 'wxcd') # Check a crossover in the middle
assert crossover(p1, p2, 0) == ('wxyz', 'abcd') # Check a crossover if at index 0, should just swap them
assert crossover(p1, p2, 3) == ('abcz', 'wxyd') # Check a crossover towards the end

<a id="initialize_population"></a>
## `initialize_population`

This function initializes the population of strings to test on. This allows the genetic algorithm to have a group of strings to evaluate against and mutate.

### Parameters:
- `target_length` int: Length of the target string so we make all population strings same length
- `population_size` int: How many strings we want to generate 
- `alphabet` str: The alphabet that the function will generate the strings with

### Returns:
- `population` (list[str]): The list of population strings

In [12]:
def initialize_population( target_length: int, population_size: int=100, alphabet: str="abcdefghijklmnopqrstuvwxyz ") -> list[str]:
    population = [''.join(random.choice(alphabet) for _ in range(target_length)) for _ in range(population_size)]
    return population

In [13]:
target_str = "abcd"

assert len(initialize_population(len(target_str))[0]) == 4 #Check that the population string length matches target string length
assert len(initialize_population(len(target_str))) == 100 #Make sure it generates the correct amount of strings
assert initialize_population(len(target_str), alphabet="a")[0] == "aaaa" #Check to ensure it generates strings with the correct alphabet

<a id="genetic_algorithm"></a>
## `genetic_algorithm`

This is the main function of our assignment. This does a loop for as many generations as needed (up to 10000 unless changed) to try and genetically mutate and crossover population strings into the target string. 

### Parameters:
- `target` str: Target string we are trying to match
- `population` int: List of strings to mutate and crossover and evaluate compared to target string 
- `generations` int: max amount of generations we are going to, will end if it finds the string early 
- `mutation_rate` float: Determines how likely a mutation is to happen on a string
- `alphabet` str: The alphabet that the algorithm will be utilizing
- `reversed` bool: Whether or not we need the string to be reversed or not when "evolving"
- `rotation` int: How many letters we are supposed to "rotate" a letter


### Returns:
- `best_individual` str: The best_individual string of the algorithm, normally the target string that has been evolved to

In [14]:
def genetic_algorithm(target: str, population: List, generations: int=10000, mutation_rate: float=0.01, alphabet: str="abcdefghijklmnopqrstuvwxyz ", reversed: bool=False, rotation: int=0) -> str:
    best_individual, best_fitness = None, -1

    for generation in range(generations):
        fitness_scores = [evaluate_fitness(individual, target, reversed, rotation) for individual in population]
        max_fitness = max(fitness_scores)
        best_in_generation = population[fitness_scores.index(max_fitness)]
        
        if max_fitness > best_fitness:
            best_fitness, best_individual = max_fitness, best_in_generation

        if generation % 10 == 0:
            print(f"Generation {generation}: Fitness: {max_fitness}, Genotype: {list(best_in_generation)}, Phenotype: {(best_in_generation)}")
        
        if max_fitness == len(target):
            print(f"Solution found in generation {generation}")
            return best_in_generation

        next_population = [mutate(crossover(*random.choices(population, weights=fitness_scores, k=2))[i], mutation_rate, alphabet) for i in range(2) for _ in range(len(population) // 2)]
        population = next_population

    print(f"Best individual after {generations} generations: Fitness: {best_fitness}, Genotype: {best_individual}, Phenotype: {''.join(best_individual)}")
    return best_individual

In [15]:
tgt = "abcd"
population = initialize_population(len(tgt), 10)

assert genetic_algorithm(tgt, ["abcd"], 100) == "abcd" #Ensure that it breaks out early since it already "mutated" the string to the target
assert genetic_algorithm(tgt, population, alphabet="abcd") == "abcd" #Ensure it can mutate to the correct string
assert genetic_algorithm(tgt, population, 5) #ensure it still returns something even if it didnt reach target within correct amount of generation

Generation 0: Fitness: 4, Genotype: ['a', 'b', 'c', 'd'], Phenotype: abcd
Solution found in generation 0
Generation 0: Fitness: 1, Genotype: ['c', 'c', 'w', 'd'], Phenotype: ccwd
Generation 10: Fitness: 1, Genotype: ['c', 'c', 'b', 'd'], Phenotype: ccbd
Generation 20: Fitness: 1, Genotype: ['c', 'c', 'd', 'd'], Phenotype: ccdd
Generation 30: Fitness: 2, Genotype: ['c', 'b', 'w', 'd'], Phenotype: cbwd
Generation 40: Fitness: 3, Genotype: ['a', 'b', 'w', 'd'], Phenotype: abwd
Generation 50: Fitness: 3, Genotype: ['a', 'b', 'w', 'd'], Phenotype: abwd
Solution found in generation 55
Generation 0: Fitness: 1, Genotype: ['c', 'c', 'w', 'd'], Phenotype: ccwd
Best individual after 5 generations: Fitness: 1, Genotype: ccwd, Phenotype: ccwd


## Problem 1

The target is the string "this is so much fun".
The challenge, aside from implementing the basic algorithm, is deriving a fitness function based on "b" - "p" (for example).
The fitness function should come up with a fitness score based on element to element comparisons between target v. phenotype.

In [16]:
target1 = "this is so much fun"

In [17]:
# set up if you need it.
population = initialize_population(len(target1), 100, ALPHABET)

In [18]:
result1 = genetic_algorithm(target1, population, alphabet=ALPHABET) # do what you need to do for your implementation but don't change the lines above or below.

Generation 0: Fitness: 4, Genotype: ['t', 'h', 'i', 'h', 'l', 'a', 'n', 'q', 'j', 'm', 'v', 'c', 'q', 'c', 'k', 'c', 'l', 'e', 'g'], Phenotype: thihlanqjmvcqckcleg
Generation 10: Fitness: 9, Genotype: ['t', 'h', 'i', 'd', ' ', 'i', 'v', 'p', 'a', 'f', ' ', 'b', 'w', 'c', 'd', 'o', 'f', 'u', 'f'], Phenotype: thid ivpaf bwcdofuf
Generation 20: Fitness: 11, Genotype: ['t', 'h', 'i', 'h', ' ', 'i', 'n', ' ', 'j', 'o', 'v', 'g', 'u', 'm', 'h', 'o', 'm', 'u', 'n'], Phenotype: thih in jovgumhomun
Generation 30: Fitness: 13, Genotype: ['t', 'h', 'i', 'd', ' ', 'i', 'n', 'q', 'n', 'q', ' ', 'm', 'u', 'c', 'h', 'f', 'f', 'u', 'n'], Phenotype: thid inqnq muchffun
Generation 40: Fitness: 13, Genotype: ['t', 'h', 'i', 'h', ' ', 'i', 'x', ' ', 'n', 'a', ' ', 'm', 'u', 'c', 'h', 'o', 'f', 'u', 'f'], Phenotype: thih ix na muchofuf
Generation 50: Fitness: 13, Genotype: ['t', 'h', 'i', 'h', ' ', 'i', 'x', ' ', 'j', 'o', 'n', 'm', 'b', 'c', 'h', 't', 'f', 'u', 'n'], Phenotype: thih ix jonmbchtfun
Generat

In [19]:
pprint(result1, compact=True)

'this is so much fun'


## Problem 2

You should have working code now.
The goal here is to think a bit more about fitness functions.
The target string is now, 'nuf hcum os si siht'.
This is obviously target #1 but reversed.
If we just wanted to match the string, this would be trivial.
Instead, this problem, we want to "decode" the string so that the best individual displays the target forwards.
In order to do this, you'll need to come up with a fitness function that measures how successful candidates are towards this goal.
The constraint is that you may not perform any global operations on the target or individuals.
Your fitness function must still compare a single gene against a single gene.
Your solution will likely not be Pythonic but use indexing.
That's ok.
<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You may not reverse an entire string (either target or candidate) at any time.
        Everything must be a computation of one gene against one gene (one letter against one letter).
        Failure to follow these directions will result in 0 points for the problem.
    </p>
</div>

The best individual in the population is the one who expresses this string *forwards*.

"this is so much fun"

In [20]:
target2 = "nuf hcum os si siht"

In [21]:
# set up if you need it.
population = initialize_population(len(target2), 100, ALPHABET)

In [22]:
result2 = genetic_algorithm(target2, population, alphabet=ALPHABET, reversed=True) # do what you need to do for your implementation but don't change the lines above or below.

Generation 0: Fitness: 2, Genotype: ['a', 'r', 't', 'j', 'k', 'h', 'd', 'n', 'd', 'c', 'w', 'm', 'd', 'q', ' ', 'h', 'h', 'y', 'n'], Phenotype: artjkhdndcwmdq hhyn
Generation 10: Fitness: 8, Genotype: ['t', 'h', 'z', 'm', 'x', 'd', 's', ' ', 'f', 'o', 'k', 'q', 'u', 'e', 'y', 'k', 'f', 'v', 'n'], Phenotype: thzmxds fokqueykfvn
Generation 20: Fitness: 10, Genotype: ['t', 'f', 'i', 'h', ' ', 'n', 's', 'e', 'f', 'o', 'l', 'm', 'u', 't', 'h', 'l', 'f', 'v', 'n'], Phenotype: tfih nsefolmuthlfvn
Generation 30: Fitness: 11, Genotype: ['t', 'h', 'i', 'h', 'l', 'e', 's', 'x', 's', 'o', 'w', 'm', 'u', 't', 'h', 'k', 'f', 'v', 'n'], Phenotype: thihlesxsowmuthkfvn
Generation 40: Fitness: 11, Genotype: ['t', 'h', 'i', 'm', 'l', 'n', 's', 'c', 's', 'o', 'w', 'm', 'u', 'j', 'h', 'g', 'f', 'o', 'n'], Phenotype: thimlnscsowmujhgfon
Generation 50: Fitness: 12, Genotype: ['t', 'h', 'i', 's', 'l', 'w', 's', 'e', 's', 'o', 'l', 'm', 'u', 't', 'h', 'l', 'f', 'v', 'n'], Phenotype: thislwsesolmuthlfvn
Generat

In [23]:
pprint(result2, compact=True)

'this is so much fun'


## Problem 3

This is a variation on the theme of Problem 2.
The Caeser Cypher replaces each letter of a string with the letter 13 characters down alphabet (rotating from "z" back to "a" as needed).
This is also known as ROT13 (for "rotate 13").
Latin did not have spaces (and the space is not continguous with the letters a-z) so we'll remove them from our alphabet.
Again, the goal is to derive a fitness function that compares a single gene against a single gene, without global transformations.
This fitness function assigns higher scores to individuals that correctly decode the target.

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You may not apply ROT13 to an entire string (either target or candidate) at any time.
        Everything must be a computation of one gene against one gene.
        Failure to follow these directions will result in 0 points for the problem.
    </p>
</div>

The best individual will express the target *decoded*.

"thisissomuchfun"

In [24]:
ALPHABET3 = "abcdefghijklmnopqrstuvwxyz"

In [25]:
target3 = "guvfvffbzhpusha"

In [26]:
# set up if you need it
population = initialize_population(len(target3), 100, ALPHABET3)

In [27]:
result3 = genetic_algorithm(target3, population, alphabet=ALPHABET3, rotation=13) # do what you need to do for your implementation but don't change the lines above or below.

Generation 0: Fitness: 3, Genotype: ['p', 'j', 'i', 'r', 'w', 'q', 'l', 'i', 's', 'u', 'f', 'h', 't', 't', 'x'], Phenotype: pjirwqlisufhttx
Generation 10: Fitness: 8, Genotype: ['t', 'e', 'i', 'r', 's', 's', 's', 'w', 'm', 'f', 'c', 't', 'f', 'u', 'm'], Phenotype: teirssswmfctfum
Generation 20: Fitness: 10, Genotype: ['t', 'h', 'i', 'r', 'd', 's', 's', 'o', 'o', 'f', 'c', 'h', 'f', 'x', 'n'], Phenotype: thirdssoofchfxn
Generation 30: Fitness: 13, Genotype: ['t', 'h', 'i', 'u', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 's', 'n'], Phenotype: thiuissomuchfsn
Generation 40: Fitness: 13, Genotype: ['t', 'h', 'i', 'b', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 'x', 'n'], Phenotype: thibissomuchfxn
Generation 50: Fitness: 13, Genotype: ['t', 'h', 'i', 'b', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 'p', 'n'], Phenotype: thibissomuchfpn
Generation 60: Fitness: 13, Genotype: ['t', 'h', 'i', 'd', 'i', 's', 's', 'o', 'm', 'u', 'c', 'h', 'f', 'h', 'n'], Phenotype: thidissomuchfhn
Generatio

In [28]:
pprint(result3, compact=True)

'thisissomuchfun'


## Problem 4

There is no code for this problem.

In Problem 3, we assumed we knew what the shift was in ROT-13.
What if we didn't?
Describe how you might solve that problem including a description of the solution encoding (chromosome and interpretation) and fitness function. Assume we can add spaces into the message.

Here is the way I would attempt to handle this. Assuming that it is still just a shift instead of an actual cipher of some sort, here are the steps I would take.

1. Create Popultation like normal
2. Create a list of all possible rotation, in our case 0-25
3. Attempt to "decode" in a similar way to problem 3, however do it for each possible rotation
4. After attempting to decode the message with specified rotation, evaluate the fitness of the decoded string by checking how many vald english words it contains
5. whichever rotation has the most english words within it should be the true decoded string (can't say that it is whatever string has english words as words can sometimes appear even though they are unintentional)

The fitness function will evaluate how closely the decoded message resembles valid English by comparing it to a dictionary of English words. This is different than the normal fitness function as it is checking for words instead of closeness to original string. Similar to what we had to do for Problem 3 but we don't know the answer.

Chromosome is the same as it has been, represented by a letter that is hopefully moved towards a decoded string.

## Challenge

**You do not need to do this problem and it won't be graded if you do. It's just here if you want to push your understanding.**

The original GA used binary encodings for everything.
We're basically using a Base 27 encoding.
You could, however, write a version of the algorithm that uses an 8 bit encoding for each letter (ignore spaces as they're a bit of a bother).
That is, a 4 letter candidate looks like this:

```
0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1
```

If you wrote your `genetic_algorithm` code general enough, with higher order functions, you should be able to implement it using bit strings instead of latin strings.

## Before You Submit...

1. Did you provide output exactly as requested?
2. Did you re-execute the entire notebook? ("Restart Kernel and Rull All Cells...")
3. If you did not complete the assignment or had difficulty please explain what gave you the most difficulty in the Markdown cell below.
4. Did you change the name of the file to `jhed_id.ipynb`?

Do not submit any other files.

In [29]:
bin(1)[2:].rjust(3, '0')

'001'

In [30]:
[0, 0, 1, 0, 1, 0, 1, 1, 1] => [1, 2, 7]

SyntaxError: cannot assign to literal (1797879610.py, line 1)

In [21]:
[4, 2, 5] => [1, 0, 0, 0, 1, 0, 1, 0, 1]

[1, 0, 0, 0, 1, 0, 1, 0, 1]

In [22]:
population = [
    [1, 0, 0, 0, 1, 0, 1, 0, 1],
    [0, 0, 1, 0, 1, 0, 1, 1, 1],
    [0, 1, 1, 0, 1, 1, 1, 0, 1]
]

In [23]:
population = [
    [1, 0, 0],
    [0, 0, 1],
    [0, 1, 1],
    [0, 1, 0]
]

0. do we crossover? rand() < 0.9
1. pick a locus/index at random...
2. cross

```
mom    010111   011001000
dad    101001   011110110

chil1  010111   011110110
chil2  101001   011001000
```
0. do we mutate? rand() < 0.05 if TRUE
1. pick a locus/index at random... 
2. pick a symbol from the alphabet at random.. [0, 1]

```
chil1  010111010110110
chil2  101001011001000
```


In [26]:
ord("a")

97

In [27]:
chr(97)

'a'

In [None]:
["h", "e", "l", "l", "o"] -> genotype
"hello" -> phenotype

In [None]:
["b", "r", "e", "a", "d"] ---> target

F(target) -> f(genotype) -> score


["r", "l", "e", "k", "m"] ---> genotype

f(genotype) -> score 