## First Practical Exercise - the GA 
Goal of this exercise is to evolve the string, "Welcome to CS547!"
Main aim is to demonstrate the principles of evolutionary algorithms.

## Constants and Imports
We will need various constants such as the target string, population size, mutation rate etc. along with imports for the packages we are going to make use of.

In [None]:
import string
import random
random.seed(42) # initialse and make repeatable
from operator import itemgetter

# The target string as we will be making reference to that.
target = "Welcome to CS547!"
#Population size
pop_size=50
crossover_rate = 0.75
mutation_rate = 0.05

### Representation
Always the first thing to think about - how do we represent a solution to the problem (i.e. how should this be encoded and what data type (or types) should be used. It can often help to think about how you might manipulate this (to generate new solutions) and evaluate it. <br>
Also think about how you are going to populate a solution. To make things easy we can limit ourselves to the set of printable characters (**Hint**: Python provides support for this but there are limitations with the string.printable set so it makes sense to restrict this further to the ASCII values 32 to 126).<br>
We are also going to simplify things by basing the size of the solution on the size of the target - i.e. a total of 17 characters. This is something you could try relaxing later and letting the algorithm solve. 

In [None]:
# Individual is just going to be a string.
# We'll make it easy and also contrain the length to 17 characters (although we could relax this later).
# We also need to think about the source charaacters for the string. 
# Python makes this very easy so we can just use the string module and draw from the printable character set
# char_source = string.printable 
char_source = ''.join(chr(i) for i in range(32, 127))
#print(char_source)

## Fitness Function
This defines how we score or evaluate a solution. You can make use of the target string at this point.<br>
The fitness function has an important role is guiding the search towards the solution, so it is essential that we are able to identify when one solution is better than another.<br>
There are various alternatives but the more information you provide the quicker the search is likely to be.

In [None]:
# Simple approach is to count the number of characters in the target that are in the correct place.
# Could modify this to also include right characters wrong place.
# Alternatively calculate the ascii distance.
def simple_fitness(solution):
    # Simple option: count number of characters that are correct
    fitness = 0
    for i in range(len(solution)):
        if solution[i] == target[i]:
            fitness += 1
    return fitness

# test the above
print(simple_fitness("Welcome to CS547!")) # should be 17
print(simple_fitness("wolcoNeXt!dCSe47$")) # should be 9
print(simple_fitness("This is!miles off")) # should be 0

In [None]:
# Alternative fitness function to calculate the ascii distance between a solution and the target.
# In this case a lower value is better so we will negate the score
# Input parameters: solution
# Returns: The fitness of the solution
def fitness(solution):
    # Simple option: count number of characters that are correct
    fitness = 0
    for i in range(len(solution)):
        fitness += abs(ord(solution[i])-ord(target[i]))
    return -fitness

# test the above (precise results will depend on your function)
# print(fitness("Welcome to CS547!")) # output should be the highest score
# print(fitness("wolcoNeXt!dCSe47$")) # output should be a mid-range score
# print(fitness("This is!miles off")) # output should be the lowest score

## The GA algorithm
The main steps are:

- Generate a random population
- Evaluate the fitness of the population
- While a solution has not been found
    - Select and crossover individuals to create a new population
    - Mutate some individuals of the new population
    - Make the new population the current population
    - Evaluate the fitness of the current population

# Generate a random population
Store the population as a list of tuples so that we can associate a fitness value with each element.

We also need a function to generate a random string of characters from our char_source string.

### Generate an Individual
Create a random string from the printable character set which is the same size as the target

In [169]:
# Generate a random solution(individual) of the same size as the target
# Input parameters: none
# Returns: a randomly generated individual
def gen_individual():
    individual = ''
    for i in range(len(target)):
        index = random.randrange(len(char_source))
        individual += char_source[index]
    return individual

# test...
#print(gen_individual())

### Create the population
Using the genIndividual() function above, build a population from individuals which are represented as lists [string, fitness]. The fitness will be inially 0 as we have not calculated it yet. The population itself will be a list of these lists.

i.e. population = [["individual1", fitness1], ["individual2", fitness2], ["individual3", fitness3], ... etc.] 

In [171]:
# Input parameters: none
# Returns: A population of randomly generated individuals with associated fitness values (set to 0)
def gen_population():
    pop = []
    for i in range(pop_size):
        pop.append([gen_individual(), 0])
    return pop

# test 
# print(gen_population())

[["Fj|i!AsNx>'u/[G4S", 0], ['w`zGx/qEOn<<1]3Zm', 0], ['OUyf\\du;?wl*cYczN', 0], [")h.'f`9id35IbX.w:", 0], ["{j^+aY'Z0aUZh'g[v", 0], ['G|"R@ ;j)%VLy(e\'(', 0], ['\\$DT71r}rUOPYPP*w', 0], ['te1sL/6dRc0}< "F[', 0], ['v|eVdP=?ZL3C8|.$t', 0], ['Un">:(,l$Ylvz&?~%', 0], ["SX=e;'1`E=}iHilvI", 0], [">F2tb<TFC'gk~6pvV", 0], ['g_&LruPcHyUT3FP7d', 0], ["\\><Fz2['gTUgc1Q?@", 0], [':Jr*YO+d|8&BPvmm%', 0], [')8k|ug;]:JF!;8~/]', 0], ['?ymz:R>fIDP[dsMGA', 0], ['Na_[,|\\H:OHT%h<~2', 0], ['"Afjj|UE39J=Ph?_f', 0], ['swK@^|r~^Z5}M51|e', 0], ["^7er'c$)u& T1p=(z", 0], ['3!;`ZO\'oqun]t^" d', 0], ['fT!"c|CdD"`yvW6-,', 0], ['c3>8oc@MBR*OSZh?y', 0], ["<Fw*ss$+SPPf\\'q!y", 0], ['5*_WrJh,c%=;xh\\B%', 0], [')wCeht$6H":k2{R)F', 0], ['4h>hQveJQ~1x|*`L&', 0], [',W=)KmnlRI#qBY^=M', 0], ['fPW7wjtP*oE?{)*B3', 0], ['P{q3~QHN-+ GXNB-0', 0], ['+7WYggaT-#+Mf+llI', 0], ['Q!ETQ*|g?ib5wP51B', 0], ['FB_2(5WCUF])N@?|p', 0], ['_ln9Z-1F RJoPJXJW', 0], ['sl1FImx9]H6RHE~xq', 0], ['^i?IPCRN.h9ke7wf#', 0], ['}[z:XEx(Tv_1qF>@t', 0], ['3

### Evaluate the population
Go through each element in the population, calculating its fitness and associating the fitness value with the individual in the tuple.

We will also need a function that returns the best individual in the population and its associated fitness.

In [2]:
# Input parameters: A population of individuals (solutions)
# Returns: The population modified with the computer fitness of each individual
def evaluate_pop(pop):
    score = 0
    for individual in pop:
        score = fitness(individual[0])
        individual[1] = score
    return pop

# Test
#newpop = gen_population()
#print(evaluate_pop(newpop))

In [4]:
# Input parameters: A population of individuals (solutions)
# Returns: The fittest individual (with its score)
def fittest_individual(pop):
    best_fitness = -1000
    for i in range(len(pop)):
        if fitness(pop[i][0]) > best_fitness:
                best_fitness = fitness(pop[i][0])
                best_solution = pop[i]
    return best_solution

#test
#newpop = gen_population()
#print(fittest_individual(newpop))

### Select Individuals and Crossover to create the next generation
This step involves two sub-steps - the selection of parents and recombination to create offspring.

The selection is going to be very simple - we are going to randomly select from the top 50% of solutions. Selection is with relacement so there is a chance that some individuals may be chosen more than once whereas others may never be selected as parents. We will break this down into two steps: 
- Define a function to return the top 50% of the population
- The main selection loop which repeatedly (and randomly) selects parents from this subset from which new individuals (children) are generated to create the new population. This will also involve defining a function to perform the crossover operation.

In [6]:
# orders the list according to fitness and returns the top 50%

# Input parameters: A population of individuals
# Returns: The top 50% of the individuals (ranked by fitness)
def find_top_50(pop):
    sortedPop = sorted(pop, key=itemgetter(1), reverse = True)
    return sortedPop[0:int(pop_size/2)]

#test
#newpop = genPopulation()
#print(find_top_50(newpop))

# Input parameters: Two parents (tuples of the form <individual, fitness>)
# Returns: Two children after combining the two parents (tuples of the form <individual, 0>)
def crossover(parent1, parent2):
    crossover_point = random.randrange(len(parent1[0]))
    child1 = [parent1[0][:crossover_point]+parent2[0][crossover_point:],0]
    child2 = [parent2[0][:crossover_point]+parent1[0][crossover_point:],0]
    return child1, child2

# test 
# print(crossover(("abcdefg",12),("hijklmn",34)))
    
# Input parameters: a population
# Returns: The new population after selection and crossover have been applied
def select_and_generate_new_population(pop):
    parent_pool = find_top_50(pop)
    new_generation = []
    for i in range(len(parent_pool)):
        parent1 = parent_pool[random.randrange(len(parent_pool))]
        parent2 = parent_pool[random.randrange(len(parent_pool))]
        if random.random() < crossover_rate:
            child1, child2 = crossover(parent1, parent2)
        else:
            child1 = parent1
            child2 = parent2
        new_generation.append(child1)
        new_generation.append(child2)
    return new_generation

# test
#newpop = gen_population()
#select_and_generate_new_population(newpop)

### Mutate the new population
Mutation is applied to the new population. The mutation rate is typically set quite low (e.g. 0.05). There are several mutation operations that may be applied in this case but change should be quite small.

In [14]:
# Function to mutate individuals in the population according to the mutation rate
# Input parameters: A population
# Returns: A modified population
def mutate(pop):
    for i in range(len(pop)):
        if random.random() < mutation_rate:
            mutation_point = random.randrange(len(pop[i][0]))
            # mutation is just a random character but could also move character +/- 1 
            # to be more comparable with the hill climber
            random_char = char_source[random.randrange(len(char_source))]
            mutation = pop[i][0][:mutation_point]+random_char+pop[i][0][mutation_point+1:]
            pop[i][0] = mutation
            #pop[i] = [mutation,0]
            #print("changed")
            #print(i)
    return pop

# test
#apop = gen_population()
#print(apop)
#anotherpop = mutate(apop)
#print(anotherpop)

### Main GA Loop

In [17]:
# Main GA loop
pop = gen_population()
scored_pop = evaluate_pop(pop)
i = 0
print("Generation",i)
print(fittest_individual(scored_pop))
while fittest_individual(scored_pop)[1] <0:
    i += 1
    new_pop = []
    new_pop = select_and_generate_new_population(scored_pop)
    mutated_pop = mutate(new_pop)
    pop = mutated_pop
    scored_pop = evaluate_pop(pop)
    print("Generation",i)
    print(fittest_individual(scored_pop))
    

NameError: name 'gen_population' is not defined