# Freeman's Word Matching Problem



This problem is defined by James Freemand in his book Simulating Neural Networks with Mathematica.
The problem is studied in Chapter 9, Introduction to Genetic Algorithms by S.N. Sivanandam.

The wordmatching problem tries to evolve an expression of ''to be or not to be'' from the randomly generated lists of letters with genetic algorithm. Since there are 26 possible letters for each of 13 locations in the list, the probability that we get the
correct phrase in a pure random way is (1/26)13 = 4.03038∗10e−19, which is about two chances out of a billion.

We use a list of ASCII integers to encode the string of letters. The lower case
letters in ASCII are represented by numbers in the range [97,122] in the decimal
number system. For example, the string of letters `tobeornottobe` is converted into
the following chromosome represented with ASCII integers:
[116,111,98,101,111,114,110,111,116,116,111,98,101].  Lowercase ASCII character have integer codes


## Kling Implementation

In [Learning DEAP from examples](https://www.amazon.com/Learning-DEAP-examples-Evolutionary-evolutionary-ebook/dp/B06XHXD2SF) Ronn Kling implements the problem in DEAP, solving for a four letter word such as `test` [116,101,115,116].  Keep to lowercase  and don't use spaces.

This examples experiments with two different fitness functions, hence all the code has to be bunched together in a function.  The fitness function has to appear quite early in the modelling process, when the individual is defined, then all the remaining model definition follows afterwards.

The DEAP scripts can all be quite similar if the standard DEAP functions are used.
See the [02-travelling-salesman.ipynb](https://github.com/NelisW/NeuralNetworks-DeepLearning-Notes/blob/master/GAGP/DEAP/code/02-travelling-salesman.ipynb) notebook for more detail on the standard DEAP code, the code will be used here with minimal additional comments.




In [1]:
import matplotlib.pyplot as plt
import sys
import array
import random
import numpy as np

from deap import  algorithms
from deap import  base
from deap import  creator
from deap import  tools

%matplotlib inline

In [19]:
# the word to be constructed
targetString = 'test'
targetString = 'tobeornottobe'
lenstr = len(targetString)
# convert to ASCII representation
stringToMatch = [ord(c) for c in targetString]


The first attempt at a fitness function compares the absolute distance between the target ASCII ordinal number and the individual ASCII ordinal number, character by character.  Kling points out that this does not work, because of the `abs()` function, where the distance between s and t is the same as the distance between u and t, these appear the same.

In [20]:
def evalString(individual):
    match = [0] * len(stringToMatch)
    for i in range(0,len(stringToMatch)): 
        match[i] = (abs(individual[i] - stringToMatch[i]))
    inputString = [chr(c) for c in individual] # turn the numbers into characters
    print(inputString, end='\r') # display them all on the same line
    return tuple(match) # has to be a tuple!

The second fitness function minimise the number of negative differences and the number of positive differences (remember that the first function cannot distinguish between the two).  The function returns only two values, the numbers of negative and positive differences. These two differences are calculated across the length of the string.

In [21]:
def evalStringMinPos(individual):
    match = [0.0, 0.0]
    for i in range(0, len(stringToMatch)) :
        if ((individual[i] - stringToMatch[i]) < 0.0) :
            match[0] = match[0] + 1
        if ((individual[i] - stringToMatch[i]) > 0.0) :
            match[1] = match[1] + 1
    inputString = [chr(c) for c in individual] # turn the numbers into characters
    print(inputString, end='\r') # display them all on the same line
    return tuple(match) # has to be a tuple!

Amit Kapoor uses a different fitness function in his book:  
https://hub.packtpub.com/using-genetic-algorithms-for-optimizing-your-models-tutorial/  
https://www.packtpub.com/big-data-and-business-intelligence/hands-artificial-intelligence-iot  
which is the sum of all equality tests, i.e., all letters must match exactly, and the fitness value is driven to a maximum value, not to minima as in Kling's implementation.

    def evalStringSum(individual, word):
        #word = list('hello')
        rtnval = sum(individual[i] == word[i] for i in range(len(individual)))
        return rtnval,


The following operators are used in the model:

`cxTwoPoint` takes a segment of the gene, defined by a start and end point, and swaps it with the other parents segment.

`mutShuffleIndexes` ensures that no new values are introduced into the population.

In [22]:
def setupAndRun(popsize,evalFun,verbose=0):

    print(f'Using fitness function {evalFun.__name__}')
    # create fitness method
    if 'evalStringMinPos' in evalFun.__name__:
        creator.create("FitnessMin", base.Fitness,weights=(-1,-1))
    else:
        creator.create("FitnessMin", base.Fitness,weights=tuple([-1.0 for i in targetString] ))
    # create individual method
    creator.create("Individual",list, fitness=creator.FitnessMin)
    # toolbox
    toolbox = base.Toolbox()
    # create the random ASCII attributes: a->97, z->121 (both values inclusive)
    toolbox.register('attrASCII',random.randint,97,122)
    # create the individual and population generating methods
    toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attrASCII,len(stringToMatch))
    toolbox.register("population", tools.initRepeat, list, toolbox.individual)    
    # register the crossover operator
    toolbox.register("mate", tools.cxTwoPoint)
    # register a mutation operator
    toolbox.register("mutate", tools.mutShuffleIndexes, indpb=0.05)
    toolbox.register("select", tools.selTournament, tournsize=3)

    # register the goal / fitness function
    toolbox.register("evaluate",evalFun)

    random.seed(64)
    # create a small initial population individuals (where each individual is a list of integers)
    pop = toolbox.population(n=300)
    #only save the very best one
    hof = tools.HallOfFame(1)
    stats = tools.Statistics(lambda ind: ind.fitness.values)
    stats.register("avg", np.mean)
    stats.register("std", np.std)
    stats.register("min", np.min)
    stats.register("max", np.max)
    # use one of the built in GA's with a probablilty of mating of 0.7
    # a probability of mutating 0.2 and 140 generations.
    algorithms.eaSimple(pop, toolbox, 0.7, 0.2, 140, stats=stats,
                        halloffame=hof, verbose=verbose)
    best_ind = tools.selBest(pop, 1)[0]
    print(f'Best individual = {best_ind}, fitness = {best_ind.fitness.values}')
    inputString = [chr(c) for c in best_ind]
    print(f'best guess={inputString}', end='\n')


In [23]:
print(f'targetString = {targetString}')
setupAndRun(popsize=300,evalFun=evalString)

targetString = tobeornottobe
Using fitness function evalString
Best individual = [116, 111, 98, 101, 111, 115, 110, 111, 116, 116, 111, 98, 102], fitness = (0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0)
best guess=['t', 'o', 'b', 'e', 'o', 's', 'n', 'o', 't', 't', 'o', 'b', 'f']


In [24]:
print(f'targetString = {targetString}')
setupAndRun(popsize=300,evalFun=evalStringMinPos,verbose=1)

targetString = tobeornottobe
Using fitness function evalStringMinPos
gen	nevals	avg    	std    	min	max
0  	300   	6.22667	1.64173	2  	11 
1  	234   	6.11167	2.05812	2  	11 
2  	226   	6.01167	2.75225	1  	11 
3  	226   	5.92167	3.44028	0  	12 
4  	225   	5.78833	4.09474	0  	13 
5  	232   	5.595  	4.59031	0  	13 
6  	223   	5.435  	5.00491	0  	13 
7  	239   	5.31167	5.24797	0  	13 
8  	207   	4.935  	4.92721	0  	13 
9  	213   	4.53667	4.54701	0  	12 
10 	253   	4.19333	4.20745	0  	12 
11 	217   	3.75667	3.7812 	0  	10 
12 	233   	3.345  	3.36194	0  	9  
13 	239   	3.075  	3.11706	0  	9  
14 	241   	2.70333	2.74566	0  	9  
15 	231   	2.325  	2.38803	0  	8  
16 	220   	1.89333	1.94815	0  	7  
17 	201   	1.56   	1.62062	0  	7  
18 	223   	1.28167	1.32627	0  	5  
19 	214   	1.05167	1.09954	0  	5  
20 	223   	0.868333	0.929694	0  	5  
21 	239   	0.693333	0.734363	0  	4  
22 	223   	0.621667	0.659695	0  	4  
23 	234   	0.573333	0.595502	0  	3  
24 	224   	0.588333	0.618221	0  	4  
25 	216   	