# Freeman's Word Matching Problem



This problem is defined by James Freeman d in his book Simulating Neural Networks with Mathematica.
The problem is studied in Chapter 9, Introduction to Genetic Algorithms by S.N. Sivanandam.

The word matching problem tries to evolve an expression of ''to be or not to be'' from the randomly generated lists of letters with genetic algorithm. Since there are 26 possible letters for each of 13 locations in the list, the probability that we get the
correct phrase in a pure random way is (1/26)13 = 4.03038∗10e−19, which is about two chances out of a billion.

We use a list of ASCII integers to encode the string of letters. The lower case
letters in ASCII are represented by numbers in the range [97,122] in the decimal
number system. For example, the string of letters `tobeornottobe` is converted into
the following chromosome represented with ASCII integers:
[116,111,98,101,111,114,110,111,116,116,111,98,101].  Lowercase ASCII character have integer codes


## Kapoor Implementation

https://hub.packtpub.com/using-genetic-algorithms-for-optimizing-your-models-tutorial/  
https://www.packtpub.com/big-data-and-business-intelligence/hands-artificial-intelligence-iot   

BTW the book has some nice ML chapters but not much on IOT.

The DEAP scripts can all be quite similar if the standard DEAP functions are used.
See the [02-travelling-salesman.ipynb](https://github.com/NelisW/NeuralNetworks-DeepLearning-Notes/blob/master/GAGP/DEAP/code/02-travelling-salesman.ipynb) notebook for more detail on the standard DEAP code, the code will be used here with minimal additional comments.




As the first step, we import the modules we will need. We use the string module and the random module to generate random characters from (a—z, A—Z, and 0—9). From the DEAP module, we use creator, base, and tools:

In [3]:
import string
import random

from deap import base, creator, tools

In DEAP, we start with creating a class that inherits from the deep.base module. We need to tell it whether we are going to have a minimization or maximization of the function; this is done using the weights parameter. A value of +1 means we are maximizing (for minimizing, we give the value -1.0). The following code line will create a class, FitnessMax, that will maximize the function:

In [4]:
## Create a Finess base class which is to be minimized
# weights is a tuple -sign tells to minimize, +1 to maximize

creator.create("FitnessMax", base.Fitness, weights=(1.0,))  





This will define a class ```FitnessMax``` which inherits the Fitness class of deep.base module. The attribute weight which is a tuple is used to specify whether fitness function is to be maximized (weights=1.0) or minimized weights=-1.0. The DEAP library allows multi-objective Fitness function. 

### Individual

Next we create a ```Individual``` class, which inherits the class ```list``` and has the ```FitnessMax``` class in its Fitness attribute. 

In [5]:
# Now we create a individual class

creator.create("Individual", list, fitness=creator.FitnessMax)

# Population

Once the individuals are created we need to create population and define gene pool, to do this we use DEAP toolbox. All the objects that we will need now onwards- an individual, the population, the functions, the operators and the arguments are stored in the container called ```Toolbox```

We can add or remove content in the container ```Toolbox``` using ```register()``` and ```unregister()``` methods

Now, with the Individual class defined, we use the toolbox of DEAP defined in the base module. We will use it to create a population and define our gene pool. All the objects that we will need from now onward—an individual, the population, the functions, the operators, and the arguments—are stored in a container called toolbox. We can add or remove content to/from the toolbox container using the register() and unregister() methods:

In [6]:
toolbox = base.Toolbox()

# Gene Pool
toolbox.register("attr_string", random.choice, string.ascii_letters + string.digits )

Now that we have defined how the gene pool will be created, we create an individual and then a population by repeatedly using the Individual class. We will pass the class to the toolbox responsible for creating a N parameter , telling it how many genes to produce:

In [7]:
#Number of characters in word
word = list('hello')
N = len(word)

# Initialize population
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_string, N )
toolbox.register("population",tools.initRepeat, list, toolbox.individual)

We define the fitness function. Note the comma in the return statement. This is because the fitness function in DEAP is returned as a tuple to allow multi-objective fitness functions:

In [8]:
def evalWord(individual, word):
    #word = list('hello')
    rtnval = sum(individual[i] == word[i] for i in range(len(individual)))
    return rtnval,


Add the fitness function to the container. Also, add the crossover operator, mutation operator, and parent selector operator. You can see that, for this, we are using the register function. In the first statement, we register the fitness function that we have defined, along with the additional arguments it will take. The next statement registers the crossover operation; it specifies that here we are using a two-point crossover (cxTwoPoint). Next, we register the mutation operator; we choose the mutShuffleIndexes option, which shuffles the attributes of the input individual with a probability indpb=0.05. And finally, we define how the parents are selected; here, we have defined the method of selection as tournament selection with a tournament size of 3:

In [9]:
toolbox.register("evaluate", evalWord, word)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutShuffleIndexes, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=3)

We define the other operators/functions we will need by registering them in the toolbox. This allows us to easily switch between the operators if desired.

## Evolving the Population
Once the representation and the genetic operators are chosen, we will define an algorithm combining all the individual parts and performing the evolution of our population until the One Max problem is solved. It is good style in programming to do so within a function, generally named main().

Creating the Population
First of all, we need to actually instantiate our population. But this step is effortlessly done using the population() method we registered in our toolbox earlier on.

Now we have all the ingredients, so we will write down the code of the genetic algorithm, which will perform the steps we mentioned earlier in a repetitive manner:

In [10]:
def main():
    random.seed(64)

    # create an initial population of 300 individuals (where
    # each individual is a list of integers)
    pop = toolbox.population(n=300)

    # CXPB  is the probability with which two individuals
    #       are crossed
    #
    # MUTPB is the probability for mutating an individual
    CXPB, MUTPB = 0.5, 0.2
    
    print("Start of evolution")
    
    # Evaluate the entire population
    fitnesses = list(map(toolbox.evaluate, pop))
    for ind, fit in zip(pop, fitnesses):
        #print(ind, fit)
        ind.fitness.values = fit
    
    print("  Evaluated %i individuals" % len(pop))

    # Extracting all the fitnesses of 
    fits = [ind.fitness.values[0] for ind in pop]

    # Variable keeping track of the number of generations
    g = 0
    
    # Begin the evolution
    while max(fits) < 5 and g < 1000:
        # A new generation
        g = g + 1
        print("-- Generation %i --" % g)
        
        # Select the next generation individuals
        offspring = toolbox.select(pop, len(pop))
        # Clone the selected individuals
        offspring = list(map(toolbox.clone, offspring))
    
        # Apply crossover and mutation on the offspring
        for child1, child2 in zip(offspring[::2], offspring[1::2]):

            # cross two individuals with probability CXPB
            if random.random() < CXPB:
                toolbox.mate(child1, child2)

                # fitness values of the children
                # must be recalculated later
                del child1.fitness.values
                del child2.fitness.values

        for mutant in offspring:

            # mutate an individual with probability MUTPB
            if random.random() < MUTPB:
                toolbox.mutate(mutant)
                del mutant.fitness.values
    
        # Evaluate the individuals with an invalid fitness
        invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
        fitnesses = map(toolbox.evaluate, invalid_ind)
        for ind, fit in zip(invalid_ind, fitnesses):
            ind.fitness.values = fit
        
        print("  Evaluated %i individuals" % len(invalid_ind))
        
        # The population is entirely replaced by the offspring
        pop[:] = offspring
        
        # Gather all the fitnesses in one list and print the stats
        fits = [ind.fitness.values[0] for ind in pop]
        
        length = len(pop)
        mean = sum(fits) / length
        sum2 = sum(x*x for x in fits)
        std = abs(sum2 / length - mean**2)**0.5
        
        print("  Min %s" % min(fits))
        print("  Max %s" % max(fits))
        print("  Avg %s" % mean)
        print("  Std %s" % std)
    
    print("-- End of (successful) evolution --")
    
    best_ind = tools.selBest(pop, 1)[0]
    print("Best individual is %s, %s" % (''.join(best_ind), best_ind.fitness.values))

In [11]:
main()

Start of evolution
  Evaluated 300 individuals
-- Generation 1 --
  Evaluated 178 individuals
  Min 0.0
  Max 2.0
  Avg 0.22
  Std 0.4526956299030656
-- Generation 2 --
  Evaluated 174 individuals
  Min 0.0
  Max 2.0
  Avg 0.51
  Std 0.613650280425803
-- Generation 3 --
  Evaluated 191 individuals
  Min 0.0
  Max 3.0
  Avg 0.9766666666666667
  Std 0.6502221842484989
-- Generation 4 --
  Evaluated 167 individuals
  Min 0.0
  Max 4.0
  Avg 1.45
  Std 0.6934214687571574
-- Generation 5 --
  Evaluated 191 individuals
  Min 0.0
  Max 4.0
  Avg 1.9833333333333334
  Std 0.7765665171481163
-- Generation 6 --
  Evaluated 168 individuals
  Min 0.0
  Max 4.0
  Avg 2.48
  Std 0.7678541528180985
-- Generation 7 --
  Evaluated 192 individuals
  Min 1.0
  Max 5.0
  Avg 3.013333333333333
  Std 0.6829999186595044
-- End of (successful) evolution --
Best individual is hello, (5.0,)
