# Sequence Optimisation

This notebook establishes an example of how to optimise a time sequence with a composite genotype.

The sequence comprises a number of events at discrete times where each event has two additional parameters.
The events cannot take place simultaneously at a given time, but if the need arises to have to events closely spaced, they wil follow in the first available next time slot immediately after any currently occupied time slot.

The real-world problem's objective is to achieve a maximum in the primary outcome, while employing the least number of events.
The problem with optimizing towards a maximum in a population with many large values is that the large values will completely swamp (and therefore hide) any smaller or poor performing values. For example the poor performing individual in this set will have little effect if the mean value (900.1) is considered:

    1  1000  1000  1000  1000  1000  1000  1000  1000  1000 

Taking the inverse values, the mean value (0.1009) is now significantly affected by the poor performing individual, and less so by the well performing individuals:

    1 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001

So the objective should be to optimise towards the best fit (approaching zero value), so that the well performing solutions will have  lower impact on fitness.  It is proposed that the final real-world optimisation should employ some 'inverted' fitness measure. The simple mathematical reciprocal may not be the best choice, some work is required here.
Irrespective of the exact mapping the effect should be inverse-related.
On the assumption of an inverse fitness measure, we define the best match to approach zero.

A key consideration in the real-world problem is that there probably is no clear global minimum (e.g., inverse maximum).  A number of potentially very different minima are expected, and the solution must balance the performance between the minima. The best fit across all minima will be sought but it is conceivable  that some form of preference ordering or weighting might be required, trading some local minima against other local minima. 

This example will optimize towards two objectives: 

- Employ the least number of events.  This is a very simple test: count the number of events.
- Globally best match to a set of pre-defined sequences. For this example a number of sequences will be posed, where each sequence will have a weight according to the preference order.


In [24]:
a = np.ones(10)*1000
a[0] = 1
print(f'{a} mean={np.mean(a)}')
a = 1 / a
print(f'{a} mean={np.mean(a):.2f}')
a = np.log(1 / (1+a))
print(f'{a} mean={np.mean(np.exp(a)):.2f}')



[   1. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000.] mean=900.1
[1.    0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001] mean=0.10
[-0.69314718 -0.0009995  -0.0009995  -0.0009995  -0.0009995  -0.0009995
 -0.0009995  -0.0009995  -0.0009995  -0.0009995 ] mean=0.95


In [25]:
import matplotlib.pyplot as plt
import sys
import array
import random
import numpy as np
from enum import Enum,unique

from deap import  algorithms
from deap import  base
from deap import  creator
from deap import  tools

%matplotlib inline


In [26]:
PopSize = 100
MaxGen = 100
MutProb = 0.2
CxProb = 0.8
tournSize=3

# default probability for Ftype.S preference
FtypeSProb = 0.5
# default probability for Dirtype.P preference
DirtypePProb = 0.5

timemax = 5
timeinc = 0.05

The genotype has three parameters:
    
- a discrete (but real-valued) time value between 0 and `timemax`, at  `timeinc` intervals.
- an Ftype selection between two discrete values `M` and `S`, with a probability of `FtypeSProb` of being `S`.
- an Dirtype selection between two discrete values P and S, with a probability of `DirtypePProb` of being `P`.



In [27]:
def round_down(x, a):
    return round(x / a) * a

# Only these are allowed as Ftypes
@unique
class Ftype(Enum):
    M = 0
    S = 1
    
# Only these are allowed as Dirtype
@unique
class Dirtype(Enum):
    P = 0
    S = 1
    
def generate_event():
    di = {}
    # time between 0 and timemax
    di['time'] = round_down(random.uniform(0, timemax),timeinc)
    # even odds for each of the two cases
    di['Ftype'] = Ftype.S if random.uniform(0, 1) <=FtypeSProb else Ftype.M
    # even odds for each of the two cases
    di['DirType'] = Dirtype.P if random.uniform(0, 1) <=DirtypePProb else Dirtype.S
    return di


The DEAP overview page makes an important point:  *Once the types are created you need to fill them with sometimes random values, sometime guessed ones.* So, if you have some idea of a good starting point, mix your best-guess estimates with  additional random guesses, to start with a blended set.  If your guesses are good, the GA should start better, but the randomness brings in some diversity.  Construct a best-guess individual with a fitness function and then insert it into the population using `pop.append(guess_ind)` or `population.insert(0, guess_ind)`.

The objective is to minimise two parameters:
- the difference between the required code and the individual code
- the number of events


In [28]:
creator.create("FitnessMin", base.Fitness,weights=(-1.0,-1.0))
creator.create("Individual",dict, fitness=creator.FitnessMin)

toolbox = base.Toolbox()
toolbox.register("individual", tools.initIterate, creator.Individual,generate_event) 
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

pop = toolbox.population(n=PopSize)




In [29]:
print(f'Individual: {creator.Individual()}')  
print(f'individual: {toolbox.individual()}')  
print(f'population: {toolbox.population(n=10)}') 

Individual: {}
individual: {'time': 2.95, 'Ftype': <Ftype.M: 0>, 'DirType': <Dirtype.S: 1>}
population: [{'time': 2.95, 'Ftype': <Ftype.M: 0>, 'DirType': <Dirtype.S: 1>}, {'time': 0.65, 'Ftype': <Ftype.S: 1>, 'DirType': <Dirtype.P: 0>}, {'time': 0.2, 'Ftype': <Ftype.M: 0>, 'DirType': <Dirtype.S: 1>}, {'time': 4.1000000000000005, 'Ftype': <Ftype.M: 0>, 'DirType': <Dirtype.P: 0>}, {'time': 1.7000000000000002, 'Ftype': <Ftype.S: 1>, 'DirType': <Dirtype.P: 0>}, {'time': 1.3, 'Ftype': <Ftype.M: 0>, 'DirType': <Dirtype.S: 1>}, {'time': 2.25, 'Ftype': <Ftype.S: 1>, 'DirType': <Dirtype.S: 1>}, {'time': 3.8000000000000003, 'Ftype': <Ftype.M: 0>, 'DirType': <Dirtype.S: 1>}, {'time': 0.05, 'Ftype': <Ftype.M: 0>, 'DirType': <Dirtype.P: 0>}, {'time': 4.4, 'Ftype': <Ftype.M: 0>, 'DirType': <Dirtype.P: 0>}]


In [30]:
toolbox.register('mate',tools.cxOrdered)

toolbox.register('mutate',tools.mutShuffleIndexes, indpb=0.05)

toolbox.register('select',tools.selTournament, tournsize=tournSize)


In [31]:
ind = toolbox.individual()

stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)

In [32]:
def evalSeq(individual):
    diffx = np.diff(x[individual])
    diffy = np.diff(y[individual])
    distance = np.sum(diffx**2 + diffy**2)
    return (distance,)

# register the evaluation function with the toolbox
toolbox.register('evaluate',evalSeq)

In [33]:
#initial populatiom
pop = toolbox.population(n=POP_SIZE)

# save only the best individual
hof = tools.HallOfFame(1)

# code the GA algorithm
result, log = algorithms.eaSimple(pop,toolbox,cxpb=CxProb, mutpb=MutProb, stats=stats,
                                  ngen=MaxGen, halloffame=hof,verbose=True)



NameError: name 'x' is not defined