# Sequence Optimisation

This notebook establishes an example of how to optimise a time sequence with a composite genotype.

The sequence comprises a number of events at discrete times where each event has two additional parameters.
The events cannot take place simultaneously at a given time, but if the need arises to have to events closely spaced, they wil follow in the first available next time slot immediately after any currently occupied time slot.

This example will optimize towards two objectives: 

- Employ the least number of events.  This is a very simple test: count the number of events and penalise too high counts.
- Optimise for the minimum distance between the required sequence and the presently tests sequence. For this example a number of required sequences will be posed, where each sequence will have a weight according to the preference order.

A key consideration in the real-world problem is that there probably is no clear global minimum (e.g., inverse maximum).  A number of potentially very different minima are expected, and the solution must balance the performance between the minima. The best fit across all minima will be sought but it is conceivable  that some form of preference ordering or weighting might be required, trading some local minima against other local minima. 



In [1]:
##
import matplotlib.pyplot as plt
import sys
import array
import random
import numpy as np
from enum import Enum,unique

from deap import  algorithms
from deap import  base
from deap import  creator
from deap import  tools

%matplotlib inline


## Optimising for Minimum or Maximum?

The real-world problem's objective is to achieve a maximum in the primary outcome, while employing the least number of events.
The problem with optimizing towards a maximum in a population with many large values is that the large values will completely swamp (and therefore hide) any smaller or poor performing values. For example the poor performing individual in this set will have little effect if the mean value (900.1) is considered:

    1  1000  1000  1000  1000  1000  1000  1000  1000  1000 

Taking the inverse values, the mean value (0.1009) is now significantly affected by the poor performing individual, and less so by the well performing individuals:

    1 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001

So the objective should be to optimise towards the best fit (approaching zero value), so that the well performing solutions will have  lower impact on fitness.  It is proposed that the final real-world optimisation should employ some 'inverted' fitness measure. The simple mathematical reciprocal may not be the best choice, some work is required here.
Irrespective of the exact mapping the effect should be inverse-related.
On the assumption of an inverse fitness measure, we define the best match to approach zero.


The following code demonstrates the problem with maximising optimisations using a simple fitness measure. It is evident that the best choice depends on the task at hand.

1. Maximising $f(x)$ towards $+\infty$ causes the (undesired) smaller outcomes to disappear in the mean and the maximum of the data set.  Neither the mean nor the max functions will optimise the low performers.

1. Minimising $f(1/x)$ towards 0 causes the (undesired) smaller outcomes to rise and have a much stronger effect than the outcomes of the well-performing outcomes (which disappears towards 0). Either the mean or the max functions will optimise the low performers.

1. Minimising $-f(x)$ towards 0 only works when using the minimise-towards-zero function, because the mean function swamps the low performer outcomes with large negative values.

The Matlab documentation and Wikipedia propose to use (3) to find the minimum-towards-zero of a function. This would work where a single value is sought, such as the maximum of a function.
Option (3) will not work when the ensemble performance is required, such as when acceptable performance from all samples are required, compared to he good performance of a single sample.



In [2]:
# to investigate the minimising options

a = np.ones(8)*1000
a[0] = 1
b = a
print('(1) Maximising towards +infinity:')
print(f'{b} \nmean={np.mean(b)} max={np.max(b)}\n')

print('(2) Minimising the inverse towards zero:')
b = 1 / a
print(f'{b} \nmean={np.mean(b):.2f} max={np.max(b)}\n')

print('(3) Minimising the negative towards zero:')
b = - a
print(f'{b} \nmean={np.mean(b):.2f} max={np.max(b)}\n')


(1) Maximising towards +infinity:
[   1. 1000. 1000. 1000. 1000. 1000. 1000. 1000.] 
mean=875.125 max=1000.0

(2) Minimising the inverse towards zero:
[1.    0.001 0.001 0.001 0.001 0.001 0.001 0.001] 
mean=0.13 max=1.0

(3) Minimising the negative towards zero:
[   -1. -1000. -1000. -1000. -1000. -1000. -1000. -1000.] 
mean=-875.12 max=-1.0



## Parameter Set


In [3]:
# to define the run settings
PopSize = 1
MaxGen = 100
MutProb = 0.2
CxProb = 0.8
tournSize=3

# default probability for Ftype.N preference
FtypeNProb = 0.3
# default prob for Ftype.M and not Ftype.S
FtypeMProb = 0.5
# default probability for Dirtype.P preference
DirtypePProb = 0.5

timemax = 10
timeinc = 0.05

The genotype has three parameters:
    
- a discrete (but real-valued) time value between 0 and `timemax`, at  `timeinc` intervals.

- a Dirtype selection between two discrete values P and S, with a probability of `DirtypePProb` of being `P`. Hence the probability of `P` is  (1-DirtypePProb).

- an Ftype selection between three discrete values `N`, `M` and `S`, with a probability of `FtypeNProb` of being `N`. If the value is not `N` the probability for `M` and not `S` is FtypeMProb. Hence the probability of `S` is  (1-FtypeNProb)(1-FtypeMProb).  

The `Ftype.N` event is the `None` event and has no influence on the external process. `Ftype.N` serves mainly as a placeholder to to fill the individual chromosome to sufficient length. `Ftype.N` is preserved during crossover, but can be overwritten by 
`Ftype.S` or `Ftype.M` during mutation.

        



In [4]:
# to define the sequence parameters and random sequence generating function
def round_down(x, a):
    return float(f'{round(x / a) * a:.2f}')

# Only these are allowed as Ftypes
@unique
class Ftype(Enum):
    N = 0
    S = 1
    M = 2
    
# Only these are allowed as Dirtype
@unique
class Dirtype(Enum):
    P = 0
    S = 1
    
def generate_event():
    di = {}
    # time between 0 and timemax
    di['time'] = round_down(random.uniform(0, timemax),timeinc)
    # odds for each of the three cases
    di['Ftype'] = Ftype.N if random.uniform(0, 1) <= FtypeNProb else \
            Ftype.M if random.uniform(0, 1) <= FtypeMProb else Ftype.S
    # odds for each of the two cases
    di['DirType'] = Dirtype.P if random.uniform(0, 1) <=DirtypePProb else Dirtype.S
        
    return di


The DEAP overview page makes an important point:  *Once the types are created you need to fill them with sometimes random values, sometime guessed ones.* So, if you have some idea of a good starting point, mix your best-guess estimates (called *prior individual solutions* in this document) with  additional random guesses, to start with a blended set.  If your guesses are good, the GA should start better, but the randomness brings in some diversity. 

The model requires the use of at least one prior individual solution as part of the population.  An unlimited number of prior individual solutions may be used, with zero or more additional random individuals.  The purpose with using prior solution individual(s) is to guide the simulation towards previously used sequences. The use of a sufficiently large number of random individuals are also necessary to introduce sufficient new diversity into the population. There is little point in using only prior solutions.

The sequence length to be used in this run is determined from the longest sequence in the prior sequence. So to define the sequence length, create at least one prior sequence with the required length.  Use any of the three `Ftype` values in the prior sequence.

The prior sequences must be present as a list of sequences, where each sequence is a list of events.

In [5]:
#  to construct the list of list of prior sequences and determine the number of events
initInd = [
    [
    {'time':0,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':0.05,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':0.55,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':0.6,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':0.8,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':0.85,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    ],
    [
    {'time':0,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':0.5,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':0.55,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':1.0,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':1.05,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':1.5,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':1.55,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':2.0,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':2.5,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':3.0,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':3.5,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':3.5,'Ftype':Ftype.S, 'DirType':Dirtype.P},
    {'time':10.,'Ftype':Ftype.N, 'DirType':Dirtype.P},
    {'time':10.,'Ftype':Ftype.N, 'DirType':Dirtype.P},
    ],
]

# count the number of events per prior
numEvents = 0
for li in range(len(initInd)):
    lenli = len(initInd[li])
    numEvents = lenli if lenli > numEvents else numEvents

# fill the all priors to the same length as the longest
print(f'Number of events per individual: {numEvents}')
for li in range(len(initInd)):
    lenli = len(initInd[li])
    while lenli < numEvents:
        initInd[li].append({'time':timemax,'Ftype':Ftype.N, 'DirType':Dirtype.P})
        lenli += 1


Number of events per individual: 14


The objective is to minimise two parameters (number of events and difference between sequences), hence the fitness function must evaluate two parameters.


In [6]:
# to create the basic DEAP objects and the random population
creator.create("FitnessMin", base.Fitness,weights=(-1.0,-1.0))
creator.create("Individual",list, fitness=creator.FitnessMin)

toolbox = base.Toolbox()
toolbox.register("individual", tools.initRepeat, creator.Individual,generate_event,numEvents) 
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

pop = toolbox.population(n=PopSize)


The prior sequences are added to the random population by using the DEAP toolbox registration procedures. 
First count the number of prior sequences (`len(initInd)`) and then step through the list of priors selecting the individual sequence by index `numInit` and then adding the events for the individual to using `tools.initIterate`, and then finally append the individual to the population.

In [7]:
# to add the prior sequences to the population
def addInitSequence(numInit):
    return initInd[numInit]
    
for numInit in range(len(initInd)):
    toolbox.register("addInitSequence", addInitSequence,numInit)
    toolbox.register("individualInit", tools.initIterate, creator.Individual,toolbox.addInitSequence) 
    pop.append(toolbox.individualInit())

In [8]:
# to print the population
def printPop(pop):
    print(60*'=')
    print(f'Population size: {len(pop)}')
    for ip in range(len(pop)):
        indv = pop[ip]
        print(60*'-')
        print(f'\npopulation[{ip}]:') 
        for ii in range(len(indv)):
            print(f'{indv[ii]}') 
    print(60*'=')

printPop(pop)

Population size: 3
------------------------------------------------------------

population[0]:
{'time': 9.7, 'Ftype': <Ftype.S: 1>, 'DirType': <Dirtype.S: 1>}
{'time': 6.9, 'Ftype': <Ftype.S: 1>, 'DirType': <Dirtype.P: 0>}
{'time': 4.8, 'Ftype': <Ftype.S: 1>, 'DirType': <Dirtype.P: 0>}
{'time': 9.55, 'Ftype': <Ftype.M: 2>, 'DirType': <Dirtype.P: 0>}
{'time': 1.9, 'Ftype': <Ftype.N: 0>, 'DirType': <Dirtype.S: 1>}
{'time': 9.85, 'Ftype': <Ftype.N: 0>, 'DirType': <Dirtype.P: 0>}
{'time': 1.25, 'Ftype': <Ftype.S: 1>, 'DirType': <Dirtype.P: 0>}
{'time': 2.7, 'Ftype': <Ftype.N: 0>, 'DirType': <Dirtype.P: 0>}
{'time': 0.2, 'Ftype': <Ftype.N: 0>, 'DirType': <Dirtype.P: 0>}
{'time': 1.45, 'Ftype': <Ftype.M: 2>, 'DirType': <Dirtype.P: 0>}
{'time': 7.2, 'Ftype': <Ftype.M: 2>, 'DirType': <Dirtype.S: 1>}
{'time': 5.25, 'Ftype': <Ftype.N: 0>, 'DirType': <Dirtype.S: 1>}
{'time': 2.05, 'Ftype': <Ftype.N: 0>, 'DirType': <Dirtype.P: 0>}
{'time': 4.5, 'Ftype': <Ftype.S: 1>, 'DirType': <Dirtype.P: 0>}
--

At this point the individuals are defined and population constructed.

## Crossover and Mutation




In [9]:
toolbox.register('mate',tools.cxOrdered)

toolbox.register('mutate',tools.mutShuffleIndexes, indpb=0.05)

toolbox.register('select',tools.selTournament, tournsize=tournSize)


In [10]:
ind = toolbox.individual()

stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)

In [11]:
def evalSeq(individual):
    diffx = np.diff(x[individual])
    diffy = np.diff(y[individual])
    distance = np.sum(diffx**2 + diffy**2)
    return (distance,)

# register the evaluation function with the toolbox
toolbox.register('evaluate',evalSeq)

In [13]:
# save only the best individual
hof = tools.HallOfFame(1)

# code the GA algorithm
result, log = algorithms.eaSimple(pop,toolbox,cxpb=CxProb, mutpb=MutProb, stats=stats,
                                  ngen=MaxGen, halloffame=hof,verbose=True)



NameError: name 'x' is not defined