# Module 5 - Programming Assignment

## Directions

There are general instructions on Blackboard and in the Syllabus for Programming Assignments. This Notebook also has instructions specific to this assignment. Read all the instructions carefully and make sure you understand them. Please ask questions on the discussion boards or email me at `EN605.445@gmail.com` if you do not understand something.

<div style="background: mistyrose; color: firebrick; border: 2px solid darkred; padding: 5px; margin: 10px;">
You must follow the directions *exactly* or you will get a 0 on the assignment.
</div>

You must submit a zip file of your assignment and associated files (if there are any) to Blackboard. The zip file will be named after you JHED ID: `<jhed_id>.zip`. It will not include any other information. Inside this zip file should be the following directory structure:

```
<jhed_id>
    |
    +--module-05-programming.ipynb
    +--module-05-programming.html
    +--(any other files)
```

For example, do not name  your directory `programming_assignment_01` and do not name your directory `smith122_pr1` or any else. It must be only your JHED ID.

In [5]:
from IPython.core.display import *
from StringIO import StringIO
from random import random, gauss, randrange
import bisect

## Local Search - Genetic Algorithm

For this assignment we're going to use the Genetic Algorithm to find the solution to a shifted Sphere Function in 10 dimensions, $x$, where the range of $x_i$ in each dimension is (-5.12 to 5.12). Here a "solution" means the vector $x$ that minimizes the function. The Sphere Function is:

$$f(x)=\sum x^2_i$$

We are going to shift it over 0.5 in every dimension:

$$f(x) = \sum (x_i - 0.5)^2$$

where $n = 10$.

As this *is* a minimization problem you'll need to use the trick described in the lecture to turn the shifted Sphere Function into an appropriate fitness function (which is always looking for a *maximum* value).

## Binary GA

You are going to solve the problem two different ways. First, using the traditional (or "Canonical") Genetic Algorithm that encodes numeric values as binary strings (you don't have to represent them literally as strings but they are general lists or strings of only 0 or 1).

There are many different ways to affect this encoding. For this assignment, the easiest is probably to use a 10 bit binary encoding for each $x_i$. This gives each $x_i$ a potential value of 0 to 1024 which can be mapped to (-5.12, 5.12) by subtracting 512 and dividing by 100.

All the GA operators will be as described in the lecture.

**Important**

Please remember that there is a difference between the *genotype* and the *phenotype*. The GA operates on the *genotype* (the encoding) and does not respect the boundaries of the phenotype (the decoding). So, for example, do **not** use a List of Lists to represent an individual. It should be a *single* List of 10 x 10 or 100 bits. In general, crossover and mutation have no idea what those bits represent.

## Real Valued GA

For the real valued GA, you can represent each $x_i$ as a float in the range (-5.12, 5.12) but you will need to create a new mutation operator that applies gaussian noise. Python's random number generator for the normal distribution is called `gauss` and is found in the random module:

```
from random import gauss, random
```

You may need to experiment a bit with the standard deviation of the noise but the mean will be 0.0.

## GA

The Genetic Algorithm itself will have the same basic structure in each case: create a population, evaluate it, select parents, apply crossover and mutation, repeat until the number of desired generations have been generated. The easiest way to accomplish this in "Functional" Python would be to use Higher Order Functions.



Your code should print out the best individual of each generation including the generation number, genotype (the representation), phenotype (the actual value), the fitness (based on your fitness function transformation) and the function value (for the shifted sphere) if passed a DEBUG=True flag.

The GA has a lot of parameters: mutation rate, crossover rate, population size, dimensions (given for this problem), number of generations.  You can put all of those and your fitness function in a `Dict` in which case you need to implement:

```python
def binary_ga( parameters):
  pass
```

and

```python
def real_ga( parameters):
  pass
```

Remember that you need to transform the sphere function into a legit fitness function. Since you also need the sphere function, I would suggest that your parameters Dict includes something like:

```python
parameters = {
   "f": lambda xs: sphere( 0.5, xs),
   "minimization": True
   # put other parameters in here.
}
```

and then have your code check for "minimization" and create an entry for "fitness" that is appropriate.

In [6]:
def sphere( shift, xs):
    return sum( [(x - shift)**2 for x in xs])

In [4]:
sphere( 0.5, [1.0, 2.0, -3.4, 5.0, -1.2, 3.23, 2.87, -4.23, 3.82, -4.61])

113.42720000000001


-----

### Common Helper Functions ###

In [7]:
def crossover(parents1, parents2):
    pt = randrange(len(parents1))
    return parents1[:pt] + parents2[pt:]

def crossover2(parents1, parents2):
    pt = randrange(len(parents1))
    return (parents1[:pt]+parents2[pt:], parents2[:pt]+parents1[pt:])

def cumsum(L):
    total = 0
    for l in L:
        total += l
        yield total

def pickParents(fitness, nParents):
    total = 0
    cumFitness = list(cumsum(fitness))
    rands = [random()*cumFitness[-1] for n in xrange(nParents)] # list of rands
    return [bisect.bisect_left(cumFitness,r) for r in rands]

-----

## Binary GA ##

### Binary GA Helper Functions###

In [8]:
def getDecimals(genes):
    bins = [genes[n:n+10] for n in xrange(0,len(genes),10)]
    return [(int(x,2)-511)/100.0 for x in bins]

def mutateBinary(genes):
    i = randrange(len(genes))
    genes = genes[:i] + ('0' if genes[i]=='1' else '1') + genes[i+1:]
    return genes

def printGenInfoBin(genNum, genes, fitness, maxVal):
    out = 'Gen #%u best gene: %s\n' + \
        '\tx-values: %s,\n\tfitness: %.4f, sphere value: %.4f\n'
    binary = [genes[n:n+10] for n in xrange(0,len(genes),10)]
    xVal = getDecimals(genes)
    sp = maxVal - fitness
    print out % (genNum, repr(binary), repr(xVal), fitness, sp) 

### Binary GA Main Program ###

In [38]:
def binary_ga(pm, DEBUG=False):
    pop = [[format(randrange(2**pm['nBits']-1), '#012b')[2:] \
            for x in xrange(10)] for y in xrange(pm['popSize'])]
    pop = [''.join(p) for p in pop]
    maxVal = pm['f']([-5.12 for x in xrange(10)])
    
    for gen in xrange(pm['nGeneration']):
        fitness = [maxVal - pm['f'](getDecimals(genes)) for genes in pop]
        
        if DEBUG:
            best = fitness.index(max(fitness))
            printGenInfoBin(gen, pop[best], fitness[best], maxVal)
        
        idx = pickParents(fitness, pm['popSize']*2)
        nextGen = list()
        for n in xrange(pm['popSize']):
            child = crossover(pop[idx[n]], pop[idx[-n]])
            if random() < pm['mutateProb']:
                mutateBinary(child)
            nextGen.append(child)
        pop = nextGen
    
    fitness = [maxVal - pm['f'](getDecimals(genes)) for genes in pop]
    best = fitness.index(max(fitness))
    printGenInfoBin(pm['nGeneration'], pop[best], fitness[best], maxVal)
    return getDecimals(pop[best])

In [40]:
## Traditional GA

param = {
    "f": lambda xs: sphere( 0.5, xs),
    "minimization": True,
    "popSize": 1000,
    "mutateProb": 0.10,
    "nGeneration": 20,
    "nBits": 10,
}

bestX =  binary_ga( param, True)

Gen #0 best gene: ['1000000000', '0110001001', '0111000101', '1001101100', '1000000100', '1000011000', '1000000111', '1000011010', '0110001010', '0101100001']
	x-values: [0.01, -1.18, -0.58, 1.09, 0.05, 0.25, 0.08, 0.27, -1.17, -1.58],
	fitness: 303.6574, sphere value: 12.1866

Gen #1 best gene: ['0011110011', '0111000101', '0111000000', '0111110000', '0101001100', '1000110010', '0111110111', '1001100110', '1000111001', '0110000101']
	x-values: [-2.68, -0.58, -0.63, -0.15, -1.79, 0.51, -0.08, 1.03, 0.58, -1.22],
	fitness: 294.0395, sphere value: 21.8045

Gen #2 best gene: ['1010001100', '1001001001', '1010000111', '1011111111', '1101001000', '1010011111', '1001001010', '1001111010', '1000110101', '1101001001']
	x-values: [1.41, 0.74, 1.36, 2.56, 3.29, 1.6, 0.75, 1.23, 0.54, 3.3],
	fitness: 292.5440, sphere value: 23.3000

Gen #3 best gene: ['1001000001', '1010110011', '1001000000', '1011000111', '0110111100', '0110110111', '0101110000', '1000001001', '1001110001', '0100101001']
	x-valu

---

## Real Value GA ##

### Real Value GA Helper Functions ###

In [40]:
def mutateReal(genes, sigma, xMax):
    i = randrange(len(genes))
    genes[i] = min( max(genes[i]*gauss(0,sigma), -xMax) , xMax)
    return genes

### Real Value GA Main Program ###

In [45]:
def real_ga(pm):
    pop = [[random()*2*pm['xRange'] - pm['xRange'] for x in xrange(10)] \
           for y in xrange(pm['popSize'])]
    maxVal = pm['f']([-5.12 for x in xrange(10)])
    
    for genCount in xrange(pm['nGeneration']):
        fitness = [maxVal - pm['f'](genes) for genes in pop]
        idx = pickParents(fitness, pm['popSize']*2)
        nextGen = list()
        for n in xrange(pm['popSize']):
            child = crossover(pop[idx[n]], pop[idx[-n]])
            if random() < pm['mutateProb']:
                mutateReal(child, pm['sigma'], pm['xRange'])
            nextGen.append(child)
        pop = nextGen
    
    fitness = [maxVal - pm['f'](genes) for genes in pop]
    return pop[fitness.index(max(fitness))]

In [60]:
## Real Valued GA

param2 = {
    "f": lambda xs: sphere( 0.5, xs),
    "minimization": True,
    "popSize": 1000,
    "mutateProb": 0.10,
    "nGeneration": 100,
    "nBits": 10,
    "xRange": 5.12,
    "sigma": 2
}

best2 = real_ga( param2)
tmp = '[' + ', '.join('%.3f' % f for f in best2)[:-2] + ']'
print('%.4f'%param['f'](best2), tmp)

('0.9012', '[0.415, -0.139, 0.831, 0.426, 0.638, 0.034, 0.839, 0.513, 0.628, 0.4]')
