# Genetic Algorithm

The genetic algorithm emulates Evolution by "breeding" solutions from previous solutions and applying mutation. The likelihood that a solution "survives" is based on its "fitness" value (as defined by some "fitness function").

### Problem to solve

Let's try using it to solve a simple equation:

* x − y = −1
* 3x + y = 9

**(The real solution is x=2, y=3)**

## Generic Genetic Algorithm

First, let's write a basic genetic algorithm.

Each "individual" will be a list in this form: [fitness, val1, val2, ...]

In [68]:
import random

def generate_couples(population):
    couples = []
    for i in range(0, len(population)-1, 2):
        c = (population[i], population[i+1])
        couples.append(c)
    return couples

def genetic_algorithm(fit_func, cross_func, mutate_func, init_pop, max_iter=1000,
                      max_pop=100, quit_at_err=0.01, mut_prob=0.1, mut_mu=0, mut_sigma=1):
    population = init_pop
    num_solutions_considered = 0
    top_dog = init_pop[0]
    
    for i in range(max_iter):        
        # Calculate fitness function for each individual
        for individual in population:
            fitness = individual[0]
            if fitness < 0:  # Meaning it has not been calculated, since fitness is always positive or 0
                fitness = fit_func(individual)
                individual[0] = fitness

        # Sort population by fitness
        population.sort()
        top_dog = population[0]
        num_solutions_considered += len(population)
        if top_dog[0] < quit_at_err:
            print('Generations: {}, Solutions considered: {}'.format(i, num_solutions_considered))
            return top_dog
        
        # Make couples
        couples = generate_couples(population)
        
        # Mate
        babies = [mutate(have_sex(couple[0], couple[1]), mut_prob, mut_mu, mut_sigma) for couple in couples]
        population += babies
        
        # Sort and cull
        population.sort()
        population = population[:max_pop]
        
        #print('Iteration {}, Population: {}'.format(i, len(population)))

    print('Generations: {}, Solutions considered: {}'.format(i, num_solutions_considered))
    return top_dog


## Write problem-specific functions

In [51]:
def fitness_function(individual):
    fitness, x, y = individual
    # We square the error so it's always positive
    eq1_error = ((x - y) - (-1))**2
    eq2_error = ((3*x + y) - 9)**2
    return eq1_error + eq2_error

def have_sex(a, b):
    x = a[1]
    y = b[2]
    return [-1, x, y]

def mutate(a, mutation_probability, mu, sigma):
    mutant = [-1]
    for var in a[1:]:
        if random.random() <= mutation_probability:
            new_var = var + random.gauss(mu, sigma)
            mutant.append(new_var)
        else:
            mutant.append(var)
    return mutant

initial_population = [[-1, 0, 0], [-1, 10, 10]]     
most_fit_solution = genetic_algorithm(fitness_function, have_sex, mutate, initial_population)
most_fit_solution

Iterations: 39, # solutions considered: 3183


[0.0003454095730006291, 1.995886744141802, 3.01436163580238]

**Wow, so that's a pretty close solution in not too many generations (or solutions considered).**

Now, let's compare it these 2 types of random searches:

* Random walk
* Uniform random search (in a constrained window)

## Comparison with random walk solution

In [65]:
def random_walk_next_solution(prev_solution, mu, sigma):
    return [prev_solution[0] + random.gauss(mu, sigma), prev_solution[1] + random.gauss(mu, sigma)]

def error_function(individual):
    x, y = individual
    # We square the error so it's always positive
    eq1_error = ((x - y) - (-1))**2
    eq2_error = ((3*x + y) - 9)**2
    return eq1_error + eq2_error

def do_random_walk_search(mu, sigma, init_guess, max_iter=100*10000+1, quit_at_err=0.01):
    current_solution = init_guess
    
    for i in range(max_iter):
        current_solution = random_walk_next_solution(current_solution, 0, .5)
        current_solution_err = error_function(current_solution)
        if current_solution_err < quit_at_err:
            print('Solutions considered: {}, Error: {}'.format(i, current_solution_err))
            break
        if i % 10000 == 0:
            print('Iteration: {}, Error: {}'.format(i, current_solution_err))
            
    return current_solution

do_random_walk_search(0, 0.5, [0, 0])

Iteration: 0, Error: 58.197997727493785
Iteration: 10000, Error: 44847.096572963936
Iteration: 20000, Error: 26046.67741968994
Iteration: 30000, Error: 188915.547005843
Iteration: 40000, Error: 179066.1641331165
Iteration: 50000, Error: 98354.59650722975
Iteration: 60000, Error: 8099.110689885708
Iteration: 70000, Error: 22889.705969310067
Iteration: 80000, Error: 78947.95786648008
Iteration: 90000, Error: 81609.08510716847
Iteration: 100000, Error: 88042.48735877528
Iteration: 110000, Error: 127779.6609447398
Iteration: 120000, Error: 132049.4104383309
Iteration: 130000, Error: 30637.002742822388
Iteration: 140000, Error: 3029.9929598023473
Iteration: 150000, Error: 7267.082706040252
Iteration: 160000, Error: 12265.728575374562
Iteration: 170000, Error: 51693.022768068826
Iteration: 180000, Error: 54654.50564500899
Iteration: 190000, Error: 214919.2574790537
Iteration: 200000, Error: 233639.1235252252
Iteration: 210000, Error: 487713.4435635344
Iteration: 220000, Error: 308793.3769138

[-215.50760466522127, 754.9920021071864]

That is a pretty bad solution, and a ton of error.

## Comparison with random (non-walk) solution

In [66]:
def random_next_solution(x_min, x_max, y_min, y_max):
    return [random.uniform(x_min, x_max), random.uniform(y_min, y_max)]

def do_random_search(x_min, x_max, y_min, y_max, max_iter=100*10000+1, quit_at_err=0.01):
    for i in range(max_iter):
        # Note that with this solution, we are artificially limiting the search window
        current_solution = random_next_solution(x_min, x_max, y_min, y_max)
        current_solution_err = error_function(current_solution)
        if current_solution_err < quit_at_err:
            print('Iterations: {}, Error: {}'.format(i, current_solution_err))
            break
        if i % 10000 == 0:
            print('Iteration: {}, Error: {}'.format(i, current_solution_err))

    return current_solution

do_random_search(-20, 20, -20, 20)

Iteration: 0, Error: 308.3127928505571
Iteration: 10000, Error: 4727.309365030706
Iteration: 20000, Error: 2702.1996692573516
Iteration: 30000, Error: 295.37602816058853
Iteration: 40000, Error: 2379.093086548179
Iteration: 50000, Error: 612.9270618824643
Iteration: 60000, Error: 5197.901755236507
Iteration: 70000, Error: 3201.8070554313686
Iteration: 80000, Error: 265.6679530549633
Iteration: 90000, Error: 2466.7681125102044
Iteration: 100000, Error: 602.9843901795197
Iteration: 110000, Error: 50.16423143387052
Iteration: 120000, Error: 450.2998350146081
Iteration: 130000, Error: 4908.17917978217
Iteration: 140000, Error: 153.68311282520978
Iteration: 150000, Error: 3180.767818636309
Iterations: 151234, Error: 0.006872960595740857


[1.970702724699727, 3.0275199352116857]

So looks like the random uniform search is descent (though nowhere near the efficiency as the Genetic Algorithm). BUT, we had to limit the search window for x and y to (-20, 20). The Genetic Algorithm is is really nice because no such limitation of the window is necessary. With some problems, we don't have such a good idea of where the solution (or a good-enough solution) lies.

Now, look how poorly the random search algorithm behaves when we expand the window to (-100, 100):

In [67]:
do_random_search(-100, 100, -100, 100)

Iteration: 0, Error: 87498.269900416
Iteration: 10000, Error: 19729.641085476345
Iteration: 20000, Error: 12088.16015937448
Iteration: 30000, Error: 70371.69304549509
Iteration: 40000, Error: 101228.17845615717
Iteration: 50000, Error: 3752.077746472049
Iteration: 60000, Error: 5678.870652708074
Iteration: 70000, Error: 13796.72050359031
Iteration: 80000, Error: 75143.37747065257
Iteration: 90000, Error: 45542.307427910455
Iteration: 100000, Error: 68761.573982296
Iteration: 110000, Error: 103085.07660772622
Iteration: 120000, Error: 3974.410160526244
Iteration: 130000, Error: 45856.8843360887
Iteration: 140000, Error: 82085.17905786278
Iteration: 150000, Error: 55998.276219948355
Iteration: 160000, Error: 13533.890847679402
Iteration: 170000, Error: 7714.684011843349
Iteration: 180000, Error: 86523.04016051191
Iteration: 190000, Error: 16543.62379364362
Iteration: 200000, Error: 16387.267717172082
Iteration: 210000, Error: 35095.89645941437
Iteration: 220000, Error: 21500.344783772187

[-85.14712014026289, -30.728240396272312]

So when we expand the search space, the results are much worse for the uniform random search.