# Genetic Algorithms

A genetic algorithm is a stochastic search algorithm that is inspired by Charles Darwin’s theory of natural evolution. This algorithm reflects the process of natural selection where the fittest individuals are selected for reproduction in order to produce offspring of the next generation. 

Genetic algorithms work by randomly initializing a population with a function that can evaluate each member of the population's fitness (how "good" they are, smaller is better).
We then evolve the popupulation over many generations by
1) Measuring the fitness of all solutions choosing the most fit individuals as parents <br>
2) Mixing the parents together to to create new offspring (solutions) <br>
3) Mutating some of the offspring <br>
 
We repeat these steps until our stop criteria are met


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
%matplotlib inline

### Read measured data from an Excel file

In [3]:
xlsx = pd.ExcelFile('European Measurements.xlsx')
sheet1 = xlsx.parse(3) # Note the other datasets in the file.
variables = ["Month","Avg. Likes"]
data = sheet1.loc[:, variables].values
# print(data)

### Simple GA mechanism

In [4]:
def fitness_function(data, pop):
    '''
    measure the fitness of each solution
    data: given data points
    pop: current population of solutions
    '''
    fitness = []
    for solution in range(len(pop)):
        for event in range(len(data)):
            error = 0;
            event_time = data[event][0]
            event_measured = data[event][1]
            event_expected = pop[solution][0]*event_time**3 + pop[solution][1]*event_time**2 + pop[solution][2]*event_time + pop[solution][3]
            error +=(event_expected - event_measured)**2
        fitness.append(1/error) # We use 1/error in order to use a maximization mechanism, while we want to minimize the error
    return fitness

def biased_selection(pop, fitness, num_parents):
    '''
    select the top num_parents of solutions according to their fitness
    pop: current population of solutions
    fitness: fitness score for each solution
    num_parents: top number of solutions to kept
    '''
    sorted_fitness_args = np.argsort(fitness)
    return pop[sorted_fitness_args[-num_parents:],:]

def recombination(parents, offspring_size):
    '''
    mix the parent solutions to create a new set of solutions
    parents: previous set of solutions to be used to make new generation of solutions
    offspring_size: amount of new soltions to create from parents
    '''
    offspring = np.empty(offspring_size)
    recombination_point = np.uint8(offspring_size[1]/2)
    for k in range(offspring_size[0]):
        parent1_idx = k%parents.shape[0]
        parent2_idx = (k+1)%parents.shape[0]
        offspring[k, 0:recombination_point] = parents[parent1_idx, 0:recombination_point]
        offspring[k, recombination_point:] = parents[parent2_idx, recombination_point:]
    return offspring

def mutation(offspring_recombination):
    '''
    for each solution we mutate a single number from the overall solution
    offspring_recombination: new generation of solutions to mutate
    '''
    for idx in range(offspring_recombination.shape[0]):
#         if np.random.randint(1, 4, 1)[0] < 1:
        random_value = np.random.randint(-100, 100, 1)
        random_index = np.random.randint(0, offspring_recombination.shape[1], 1)
        offspring_recombination[idx, random_index] = offspring_recombination[idx, random_index] + random_value
    return offspring_recombination


### GA application for fitting time series data
We try to optimize our solutions to the fitness function over many generations of solutions 

We are given that a good solution to our data is:  a=4.8, b=12.1, c=53.2 and d=6219.
We are using our fitness function to try and find other solutions with a very low fitness (lower is better).

In [None]:
# GA Parameters
formula_degree = 4
number_of_solutions = 1000
number_of_parents = 2000
population_size = (number_of_solutions, formula_degree) 
number_of_generations = 10
best_outputs = []

# Genesis
new_population = np.random.randint(low=0, high=5000, size=population_size)
print("The population of the first generation: ")
print(new_population)

# Evolution
print ("\nEvolution:")
for generation in range(number_of_generations):

    fitness = fitness_function(data, new_population)
    print("Generation = ", generation, "\tBest fitness = ", round(1/np.max(fitness),5))
    best_outputs.append(round(1/np.max(fitness),5))
    parents = biased_selection(new_population, fitness, number_of_parents)
    offspring_recombination = recombination(parents, offspring_size=(population_size[0]-parents.shape[0], formula_degree))
    offspring_mutation = mutation(offspring_recombination)
    new_population[0:parents.shape[0], :] = parents
    new_population[parents.shape[0]:, :] = offspring_mutation

# Results
print("\nThe population of the last generation: ")
print(new_population)
fitness = fitness_function(data, new_population)
best_match_idx = np.where(fitness == np.max(fitness))
print("Best solution: ", new_population[best_match_idx, :])

# Chart
plt.plot(best_outputs)
plt.xlabel("Generation")
plt.ylabel("Best Fitness Score")
plt.show()

The population of the first generation: 
[[1149  531 3866 1468]
 [3623 4930 4674 1801]
 [4246 2890 3254 1432]
 ...
 [3717  503 1352 4745]
 [4094  283  871 2846]
 [1420 1400 2478  935]]

Evolution:
Generation =  0 	Best fitness =  93972062.67646
Generation =  1 	Best fitness =  70641186.65473
Generation =  2 	Best fitness =  149928.60235


#### Knowning that a good answer to the problem is: a=4.8, b=12.1, c=53.2 and d=6219 lets us create an fitness score so we can  look for other possible solutions.