# Genetic algorithms

## What are genetic algorithms ?

A genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). It is based on natural selection to approximate soltuion(s) for a given problem. As a metaheuristic algorithm, it may not find the optimal solution but is at least fast the execute.

Très flexibles, utilisés à la fois pour des problèmes discrets et continus, mais fréquemment appliqués à des problèmes combinatoires.

It starts with a population of possible solutions. Each solution has a genone that encodes it. It is often a binary encoding that differenciates the solution from the others. Chromosomes are encoded or by 1 or by 0. 1 [explain what means 1 or 0 in general]. 

The set of all solutions at a given points is called a generation. First solutions generation is Generation 0. This starting generation is composed of random solutions that should not be optimal at all.

Once a generation is set, natural selection is processed by determining the fitness of current generation solutions and then pick parents. This is done using the fitness function, which tells how good a solution is. In the parents selection process, solutions are picked and fitness score increases chance to be picked.

Once parents are picked, the next step is about crossoverring their genomes. A crossover function has de be defined for that and their are many approach which will be covered bellow. Crossovering parents genome creates news genomes that can be interpreted as children, which creates part of the news generation of solutions.

Children creation is repeated until a new generation is fullfilled.

But since fitness function and crossovering has part of randomness, there is a risk of killing best solutions from the previous generation. To avoid that, we implement an elitism mechanism that saves previous generation best solution and select them to the next generation.

Next step is mutation step: during this phase, we simply mutate a few chromosomes, changing random bits with a certains probability (0->1 / 1->0). After this pahse, a new generation is finnaly ready. 

This generation process lasts as long as a suffisient solutions is not found or for a given number of generations.

If we take a step back, those are the main concepts of this algorithm:
- genetic representation of solution (genomes mades of chromosomes)
- a function to generate new solutions
- a fitness function, evaluating solutions quality
- a selection function, selecting solutions inside a population
- a crossover function, making childrens out of parents genomes
- a mutation function, altering childrens genomes

The population encoding may differ from a problem to another: sometimes it not efficient to use bits, but integers lists, float  list or even graphs to solve a given problem. Same logic is true for the crossover process.

---

## Uses of genetic algorithms

1. **Optimization Problems**: Genetic algorithms (GAs) are broadly used to find solutions to optimization and search problems.
   
2. **Machine Learning**:
   - **Feature Selection**: GAs can be employed to identify the most relevant set of features in a dataset.
   - **Neural Network Training**: They can be used for optimizing the weights in artificial neural networks.

3. **Game Playing**:
   - **Strategy Optimization**: GAs can help in evolving strategies to optimize game playing.

4. **Financial Forecasting**: To predict stock market trends or optimize trading strategies.

5. **Job Scheduling**: To optimize the scheduling of jobs in manufacturing or computational environments.

6. **Traveling Salesman Problem**: Finding the shortest possible route that visits a set of points and returns to the origin.

7. **Medical Diagnosis**: For optimizing the parameters in prediction models.

8. **Pharmaceuticals**: In the design of new drugs by finding optimal molecular structures.

9. **Aerospace**: Optimizing design parameters for aircraft or spacecraft.

10. **Automobile Design**: For optimizing parameters like fuel efficiency and aerodynamics.

11. **Telecommunications**: Optimizing the design of complex networks and systems.

12. **Image Processing**: Using GAs to optimize filters for noise removal or feature extraction.

13. **Circuit Design**: To find optimal or near-optimal designs for electronic circuits.

14. **Economic Modeling**: GAs can be employed to simulate and predict complex economic systems.

15. **Ecological Modeling**: Predicting the behavior of ecological systems under various conditions.

16. **Biological Evolution Simulation**: GAs can be used to simulate the process of natural evolution.

17. **Wind Farm Optimization**: Positioning of turbines in a wind farm to maximize power generation.

18. **Water Distribution Network Design**: Optimal design of water distribution networks to ensure efficient delivery and reduce costs.

19. **Traffic Light Timing**: Optimization of traffic light timings to minimize traffic congestion.

20. **Robotics**: For teaching robots how to move, adapt, or even optimize their structures for specific tasks.

21. **Power Systems**: In optimizing the operation and planning of electrical power systems.

The versatility and applicability of genetic algorithms make them suitable for a broad range of problems.

---

## Advantages

1. **Global Search**: Genetic algorithms (GAs) have the ability to search a wide solution space, making them suitable for global optimization problems where the solution landscape is not well-understood.

2. **Parallelism**: GAs inherently work on a population of solutions at once, allowing them to naturally take advantage of parallel processing capabilities.

3. **Adaptability**: They can be applied to a wide range of problems, both in optimization and search domains, without needing problem-specific knowledge.

4. **No Requirement for Gradient Information**: Unlike gradient-based optimization techniques, GAs do not require gradient information, making them suitable for non-differentiable, discontinuous, and noisy functions.

5. **Dynamic and Changing Environments**: GAs can be adaptive and can work in situations where the environment changes over time.

6. **Flexible Encoding of Solutions**: Solutions can be encoded in various ways (binary strings, real numbers, permutations, etc.), making GAs versatile.

---

## Drawbacks

1. **No Guarantee of Optimal Solution**: GAs can converge to a suboptimal solution rather than the global optimum.

2. **Slow Convergence**: In some cases, especially when the solution space is vast, GAs might take a long time to converge to an acceptable solution.

3. **Parameter Setting**: Choosing appropriate parameters like mutation rate, crossover rate, and population size can be challenging and might require experimentation.

4. **Risk of Premature Convergence**: GAs can sometimes converge prematurely to a suboptimal solution, especially if there's not enough genetic diversity in the population.

5. **Computationally Intensive**: Due to the nature of evolutionary processes, GAs might require a significant amount of computational resources for complex problems. If fitness determination is long, then global execution should be long too.

6. **Overfitting**: Especially in applications like machine learning, if not managed properly, GAs might lead to solutions that are overfit to the training data.

7. **Complexity**: Implementing a GA might be more complex than simpler optimization techniques for certain problems.

---

## Some encoding methods

### Binary Encoding

Most common methods of encoding. Chromosomes are string of 1s and 0s and each position in the chromosome represents a particular characteristics of the solution. 

### Permutation Encoding

Useful in ordering such as the Travelling Salesman Problem (TSP). In TSP, every chromosome is a string of numbers, each of which represents a city to be visited. 

### Value Encoding

Used in problems where complicated values, such as real numbers, are used and where binary encoding would not suffice. Good for some problems, but often necessary to develop some specific crossover and mutation techniques for these chromosomes.
Most suitable for optimization in a continuous search space.

## Some crossover methods

...

## Python demonstration (snapsack problem)

In [24]:
from collections import namedtuple
from functools import partial
from random import choices, randint, randrange
import random
import time
from typing import List, Callable

Genome = list[int]
Population = List[Genome]
FitnessFunc = Callable[[Genome], int]
PopulatonFunc = Callable[[], Population]
SelectionFunc = Callable[[Population], tuple[Genome, Genome]]
MutationFunc = Callable[[Genome], Genome]
CrossoverFunc = Callable[[Genome, Genome], tuple[Genome, Genome]]

Thing = namedtuple('Thing', ['name', 'value', 'weight'])

things = (
    Thing('Laptop', 500, 2200),
    Thing('Headphones', 150, 160),
    Thing('Coffee Mug', 60, 350),
    Thing('Notepad', 40, 333),
    Thing('Water Bottle', 30, 192),
    Thing('Mints', 5, 25),
    Thing('Socks', 10, 38),
    Thing('Tissues', 15, 80),
    Thing('Phone', 500, 200),
    Thing('Baseball Cap', 100, 70)
)

# more_things = (
#     Thing('Mints', 5, 25),
#     Thing('Socks', 10, 38),
#     Thing('Tissues', 15, 80),
#     Thing('Phone', 500, 200),
#     Thing('Baseball Cap', 100, 70)
# ) + things

def generate_genome(lenght: int) -> Genome:
    return choices([0,1], k= lenght)

def generate_population(size: int, genome_length: int) -> Population:
    return [generate_genome(genome_length) for _ in range(size)]

def fitness(genome: Genome, things: [Thing], weight_limit):
    if(len(genome) != len(things)):
        raise ValueError('Fitness: Genome should be the same lenght than things.')

    weight = 0
    value = 0

    for i, thing in enumerate(things):
        if genome[i] == 1: # thing selected
            weight += thing.weight
            value += thing.value

        if weight > weight_limit: return 0
    return value

def single_point_crossover(a: Genome, b: Genome) -> tuple[Genome, Genome]:
    if len(a) != len(b):
        raise ValueError('Pair_selection: Both genomes should be the same lenght')

    length = len(a)

    if length < 2:
        return a, b
    
    p = randint (1, length -1)
    
    return a[0:p] + b[p:], b[0:p] + a[p:]

def pair_selection(population: Population, fitness: FitnessFunc) -> Population:
    return choices (
        population= population,
        weights= [fitness(genome) for genome in population],
        k=2
    )

def mutation(genome: Genome, num: int = 1, probability: float = 0.5) -> Genome:
    for _ in range(num):
        index = randrange(len(genome))
        genome[index] = genome[index] if random.random() > probability else abs(genome[index] - 1)
    return genome


def run_evolution(
        populate_func: PopulatonFunc,
        fitness_func: FitnessFunc,
        fitness_limit: int, 
        selection_func: SelectionFunc = pair_selection,
        crossover_func: CrossoverFunc = single_point_crossover,
        mutation_func: MutationFunc = mutation,
        generation_limit: int = 100 
        ) -> tuple[Population, int]:
    
    population = populate_func()

    for i in range(generation_limit):
        population = sorted(
            population,
            key= lambda genome: fitness_func(genome),
            reverse= True
        )

        if fitness_func(population[0]) >= fitness_limit:
            break

        next_generation = population[0:2]

        for _ in range(int(len(population) / 2) - 1):
            parents = selection_func(population, fitness_func)
            offspring_a, offspring_b = crossover_func(parents[0], parents[1])
            offspring_a = mutation_func(offspring_a)
            offspring_b = mutation_func(offspring_b)
            next_generation += [offspring_a, offspring_b]

        population = next_generation

    population = sorted(
        population,
        key= lambda genome: fitness_func(genome),
        reverse= True
    )
    return population, i


# Main
start = time.time()
population, generations = run_evolution(
    populate_func = partial(
        generate_population, size=10, genome_length=len(things)
        ), 
    fitness_func = partial(
        fitness, things=things, weight_limit=3000
        ),
        fitness_limit= 740, 
        generation_limit= 1000,
)
end = time.time()

# Result display
def genome_to_things (genome: Genome, things: [Thing]) -> [Thing]:
    result = []
    for i, thing in enumerate (things) :
        if genome [i] == 1:
            result += [thing.name]
    return result

print (f"number of generations: {generations}")
print (f"time: {end - start}s")
print(f"best solution: {genome_to_things (population [0], things)} ")

number of generations: 0
time: 0.0001761913299560547s
best solution: ['Laptop', 'Headphones', 'Notepad', 'Mints', 'Socks', 'Phone'] 


## Sources

- https://en.wikipedia.org/wiki/Genetic_algorithm
- https://www.geeksforgeeks.org/encoding-methods-in-genetic-algorithm/
- https://medium.com/geekculture/encoding-techniques-in-genetic-algorithm-371bccbe4bf7
- https://www.youtube.com/watch?v=nhT56blfRpE for the great code