# Representations
- **fitness** - how well a candidate solution is able to solve the problem. Usually is one of the most difficult part to define in a problem.
A bad fitness can void the differential survival (the different chances of survival of the candidate solutions) and the parent selection in our problem.
- **Genotype** - the internal representation of the candidate solution and which is directly manipulated by the genetic operators. It's usually a bit string, but it can be also a tree, a graph, a vector, etc. Usually is transformed in order to compute the fitness.
- **Phenotype** - the external representation of the candidate solution and which is directly evaluated by the fitness function. It's usually the same as the genotype, but it can be also different. For example, in the case of the 15 puzzle, the genotype is a bit string, but the phenotype is a 4x4 grid. Or in an expression we can have a *tree* as genotype and a *mathematical expression* as phenotype.
If the Genotype can be directly evaluated by the fitness function, we have a **direct** representation, Genotype and Phenotype coincide.
When we can have multiple Genotypes mapped to the same Phenotype, we have an **indirect**, and we can also have multiple Phenotypes mapped to the same fitness value.

Importance of mappings:
We need to be able to do small changes at Genotype level that should bring us small changes in the fitness, that can also be called causality
- **Mutation** -> single parent with small changes generates a new individual, that procedure can be repeated multiple times to enhance the difference between the parent and the child. We can have a variable /sigma that can be used to control the amount of change in the mutation ( rand() < sigma ).
- **Recombination** -> We have two or more parents and the offspring must inherit **traits** (not only genes) from both parents.
In general we aims to have both Mutation and Recombination in our algorithm, but we can also have only Mutation. We can't have only Recombination because that would lead to a loss of diversity in our population. 
In general the mutation brings small changes in the population (exploitation), while the recombination brings big changes (exploration).

## Types of Representation
### 1 - Binary Representation
One of the simplest and earliest representations used by genetic algorithms. Often used (incorrectly) for almost any kind of problem, in fact virtually anything can be encoded in binary, but not always this can be useful : sometimes the encoding destroys the whole structure of the solution and gives bad results. 
For example, representing a number in binary and trying to flip a single bit sometimes gives small changes, sometimes huge ones (variable Hamming distance).
A more reasonable representation for numbers is **Gray Coding**, which is a binary representation where two consecutive numbers differ by only one bit (with a Hamming distance of 1). This is useful because we may have a small change in genotype that will also lead to a small change in phenotype.
#### Mutation for Binary Encoding
The most common mutation is a simple **bit-flip**, where each gene is considered separately and flipped with a (small) probability *p*. The number of genes mutated is not fixed, but depends on the length *L* of the genotype and the bit-flip probability *p*. Most of the time we'll use mutation rates that mutate at most one or two genes per offspring.  
A possible implementation in Python creates a bitstring following the result of a random variable with a uniform distribution between (0,1), if the value is above a certain threshold (mutation rate) we'll flip the corresponding gene.
```python
    mask = [random.uniform(0, 1) < mutation_rate for _ in range(len(genome))]
    genome = [a ^ b for a, b in zip(genome, mask)]
```
#### Recombination for Binary Encoding
There are three standard ways to perform a crossover on binary genome (starting from two parents and generating two children) : 
- **One-cut crossover** : Select a random point in the genotype, then take the first part from one parent and the second part from the other. For the second child, we simply reverse the selection. This type of crossover is more likely to keep genes that are close together in the genotype and never keep together genes that are at opposite ends of the genotype.
- **N-cut crossover** : Pick N random points in the genotype and then take an alternative segment from each parent. This variant still has **positional bias**, where it tends to keep together genes that are close to each other, also known as genetic *hitch-hiking**, and with odd N there is also a bias against keeping together genes on the opposite side of the genome.
- **Uniform crossover** : Treat each gene independently and randomly choose from which parent it should be inherited. This can be done by generating a random variable that follows a uniform distribution between (0,1) for each position and then, if it is below a certain threshold, selecting the gene from one parent instead of the other. The second child can be created using inverse mapping. Unlike N-cut methods, there's no positional bias here, but instead the tendency is to transfer 50% of the genes from each parent to the child, avoiding in many cases *coadapted genes* from the same parent. This is also known as distributive bias.

### 2 - Integer Representation
Similar to binary representation, almost anything could be represented by a number, but it's better to use them only when they're really useful. The main use for integer representation is when each gene can take one of a (restricted or unrestricted) set of values.
We can distinguish different types of integer representation: 
- **Restricted set** : There are only a limited number of possible values that our gene can take, usually when it's a categorical attribute,
- **Unrestricted set**: Our genes can take any possible integer value without any particular restriction. Usually associated with Ordinal Attributes.
- **Ordinal attributes**: There are natural relations between different values, usually the natural order, and usually are an Unrestricted Set of possible values (i.e. all positive integers).  
For example, you can define that 5 is less than 6 but more than 4, and that 5 is more like 12 than 3456.
- **Cardinal attributes**: There may be no natural order, and usually they are a finite set of values that our genes can take, for example the points of the compass {North,South,West,East} or the sizes of clothes {S,M,L,XL}.
#### Mutation for Integer Encoding
There are two principal forms of Random Mutations used for Integer Representations:
- **Random Resetting** - In each position, independently with a certain probability *p*, a new value is choosen at random from the set of valid values. This is the most suitable methods for Cardinal attributes.
- **Creep Mutation** - Method used for ordinal attributes, where a small value is added to (or subtracted from) each gene with probability *p*. Usually randomly sampled from a distribution that is symmetric about zero and biased towards small values. Note that this requires a large number of parameters to control the distribution, where it may be difficult to find an appropriate setting.  

The possible **Crossover Methods** for Integer Representation are the same as for Binary Representation

### 3 - Floating Point Representation

The floating point representation is useful when required to represent values generated by a continuous distribution rather than a discrete one. It consists on a string of real values (FP for our machine) representing for example the lenght and the width of some rectangles, allowing a precision way higher than using integers. The genotype for a generic solution with *k* genes is now a vector of real values <x<sub>1</sub> , ... , x<sub>k</sub>>.

#### Mutation for Floating Point
Ignoring the discretization imposed by the hardware, treat this representation as continuous real values, so all discrete methods seen before are no longer valid. Instead is common to change each gene value randomly within its domain. There are three main transformation for mutating a Real-Values genotype:
- **Uniform Mutation** : The values of the new x<sub>i</sub> are drawn randomly from a uniform distribution between [L<sub>i</sub> , U<sub>i</sub>] which are the lower and higher bound for the i-th value in the genome. Is common to use a position-wise mutation probability to choose whenever change an element or not.
- **Nonuniform Mutation** : 
- **Self-Adaptive Mutation** :

### 4 - Permutation Representation
Can also be seen as a special type of Integer representation, but we can have different types of mutation and recombination
In this case our information can be stored in the **fixed position** or in the **order** or in the **sequence** (also *adjacency*) of elements that characterize our current solution (for example in the 15 puzzle we can have the position of the tiles or the sequence of the tiles).
Now the random mutation does not make sense, and we need special operator for this type of representation, depending also on the problem that we're trying to solve.
Possible Mutations:
- **Swap mutation** - we swap two elements in the permutation
- **Scramble Mutation** - we take a random subset of genes and shift their position
- **Insert Mutation** - we take a random gene and insert it into another random position and shift all other genes
- **Inversion Mutation** - we take a random subset of genes and invert their order
Possible Recombination:
- **Cycle Crossover** - we take a random cycle (continous subset) of genes from one parent and and then copy the other genes from the other parent. In this case we're selecting information from one parent and then use the other parent only to *"fill the gaps"*.
- **Partially Mapped Crossover** - we take a random subset of genes from one parent and copy them to the offspring, then we take the other parent and copy the missing genes in the same order as the other parent. In this case we're selecting information from both parents and we're trying to keep the order of the genes.
- **Inver Over** - is an Asymmetric Crossover, we first select a loci from the first parent and select the same from the second, then we take the subsequent loci from the second parent and insert it into the offspring, then fill with the subset from the first parent between the two loci selected. It's usefull to mantain relative order between two or more genes inside the genotype.

### 5 - Tree Representation


### Types of Fitness Landscapes
- **Rugged** - many local optima
- **Smooth** - one global optimum
- **Deceptive** - local optima are more fit than global optimum, tries to trick you

### Classic Benchmark Functions
- **One Max** - the goal of this problem is to maximize the number of ones in a bit string and the fitness fn is the sum of ones in the string. It's a smooth landscape and very easy to solve : Unimodal, separable and not deceptive at all. Can give some insight into the performance of an algorithm if we follow the selection of the various individual.
- **Knapsack Problem** - the goal of this problem is to maximize the value of items in a knapsack without exceeding the weight limit. We can have different variations of this problem, **multidimensional knapsack** (we have also constrain on the size of items), bounded knapsack, unbounded knapsack, etc. It's a rugged landscape and very hard to solve : multimodal, non-separable and deceptive. It's a good benchmark for testing the performance of an algorithm.
- **Set Covering Problem** - the goal of this problem is to minimize the number of sets needed to cover all the elements in a set.
- **15 Puzzle** - Classical problem that involves 15 sliding tiles in a 4x4 grid (one left void). The goal is to reach the goal state from a random state. A possible fitness function is the number of tiles in the correct position. 
- **Rastrign Function** - the goal of this problem is to minimize this non-convex function, typical benchmark used for evolutionary strategies.

Example of Evolution strategy in the One Max problem: We take two of the best solutions in our frontier and mix them up, for example taking half of the first and half of the second, evaluate the result and add it to the frontier. We repeat this process until we have a new frontier of the same size of the previous one. We can also add some random solutions to the new frontier in order to have a more diverse population.