# Misagh Mohaghegh - 810199484

Artificial Intelligence CA2: *Genetics*  
In this assignment, the equation problem is solved using a genetic algorithm.

## Genetic Representation

Each character (operands and operators) is considered a gene.  
A chromosome consists of n genes where n is the length of the given equation.  
In a chromosome, operator genes are placed in between operand genes.

The `Input` class contains our input fields which are:

- `eq_len`: The equation length (operands + operators)
- `operands`: The list of operands
- `operators`: The list of operators
- `goal`: The result of the equation
 
The `read_input` function is provided for reading the input and returning a `Input` instance, but it can be entered manually as well.

In [15]:
import random
import bisect
import copy
from typing import NamedTuple
from dataclasses import dataclass
from itertools import accumulate


class Input(NamedTuple):
    eq_len: int
    operands: list[int]
    operators: list[str]
    goal: int


def read_input() -> Input:
    eq_len = int(input())
    operands = input().split()
    operators = input().split()
    goal = int(input())
    return Input(eq_len, operands, operators, goal)

The `Consts` class contains the problem solving parameters.

- `prob_xover`: Probability of a crossover happening between two chromosomes.
- `prob_mutate`: Probability of a mutation happening in a chromosome.
- `prob_carry`: The percentage of the best chromosomes that will be carried directly to the next generation.
- `population_size`: The count of all chromosomes which is called the population.
- `max_generations`: The maximum number of generations that the algorithm will explore to find a solution.

In [16]:
@dataclass
class Consts:
    prob_xover: float
    prob_mutate: float
    prob_carry: float
    population_size: int
    max_generations: int

Chromosome = list[str]

## Initial Population

The `gen_population` function takes our input and creates `size` chromosomes.  
We will use this to get our initial population.

In [17]:
def gen_population(inp: Input, size: int) -> list[Chromosome]:
    population: list[Chromosome] = [None]*size
    for i in range(size):
        chromo = [None]*inp.eq_len
        chromo[0::2] = [random.choice(inp.operands) for _ in chromo[0::2]]
        chromo[1::2] = [random.choice(inp.operators) for _ in chromo[1::2]]
        population[i] = chromo
    return population


## Fitness Function

We will define the fitness of a chromosome as follows:  

$$ \frac{1}{1 + |chromosome - goal|}$$

In which chromosome is the result of the mathematical expression, and goal is the result we are seeking.  
Basically, the closer the result is to the goal, the higher the fitness will be.  
The fitness is between 0 and 1. The fitness of 1 means that the chromosome is our solution.


In [18]:
def calc_fitness(inp: Input, chromo: Chromosome) -> float:
    string = ''.join(chromo)
    number = eval(string, {'__builtins__': None})
    difference = abs(number - inp.goal)
    fitness = 1 / (1 + difference)
    return fitness

## Genetic Selection and Operations

In the next section, 4 functions are defined with each being a step in reaching the next generation.

- `create_carried_pool`:  
  Here, the population is sorted by its fitness and `carry_count` of the best chromosomes are returned.

- `create_mating_pool`:  
  Here, a new population is generated which has the same size as the previous, but each member is chosen by a fortune wheel where the better chromosomes have a higher chance of winning.  
  This is implemented with a cumulative sum on the fitness probabilities.

- `create_crossover_pool`:  
  Here, a modified population is returned in which some pairs of chromosomes, with the chance of `prob_xover`, have been crossed over using 1-point crossover.  
  2-point crossover can be used instead and is commented.
  
- `create_mutate_pool`:  
  Here, a modified population is returned in which some chromosomes, with the chance of `prob_mutate`, have had a single gene of them mutated to a random operand or operator, based on what the type of the selected gene is.

In [19]:
def create_carried_pool(population: list[Chromosome],
                        fitnesses: list[float],
                        carry_count: int) -> list[Chromosome]:
    sorted_population = sorted(zip(fitnesses, population), reverse=True, key=lambda pair: pair[0])
    carry = [pair[1] for pair in sorted_population[:carry_count]]
    return copy.deepcopy(carry)


def create_mating_pool(population: list[Chromosome], fitnesses: list[float]) -> list[Chromosome]:
    sum_fitnesses = sum(fitnesses)
    prob = [f / sum_fitnesses for f in fitnesses]
    cumulative_sum = list(accumulate(prob))

    mating_pool: list[Chromosome] = []
    for _ in population:
        rand = random.random()
        idx = bisect.bisect_left(cumulative_sum, rand)
        mating_pool.append(population[idx])
    return copy.deepcopy(mating_pool)


def create_crossover_pool(population: list[Chromosome], prob_xover: float) -> list[Chromosome]:
    xover_pool = copy.deepcopy(population)
    for i in range(0, len(xover_pool), 2):
        if i == len(xover_pool) - 1:
            break
        if random.random() < prob_xover:
            point = random.randint(0, len(xover_pool[i]))
            xover_pool[i][:point], xover_pool[i+1][:point] = xover_pool[i+1][:point], xover_pool[i][:point]
            # 2-point crossover
            # a, b = sorted(random.sample(range(len(xover_pool[i]) + 1), 2))
            # swap_tmp = xover_pool[i][a:b]
            # xover_pool[i][a:b] = xover_pool[i + 1][a:b]
            # xover_pool[i + 1][a:b] = swap_tmp

    return xover_pool


def create_mutated_pool(population: list[Chromosome], prob_mutate: float, inp: Input) -> list[Chromosome]:
    mutated_pool = copy.deepcopy(population)
    for chromo in mutated_pool:
        if random.random() < prob_mutate:
            idx = random.randrange(len(chromo))
            chromo[idx] = random.choice(inp.operators) if idx % 2 else random.choice(inp.operands)
    return mutated_pool

## Genetic Algorithm

The `find_equation` function does the process of finding the equation.  
The while loop is repeated `consts.max_generations` times with each iteration corresponding to a generation, until the equation is found.

The process is as follows:

- The population is shuffled.

- Fitness is calculated for each chromosome.  
  If the fitness of a chromosome is 1, that chromosomes satisfies our equation and is the solution.

- The carried pool is created.  
  The count of the carry chromosomes is `consts.prob_carry` percent of the population size.

- The mating pool is created, and crossover and mutation is applied.

- The carried pool is added to the population.  
  Which means that some of the mating population is discarded to not change the size of the population.

In [20]:
def find_equation(inp: Input, consts: Consts, population: list[Chromosome]) -> list[Chromosome]:
    generation_num = 0
    while True:
        generation_num += 1
        if generation_num == consts.max_generations:
            return None

        random.shuffle(population)

        fitnesses = [calc_fitness(inp, chromo) for chromo in population]
        try:
            return population[fitnesses.index(1)]
        except ValueError:
            pass
        
        carry_count = int(len(population) * consts.prob_carry)
        carried_pool = create_carried_pool(population, fitnesses, carry_count)

        mating_pool = create_mating_pool(population, fitnesses)
        crossover_pool = create_crossover_pool(mating_pool, consts.prob_xover)
        mutated_pool = create_mutated_pool(crossover_pool, consts.prob_mutate, inp)

        population = mutated_pool[:len(population) - carry_count]
        population.extend(carried_pool)

At last, the input and constant parameters are given and `find_equation` is called with a random initial population.

## Example Run

In [21]:
inp = Input(
    eq_len=21,
    operands=['1', '2', '3', '4', '5', '6', '7', '8'],
    operators=['+', '-', '*'],
    goal=18019
)
consts = Consts(
    prob_xover=0.6,
    prob_mutate=0.4,
    prob_carry=0.2,
    population_size=200,
    max_generations=1000
)

result = find_equation(inp, consts, gen_population(inp, consts.population_size))
if result is None:
    print('Could not find equation.')
else:
    print(*result)

5 - 8 * 4 * 5 * 5 - 2 + 8 * 6 * 7 * 8 * 7


## Questions

1. *The problems caused by an extremely large or small population size:*  
  If the population is too small, diversity decreases and not many cases are checked in each generation. So the chances of reaching the correct solution in the same number of generations is decreased. We can make this better by increasing the mutation probability and checking more generations.  
  If the population is too large, the algorithm will take more time and resources than needed.

2. *The effects of a growing population on the algorithm:*  
  If the population grows with each generation, although the precision of the algorithm may increase because of more diversity, the time and memory consumption will increase in each step which is a problem and may reach some limitations.  
  The population size is kept the same because it is supposed to converge by populating it with the chromosomes closer to the answer and discarding the ones further away. By adding more in each step, we are not working in favor of the convergence.

3. *The distinct effects of crossover and mutation:*  
  Crossover will create new chromosomes by combining two chromosomes while mutation changes a chromosome directly.  
  Crossover is done with a much higher probability than mutation.  
  Crossover hopes to reach better chromosomes by combining two good chromosomes while mutation is used to escape getting stuck in a local extremum.  
  We cannot reach the solution by only using one of them because of their different effects.

1. *Approaches to solving the equation problem faster:*  
  The problem parameters are most effective for this. Choosing the right amount for each parameter can make the algorithm run more efficiently.

1. *Solutions to stagnated chromosomes and getting stuck:*  
  Mutation is used to escape local extremums which is implemented in this assignment.  
  And in case of getting stuck even after mutations, multi-starting the algorithm can overcome the problem.  
  Multi-starting is re-running the algorithm with a new initial population.

1. *Stopping the algorithm if a solution does not exist:*  
  We can limit the number of generations the algorithm explores.

Running the algorithm with multi-start:

In [22]:
multi_start_limit = 4
while multi_start_limit:
    result = find_equation(inp, consts, gen_population(inp, consts.population_size))
    if result is None:
        multi_start_limit -= 1
    else:
        print(*result)
        break
else:
    print('Could not find equation.')

8 * 6 * 8 * 1 - 5 + 7 * 5 * 4 * 3 * 6 * 7
