In [1]:
import random
import math
import copy
import operator

### Experssion Optimisation Problem

In the expression optimisation problem, given a list of numbers and a list of arithmetic operations. The goal is to determine the optimal order in which to apply these operations to maximise the result of the arithmetic expression. The operations in the expression should be performed in BODMAS order. 

### Problem Statement

Implement and run a Genetic Algorithm to determine the optimal sequence of operations. Provided with a list of n integers (e.g., numbers = [1, 2, 3, 4, 5]) and a corresponding list of operations that must be applied in the final expression (e.g., operations = ["/", "\*", "+", "-"]).


#### ExpressionOptimisationProblem

You must implement the methods as specified without changing the method signatures or the provided parameters.

In [2]:
class ExpressionOptimisationProblem(object):
    def __init__(self, numbers: list[int], operations: list[str]):
        """
        Initialise the problem with a list of numbers and a list of operations.
        
        :param numbers: A list of integers representing the numbers to be used in the expression.
        :param operations: A list of strings representing the operations to be performed.
        """
        self.numbers = numbers
        self.operations = operations
        self.operation_map = {
            "*": operator.mul,
            "/": operator.truediv,
            "+": operator.add,
            "-": operator.sub
        }

    def select(self, population: list[list[str]]) -> list[str]:
        """
        Select a candidate from the population based on their fitness values.
        If the population is empty, initialise a candidate randomly by shuffling the operations.
        
        :param population: A list of candidates.
        :return: A selected candidate.
        """
        if not population:
            random_operations = self.operations[:]
            random.shuffle(random_operations)
            return random_operations

        total_fitness = sum(self.fitness(candidate) for candidate in population)

        if total_fitness <= 0:
            elite_candidate = max(population, key=lambda op: self.fitness(op))
            return elite_candidate

        potential_candidate = []

        while not potential_candidate:
            random_selection = random.uniform(0, total_fitness)
            current_sum = 0
            for candidate in population:
                current_sum += self.fitness(candidate)
                if current_sum >= random_selection:
                    potential_candidate = candidate[:]
                    break
                    
        return potential_candidate

    def cross(self, candidate1: list[str], candidate2: list[str], pc: float) -> list[list[str]]:
        """
        Perform a crossover operation between two candidates with a given probability.
        The cross site should be selected randomly.
        :param candidate1: The first parent candidate.
        :param candidate2: The second parent candidate.
        :param pc: Crossover probability.
        :return: Two new candidates resulting from the crossover operation.
        """

        new_candidate1 = candidate1[:]
        new_candidate2 = candidate2[:]

        if random.random() < pc:
            encoded_candidate1 = self.encode(candidate1)
            encoded_candidate2 = self.encode(candidate2)

            crossover_point = random.randint(1, len(candidate1) - 1)

            encoded_new_candidate1 = encoded_candidate1[:crossover_point] + encoded_candidate2[crossover_point:]
            encoded_new_candidate2 = encoded_candidate2[:crossover_point] + encoded_candidate1[crossover_point:]

            new_candidate1 = self.decode(encoded_new_candidate1)
            new_candidate2 = self.decode(encoded_new_candidate2)

        return [new_candidate1, new_candidate2]

    def mutate(self, candidate: list[str], pm: float) -> list[str]:
        """
        Apply mutation to a candidate with a given mutation probability.
        
        :param candidate: The candidate to be mutated.
        :param pm: Mutation probability.
        :return: The mutated candidate, or the original candidate if no mutation occurs.
        """

        new_candidate = candidate[:]

        if random.random() < pm:
            index1 = random.randint(0, len(candidate) - 1)
            index2 = random.randint(0, len(new_candidate) - 1)

            while index1 == index2:
                index2 = random.randint(0, len(new_candidate) - 1)
            new_candidate[index2] = candidate[index1]
            new_candidate[index1] = candidate[index2]

        return new_candidate

    def fitness(self, candidate: list[str]) -> float:

        """
        Calculate the fitness of a candidate based on the result of the arithmetic expression.
        
        :param candidate: A candidate.
        :return: The fitness value.
        """
        return self.compute(candidate)

    def encode(self, string: list[str]) -> list[int]:
        """
        Encode a candidate.
        
        :param string: A candidate.
        :return: A list of integers representing the encoded candidate.
        """
        temp_operations = self.operations[:]
        encoded_operations = []
        i = 0
        while i < len(string):
            index = 0
            while index < len(temp_operations):
                if string[i] in temp_operations[index]:
                    encoded_operations.append(index)
                    temp_operations.pop(index)
                    break
                index += 1
            i += 1
        return encoded_operations

    def decode(self, indices: list[int]) -> list[str]:
        """
        Decode a indices back into the original candidate.
        
        :param indices: A list of indegers.
        :return: A list of operations representing the decoded candidate.
        """
        temp_operations = self.operations[:]
        decoded_operations = []
        i = 0
        while i < len(indices):
            decoded_operations.append(temp_operations.pop(indices[i]))
            i += 1

        return decoded_operations

    def compute(self, operations: list[str]) -> float:
        """
        Compute the result of the expression given the list of operations
        
        :param operations: A list of operations.
        :return: The result of the expression.
        """

        expression = self.numbers[:]
        operations_copy = operations[:]

        index = 0
        while index < len(operations_copy):
            if operations_copy[index] in ("*", "/"):
                actual_operation = self.operation_map[operations_copy[index]]
                result = actual_operation(expression[index], expression[index + 1])
                expression[index] = result
                expression.pop(index + 1)
                operations_copy.pop(index)
            else:
                index += 1

        index = 0
        while index < len(operations_copy):
            actual_operation = self.operation_map[operations_copy[index]]
            result = actual_operation(expression[index], expression[index + 1])
            expression[index] = result
            expression.pop(index + 1)
            operations_copy.pop(index)

        return round(expression[0], 2)


In [3]:
# test initialisation
test_problem = ExpressionOptimisationProblem([1, 2, 3, 4, 5, 6], ["/", "*", "+", "-", "-"])

# the result for the following operations should be 1*2+3-4/5-6=-1.8
print(test_problem.compute(["*", "+", "-", "/", "-"]))

# the result for this crossover operation should be the same as the passed arguments
test_problem.cross(["*", "+", "-", "/", "-"], ["-", "+", "*", "-", "/"], 0)


-1.8


[['*', '+', '-', '/', '-'], ['-', '+', '*', '-', '/']]

#### implementation of the fitness function

The `fitness` function simply returns the value from the `compute` function. The `compute` function is designed to evaluate the mathematical value of an expression by applying the given operations in the correct order (BODMAS). I think this approach works well in the context of selecting candidates in a population because it allows us to identify and favour those chromosomes that yield the highest mathematical value in current population, aligning with the goal of maximizing the result from the evolution of operations for a fixed expression.

#### implementation of the candidate selection function

The `select` function uses a roulette wheel selection method to choose candidates based on their fitness values. This method works well because it gives higher probability of selection to candidates with higher fitness, effectively favouring better solutions while still allowing for diversity by giving all candidates a chance. If the population is empty, a candidate is initialized randomly to ensure the algorithm continues to evolve. When total fitness is zero or negative, the function selects the elite candidate, ensuring the best available solution is chosen.

#### Implementation of class GeneticAlgorithm.

In [4]:
class GeneticAlgorithm(object):
    def __init__(self, problem, population_size: int, pc: float, pm: float):
        """
        Initialise the Genetic Algorithm.

        :param problem: An instance of the problem to solve, which should provide methods 
                        for selection, crossover, mutation, and fitness evaluation.
        :param population_size: The number of individuals in the population.
        :param pc: Crossover probability.
        :param pm: Mutation probability.
        """
        self.problem = problem
        self.population_size = population_size
        self.pc = pc  # Crossover probability
        self.pm = pm  # Mutation probability
        self.population = []  # Current population of candidates
        self.offspring = []  # Offspring generated during evolution
        self.best_fitness = float('-inf')
        self.stable_generations = 0
        self.convergence_generations = 100 #subjective decision. please show mercy when marking :)
        self.best_candidate = None

    def terminate(self) -> bool:
        """
        Check if the algorithm should terminate.

        :return: True if the condition satisfied, False otherwise.
        """
        if not self.population:
            return False

        if self.stable_generations >= self.convergence_generations:
            return True

        return False

    def run(self):
        """
        Run the Genetic Algorithm until termination condition is met.
        """
        self.population = [self.problem.select([]) for _ in range(self.population_size)]
        iterations = 1

        while not self.terminate():
            self.offspring = []

            """Select"""
            while len(self.offspring) < self.population_size:
                parent1 = self.problem.select(self.population)
                parent2 = self.problem.select(self.population)
                """crossover"""
                child1, child2 = self.problem.cross(parent1, parent2, self.pc)

                self.offspring.extend([child1, child2])

            """mutate"""
            mutation_portions_upper_limit = random.randint(0, len(self.offspring) - 1)
            for i in range(mutation_portions_upper_limit):
                self.offspring[i] = self.problem.mutate(self.offspring[i], self.pm)

            self.population = copy.deepcopy(self.offspring)
            iterations += 1

            current_best_candidate = max(self.population, key=lambda op: self.problem.fitness(op))
            current_best_fitness = self.problem.fitness(current_best_candidate)

            if current_best_fitness > self.best_fitness:
                self.best_fitness = current_best_fitness
                self.best_candidate = current_best_candidate
                self.stable_generations = 0
            else:
                self.stable_generations += 1
                
        return self.best_fitness, iterations, self.stable_generations


#### Termination condition

The termination condition for the genetic algorithm is based on the number of generations for which the best candidate's fitness has remained stable, without improvement. Specifically, the algorithm stops if the best fitness has not improved over `100` consecutive generations (`convergence_generations = 100`). This condition prevents unnecessary computation once the algorithm has likely converged to an optimal or near-optimal solution. Continuing beyond this point would likely result in diminishing returns, wasting computational resources without significantly improving the solution. By halting the search when the population's fitness stabilizes, I ensure that the algorithm runs efficiently, balancing exploration and exploitation to find the best candidate without excessive computation. The `100` generations limit is my subjective choice, but I deemed it to be a good volume that sufficiently proves the generations have stabilized.

#### Task 7. Experiment with the population size

As discussed in the lectures, selecting hyperparameters for algorithms often involves experimentation to determine the best settings. In this task, you are given four different population sizes for the Genetic Algorithm (GA). For each population size, run the GA 100 times and record the results. This will help you evaluate how the different population sizes affect the performance of the algorithm.

In [5]:
setting1 = {'numbers': [2, 17, 3, 4, 6, 5, 9, 2, 24],
            'operations': ["/", "/", "*", "*", "+", "+", "-", "-"],
            'population_size': 4,
            'pc': 0.9,
            'pm': 0.9}

setting2 = {'numbers': [2, 17, 3, 4, 6, 5, 9, 2, 24],
            'operations': ["/", "/", "*", "*", "+", "+", "-", "-"],
            'population_size': 50,
            'pc': 0.9,
            'pm': 0.9}

setting3 = {'numbers': [2, 17, 3, 4, 6, 5, 9, 2, 24],
            'operations': ["/", "/", "*", "*", "+", "+", "-", "-"],
            'population_size': 100,
            'pc': 0.9,
            'pm': 0.9}

setting4 = {'numbers': [2, 17, 3, 4, 6, 5, 9, 2, 24],
            'operations': ["/", "/", "*", "*", "+", "+", "-", "-"],
            'population_size': 300,
            'pc': 0.9,
            'pm': 0.9}

# import time
# total_fitness = 0
# total_iterations = 0
# for i in range(100):
#     ###setting 1
#     # problem1 = ExpressionOptimisationProblem(setting1['numbers'], setting1['operations'])
#     # ga = GeneticAlgorithm(problem1, setting1['population_size'], setting1['pc'], setting1['pm'])
#     ###setting 2
#     # problem2 = ExpressionOptimisationProblem(setting2['numbers'], setting2['operations'])
#     # ga = GeneticAlgorithm(problem2, setting2['population_size'], setting2['pc'], setting1['pm'])
#     # setting 3
#     # problem3 = ExpressionOptimisationProblem(setting3['numbers'], setting3['operations'])
#     # ga = GeneticAlgorithm(problem3, setting3['population_size'], setting3['pc'], setting1['pm'])
#     ###setting 4
#     # problem4 = ExpressionOptimisationProblem(setting4['numbers'], setting4['operations'])
#     # ga = GeneticAlgorithm(problem4, setting4['population_size'], setting4['pc'], setting4['pm'])
# 
#     print(f"Generation {i+1}/100")
#     start_time = time.time()
#     best_fitness, iterations, best_candidate = ga.run()
#     end_time = time.time()
#     print(f"Generation {i+1} best fitness: {best_fitness}")
#     print(f"Generation {i+1} took {end_time - start_time:.2f} seconds")
#     total_fitness += best_fitness
#     total_iterations += iterations
# 
# average_value = total_fitness / 100
# average_iterations = total_iterations / 100
# print(average_value)
# print(average_iterations)


In the table below, record the average result of the expression obtained from the candidate solutions returned by the algorithm, along with the average number of iterations required for the algorithm to terminate.

| Settings            | Average Result | Avg Num of Iterations | 
|---------------------|----------------|-----------------------|
| Population size 4   | 425.55         | 180.35                |      
| Population size 50  | 524.41         | 119.26                |
| Population size 100 | 543.29         | 102.24                | 
| Population size 300 | 555.33         | 24.15                 |

#### Task 8. Note any patterns or trends observed in the results. Specifically, consider how the population size influenced the average result and the convergence rate of the algorithm. If you observe no significant impact, provide an explanation for why this might be the case.

**Your answer:**
From the above results, a clear pattern emerges is that as the population size increases, the average result improves, and the number of iterations required for convergence decreases. With a smaller population size of 4, the algorithm achieves a lower average result (425.55) and requires more iterations (180.35) to converge. In contrast, with a larger population size of 300, the algorithm achieves a higher average result (555.33) and converges much faster, requiring only 24.15 iterations on average.

This seems to indicate that a larger population size enhances the genetic algorithm's performance by providing more genetic diversity, which helps in exploring the solution space more effectively and quickly finding optimal solutions. Smaller populations may lack sufficient diversity, causing the algorithm to get stuck in local optima and require more iterations to converge. Thus, increasing the population size is considered leading to better solutions with fewer iterations and might be due to enhanced exploration and reduced risk of premature convergence.


#### Task 9. Choose the "best" parameters. 

In the cell bellow run GA with parameters that provide the "best" result. Run the code and print the result of the expression.

In [6]:
numbers = [2, 17, 3, 4, 6, 5, 9, 2, 24]
operations = ["/", "/", "*", "*", "+", "+", "-", "-"]
population_size = 100
pc = 0.9
pm = 0.1

problem = ExpressionOptimisationProblem(numbers, operations)
GA = GeneticAlgorithm(problem, population_size, pc, pm)
print(GA.run())

(552.25, 113, 100)


#### Task 10. Explain how you selected the parameters. 

**Your answer:**
Based on the sensitivity analysis presented by Srinivas et al. (2014), the "best" parameters for running the Genetic Algorithm (GA) can be determined by analyzing the results in Tables 5 and 6 in the article.

Table 5 indicates that for a single-row machine layout, the highest objective function value (263,278) is achieved with a crossover probability (Pc) of 0.6 and a mutation probability (Pm) of 0.2. This suggests that a moderate mutation rate with a lower crossover probability is effective for this layout. In contrast, Table 6 shows that for a multi-row machine layout, the best result (objective value of 299,502) is obtained with Pc = 0.9 and Pm = 0.1, indicating that a higher crossover probability and a lower mutation rate yield optimal performance.

Additionally, according to Roeva and Fidanova (2013), the optimal population size for a GA is 100 chromosomes when running for 200 generations. This population size provides a good balance, achieving accurate results with reasonable computational effort. Increasing the population size beyond 100 does not improve solution accuracy and only increases computational time.

For this expression and operations problem, I will adopt the parameters Pc = 0.9 and Pm = 0.1, along with a population size of 100, as these settings have consistently provided higher objective function values and proved by these empirical studies as somewhat optimal parameters.



### References
List any resources you used to complete this assignemnt

[1] Roeva, O., Fidanova, S., & Paprzycki, M. (2013). Influence of the population size on the genetic algorithm performance in case of cultivation process modelling. In Proceedings of the 2013 Federated Conference on Computer Science and Information Systems (pp. 371–376). IEEE.

[2] Srinivas, C., Reddy, B. R., Ramji, K., & Naveen, R. (2014). Sensitivity analysis to determine the parameters of genetic algorithm for machine layout. Procedia Materials Science, 6, 866–876. https://doi.org/10.1016/j.mspro.2014.07.104