# Evolutionary algorithms analysis

Complete run of the evolutionary algorithms analysis. This notebook is used to run different evolutionary algorithms and test them on benchmark functions from benchamrk_function.py file. The results are saved in the results folder. 

Analysis is done via average ranks, with specific notes on outliers.

In [1]:
# Evo algorithms setup

# Imports
import source.evolutionary_algorithms as ea
import numpy as np
import source.helpers as hp
import importlib
import inspect
import time
import sys
import os

# Load benchmark functions
module_path = os.path.abspath(os.path.join('./source'))
sys.path.append(module_path)
benchmarks = importlib.import_module('benchmark_functions')
functions = [obj for _, obj in inspect.getmembers(benchmarks) if inspect.isfunction(obj)]

# Hyperparameters
DIMENSIONS = [2, 10, 30]

lower_bound = -100.0
upper_bound = 100.0

POPULATION_SIZES = [10, 20, 50]
FUNCTION_EVALUATIONS = 2000
RESULTS = dict()
REPEATS = 30

## Running ea

Run ea for benchmark functions and write results to global variables.


In [2]:
USE_PERSISTENT_DATA = False
FILENAME = f"results/full_run_1702951254.0896728.json" # Rename this in case of 

if USE_PERSISTENT_DATA:
    RESULTS = hp.load_results(FILENAME)
    print("Loaded results from file")
else:
    for i, dim in enumerate(DIMENSIONS):
        BOUNDS = np.array([[lower_bound, upper_bound]] * DIMENSIONS[i])
        pop_size = POPULATION_SIZES[i]
        evaluations = dim * FUNCTION_EVALUATIONS
        dimension_results = []
        print(f"Calculating dimension {dim} - ", end="")
        counter = 0

        for function in functions:
            runs = []
            counter += 1
            
            for r in range(REPEATS):
                row = [0, 0, 0, 0, 0]
                row[0] = ea.differential_evolution(function, BOUNDS, pop_size, max_evaluations=evaluations, F=0.8, strategy="rand/1/bin")[1]
                row[1] = ea.differential_evolution(function, BOUNDS, pop_size, max_evaluations=evaluations, F=0.5, strategy="best/1/bin")[1]
                row[2] = ea.particle_swarm_optimization(function, BOUNDS, pop_size, max_evaluations=evaluations)[1]
                row[3] = ea.soma_ato(function, BOUNDS, pop_size, max_evaluations=evaluations)[1]
                row[4] = ea.soma_ata(function, BOUNDS, pop_size, max_evaluations=evaluations)[1]
                runs.append(row)

            print("#", end="")
            if counter % 5 == 0:
                print("|", end="")

            dimension_results.append(runs)
        
        RESULTS[dim] = dimension_results
        print("")
        
    if REPEATS == 30:
        RUN_NAME = "full"
    else:
        RUN_NAME = "partial"

    hp.save_results(RESULTS, f"results/{RUN_NAME}_run_{time.time()}.json")


Calculating dimension 2 - #####|#####|#####|#####|#####|
Calculating dimension 10 - #####|#####|#####|#####|#####|
Calculating dimension 30 - #####|#####|#####|#####|#####|


## Analyze results

Legend/note:
- **Rank**: evolutionary algorithm's average rank for benchmark function across number of runs (REPEATS)
- **Rank Avg.**: distance between Rank and Total Avg. Rank
- **Total Avg. Rank**: average rank of evolutionary algorithm across all benchmark functions
- **Chi square**: used to test if Rank Avg. is significantly different from Average
  - if p-value is less than 0.05 then Rank Avg. is significantly different from Average

In [3]:
for dim in RESULTS:
    print(f"\nResults for {dim} dimensions:\n")
    ranking_table = []
    hp.table_header()
    result_dict = dict()
    for i, fun in enumerate(RESULTS[dim]):
        res = np.zeros(5)
        for run in fun:
            res += hp.rank_array(run)
        res /= REPEATS 
        result_dict[functions[i]._custom_name] = res
        ranking_table.append(res)

    average_ranks = np.mean(ranking_table, axis=0)
    avg_distance = 0
    
    for pair in result_dict:
        result_rank = result_dict[pair]
        distance = hp.euclidean_distance(result_rank, average_ranks)
        avg_distance += distance
        hp.table_row(pair, result_rank, distance)

    avg_distance /= len(result_dict)
    
    hp.table_footer(average_ranks, avg_distance)
    chi, p = hp.friedman_test(ranking_table)
    print("\nFriedman test:")
    print("Chi-square: {:>14.6}\nP-value: {:>18.6}".format(chi, p))

    print("\n\n")


Results for 2 dimensions:

--------------------------------------------------------------------------------------------------
| Evo. Algorithm → | DE Rand 1  | DE Best 1  |    PSO     |  SOMA ato  |  SOMA ata  | Rank Avg.  |
| Benchmark ↓      |   (Rank)   |   (Rank)   |   (Rank)   |   (Rank)   |   (Rank)   |   (Diff)   |
--------------------------------------------------------------------------------------------------
| Ackley 1st       |    2.73    |    3.20    |    3.47    |    2.93    |    3.57    |    0.93    |
| Ackley 1st Alt.  |    3.87    |    4.63    |    1.23    |    2.90    |    2.37    |    2.70    |
| Alpine 1st       |    1.50    |    3.10    |    2.27    |    3.50    |    4.67    |    1.58    |
| Alpine 2nd       |    2.47    |    4.40    |    3.13    |    1.80    |    3.20    |    1.53    |
| Csendes          |    1.47    |    2.23    |    2.60    |    3.73    |    4.97    |    2.23    |
| Custom 1 (LLM)   |    2.20    |    4.13    |    3.80    |    1.97    |    2.97 

### Analysis of results
*(based on data from results/full_run.json)*

Chi-square doesn't get lower between 10 and 30 dimensions, yet SOMA ato shows significant improvement. This is balanced out by much worse performance by DE rand/1/bin.

### Best and worst performers

#### DE rand/1/bin

Rand 1 bin strategy did very well in 2 and 10 dimensions, with average performance in 30 dimensions. 

It excelled across many benchmarks, most notably:

- **Svanda 2nd**: my custom function requires ever so increasing amount of climbing with high heuristic factor to reach the global minimum and not fall short in local minima surrounding it. Random factor of choosing the next point is very helpful in escaping those global minima.

- **Ackley 1st**: Ackley function slowly descending towards global minima with small hills and local minima traps. It provides similar problem to my 2nd function. The poor performance of DE rand/1/bin in 2 dimensions I account to not enough steps for significant heuristic factor to outperform better strategy.

With poor performance in:

- **Zakharov**: Zakharov function is slowly descending towards global minima. This performance difference between dim 10 and 30 is unbelievable and I can't seem explain it. It seems in dim 30 there was so many possibilities that random factor was actively hurting the performance of strategy, as all that was needed was to go in the direction of lowest member of population.

- **Alpine 1st**: Alpine function is very similar to Ackley, but more pronounced. It has more hills/spikes and traps, which makes it harder to escape local minima. Another suspicious data with dim 10 and 30 difference with possible explanation that in dim 30 random factor is far more likely to go in an unfavorable direction. Similiar can be seen with my Svanda 1st and 5th functions, where in dim 30 this spikes and traps make DE rand/1/bin perform worse than most strategies.
  
#### DE best/1/bin

Best 1 bin strategy did average to worse than other strategies, working mainly when it was most fortuitous to commit in direction of lucky population members area.

Notably good performance in:

- **Zakharov**: Zakharov's descent to global minima was ideal for this strategy especially in dim 30, when it allowed population to effectively converge close to the global minima.

- **Michalewicz**: Michalewicz function with my altered version of it showed great performance in dim 30. Michaelwicz offers decline towards global minima with many flat areas. It promotes convergence of population towards minima with lots of movement. Altered version offers local minima pits balanced by spiky hills. Best 1 bin strategy allows it to better commit to local minima, which in difficult higher dimension space is key.

#### PSO

PSO did average with some outliers. Its clear advantage is hive like exploration/movement which allows it to escape local minima and traverse difficult terrain.

Notably good performance in:

- **Ackley 1st altered**: My altered version of Ackley's first function offer extremely sharp spikes and pits requiring exploration through difficult local terrain which hive like migration offers. While in 2 and 10 dimensions it did fine, in 30 in performed exceptionally well.

- **Levy 1st**: Levy's first function offers similar terrain to Ackley with descent towards bottomed out global minima. Very sharp spikes and pits again first hive's exploration/movement. Again with average performance in 2 and 10 dimensions, with great ver good performance in dim 30.

- **Michalewicz**: In 2nd dimension PSO did very good, again because of its ability to travel space and absence of local minima.

While overall good and fairly consistent, PSO did quite poorly in:

- **Svanda 2nd and 3rd**: In higher dimensions PSO seems to do quite poorly in functions like these. What I perceive to be main culprit are zones of local minima that contain very similar values. This would result circling mentioned local minima and wasting a lot of steps attempting to form around one place, without that much global exploration.

#### SOMA ato

SOMA ato did very well in 10, superbly in 30 dimensions, with average performance in 2 dimensions. It seems that migration split into individual steps with partial heuristic factor in direction with PRT vector allow better exploration. A certain direction (with some random factor) is explored with many steps, allowing whole population to move to a overall better place after migration, than it would, were it to change the direction for the current best after each step. This ability is that more important in higher dimensions, where in difficuilt space ability to converge aswell as to explore

While overall great, it did poorly in:

- **Ackley 1st altered**: Performance in higher dimension was bad compared to other benchmark. This is mainly due to the difficult nature od model that doesn't favor exploration and requires algorithm to have "committing" aspect, that could inch it towards local minima in small steps.

- **Svanda 2nd**: Nature of this functions didn't incentivize exploration, especially around global minima. Combined with model being that much more difficult in higher dimensions, it resulted in poor performance.

- **Svanda 6th**

#### SOMA ata

This strategy suffers heavily from the same migration steps that helped SOMA ato. Here much of the algorithm steps/time is wasted on exploring local area around each particle. As leaders swap places, the whole population moves in a direction of the current leader, but only by a small amount. This is especially visible in 30 dimensions, where it simply cannot explore enough and 

Outside of dimension 2 it is barely able to do better than produce worst result.

