# Evolutionary Computation - Assignment 9: Hybrid evolutionary algorithm
Bartosz Stachowiak 148259<br>
Andrzej Kajdasz 148273

## 1. Problem Statement

There are columns of integers representing nodes. Each row corresponds to a node and contains its x and y coordinates in a plane, as well as a cost associated with the node. There were 4 such data sets each consisting of 200 rows (each representing a single node).

Problem to solve is to choose precisely 50% of the nodes (rounding up if there is an odd number of nodes) and create a Hamiltonian cycle (a closed path) using this subset of nodes. The goal is to minimize the combined total length of the path and the total cost of the selected nodes.

To calculate the distances between nodes, the Euclidean distance formula was used and then round the results to the nearest integer. As suggested, the distances between the nodes were calculated after loading the data and placed in a matrix, so that during the subsequent evaluation of the problem, it was only necessary to read these values which reduced the cost of the operation of the algorithm.

To solve the problem the hybrid evolutionary algorithm were used with the following parameters:
- Elite population - variable sizes from 5 to 75
- Steady state algorithm
- Parents selected from the population with the uniform probability
- No copies of the same solution in the population (checked by comparing costs)

## 2. Pseudocode of all implemented algorithms

```
function generate_population(solution, nodes, distances, pop_size):
    population = []
    while len(population) < pop_size:
        solution = generate_random_solution(nodes)
        cost = evaluate(solution, nodes, distances)
    
        in_population = False
        for instance in population:
            if instance.cost == cost:
                in_population = True
                break

        if not in_population:
            population.append((solution, cost))

    population.sort(key=lambda x: x[1])
    return population


function select_parents(population):
    indices = shuffle(range(len(population)))
    parent_1 = population[indices[0]]
    parent_2 = population[indices[1]]
    return parent_1, parent_2


function edges_to_segments(edges):
    segments = []
    for edge in edges:
        matching_segments = []
        edge_cant_be_used = False

        for segment in segments:

            for i in range(1, len(segment) - 1):
                if segment[i] in (edge[0], edge[1]):
                    edge_cant_be_used = True
                    break

            if edge_cant_be_used:
                break

            if segment[0] in (edge[0], edge[1]) or segment[-1] in (edge[0], edge[1]):
                matching_segments.append(segment)
        
        if edge_cant_be_used:
            continue

        if len(matching_segments) == 0:
            segments.append(edge)
        
        elif len(matching_segments) == 1:
            segment = matching_segments[0]
            if segment[0] == edge[0]:
                segment.insert(0, edge[1])
            elif segment[0] == edge[1]:
                segment.insert(0, edge[0])
            elif segment[-1] == edge[0]:
                segment.append(edge[1])
            elif segment[-1] == edge[1]:
                segment.append(edge[0])
        
        elif len(matching_segments) == 2:
            segment_1 = matching_segments[0]
            segment_2 = matching_segments[1]

            if segment_1[0] == edge[0] or segment_1[0] == edge[1]:
                segment_1.reverse()

            if segment_2[0] == edge[0] or segment_2[0] == edge[1]:
                segment_2.reverse()
            
            segment_1.extend(segment_2)
            segments.remove(segment_2)
        
    return segments


function recombine(parent_1, parent_2, nodes, distances):
    p1_edges = set(sorted(parent_1[i], parent_1[(i + 1) % len(parent_1)]) for i in range(len(parent_1)))
    p2_edges = set(sorted(parent_2[i], parent_2[(i + 1) % len(parent_2)]) for i in range(len(parent_2)))

    common_edges = p1_edges.intersection(p2_edges)

    # extract common edges into longest common segments
    segments = edges_to_segments(common_edges)

    shuffle(segments)

    child = segments[0]
    for segment in segments[1:]:
        child.extend(segment)

    solver = GreedyCycleSolver()
    child = solver.solve(child, nodes, distances, len(nodes) // 2)

    return child


function genetic_local_search(solution, nodes, distances, max_time, local_search, pop_size):
    
    # population a sorted list of tuples (solution, cost)
    population = generate_population(solution, nodes, distances, pop_size)

    start_time = time()
    iteration = 0

    while time() - start_time < max_time:
        iteration += 1

        # sampled from the population with the uniform probability withou replacement
        parent_1, parent_2 = select_parents(population)

        child = recombine(parent_1, parent_2, nodes, distances)

        if local_search:
            child = local_search(child, nodes, distances)
        
        new_cost = evaluate(child, nodes, distances)
        
        in_population = False

        for instance in population:
            if instance.cost == new_cost:
                in_population = True
                break
        
        if new_cost < population[-1].cost and not in_population:
            population[-1] = (child, new_cost)
            population.sort(key=lambda x: x[1])

    return population[0][0]
```

## 3. Results of the computational experiments

In [None]:
import json
import pathlib
import itertools

import numpy as np
import matplotlib.pyplot as plt

import pandas as pd
from common import *

In [None]:
DATA_FOLDER = '../data/'
OLD_RESULTS_FOLDER = f'{DATA_FOLDER}old_results/'
RESULT_FOLDER = f'{DATA_FOLDER}results/'
INSTANCE_FOLDER = f'{DATA_FOLDER}tsp_instances/'

SOLVERS = {
    "lseo-5-r": "Genetic LS, without post-recombine LS (Pop 5)",
    "lseo-10-r": "Genetic LS, without post-recombine LS (Pop 10)",
    "lseo-20-r": "Genetic LS, without post-recombine LS (Pop 20)",
    "lseo-30-r": "Genetic LS, without post-recombine LS (Pop 30)",
    "lseo-50-r": "Genetic LS, without post-recombine LS (Pop 50)",
    "lseo-75-r": "Genetic LS, without post-recombine LS (Pop 75)",
    "lsep-5-r": "Genetic LS, with steepest post-recombine LS (Pop 5)",
    "lsep-10-r": "Genetic LS, with steepest post-recombine LS (Pop 10)",
    "lsep-20-r": "Genetic LS, with steepest post-recombine LS (Pop 20)",
    "lsep-30-r": "Genetic LS, with steepest post-recombine LS (Pop 30)",
    "lsep-50-r": "Genetic LS, with steepest post-recombine LS (Pop 50)",
    "lsep-75-r": "Genetic LS, with steepest post-recombine LS (Pop 75)"
}

OLD_SOLVERS = {
    "lsnp-10-r" : "LNS Steepest LS (D10)",
    "lsnp-20-r" : "LNS Steepest LS (D20)",
    "lsnp-30-r" : "LNS Steepest LS (D30)",
    "lsnp-50-r" : "LNS Steepest LS (D50)",
    "lsnp-75-r" : "LNS Steepest LS (D75)",
    "lsno-75-r" : "LNS no LS (D75)",
}

OLD_SOLVERS_NO_ITER = {
    "lsm-r" : "Steepest Multi Start LS",
    "lsi-10-r" : "Iterated LS (Perturbation size 10)",
    "lsi-20-r" : "Iterated LS (Perturbation size 20)",
}
SOLVERS_TO_PLOT = SOLVERS.copy()
OLD_SOLVERS = {**OLD_SOLVERS_NO_ITER, **OLD_SOLVERS}
SOLVERS = {**OLD_SOLVERS, **SOLVERS}
NUM_NODES = 200

instance_files = [path for path in pathlib.Path(INSTANCE_FOLDER).iterdir() if path.is_file()]
instance_names = [path.name[:4] for path in instance_files]
p_sizes = [5, 10, 20, 30, 50, 75]

In [None]:
instances_data = {
    name: read_instance(f'{INSTANCE_FOLDER}{name}.csv')
    for name in instance_names
}

In [None]:
instances_solvers_pairs = itertools.product(instances_data.keys(), SOLVERS.keys())

all_results = {}
all_costs = {}
all_times = {}
all_stats = {}
all_no_iterations = {}

for instance, solver in instances_solvers_pairs:
    all_results[instance] = all_results.get(instance, {})
    all_costs[instance] = all_costs.get(instance, {})
    all_times[instance] = all_times.get(instance, {})
    all_stats[instance] = all_stats.get(instance, {})
    all_no_iterations[instance] = all_no_iterations.get(instance, {})
    costs = []
    times = []
    paring_results = []
    iterations = []
    for idx in range(20):
        folder = OLD_RESULTS_FOLDER if solver in OLD_SOLVERS else RESULT_FOLDER
        if solver in OLD_SOLVERS_NO_ITER:
            solution, cost, time = read_solution(f'{folder}{instance}-{solver}-{idx}.txt')
        else:
            solution, cost, time, no_iterations = read_solution_three_feature(f'{folder}{instance}-{solver}-{idx}.txt')
            iterations.append(no_iterations)
        paring_results.append(solution)
        costs.append(cost)
        times.append(time)
        
    all_results[instance][solver] = np.array(paring_results)
    all_costs[instance][solver] = np.array(costs)
    all_stats[instance][solver] = {
        'mean': np.mean(costs),
        'std': np.std(costs),
        'min': np.min(costs),
        'max': np.max(costs),
    }
    all_times[instance][solver] = {
        'mean': np.mean(times),
        'std': np.std(times),
        'min': np.min(times),
        'max': np.max(times),
    }
    if solver not in OLD_SOLVERS_NO_ITER:
        all_no_iterations[instance][solver] = {
            'mean': np.mean(iterations),
            'std': np.std(iterations),
            'min': np.min(iterations),
            'max': np.max(iterations),
        }

In [None]:
costs_df = pd.DataFrame(all_stats).T
max_df = pd.DataFrame(all_stats).T
min_df = pd.DataFrame(all_stats).T
iterations_df = pd.DataFrame(all_no_iterations).T

for column in SOLVERS.keys():
    costs_df[column] = costs_df[column].apply(lambda x: f'{x["mean"]:.0f} ({x["min"]:.0f} - {x["max"]:.0f})')
    max_df[column] = max_df[column].apply(lambda x: x['max'])
    min_df[column] = min_df[column].apply(lambda x: x['min'])
    if column not in OLD_SOLVERS_NO_ITER:
        iterations_df[column] = iterations_df[column].apply(lambda x: f'{x["mean"]:.0f} ({x["min"]:.0f} - {x["max"]:.0f})')

for df in [costs_df, max_df, min_df, iterations_df]:
    df.rename(columns=SOLVERS, inplace=True)

### 3.1. Visualizations and statistics of cost for all dataset-algorithm pairs

In tabular form we present the Mean, Minimum and Maximum of the results of the algorithms for each dataset.

In [None]:
print("Mean (min-max) of the costs:")

best_means = {
    instance: min(all_stats[instance][solver]['mean'] for solver in SOLVERS.keys())
    for instance in instance_names
}

def apply_style(v: str, best_val: float):
    num = v.split()[0]
    try:
        num = float(num)
    except ValueError:
        return ""
    if round(num) == round(best_val):
        return "font-weight: bold; color: red"
    return ""
    


costs_df.T.style.apply(lambda x: [
    apply_style(v, best_means[x.index[i]])
    for i, v in enumerate(x)
], axis = 1)

### 3.2 Mean number of iterations

Time limits:
- TSPA: 9.12 s
- TSPB: 8.62 s
- TSPC: 6.5 s
- TSPD: 5.4 s

In [None]:
print("Mean (min-max) of the iterations:")
iterations_df.T

 ### 3.3. Visualizations of the impact of population size on the iterations number and mean cost

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(15, 11), sharex=True, sharey='row')

for instance in instances_data.keys():
    
    axs[0][0].plot(
        p_sizes,
        [all_stats[instance][f"lseo-{n}-r"]['mean'] for n in p_sizes],
        label=instance,
        marker='o', 
        linestyle='dashed'
    )
    axs[0][1].plot(
        p_sizes,
        [all_stats[instance][f"lsep-{n}-r"]['mean'] for n in p_sizes],
        label=instance,
        marker='o', 
        linestyle='dashed'
    )
    axs[1][0].plot(
        p_sizes,
        [all_no_iterations[instance][f"lseo-{n}-r"]['mean'] for n in p_sizes],
        label=instance,
        marker='o', 
        linestyle='dashed'
    )
    axs[1][1].plot(
        p_sizes,
        [all_no_iterations[instance][f"lsep-{n}-r"]['mean'] for n in p_sizes],
        label=instance,
        marker='o', 
        linestyle='dashed'
    )

plt.suptitle(f'Genetic LS stats per population size')

axs[0][0].set_title("Mean cost (no post-recombine LS)")
axs[0][0].set_xlabel('Size of population')
axs[0][0].set_ylabel('Mean cost')

axs[0][1].set_title('Mean cost (with post-recombine LS)')
axs[0][1].set_xlabel('Size of population')
axs[0][1].set_ylabel('Mean cost')

axs[1][0].set_title('Mean number of iterations (no post-recombine LS)')
axs[1][0].set_xlabel('Size of population')
axs[1][0].set_ylabel('Number of iterations')

axs[1][1].set_title('Mean number of iterations (with post-recombine LS)')
axs[1][1].set_xlabel('Size of population')
axs[1][1].set_ylabel('Number of iterations')

plt.legend()
plt.show()

## 4. Best solutions for all datasets and algorithms

To more easily compare the results, we present the best solutions for each dataset side by side.

The weight of each node is denoted both by its size and color. The bigger and brighter the node, the higher its weight.

### 4.1 New algortithms

In [None]:
for solver_idx, solver in enumerate(SOLVERS_TO_PLOT.keys()):
    fig, axs = plt.subplots(1, 4, figsize=(20, 5))
    for idx, instance in enumerate(instances_data.keys()):
        best_instance_idx = np.argmin(all_costs[instance][solver])
        plot_solution_for_instance(instances_data[instance], all_results[instance][solver][best_instance_idx], axs[idx])
        axs[idx].set_title(f'{instance}: {all_costs[instance][solver][best_instance_idx]:.0f}')
    fig.suptitle(f'{SOLVERS[solver]}', fontsize=16, y=1.05)
plt.show()

### 4.2 Best solution for each instance from all algorithms

In [None]:
fig, axs = plt.subplots(1, 4, figsize=(20, 5))
for idx, instance in enumerate(instances_data.keys()):
    best_cost =  np.inf
    for solver_idx, solver in enumerate(SOLVERS.keys()):
         if best_cost > np.min(all_costs[instance][solver]):
                best_cost = np.min(all_costs[instance][solver])
                best_result = all_results[instance][solver][np.argmin(all_costs[instance][solver])], 
                best_solver = solver
    best_instance_idx = np.argmin(all_costs[instance][best_solver])
    plot_solution_for_instance(instances_data[instance], all_results[instance][best_solver][best_instance_idx], axs[idx])
    axs[idx].set_title(f'{instance}: {all_costs[instance][best_solver][best_instance_idx]:.0f}')
    print(instance)
    print(f'\tSolver: {SOLVERS[best_solver]}, Total cost: {best_cost}')
    nodes = list(best_result[0])
    if 0 in best_result[0]:
        zero_index = np.where(best_result[0] == 0)[0][0]
        nodes = list(best_result[0][zero_index:])+list(best_result[0][:zero_index])
    print(f'\t Nodes: {nodes}\n')
plt.show()

## 5. Source Code

[GitHub](https://github.com/Tremirre/ECP)

## 6. Conclusions



Analyzing the results and visualizations, one can come to several conclusions about the algorithms used in the task:
- **Genetic Local Search without post-recombine Local Search** achieved **worse results on average** than all previewed alternatives.
- Genetic Local Search with post-recombine Local Search achieved better mean results than Steepest Multi Start Local Search, Iterated Local Search for all instances. Compared to LNS with Steepest LS, the results achieved are comparable and vary depending on the instance. For TSPA, TSPC and TSPD, Large Neighbourhood Search was better, but the difference between the best mean results obtained by the two algorithms does not exceed 1000. In TSPB, Genetic LS achieved better results but the improvement is small (<1%).
- GLS has failed to find better solutions overall than LNS. 
- Post-recombination local search in GLS significantly improves performance of the algorithm. 
- The mean cost of the solutions found by GLS with post-recombine LS **decreases with the increase of the population size** (albeit only by a small margin), with best performance observed for pop size of 5. On the contrary, the mean cost of the solutions found by GLS without post-recombine LS **increases with the increase of the population size**. 
- The mean number of iterations **decreases with the increase of the population size** for both GLS with and without post-recombine LS.
- The mean number of iterations for GLS with post-recombine LS is **significantly lower** than for GLS without post-recombine LS due to additional local search on each iteration.