# Evolutionary Computation - Assignment 10: Own Method
Bartosz Stachowiak 148259<br>
Andrzej Kajdasz 148273

## 1. Problem Statement

There are columns of integers representing nodes. Each row corresponds to a node and contains its x and y coordinates in a plane, as well as a cost associated with the node. There were 4 such data sets each consisting of 200 rows (each representing a single node).

Problem to solve is to choose precisely 50% of the nodes (rounding up if there is an odd number of nodes) and create a Hamiltonian cycle (a closed path) using this subset of nodes. The goal is to minimize the combined total length of the path and the total cost of the selected nodes.

To calculate the distances between nodes, the Euclidean distance formula was used and then round the results to the nearest integer. As suggested, the distances between the nodes were calculated after loading the data and placed in a matrix, so that during the subsequent evaluation of the problem, it was only necessary to read these values which reduced the cost of the operation of the algorithm.

As the algorith for the problem, we improved upon the Genetic Local Search from the previous assignment and tested the improved version on elite size varying between 5 and 50 instances.

## 2. Adjustments to the Algorithm

### 2.1. Motivation behind the changes

Looking back at our results from the previous assignments, we noticed that one glarring issue that might have been the cause of the poor performance of our algorithm was the low number of iterations made by the GLS - the best results were obtained by the GLS with lowest population size (5), which coincidentally, had the highest number of iterations.

We hypothesized that improving the time-efficiency of the algorithm and increasing the number of iterations will lead to better results.

The algorithm had two places where it could be improved in this regard:
- initialization of the population
- recombination operator

Testing various optimisations for the initialization of the population, such as:
- simplification of the generation of created instances using GreedyCycleSearch directly instead of using Local Search on randomly generated instances,
- loosening the requirement on uniqueness of the generated instances,

we arrived at the conclusion that the setup used originally in the previous assignments was the best.

Looking at the recombination operator, we noticed that our original approach (finding longest common subsequences and filling the rest of the path with the remaining nodes) was quite complex and took a lot of time. We speculated that simplification of the operator could lead to better results.

After testing various different variations, the most efficient consisted of simply iterating over the 100 indices of a solution, and at every index, choosing the node with lower weight from the two parents. If the selected node was already present in the solution, we skipped it and moved on to the next index. Finally the solution was filled out using GreedyCycleSearch.

### 2.2 Pseudocode of the new recombination operator

```
function recombine(parent_1, parent_2, nodes, distances):
    child = []
    for i in range(100):
        first_parent_node = parent_1[i]
        second_parent_node = parent_2[i]
        
        selected_node = first_parent_node
        if nodes[selected_node].weight > nodes[second_parent_node].weight:
            selected_node = second_parent_node
        
        if selected_node not in child:
            child.append(selected_node)
    
    solver = GreedyCycleSearch(nodes, distances)
    child = solver.solve(child)
    return child
```

### 2.3. Fixing the bug in evaluation scripts

When generating computational results for the EC assignments, we used powershell scripts that called the compiled c++ program with various parameters. The scripts were written in a way that they would run the program for a given set of parameters and then save the results to a file.

To speed up the process of generating the results, we start many instances of the program in parallel, each with different parameters. This was done using the `ForEach -Parallel` powershell command, with the number of instances set to **48**.

This number was chosen at the beginning, as it was sure to utilize all logical cores of the CPU, and was not changed afterwards. It did not cause any issues as the initial assignments did not limit the running time of the algorithm.

However, as ILS, LNS and finally GLS were time-bounded, such a high number of instances running in parallel caused the results to be incorrect. This was due to the fact that the machine running the script had only **16** logical cores. This implies that the maximum number of instances that could be run in parallel without causing issues was **16**.

For time bounding we used the `std::chrono` library, which is based on the system clock. This means that the time measured by the program is the time that has passed on the machine running the program. This means that if the machine is running multiple instances of the program in parallel, the time measured by the program takes into account the time spent on running other instances of the program.

We fixed this issue by changing the number of instances run in parallel to **4** (a bit lower to make sure all measurements are not influenced). This allowed us to further improve the results.

## 3. Results of the computational experiments

In [None]:
import pathlib
import itertools

import numpy as np
import matplotlib.pyplot as plt

import pandas as pd
from common import *

In [None]:
DATA_FOLDER = '../data/'
OLD_RESULTS_FOLDER = f'{DATA_FOLDER}old_results/'
RESULT_FOLDER = f'{DATA_FOLDER}results/'
INSTANCE_FOLDER = f'{DATA_FOLDER}tsp_instances/'

SOLVERS = {
    "lsap-5-r": "Adjusted Genetic LS (Pop 5)",
    "lsap-10-r": "Adjusted Genetic LS (Pop 10)",
    "lsap-20-r": "Adjusted Genetic LS (Pop 20)",
    "lsap-30-r": "Adjusted Genetic LS (Pop 30)",
    "lsap-50-r": "Adjusted Genetic LS (Pop 50)",
}

OLD_SOLVERS = {
    "lsnp-20-r" : "LNS Steepest LS (D20)",
    "lsnp-50-r" : "LNS Steepest LS (D50)",
    "lsep-5-r": "Genetic LS, with steepest post-recombine LS (Pop 5)",
}

OLD_SOLVERS_NO_ITER = {
    "lsm-r" : "Steepest Multi Start LS",
    "lsi-10-r" : "Iterated LS (Perturbation size 10)",
    "lsi-20-r" : "Iterated LS (Perturbation size 20)",
}
SOLVERS_TO_PLOT = SOLVERS.copy()
OLD_SOLVERS = {**OLD_SOLVERS_NO_ITER, **OLD_SOLVERS}
SOLVERS = {**OLD_SOLVERS, **SOLVERS}
NUM_NODES = 200

instance_files = [path for path in pathlib.Path(INSTANCE_FOLDER).iterdir() if path.is_file()]
instance_names = [path.name[:4] for path in instance_files]
p_sizes = [5, 10, 20, 30, 50]

In [None]:
instances_data = {
    name: read_instance(f'{INSTANCE_FOLDER}{name}.csv')
    for name in instance_names
}

In [None]:
instances_solvers_pairs = itertools.product(instances_data.keys(), SOLVERS.keys())

all_results = {}
all_costs = {}
all_times = {}
all_stats = {}
all_no_iterations = {}

for instance, solver in instances_solvers_pairs:
    all_results[instance] = all_results.get(instance, {})
    all_costs[instance] = all_costs.get(instance, {})
    all_times[instance] = all_times.get(instance, {})
    all_stats[instance] = all_stats.get(instance, {})
    all_no_iterations[instance] = all_no_iterations.get(instance, {})
    costs = []
    times = []
    paring_results = []
    iterations = []
    for idx in range(20):
        folder = OLD_RESULTS_FOLDER if solver in OLD_SOLVERS else RESULT_FOLDER
        if solver in OLD_SOLVERS_NO_ITER:
            solution, cost, time = read_solution(f'{folder}{instance}-{solver}-{idx}.txt')
        else:
            solution, cost, time, no_iterations = read_solution_three_feature(f'{folder}{instance}-{solver}-{idx}.txt')
            iterations.append(no_iterations)
        paring_results.append(solution)
        costs.append(cost)
        times.append(time)
        
    all_results[instance][solver] = np.array(paring_results)
    all_costs[instance][solver] = np.array(costs)
    all_stats[instance][solver] = {
        'mean': np.mean(costs),
        'std': np.std(costs),
        'min': np.min(costs),
        'max': np.max(costs),
    }
    all_times[instance][solver] = {
        'mean': np.mean(times),
        'std': np.std(times),
        'min': np.min(times),
        'max': np.max(times),
    }
    if solver not in OLD_SOLVERS_NO_ITER:
        all_no_iterations[instance][solver] = {
            'mean': np.mean(iterations),
            'std': np.std(iterations),
            'min': np.min(iterations),
            'max': np.max(iterations),
        }

In [None]:
costs_df = pd.DataFrame(all_stats).T
max_df = pd.DataFrame(all_stats).T
min_df = pd.DataFrame(all_stats).T
iterations_df = pd.DataFrame(all_no_iterations).T

for column in SOLVERS.keys():
    costs_df[column] = costs_df[column].apply(lambda x: f'{x["mean"]:.0f} ({x["min"]:.0f} - {x["max"]:.0f})')
    max_df[column] = max_df[column].apply(lambda x: x['max'])
    min_df[column] = min_df[column].apply(lambda x: x['min'])
    if column not in OLD_SOLVERS_NO_ITER:
        iterations_df[column] = iterations_df[column].apply(lambda x: f'{x["mean"]:.0f} ({x["min"]:.0f} - {x["max"]:.0f})')

for df in [costs_df, max_df, min_df, iterations_df]:
    df.rename(columns=SOLVERS, inplace=True)

### 3.1. Visualizations and statistics of cost for all dataset-algorithm pairs

In tabular form we present the Mean, Minimum and Maximum of the results of the algorithms for each dataset.

In [None]:
print("Mean (min-max) of the costs:")

best_means = {
    instance: min(all_stats[instance][solver]['mean'] for solver in SOLVERS.keys())
    for instance in instance_names
}

def apply_style(v: str, best_val: float):
    num = v.split()[0]
    try:
        num = float(num)
    except ValueError:
        return ""
    if round(num) == round(best_val):
        return "font-weight: bold; color: red"
    return ""
    


costs_df.T.style.apply(lambda x: [
    apply_style(v, best_means[x.index[i]])
    for i, v in enumerate(x)
], axis = 1)

### 3.2 Mean number of iterations

Time limits:
- TSPA: 9.12 s
- TSPB: 8.62 s
- TSPC: 6.5 s
- TSPD: 5.4 s

In [None]:
print("Mean (min-max) of the iterations:")
iterations_df.T

 ### 3.3. Visualizations of the impact of population size on the iterations number and mean cost

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(14, 5), sharex=True)

for instance in instances_data.keys():
    
    axs[0].plot(
        p_sizes,
        [all_stats[instance][f"lsap-{n}-r"]['mean'] for n in p_sizes],
        label=instance,
        marker='o', 
        linestyle='dashed'
    )
    axs[1].plot(
        p_sizes,
        [all_no_iterations[instance][f"lsap-{n}-r"]['mean'] for n in p_sizes],
        label=instance,
        marker='o', 
        linestyle='dashed'
    )

plt.suptitle(f'Genetic LS stats per population size')

axs[0].set_title("Mean cost")
axs[0].set_xlabel('Size of population')
axs[0].set_ylabel('Mean cost')

axs[1].set_title('Number of iterations')
axs[1].set_xlabel('Size of population')
axs[1].set_ylabel('Number of iterations')

plt.legend()
plt.show()

## 4. Best solutions for all datasets and algorithms

To more easily compare the results, we present the best solutions for each dataset side by side.

The weight of each node is denoted both by its size and color. The bigger and brighter the node, the higher its weight.

### 4.1 New algortithms

In [None]:
for solver_idx, solver in enumerate(SOLVERS_TO_PLOT.keys()):
    fig, axs = plt.subplots(1, 4, figsize=(20, 5))
    for idx, instance in enumerate(instances_data.keys()):
        best_instance_idx = np.argmin(all_costs[instance][solver])
        plot_solution_for_instance(instances_data[instance], all_results[instance][solver][best_instance_idx], axs[idx])
        axs[idx].set_title(f'{instance}: {all_costs[instance][solver][best_instance_idx]:.0f}')
    fig.suptitle(f'{SOLVERS[solver]}', fontsize=16, y=1.05)
plt.show()

### 4.2 Best solution for each instance from all algorithms

In [None]:
fig, axs = plt.subplots(1, 4, figsize=(20, 5))
for idx, instance in enumerate(instances_data.keys()):
    best_cost =  np.inf
    for solver_idx, solver in enumerate(SOLVERS.keys()):
         if best_cost > np.min(all_costs[instance][solver]):
                best_cost = np.min(all_costs[instance][solver])
                best_result = all_results[instance][solver][np.argmin(all_costs[instance][solver])], 
                best_solver = solver
    best_instance_idx = np.argmin(all_costs[instance][best_solver])
    plot_solution_for_instance(instances_data[instance], all_results[instance][best_solver][best_instance_idx], axs[idx])
    axs[idx].set_title(f'{instance}: {all_costs[instance][best_solver][best_instance_idx]:.0f}')
    print(instance)
    print(f'\tSolver: {SOLVERS[best_solver]}, Total cost: {best_cost}')
    nodes = list(best_result[0])
    if 0 in best_result[0]:
        zero_index = np.where(best_result[0] == 0)[0][0]
        nodes = list(best_result[0][zero_index:])+list(best_result[0][:zero_index])
    print(f'\t Nodes: {nodes}\n')
plt.show()

## 5. Source Code

[GitHub](https://github.com/Tremirre/ECP)

## 6. Conclusions

Analyzing the results and visualizations, one can come to several conclusions about the algorithms used in the task:
- The algorithm created achieved better average results than all those implemented so far.
- The exploration of the solution space in GLS provided by the number of iterations is more valuable than the one obtained with sophisticated recombination operators.
- For the TSPA and TSPB instances, the best results obtained are identical to those obtained by Large Neighbourhood Search. So there is a chance that this local minimum is also the global optimum or very close to it. For TSPC, the minimum obtained is the best among the other algorithms. For the TSPD instance, the minimum score is 100 worse than that obtained by LNS.
- Contrary to the results in previous assignment, now the best results increase with the increase of the population size.
- Even though the improved algorithm decisively outperforms the previous ones, it still does not find better overall solutions for TSPA and TSPD (it only finds the same best solutions). This might suggest that these solutions are close to the global optimum.