# Evaluation of the performance of difference configurations

Here we compare the performance of the different configurations, specified in `./configurations.json`, on the benchmark instances, specified in `./instances.json.zip`.
If you adapt the configurations or add new instances, you can re-run this notebook to evaluate the performance of the new configurations on the new instances.

In [1]:
from algbench import describe, read_as_pandas

## Look onto the data format

The following output shows you the format of the data that has been produced by the benchmarking script.
It contains a lot of additional data, in case there are some anomalies in the data and we need to investigate them.

In [2]:
describe("./results")

 result:
| ub: 473.3792386060225
| lb: 441.44809426510574
| stats:
|| num_explored: 13954
|| num_branches: 13951
|| num_iterations: 15954
| n: 126
 timestamp: 2023-05-26T15:23:10.806535
 runtime: 120.2784583568573
 stdout: 
 stderr: 
 env_fingerprint: dff294d2e80a73937f84e9442a654782dd0bc7f4
 args_fingerprint: 665eafaacacc672b1ea4702fcccdb745b1ed97b3
 parameters:
| func: measure
| args:
|| instance_name: rnd_ol01/CETSP-200-0
|| configuration:
||| root: ConvexHull
||| branching: ChFarthestCircleSimplifying
||| search: DfsBfs
||| rules: []
||| num_threads: 8
|| timelimit: 120
 argv: ['/ibr/home/krupke/anaconda3/envs/mo310/lib/python3.10/site-packages/slurmina...
 env:
| hostname: algry01
| python_version: 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0]
| python: /ibr/home/krupke/anaconda3/envs/mo310/bin/python3
| cwd: /misc/ibr/home/krupke/close-enough-tsp/benchmark
| environment: [{'name': 'tzdata', 'path': '/misc/ibr/home/krupke/anaconda3/envs/mo310/lib/p...
| git_revision: e8c23b053

## Extract data

We extract the important data from the output in `./results` and store it in a pandas dataframe.

Additionally, we remove the trivial instances, i.e., instances that even the worst algorithm could solve to near-optimality.

In [3]:
# parse results
result = read_as_pandas("./results/", 
                        lambda data: {"instance": data["parameters"]["args"]["instance_name"],  # unique instance name
                                      "alg": str(data["parameters"]["args"]["configuration"]), # a string representation of the algorithm configuration
                                    "conf": data["parameters"]["args"]["configuration"], # the algorithm configuration as dictionary
                                                    "lb": data["result"]["lb"], # the lower bound computed by the algorithm
                                                    "ub": data["result"]["ub"], # the upper bound (objective value) computed by the algorithm
                                                    "runtime": data["runtime"], # the runtime in seconds
                                                    "iterations": float(data["result"]["stats"]["num_iterations"]), # the number of iterations performed by the algorithm
                                                    "n": data["result"]["n"]} # the size of the instance, after preprocessing
                                                    )
result["gap"] = (1-result["lb"]/result["ub"])*100   # compute the gap in percent
result["gap_above_tol"] = result["gap"].apply((lambda x: max(0, x-1.0)))   # compute the gap above 1.0% in percent. The algorithm will stop if this gap is reached, and there should be no penalty for stopping directly at 1.0% gap.
result.drop_duplicates(subset=["alg", "instance"], inplace=True)  # remove duplicate entries that can appear in rare cases as a result of parallel execution

# remove trivial instances
t = result.groupby("instance")[["gap_above_tol"]].max()  # compute the performance of the worst algorithm on each instance
trivial_instances  = t[t["gap_above_tol"]<0.1].index.unique().tolist()  # if even the worst algorithm is within 0.1% of the optimum, we consider the instance trivial
result = result[~result["instance"].isin(trivial_instances)] # remove these trivial instances

# show overview
result.groupby("alg")[["runtime", "gap", "gap_above_tol", "iterations"]].mean()

Unnamed: 0_level_0,runtime,gap,gap_above_tol,iterations
alg,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircle', 'search': 'DfsBfs', 'rules': [], 'num_threads': 8}",30.48792,1.264925,0.333889,8930.97619
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestBreadthFirst', 'rules': [], 'num_threads': 8}",33.282468,4.841202,4.515834,1885.494048
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestChildDepthFirst', 'rules': [], 'num_threads': 8}",27.783599,4.711472,3.786029,71077.285714
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['GlobalConvexHullRule'], 'num_threads': 8}",23.358498,1.07471,0.143705,11284.636905
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['LayeredConvexHullRule'], 'num_threads': 8}",24.222612,1.078644,0.160354,11235.178571
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 1}",33.242015,1.588238,0.658679,5667.077381
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 2}",32.302107,1.528298,0.594245,6467.910714
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 4}",29.422869,1.306026,0.374266,9142.017857
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 8}",27.617038,1.177952,0.247285,11129.363095
"{'root': 'LongestEdgePlusFurthestCircle', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestBreadthFirst', 'rules': [], 'num_threads': 8}",37.423035,5.532199,5.16631,2527.744048


You should use `gap_above_tol` as the main performance measure.
The lower the value, the better the perfomance. Be sure that the benchmark script has completed before you run this notebook.

Runtime is also an important measure, but it is not the main measure.

The `gap` and the `iterations` are skewed measures, but they are interesting to look at.
As you can see, the number of iterations has nothing to do with the quality of the solution.
Actually, it indicates that more time has been wasted on iterations looking into the wrong direction.

The clear winners of this benchmark are the configurations `{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['GlobalConvexHullRule'], 'num_threads': 8}` and `{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['LayeredConvexHullRule'], 'num_threads': 8}`.

Actually, the instances seem to easy for the solver, as the average optimality gaps are very small.
We can look onto the worst-case performance, too.
Here the `LayeredConvexHullRule` shows a small advantage over the `GlobalConvexHullRule`.

In [9]:
result.groupby("alg")[["runtime", "gap_above_tol"]].max()

Unnamed: 0_level_0,runtime,gap_above_tol
alg,Unnamed: 1_level_1,Unnamed: 2_level_1
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircle', 'search': 'DfsBfs', 'rules': [], 'num_threads': 8}",120.247049,6.562996
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestBreadthFirst', 'rules': [], 'num_threads': 8}",120.138384,29.040468
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestChildDepthFirst', 'rules': [], 'num_threads': 8}",120.209541,25.651603
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['GlobalConvexHullRule'], 'num_threads': 8}",120.239268,4.865566
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['LayeredConvexHullRule'], 'num_threads': 8}",120.231247,4.846507
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 1}",120.122373,9.081383
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 2}",120.14799,8.561251
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 4}",120.184394,6.858841
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 8}",120.284601,5.745362
"{'root': 'LongestEdgePlusFurthestCircle', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestBreadthFirst', 'rules': [], 'num_threads': 8}",120.125684,29.3095


In [4]:
# number of non-trivial instances left
result["instance"].nunique()

168

In [5]:
# the number of instances solved to near-optimality by each configuration (configurations that solve more instances are better).
# Configurations that are not listed here did not solve any (non-trivial) instance to near-optimality.
result[result["gap_above_tol"]<=0.01].groupby(["alg"])[["instance"]].nunique()

Unnamed: 0_level_0,instance
alg,Unnamed: 1_level_1
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircle', 'search': 'DfsBfs', 'rules': [], 'num_threads': 8}",139
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestBreadthFirst', 'rules': [], 'num_threads': 8}",128
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestChildDepthFirst', 'rules': [], 'num_threads': 8}",137
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['GlobalConvexHullRule'], 'num_threads': 8}",154
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['LayeredConvexHullRule'], 'num_threads': 8}",155
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 1}",135
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 2}",137
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 4}",140
"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 8}",147
"{'root': 'LongestEdgePlusFurthestCircle', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestBreadthFirst', 'rules': [], 'num_threads': 8}",121


In [6]:
# Check that the  lower bound is always smaller than the upper bound.
# We allow a tiny tolerance of 0.001, because the lower bound is computed by inexact arithmetics, which can lead to small numerical errors.
import pandas as pd
t = pd.merge(result.groupby(["instance"])[["lb"]].max(), result.groupby(["instance"])[["ub"]].min(), on="instance")
(t["lb"]<=t["ub"]+0.001).all()

False

In [7]:
t = pd.merge(result.groupby(["instance"])[["lb"]].min(), result.groupby(["instance"])[["ub"]].max(), on="instance")
t[t["ub"]<=t["lb"]*1.05].index.unique().to_list()
#t

[]

In [10]:
# looking at a single, random instance
import random
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option("display.max_colwidth", 500)
t = result[result["instance"]==random.choice(result["instance"].unique().tolist())].sort_values(by=["alg"]).drop(columns="conf")
t

Unnamed: 0,instance,alg,lb,ub,runtime,iterations,n,gap,gap_above_tol
8,rnd_ol01/CETSP-200-1,"{'root': 'ConvexHull', 'branching': 'ChFarthestCircle', 'search': 'DfsBfs', 'rules': [], 'num_threads': 8}",448.653467,458.506627,120.213732,16009.0,126,2.148968,1.148968
12,rnd_ol01/CETSP-200-1,"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestBreadthFirst', 'rules': [], 'num_threads': 8}",447.287594,554.864728,120.083363,6115.0,126,19.387993,18.387993
11,rnd_ol01/CETSP-200-1,"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestChildDepthFirst', 'rules': [], 'num_threads': 8}",370.312828,465.694768,120.161316,278047.0,126,20.481643,19.481643
6882,rnd_ol01/CETSP-200-1,"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['GlobalConvexHullRule'], 'num_threads': 8}",452.140365,459.063144,120.187719,21687.0,126,1.508023,0.508023
6883,rnd_ol01/CETSP-200-1,"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': ['LayeredConvexHullRule'], 'num_threads': 8}",452.289586,459.063144,120.193321,22135.0,126,1.475518,0.475518
2545,rnd_ol01/CETSP-200-1,"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 1}",439.556596,459.063236,120.122373,5788.0,126,4.249227,3.249227
2544,rnd_ol01/CETSP-200-1,"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 2}",440.927201,459.063236,120.107688,6747.0,126,3.950662,2.950662
2543,rnd_ol01/CETSP-200-1,"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 4}",445.749363,459.063236,120.149698,12431.0,126,2.900226,1.900226
7,rnd_ol01/CETSP-200-1,"{'root': 'ConvexHull', 'branching': 'ChFarthestCircleSimplifying', 'search': 'DfsBfs', 'rules': [], 'num_threads': 8}",448.678107,459.063236,120.229541,18360.0,126,2.262244,1.262244
9,rnd_ol01/CETSP-200-1,"{'root': 'LongestEdgePlusFurthestCircle', 'branching': 'ChFarthestCircleSimplifying', 'search': 'CheapestBreadthFirst', 'rules': [], 'num_threads': 8}",443.70726,566.110603,120.08716,6477.0,126,21.621807,20.621807
