# Checking and describing the generated data

It is always beneficial to add a notebook that quickly looks into the data to help you remember, which data you collected and if it actually looks correct.

In [1]:
from algbench import describe, read_as_pandas, Benchmark
from _conf import EXPERIMENT_DATA

In [2]:
describe(EXPERIMENT_DATA)

An entry in the database can look like this:
_____________________________________________
 result:
| num_nodes: 100
| lower_bound: 87630783.0
| objective: 93096532.0
 timestamp: 2023-11-17T10:20:43.172028
 runtime: 90.37890768051147
 stdout: []
 stderr: []
 logging: [{'name': 'Evaluation', 'msg': 'Building model.', 'args': [], 'levelname': 'I...
 env_fingerprint: 7de002f2d293f6cb7c59ec1a6de2e660bb383ef9
 args_fingerprint: f68b86df4d4b794819939d1619123b57112504ee
 parameters:
| func: run_solver
| args:
|| instance_name: random_euclidean_100_0
|| time_limit: 90
|| strategy: CpSatTspSolverV1
|| opt_tol: 0.001
 argv: ['/ibr/home/krupke/anaconda3/envs/mo310/lib/python3.10/site-packages/slurmina...
 env:
| hostname: algra02
| python_version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
| python: /ibr/home/krupke/anaconda3/envs/mo310/bin/python3
| cwd: /misc/ibr/home/krupke/cpsat-primer/examples/tsp_evaluation
| git_revision: d56ffe5c790b54d038b0495f722e304dcbfa845e
| python_file: /ibr

In [3]:
t = read_as_pandas(
    EXPERIMENT_DATA,
    lambda entry: {
        "instance_name": entry["parameters"]["args"]["instance_name"],
        "num_nodes": entry["result"]["num_nodes"],
        "time_limit": entry["parameters"]["args"]["time_limit"],
        "strategy": entry["parameters"]["args"]["strategy"],
        "opt_tol": entry["parameters"]["args"]["opt_tol"],
        "runtime": entry["runtime"],
        "objective": entry["result"]["objective"],
        "lower_bound": entry["result"]["lower_bound"],
    },
)
t.drop_duplicates(inplace=True, subset=["instance_name", "num_nodes", "strategy"])
t["opt_gap"] = (t["objective"] - t["lower_bound"]) / t["lower_bound"]
t.sort_values(["num_nodes", "instance_name"])

Unnamed: 0,instance_name,num_nodes,time_limit,strategy,opt_tol,runtime,objective,lower_bound,opt_gap
435,random_euclidean_25_0,25,90,CpSatTspSolverV1,0.001,0.357365,9.636744e+07,9.635576e+07,0.000121
440,random_euclidean_25_0,25,90,CpSatTspSolverDantzig,0.001,1.572138,9.636744e+07,9.636744e+07,0.000000
445,random_euclidean_25_0,25,90,CpSatTspSolverMtz,0.001,9.163427,9.636744e+07,9.636744e+07,0.000000
1820,random_euclidean_25_0,25,90,GurobiTspSolver,0.001,0.585014,9.636744e+07,9.636744e+07,0.000000
1260,random_euclidean_25_1,25,90,CpSatTspSolverV1,0.001,0.077645,9.077355e+07,9.077355e+07,0.000000
...,...,...,...,...,...,...,...,...,...
1885,random_euclidean_500_8,500,90,GurobiTspSolver,0.001,92.770625,inf,6.760793e+07,inf
1020,random_euclidean_500_9,500,90,CpSatTspSolverV1,0.001,95.620484,1.430933e+10,6.014955e+07,236.895928
1025,random_euclidean_500_9,500,90,CpSatTspSolverDantzig,0.001,115.235823,1.596116e+10,6.842889e+07,232.251712
1030,random_euclidean_500_9,500,90,CpSatTspSolverMtz,0.001,125.114764,inf,0.000000e+00,inf


In [4]:
for entry in Benchmark(EXPERIMENT_DATA):
    if (
        entry["parameters"]["args"]["instance_name"] == "random_euclidean_100_1"
        and entry["parameters"]["args"]["strategy"] == "GurobiTspSolver"
    ):
        print("=====================================")
        stdout = "".join(e[1] for e in entry["stdout"])
        stderr = "".join(e[1] for e in entry["stderr"])
        print(stdout)
        print(stderr)
        if not stdout.strip():
            print("No stdout")
        print("=====================================")

Set parameter Username
Academic license - for non-commercial use only - expires 2024-11-16
Set parameter TimeLimit to value 90
Set parameter LazyConstraints to value 1
Set parameter MIPGap to value 0.001
Gurobi Optimizer version 10.0.3 build v10.0.3rc0 (linux64)

CPU model: AMD Ryzen 9 7900 12-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 12 physical cores, 24 logical processors, using up to 24 threads

Optimize a model with 100 rows, 4950 columns and 9900 nonzeros
Model fingerprint: 0x6d5410e4
Variable types: 0 continuous, 4950 integer (4950 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [9e+03, 1e+08]
  Bounds range     [1e+00, 1e+00]
  RHS range        [2e+00, 2e+00]
Presolve time: 0.00s
Presolved: 100 rows, 4950 columns, 9900 nonzeros
Variable types: 0 continuous, 4950 integer (4950 binary)

Root relaxation: objective 6.600813e+07, 105 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Obj

## Check for errors in the data

You always want to check if the results you got are actually feasible. Errors easily happen and are not always visible on the plots.
Thus, you want to do some basic checks to detect errors early on. For example, you could accidentally have swapped lower and upper bounds in the data generation process.
Depending on your plots, this may not be visible, and you may end up comparing the wrong data and draw the wrong conclusions.
Or, you could have accidentally swapped runtime and objective values, which could look reasonable in the data as the runtime and the objective often increase with the instance size.

A very basic check is to check if the best lower and upper bounds do not contradict each other. Many errors will be caught by this check. However, you often need some tolerance to account for numerical errors.

In [5]:
assert (t.dropna()["opt_gap"] >= -0.0001).all(), "Optimality gap is negative!"

In [6]:
# Always make sure that your results are not trivially wrong
#  - e.g. lower bound is higher than objective
max_lb = t.groupby(["instance_name"])["lower_bound"].max()
min_obj = t.groupby(["instance_name"])["objective"].min()
eps = 0.0001  # some tolerance is needed when working with floats.
bad_instances = max_lb[max_lb - min_obj > eps * max_lb].index.to_list()
from IPython.display import display

display(t[t["instance_name"].isin(bad_instances)])
assert len(bad_instances) == 0, "Bad instances detected: {}".format(bad_instances)

Unnamed: 0,instance_name,num_nodes,time_limit,strategy,opt_tol,runtime,objective,lower_bound,opt_gap
