Skip to content

Latest commit

 

History

History
441 lines (335 loc) · 23.5 KB

TSP.adoc

File metadata and controls

441 lines (335 loc) · 23.5 KB

fcmaes - a Python 3 gradient-free optimization library

Join%20Chat

logo

Determine Robust Routes for the Noisy Traveling Salesman Problem

This tutorial

  • Shows how to implement the noisy TSP problem aiming at a robust route efficiently using numba.

  • Compares with an alternative implementation.

  • Compares different optimization algorithms applied to both implementations

  • Use optimization to evaluate machine learning based results for another TSP variant

There is a dramatic performance difference > factor 10000 when comparing optimization algorithms and objective function implementations. The purpose of this tutorial is to "sharpen your senses" regarding how to implement this kind of objective function and which optimization algorithm to choose when you selected Python as implementation language. Don’t underestimate the influence of your specific implementation of a problem, not only regarding evaluations/second but also regarding the number of evaluations required applying the same optimization algorithm.

As starting point we use TSP.py from the 'ExpensiveOptimBenchmark' suite from the Technical University of Delft. In Hospital we discuss four other problems from this problem suite, but only for this one we provide our own alternative implementation.

All results were produced using a 16 core / 32 thread AMD CPU 5950x. Some algorithms use only a single thread, others utilize the whole processor.

Motivation

"When a black-box optimization objective can only be evaluated with costly or noisy measurements, most standard optimization algorithms are unsuited to find the optimal solution. Specialized algorithms that deal with exactly this situation make use of surrogate models."

Such a method - specialized for integer-valued minima - is presented in the paper (IDONE) and applied to a noisy variant of an asymmetric traveling salesman problem (17 cities).

The statement "most standard optimization algorithms are unsuited" is probably true, but what about the other ones? We will identify "suited" standard optimization algorithms and prove that they work fine for TSP.py, the implementation of the objective function used in the paper. Additionally we can show that for an alternative implementation of the same noisy TSP problem (100 iterations, noise=1.0) noisy_tsp.py standard optimization algorithms can solve the noisy 17 cities problem BR17 in about 2 seconds utilizing the parallel threads modern CPUs provide.

The other problem shown in the paper is artificial and based on matrix multiplication using random matrices with objective function:

  • f(x) = (x − x^∗ )^T A(x − x^∗)

applied to binary input values. Here IDONE indeed uses far less function evaluations as all standard algorithms we tried. But what happens if instead dimension = 100 we use dimension = 5000 ? IDONE doesn’t scale very well, I wasn’t able to apply it successfully. A standard algorithm (BiteOpt) uses about 900.000 function evaluations and about 5 hours (single threaded), but at least I got the problem solved. You could argue: Real world problems are too expensive to apply their objective function so often, but on the other hand they usually cannot be solved so exceptionally well using IDONE.

The noisy Traveling Salesman Problem / TSP

The Traveling Salesman Problem/TSP asks the question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It appears as subproblem in many other areas, is very well studied and there are methods solving it for large numbers of cities.

But what if the distance between cities is not known exactly? This problem can be "simulated" by adding some random noise to the individual distances and determining a "robust" solution: We compute the path length n times and determine the "worst case", the longest path we got for all iterations each time adding different random noise to the individual distances. For all our experiments we use 100 iterations and a random noise between 0 and 1 added to all distances.

Solving this "noisy TSP" is more challenging, standard branch and bound algorithms like tsp_solver cannot be applied.

The original implementation

Lets first have a look at TSP.py, an implementation of an objective function computing the robust path length for continuous input values (mapped to discrete integers) determining the selected tour.

   def evaluate(self, x):
        robust_total_route_length = 0.0

        for iteration in range(self.n_iter):
            current = 0
            unvisited = list(range(1, self.d+2))
            total_route_length = 0.0

            for di, i in enumerate(x):
                next_up = unvisited.pop(int(round(i)))
                total_route_length += self.W[current, next_up]
                total_route_length += self.noise_rng.random() * self.noise_factor
                current = next_up

            last = unvisited.pop()
            total_route_length += self.W[current, last]
            total_route_length += self.noise_rng.random() * self.noise_factor
            total_route_length += self.W[last, 0]
            total_route_length += self.noise_rng.random() * self.noise_factor

            robust_total_route_length = max(total_route_length, robust_total_route_length)

        return robust_total_route_length

This implementation aims at minimizing the number of input variables: The number of cities N minus two. Since the tour is always a ring you can choose any fixed city as a starting point. When you have performed N-2 steps there is only one city left, you haven’t really a choice. next_up = unvisited.pop(int(round(i))) maps continuous input variables to integer values used for the selection. At least for continuous optimizers you should use a different approach as we will show below.

Optimization algorithms

At solvers we find a number of optimization algorithms to test. Many are not applicable for the TSP problem, so we choose the following selection for testing:

We don’t need a huge problem instance to drive these algorithms at and beyond their limits: br17 is a small TSP instance involving only 17 cities. According to ATSP-solutions the shortest path without noise has length 39.

original solvers comparison

Next lets have a look how fcmaes standard optimization algorithhms perform - also single threaded: We use the Python wrapper of the following three algorithms implemented using Eigen/C++:

fcmaes single thread original problem

The fcmaes CMA variant terminates fast, but provides inconsistent results.

fcmaes supports the parallel retry of optimization algorithms, lets see what happens if we apply 32 optimization runs in parallel using retry.py :

fcmaes 32 thread original problem

The original algorithms are clearly outperformed, even CMA-ES performs quite well when applied in a parallel retry scenario.

UPDATE: Mixed integer support

Both the Differential Evolution and multiobjective DE/NSGA (MODE) algorithm got an update:

  • Both algorithms now have specific mixed-integer support. If you tell the algo via a new boolean array ints parameter which are your discrete integer variables, convergence will be much faster. This works both for the Python and the C++ variants. ints = [True, True, False] for instance means that the first two variables are discrete. Using np.argsort(x) together with continuous variables for sequences as shown below is still a valid option. But this trick doesn’t work if the same discrete value can occur in different variables. The original problem is now solvable in about 10 seconds using multiple threads.

fcmaes DE mixed integer original problem

An alternative implementation of the objective function

The new implementation (noisy_tsp.py) uses numba to speed up the objective function evaluation quite significantly:

@njit(fastmath=True)
def evaluate_tsp(x, W, d, noise_factor, iter_num):
    robust_total_route_length = 0
    order = np.argsort(x) + 1
    for _ in range(iter_num):
        total_route_length = 0
        total_route_length += W[0, order[0]] + np.random.random() * noise_factor
        total_route_length += W[order[d-1], 0] + np.random.random() * noise_factor
        for i in range(d-1):
            total_route_length += W[order[i], order[i+1]] + np.random.random() * noise_factor
        robust_total_route_length = max(total_route_length, robust_total_route_length)
    return robust_total_route_length

This implementation uses np.argsort(x) to determine the order the cities are visited. The first city is fixed, so we have the number of cities N minus one argument variables x. This is one variable more, but it nevertheless works much better with continuous optimization algorithms. We used this idea also in Scheduling and JobShop.

Applying fcmaes standard optimization algorithms to the modified objective function

fcmaes 32 thread optimized problem

results in a solution time of about 2 seconds, even for CMA-ES its only about 5 seconds.

Here a table comparing the number of function evaluations per second for all algorithms and objective function variants. numba and the way the new implementation is designed speeds up the computation by about factor 100 thereby also improving convergence:

Table 1. Evaluations / second on CPU AMD 5950x
algorithm problem evals/sec used threads

idone

original

13

1

MSVRM

original

23

1

CMA

original

271

1

SA

original

335

1

BiteOpt

original

11800

32

fcmaes-CMA

original

11600

32

BiteOpt

numba based

1150000

32

fcmaes-CMA

numba based

1190000

32

Increasing the noise

What happens if we increase the noise? Another experiment uses the symmetrical TSP gr17. This time we configured noise_factor=50. According to TSP-solutions the shortest path without noise has length 2085.

original solvers comparison b

The CMA algorithm shown here is not the one from fcmaes, but the original one:

The CMA results are hard to see, reason is that we get a "termination on tolstagnation" after about 8000 function evaluations. It seems the default termination criteria don’t work for the TSP problem. As we will later see, this is not a problem with the CMA-ES algorithm itself.

CMA-ES is based on pycma from Nikolaus Hansen.

Here are the results applying fcmaes-CMA (not the original one) and BiteOpt, both utilizing all 32 threads provided by the processor:

BiteOpt fcmaes CMA original comparison

Finally we see a direct comparison of the different objective function implementations for the same optimization algorithm. Beside the speedup (evaluations/sec) we find better robust tours using both algorithms.

BiteOpt comparison
fcmaes CMA comparison

Conclusion

We have to be very careful when implementing an objective function representing a specific problem. Not always the implementation requiring the least number of variables wins. Use numba whenever possible for the time critical parts. BiteOpt or Differential Evolution, specially if used with parallel retry, are very good algorithm choices which should be tried early, if the problem is single objective and there are no constraints (which cannot be easily expressed using the weighted sum approach). Algorithms with huge overhead like IDONE and MSVRM should only be applied for very expensive objective functions. Noisy TSP can be evaluated nearly 1.2 million times / sec, so it definitely doesn’t fall into that category.

One more noisy TSP problem

Lets have a look at ai-for-tsp-competition which contains the Python code for a machine learning competition to solve another TSP variant called TD-OPSWTW. Results and the details of the TSP variant are described in BliekEtAl2021:

"The stochastic travel times between locations are only revealed as the salesman travels in the network. The salesman starts from a depot and must return to the depot at the end of the tour. Moreover, each node (customer) in the network is assigned a prize, representing how important it is to visit a given customer on a given tour. Each node has associated time windows. We consider that a salesman may arrive earlier at a node without compromising its prize, but the salesman must wait until the opening time to serve the customer. Lastly, the tour must not violate a total travel time budget while collecting prizes in the network. The goal is to collect the most prizes in the network while respecting the time windows and the total travel time of a tour allowed to the salesman."

We will focus on the first task described by the paper: A specific 65 node instance of this TSP has to be solved using machine learning techniques. Lets suppose we are the organizer of this competition and want to evaluate the applicability of machine learning for this problem by comparing the winner results with a pure optimization based approach.

We do (as with the first TSP problem above) the following

  • Apply numba to speed up the evaluation of a solution tour

  • Formulate an objective function which needs to be fast to evaluate thereby being accurate enough for the final test - which in this case is computing the average of 10000 noisy tour evaluations.

  • Apply the BiteOpt optimizer with parallel retry.

Speeding up the evaluation of a tour (numbafication)

The central routine evaluating a given tour is tour_check which we have to adapt:

from numba import njit

def tour_check(tour, x, time_matrix, maxT_pen, tw_pen, n_nodes):
    return tour_check_numba(np.array(tour, dtype=np.int32), np.array(x, dtype=np.float32),
                            np.array(time_matrix, dtype=np.int32), maxT_pen, tw_pen, n_nodes)

@njit(fastmath=True)
def tour_check_numba(tour, x, time_matrix, maxT_pen, tw_pen, n_nodes):

numba is very picky regarding argument array types, so we have to wrap it converting into float32 and int32 arrays which make numba happy. This minimal change causes a factor 8 speedup of the evaluation of TSP tours.

Objective function for TD-OPSWTW

The fitness object representing the objective function used for optimization gets an Env object representing a problem instance as argument.

  • Progress is monitored using mp.RawValue variables which share their values between processes.

  • We define the bounds for all continuous variables as [0,1] which in solution is converted into an integer vector representing the tour using the np.argsort sorting trick.

  • value computes the minimal or average score performing n calls to check_solution which gives a noisy tour evaluation. For small n the minimum is a more reliable value.

  • The objective function itself __call__ uses value for incremental n values to compute the final evaluation of the tour. This method represents a compromise between accuracy and performance. For low n penalized noisy values may be overlooked. For good candidates we have to increase n to get a reliable result which works with n=10000 used for the final test.

  • Note that the optimizer could find noisy "outliers" delivering worse results when called again. Therefore we finally use n=100000, because n=10000 inside the objective function didn’t deliver reliable results.

from scipy.optimize import Bounds
import time, math
import ctypes as ct
import multiprocessing as mp
from fcmaes.optimizer import logger, Bite_cpp, dtime
from fcmaes import retry

class fitness:

    def __init__(self, env):
        self.evals = mp.RawValue(ct.c_long, 0)  # writable across python processes
        self.best_y = mp.RawValue(ct.c_double, math.inf)
        self.t0 = time.perf_counter()
        self.env = env
        self.d = env.n_nodes

    def bounds(self):
        return Bounds(np.zeros(self.d), np.array([1]*self.d))

    def value_min(self, sol, n):
        val = math.inf
        for _ in range(n):
            _, rewards, pen, _ = self.env.check_solution(sol)
            val = min(val, rewards + pen)
        return val

    def value(self, sol, n):
        if n < 1000: # for small n take the minimum instead
            return self.value_min(sol, n)
        val = 0
        for _ in range(n):
            _, rewards, pen, _ = self.env.check_solution(sol)
            val += rewards + pen
        return val/n

    def solution(self, x): # disjoined all locations
        return [1] + [int(xi) for xi in (np.argsort(x) + 1)]

    def __call__(self, x):
        self.evals.value += 1
        sol = self.solution(x)
        n = 10
        while n <= 100000:
            y = -self.value(sol, n)
            if y >= self.best_y.value:
                return y
            n *= 10
        if y < self.best_y.value:
            self.best_y.value = y
            logger().info("evals = {0}: time = {1:.1f} y = {2:.5f} x= {3:s}"
                  .format(self.evals.value, dtime(self.t0), y, str(sol)))
        return y

    def optimize(self):
        self.bestY = 1E99
        self.bestX = []
        ret = retry.minimize(self, self.bounds(), optimizer=Bite_cpp(200000,M=16,stall_iterations=3), num_retries=32)
        sol = self.solution(ret.x)
        num = 10000
        logger().info("val" + str(num) + " = " + str(self.value(sol, num)))
        return sol

if __name__ == '__main__':

    env = Env(n_nodes=65, seed=6537855)
    sol = fitness(env).optimize()

A typical output is:

evals = 32: time = 12.7 y = 72.28609 x= [1, 28, 44, 36, 4, 17, 54, 40, 19, 45, 1, 14, 37, 12, 6, 11, 34, 13, 52, 38, 62, 60, 42, 5, 8, 61, 63, 35, 33, 65, 2, 30, 53, 27, 48, 56, 20, 26, 3, 25, 55, 10, 22, 59, 18, 58, 46, 57, 24, 29, 16, 49, 64, 47, 51, 50, 21, 31, 39, 32, 23, 43, 9, 41, 15, 7]
evals = 36: time = 12.7 y = 67.64000 x= [1, 34, 20, 18, 63, 1, 26, 43, 10, 36, 24, 44, 56, 62, 3, 17, 42, 27, 25, 41, 21, 64, 12, 54, 31, 22, 7, 9, 16, 28, 4, 60, 11, 45, 2, 23, 32, 39, 51, 8, 46, 29, 61, 65, 47, 35, 14, 40, 13, 52, 33, 55, 57, 48, 53, 38, 6, 19, 58, 15, 59, 50, 37, 5, 49, 30]
evals = 56: time = 12.7 y = 66.42925 x= [1, 36, 13, 56, 62, 17, 1, 55, 30, 46, 25, 53, 34, 10, 38, 27, 7, 9, 54, 64, 28, 22, 57, 23, 31, 2, 6, 18, 47, 58, 44, 39, 50, 43, 42, 8, 12, 65, 32, 37, 61, 5, 24, 21, 26, 14, 48, 11, 15, 63, 33, 45, 16, 52, 51, 59, 35, 41, 49, 20, 4, 19, 40, 3, 60, 29]
...
evals = 930879: time = 142.3 y = -5.00259 x= [1, 45, 5, 44, 47, 42, 2, 46, 9, 22, 7, 4, 24, 30, 40, 48, 1, 41, 39, 43, 49, 52, 51, 38, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 50, 37, 33, 35, 3, 6, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 23, 25, 26, 27, 28, 29, 31, 32, 64, 34, 36, 65, 21]
...
evals = 6060526: time = 474.5 y = -11.20464 x= [1, 32, 45, 55, 49, 47, 41, 5, 23, 57, 6, 16, 2, 42, 46, 60, 11, 33, 43, 64, 13, 19, 29, 9, 22, 65, 7, 35, 62, 63, 4, 24, 30, 40, 48, 1, 17, 56, 26, 3, 36, 27, 37, 10, 34, 8, 58, 61, 39, 12, 25, 59, 44, 52, 15, 14, 28, 51, 54, 20, 18, 31, 53, 50, 38, 21]
...
evals = 6186914: time = 492.6 y = -11.31701 x= [1, 32, 45, 55, 5, 49, 41, 47, 44, 23, 6, 57, 16, 33, 60, 46, 42, 2, 11, 64, 19, 43, 13, 29, 65, 9, 22, 35, 7, 62, 63, 4, 24, 30, 40, 48, 1, 21, 8, 61, 14, 50, 54, 17, 38, 59, 27, 28, 26, 36, 15, 58, 20, 56, 34, 37, 18, 12, 31, 51, 10, 25, 3, 53, 39, 52]
evals = 6232151: time = 501.8 y = -11.31998 x= [1, 32, 45, 55, 5, 49, 41, 47, 44, 23, 6, 57, 16, 33, 2, 60, 46, 42, 11, 64, 19, 43, 13, 29, 65, 9, 22, 35, 7, 62, 63, 4, 24, 30, 40, 48, 1, 21, 8, 61, 14, 50, 54, 17, 38, 59, 27, 28, 26, 36, 15, 58, 20, 56, 34, 37, 18, 12, 31, 51, 10, 25, 3, 53, 39, 52]
evals = 6260525: time = 518.0 y = -11.32000 x= [1, 32, 45, 55, 5, 49, 41, 47, 44, 23, 6, 57, 16, 33, 2, 60, 46, 42, 11, 64, 19, 43, 13, 29, 65, 9, 22, 35, 7, 62, 63, 4, 24, 30, 40, 48, 1, 21, 8, 61, 14, 50, 54, 17, 38, 59, 27, 28, 26, 36, 15, 56, 58, 20, 34, 37, 18, 12, 31, 51, 10, 25, 3, 53, 39, 52]

val10000 = 11.319876970495283

Conclusion

Both the machine learning and the optimization approach achieve almost the same result of 11.32. Optimization reaches 11.20 consistently after about 400-500 sec, but can need up to 1800 sec to reach 11.32 for the TD-OPSWTW instance from the competition using an AMD-5950x 16 core CPU.

Even for 10000 evaluations there is still some noise, so the final test is slightly worse although we use 100000 evaluations inside the objective function. It seems pure optimization is a valid alternative for TD-OPSWTW and other realistic TSP variants.

Compare yourself with the machine learning based approach of one of the winners: https://github.com/mustelideos/td-opswtw-competition-rl . Which solution do you prefer?