fcmaes - a Python 3 gradient-free optimization library

Solving a complex scheduling problem using continuous optimization

Can a complex scheduling problem be solved efficiently using Python and continuous optimization? In a competitive environment battling against some of the smartest people and fastest computers on the planet?

Most people would say "no" because:

Python is slow, specially if the computation involves loops
Continuous optimization is hard to parallelize to utilize the many cores modern processors provide, scipy does not support this.
Standard algorithms provided by scipy don’t work well for discrete input arguments (continuous values converted into integers). Even more advanced algorithms like Differential Evolution don’t converge fast enough.

Applying a single threaded optimization algorithm we only get about 1.8 function evaluations per second for the objective function proposed below even for a modern AMD 5950 CPU.

But what if:

We could scale by factor 16x utilizing parallelization on a modern 16 core CPU (like the AMD 5950)?
We could speed up the evaluation of the objective function by factor 450x using Numba resulting in a 7200x speedup considering parallelization?
We could use continuous optimization algorithms much better suited to discrete input arguments?

Whether this is sufficient to beat specifically tuned genetic algorithms coded in C/C++ is an open question. In this tutorial we will show only that C++ doesn’t win easily.

Important concepts used

Indirect encoding

A decoder is required to express the parameters given by their encoding as a vector of continuous variables. The constraints associated with the optimization problem are handled by the decoder and will guarantee the validity of the solution. This concept is well known in the literature.

Indirect objective

Not the real objective is known by the optimizer, but a specific objective (or objectives if a multi-objective optimizer is used) designed specifically to guide the optimization process. This concept is best known in relation to the "weighted sum approach" to handle constraints. The idea to apply multi-objective optimization to single objective problems is not very common. Probably because not many problems have been found were it works. The problem shown here is one of these rare instances.

Update

After applying a similar approach to the more general multi-objective flexible jobshop problem, see JobShop, we observed an unexpected result: The new multi-objective algorithm MoDe produced excellent single objective results. Therefore we extended this tutorial by applying this idea also here. The results were even more surprising: score 7203 with 120 trajectories to choose from.

GTOC 11

GTOC is an international competition related to global trajectory optimization. The 11th competition GTOC11 ended in November 2021. From the 92 participating teams only a small fraction were able to upload a valid solution in time:

This tutorial is targeted at the other teams, showing how the "hard part" of the problem can be handled easily and efficiently. The full problem description can be found here: problem statement.

The task initiates the construction of a Dyson sphere by building a ring of 12 stations equally distributed around a circular orbit around the sun. Whole asteroids are used as building material, transferred in one piece to the sphere. Ten motherships start from earth, each visiting an asteroid sequence freely chosen from a huge set of 83453 asteroids leaving there an attenuator which uses the asteroid mass to generate constant continuous thrust transferring the asteroid to one of the the 12 Dyson stations.

Only one station can be built at a given time, and we have to wait 90 days until we can build the next one. Dyson sphere station building and the start times of the motherships are restricted to a 20 year time window (video)

Propellant (represented by the magnitude of the mothership delta velocity impulses) is not limited but reduces the competition score. After being visited by a mothership these asteroids have to be transferred to the Dyson sphere stations using continous thrust maneuvers with a fixed magnitude of acceleration. Each asteroid has a specific mass, which is partly used to produce the energy for the transfer to the station. Its mass on arrival is determined by its original mass and its transfer time. Aim is to maximize and balance the mass transferred to all stations. The competition score is dependent on the remaining asteroid mass arriving at the station receiving the least mass - which needs to be maximized - and the sum of the delta velocity (propellant consumption) of the motherships - which needs to be minimized.

Most parts of the task are quite easy for astro physicists:

Selection of a suitable Dyson sphere
Estimation of the maximal asteroid arrival mass at the Dyson sphere - independent from phasing.
Estimation of the asteroid arrival mass at the Dyson sphere for a given station and transfer timing.
Computation of good mother ship trajectories visiting asteroids with high estimated remaining mass at the Dyson sphere.

The difficult part is to select order and arrival time windows for the Dyson stations (which station at the ring to build when), ten mothership trajectories selected from a larger set of good search results, and finally the corresponding asteroid transfers so that the score is maximized.

Additional aspects:

Inclination and size of the Dyson ring
Initial phasing of the Dyson ring stations

will not be part of the optimization process. Only by excluding these are we able to generate a table of good mothership trajectories together with a precomputed set of optimal asteroid transfers to avoid the related costly computations during optimization. This means, the complete table has to be regenerated for different Dyson rings, so it should be carefully designed.

For this tutorial we view this table as "given" and focus on the optimization part. The used table was produced by "Team Jena" ranked only 16th at the competition, so it is far from optimal. We used both Tsinhuas and ESAs submitted Dyson spheres to enable an evaluation of the method by comparing it to the results achieved by these teams. The same objective function is used for all results. Below we will analyze in detail the impact of the following three factors on the result:

Selection of the optimization algorithm.
Usage of Numba.
Utilization of parallel multi threaded optimization.

Asteroid transfer table

For each asteroid visited by any mothership trajectory we get a set of asteroids together with the arrival times of the motherships which determine the time window available for the asteroid transfers. For each sphere revolution, for each sphere station and for each asteroid start time window we can determine the optimal asteroid transfer minimizing time of flight for the given fixed continuous thrust. This results in a table where each rows lists

asteroid id
trajectory index
station number
asteroid mass
delta velocity to reach the asteroid from the previous one
start time and
transfer time

The remaining asteroid mass when arriving at the sphere station can be derived from the asteroid mass and the transfer time. The table used for this tutorial consists of about 163.000 asteroid transfer opportunities for the 50 best trajectories found.

For a valid selection of rows representing 10 mothership trajectories and one transfer for each asteroid visited we can compute the estimated score using the sum of the delta velocities for each trajectory and the sum of the remaining asteroid masses arriving at each Dyson station.

The asteroid transfer data is given as compressed csv-file read as pandas data frame.

DataFrame:
         asteroid   station  trajectory  ...   dv       transfer_start  transfer_time
0          73418        1          31  ...  3.500261        2.816957       2.891577
1          73418        2          31  ...  3.500261        2.738180       2.852998
2          73418        3          31  ...  3.500261        2.599462       2.843404
3          73418        4          31  ...  3.500261        2.464159       2.827190
4          73418        5          31  ...  3.500261        2.335112       2.807543
...          ...      ...         ...  ...       ...             ...            ...
162611     68957        5          29  ...  0.403024       16.836162       2.456525
162612     68957        6          29  ...  0.403024       16.676695       2.486187
162613     68957        7          29  ...  0.403024       16.511622       2.556778
162614     68957        8          29  ...  0.403024       16.376856       2.644360
162615     68957       12          29  ...  0.403024       16.918...

[162615 rows x 7 columns]

You may replace this with your own table to see which score the algorithm computes for your solution.

Implementation

The complete implementation may be found at scheduling.py. It is extensively commented so it should be easy to adapt the method to your specific scheduling problem. The code was tested on Linux using the Anaconda python environment. On Windows if possible use the "Linux subsystem for Windows" since python multithreading has issues there. Don’t forget to do pip install fcmaes --upgrade to install the newest fcmaes version.

Design of the argument vector

The simplicity of the proposed method results from the fact that the only thing we have to do is to implement an efficient objective function computing the estimated score for its argument vector:

10 trajectory indices selecting the trajectories representing the 10 motherships.
12 station indices determining the order the stations are built.
11 values representing the limits of the build time slots. Will be sorted and multiplied by 20 years, the mission time.

To simplify the comparison of results we use ESAs Dyson sphere with a = 1.32AU, but we had to perform our own search and approximation of possible asteroid transfers. Selecting only the best 10 trajectories and thereby disabling the trajectory selection during optimization - as ESA did during the competition - reproduces almost exactly ESAs competition score and mass distribution, indicating that our optimization is not inferior even with fixed trajectories. Letting the optimization selecting the 10 trajectories to use improves the score by about 400 - still not reaching Tsinhua/winner-territory. You probably need a smaller Dyson sphere to improve further.

The bounds are chosen to avoid rounding errors during conversion of the continuous argument values into integer indices.

We use a special parallelization algorithm performing smart boundary management (SBM) returning the best argument vectors and function values. This works best in combination with a DE→CMA optimization sequence. Using the log we can monitor the progress of the optimization.

    transfers = pd.read_csv('data/' + name + '.xz', sep=' ', usecols=[1,2,3,4,5,6,7], compression='xz',
                    names=['asteroid', 'station', 'trajectory', 'mass', 'dv', 'transfer_start', 'transfer_time'])
     ...
    # bounds for the objective function
    dim = 10+2*STATION_NUM-1
    lower_bound = np.zeros(dim)
    # lower_bound[10+STATION_NUM:dim] = 0.00001
    upper_bound = np.zeros(dim)
    lower_bound[:] = 0.0000001
    upper_bound[10:] = 0.9999999
    upper_bound[:10] = TRAJECTORY_NUM-0.00001 # trajectory indices
    bounds = Bounds(lower_bound, upper_bound)

    # smart boundary management (SMB) with DE->CMA
    store = advretry.Store(fitness(transfers), bounds, num_retries=10000, max_eval_fac=5.0, logger=logger())
    advretry.retry(store, de_cma(10000).minimize)

Alternatively the fcmaes parallel retry mechanism can be used with BiteOpt or other optimization algorithms.

    store = retry.Store(fitness(transfers), bounds, logger=logger())
    # apply BiteOpt algorithm in parallel
    retry.retry(store, Bite_cpp(1000000, M=6).minimize, num_retries=320)

Smart boundary management leads to a better final score but can be slower if wrongly configured for this task.

Design of the objective function

We implement the objective function as Python function class, call defines the function itself and the function object maintains its context - all columns of the data frame stored as numpy arrays.

class fitness(object):

    def __init__(self, transfers):
        ...
        self.asteroid = transfers["asteroid"].to_numpy()
        self.station = transfers["station"].to_numpy()
        self.trajectory = transfers["trajectory"].to_numpy()
        self.transfer_start = transfers["transfer_start"].to_numpy()
        self.transfer_time = transfers["transfer_time"].to_numpy()
        self.mass = transfers["mass"].to_numpy()
        self.dv = transfers["dv"].to_numpy()

We precompute and store the sum of the delta velocities for all trajectories

        self.trajectory_dv = trajectory_dv(self.asteroid, self.trajectory, self.dv)

The objective function calls function select which selects the asteroid transfers corresponding to the argument vector and computes the mass used together with the accumulated delta velocities of the selected branches to determine the score:

   def __call__(self, x):
        # determine the minimal station mass
        min_mass, slot_mass = select(self.asteroid, self.station, self.trajectory, self.mass,
                        self.transfer_start, self.transfer_time, x)
        sdv = select_dvs(self.trajectory_dv, x)
        return -score(min_mass, sdv)

We check for each row:

If the trajectory given is selected and the correct station is targeted. We use the arrival time to determine the time slot and the currently associated station number.
If the asteroid was visited before (two trajectories both may contain the same asteroid), we greedily choose the transfer with the larger remaining asteroid mass.
Then we add up the masses transferred to each station.

Execution time for select is dramatically reduced by Numba, a JiT compiler accelerating functions annotated by @njit. select uses nested loops, something you usually avoid in Python - but not when using Numba.

@njit(fastmath=True)
def select(asteroid, station, trajectory, mass, transfer_start, transfer_time, x):
    ...
   for i in range(asteroid.size):
        tr = int(trajectory[i]) # current trajectory
        if trajectories[tr] == 0: # trajectory not selected
            continue
        ast_id = int(asteroid[i]) # asteroid transferred
        stat = int(station[i]) # dyson sphere station targeted
        m = mass[i] # estimated asteroid mass at arrival time
        time_of_flight = transfer_time[i] # TOF of asteroid transfer
        arrival_time = transfer_start[i] + transfer_time[i] # arrival time of asteroid transfer
        # which station time slot ?
        for slot in range(STATION_NUM):
            max_time = times[slot+1] # time interval of time slot
            slot_time = times[slot]
            min_time = slot_time + WAIT_TIME # we have to wait 90 days
            if min_time >= MAX_TIME:
                continue
            if arrival_time >= slot_time and arrival_time <= max_time: # inside time slot
                if stat == stations[slot]: # does the station fit?
                    tof = time_of_flight
                    #if we have to fly a non optimal transfer, arrival mass is reduced
                    if arrival_time < min_time: # 90 DAYS are not yet over
                        to_add = min_time - arrival_time # add time difference
                        to_add *= math.sqrt(1 + to_add/WAIT_TIME) # add some more time to enable transfer
                        tof += to_add
                    mval = (1.0 - YEAR*tof*ALPHA) * m # estimated asteroid mass at arrival time
                    if ast_val[ast_id] > 0: # asteroid already transferred
                        old_slot = int(ast_slot[ast_id])
                        min_mass = np.amin(slot_mass) # greedily replace if current mass is higher
                        old_mass = slot_mass[old_slot] # but never replace at a nearly minimal slot
                        if (old_slot == slot or min_mass < 0.99*old_mass) and ast_val[ast_id] < mval:
                            # replace with actual transfer, remove old asteroid mass
                            slot_mass[old_slot] -= ast_val[ast_id]
                        else: # keep old transfer, don't use the new one
                            mval = 0
                    if mval > 0:  # register actual transfer
                        slot_mass[slot] += mval
                        ast_val[ast_id] = mval
                        ast_slot[ast_id] = slot
    ...

Instead of only using the minimal mass we also use the other station masses applying a weight degrading exponentially with the station mass rank. This way we "help" the optimization algorithm in case the minimal mass is 0 - which is always the case at the beginning. But the score shown in the output is computed using the minimal score.

    slot_mass.sort()
    min_mass = slot_mass[0]
    f = 1.0
    for m in slot_mass:
        min_mass += f*m
        f *= 0.5
    return min_mass, slot_mass

Both the station order and the selected branches need to be disjoined. To achieve this for the branches we use an utility function converting a continuous input vector s into a disjoined integer vector:

@njit(fastmath=True)
def next_free(used, p):
    while used[p]:
        p = (p + 1) % used.size
    used[p] = True
    return p

@njit(fastmath=True)
def disjoined(s, n):
    disjoined_s = np.zeros(s.size, dtype=numba.int32)
    used = np.zeros(n, dtype=numba.boolean)
    for i in range(s.size):
        disjoined_s[i] = next_free(used, s[i])
    return disjoined_s, used

For the station order we use numpy.argsort :

@njit(fastmath=True)
def dyson_stations(x, n):
    stations = np.argsort(x[10:10+n])
    # station numbers start with 1
    return np.array([s+1 for s in stations])

Hints

Finally some hints for those struggling with the "easy" parts of the task:

Selection of a suitable Dyson sphere

Determine the average inclination of all "heavy" asteroids with limited eccentricity
Choose Dyson spheres with this inclination with different semimajor axis a with 1.0AU < a < 1.6AU.
Estimate for all "heavy" asteroids the maximal arrival mass at the Dyson sphere and choose the sphere with the largest average arrival mass divided by a, since the score is proportional to the minimal station mass divided by a.

The Dyson sphere may later be fine tuned when good mother ship trajectories are computed. These trajectories can be reevaluated for different Dyson spheres thereby maximizing the resulting estimated score.

Finding good mothership trajectories

This can be done using beam search and a branch selection criteria based on time / delta velocity / estimated maximal remaining asteroid mass using different weights maximizing diversity of the computed trajectories. Using a small search breadth (about 1000 branches) and performing a huge number of searches (> 100000) improves the chance to find many good trajectories with disjoined asteroids visited. Use a pre-computed grid of asteroid positions and linear approximation to speed up the expansion of trajectories during search.

What you should not do:

Use the same set of asteroids visited first at depth 1 of the search for all runs. Use a random selection instead.
Favor a specific weight for the sum of the delta velocity for all search runs. Instead let the search decide (by using random weights).
Exclude a big fraction of asteroids for the search. I excluded only the worst 23000 because I observed worse results otherwise.

Estimation of the arrival mass

You could learn from Tsinhua (the winners) or ESA, who developed a machine learning approach estimating the final mass more reliably. A simple and fast approach is to divide the transfer into n equally sized time windows / segments and replace the continuous thrust by n impulses (also called deep space maneuvers - DSMs) at the center of these segments, a method called "Sims-Flanagan". With n the accuracy of the method increases, but also the effort to find an optimal sequence. With four segments we only have 8 input variables, 6 for the two 3-dimensional impulses dv_1 and dv_4, plus 2 for start and arrival time. dv_2 and dv_3 can be derived using the Lambert algorithm. We have to limit the impulses according to our fixed continuous thrust constraint and optimize for minimal time of flight. Differential evolution with retry using different random initial values works very well here.

You could either compute specific transfers to each of the stations for different time windows to create the input table for the scheduling algorithm. Or take the station as an additional discrete argument without time limits to compute an estimated maximal remaining asteroid mass at the station. This is useful for the evaluation of trajectories generated during search.

Optimal delta velocity

During the conference following the GTOC11 competition Dario Izzo from the ESA&friends team asked the question about which delta velocity you should aim for your trajectories.

Here are the delta velocities of about 20.000 good trajectories listed. Each represents the best trajectory determined by a search-breadth=1000 run using random roots and random weights. We see that the potential mass collected increases with delta velocity, but the corresponding score does not. We compute the score by taking the sum of the trajectory asteroid masses as minimal mass and the sum of the delta velocities of the trajectory for all ten motherships. This somehow represents an upper limit of the score a scheduling algorithm can achieve.

QD update

Recently QD-algorithms were added to fcmaes. They provide excellent means to answer the question from Dario Izzo about the optimal delta velocity. First we have to define a QD-fitness function which returns beside the score a "behavior vector" d used to increase diversity. In this case we simply can choose values representing minimal mass and delta velocity.

   def qd_fun(self, x): # quality diversity
        _, slot_mass = select(self.asteroid, self.station, self.trajectory, self.mass,
                        self.transfer_start, self.transfer_time, x)
        sdv = select_dvs(self.trajectory_dv, x)
        _, dv_val = score_vals(np.amin(slot_mass), sdv)
        sc = score(np.amin(slot_mass), sdv)
        y = -sc
        self.evals.value += 1
        d = np.array([np.amin(slot_mass)*1E-15, dv_val])
        return y, d

Now we apply the fcmaes diversifier QD meta- algorithm and plot the resulting QD-archive representing a diverse set of solutions associated to different minimal mass and delta velocity values:

def nd_optimize():
    problem = get_fitness()
    problem.qd_dim = 2
    problem.qd_bounds = Bounds([1.0,15],[2.2,24])
    niche_num = 10000
    name = "scheduler_nd"
    opt_params0 = {'solver':'elites', 'popsize':200}
    opt_params1 = {'solver':'BITE_CPP', 'max_evals':1000000, 'stall_criterion':3}
    archive = diversifier.minimize(
         mapelites.wrapper(problem.qd_fun, 2, interval=100000, save_interval=200000000),
         problem.bounds, problem.qd_bounds,
         workers = 32, opt_params=[opt_params0, opt_params1],
         max_evals=100000000, archive = None,
         niche_num = niche_num, samples_per_niche = 20)
    print('final archive:', archive.info())
    archive.save(name)

We configure the algorithm to use CVT-Map-Elites in parallel with a BiteOpt improvement emitter resulting in:

The quality of the solutions is indicated by its color. We see no dark red solutions - near and above score 7000 - below delta velocity values of 19. It seems that some delta velocity is required to increase the minimal asteroid mass arriving at the dyson station.

The complete code is at scheduling.py.

How to use this algorithm to produce a GTOC11 solution

The scheduling algorithm presented here is only one step required to produce a valid GTOC11 solution. It converts a huge set of approximated asteroid transfers associated to a number of good mothership trajectories into a selection of transfers associated to only 10 trajectories thereby maximizing the approximated score. With the table given in the example it converts 321264 transfers into a selection of about 390 transfers.

Converting an approximated transfer into a real continuous thrust transfer accepted by the GTOC11 validation costs a lot of CPU resources. The scheduling algorithm can be used to minimize this effort by reducing the number of transfers it needs to be applied to:

Apply the scheduling algorithm for a limited time to determine which transfers need to be converted.
After conversion adapt the transfer timing in the table accordingly.
Repeat this process until no time slot violations occur.

Of course in later iterations only transfers which are still approximated need to be converted.

As a last step we have to exploit the fact that only the station with the least mass counts for the score. Probably some transfers can be eliminated without reducing the minimal mass - but potentially lowering the sum of the delta velocities needed for the related mothership trajectory.

Update

After applying a similar approach to the more general multi-objective flexible jobshop problem, see JobShop, we observed an unexpected result: The new multi-objective algorithm MoDe produced excellent single objective results. GTOC11 is basically also a multi-objective problem: There are two objectives:

Minimize the delta velocities of the mothership trajectories.
Maximize the minimal Dyson station mass.

The competition score weights these using a non-linear formula. As a participant of the competition you don’t really need a pareto-front, the solution of the multi-objective problem. But what if a multi-objective algorithm provides a better single-objective result? And since the result of the optimization is a pareto front, we find a number of alternative good scoring solutions. These results are still approximations, so it is better to have alternatives for the final phase where we convert these into continuous thrust trajectories.

To check this idea we first need a multi-objective variant of the fitness function - we just add a method fun to our fitness class:

@njit(fastmath=True)
class fitness(object): # the objective function
...
    def fun(self, x):
        min_mass, slot_mass = select(self.asteroid, self.station, self.trajectory, self.mass,
                        self.transfer_start, self.transfer_time, x)
        sdv = select_dvs(self.trajectory_dv, x)
        scr, dv_val = score_vals(np.amin(slot_mass), sdv)
        y = -scr
        ys = [-min_mass*1E-10, dv_val]
        ...
        if y < self.best_y.value:
               self.best_y.value = y
    return ys

We register improvements of our single objective score value, but "hide" this fact from the optimization algorithm - which "thinks" we are after the two objectives [-min_mass*1E-10, dv_val].

As multi-objective optimizer we use MoDe implemented in C++. We call it using the parallel retry mechanism:

    # multi objective optimization 'modecpp' multi threaded, NSGA-II population update
    xs, front = modecpp.retry(fit.fun, fit.nobj, fit.ncon, fit.bounds, num_retries=640, popsize = 96,
                max_evaluations = 3000000, nsga_update = True, logger = logger(), workers=16)

MoDe provides two population update mechanisms: One derived from NSGA-II and one from DE. Note that this algorithm uses (if configured) the NSGA-II population update, but differs in other aspects significantly from NSGA-II. There is no tournament selection and MoDe can handle constraints. Different from the flexible job shop problem this time the DE population update seems to be a good alternative. The NSGA-II population update improves faster, but finally the DE population update often delivers better results. With DE population update you can declare the first 10 variables (the trajectory selection) as integer variables using the ints parameter:

    # mixed integer multi objective optimization 'modecpp' multi threaded, DE population update
    fit.bounds.lb[:10] = 0
    fit.bounds.ub[:10] = TRAJECTORY_NUM-1 # integer variables include upper bound
    xs, front = modecpp.retry(fit.fun, fit.nobj, fit.ncon, fit.bounds, num_retries=640, popsize = 128,
                max_evaluations = 10000000, nsga_update = False, logger = logger(), workers=16,
                ints=[True]*10+[False]*(dim-10))

Increasing both evaluation budged and population size, DE population update after one hour found a score of 7197 using 60 trajectories:

evals = 56562254: time = 3604.9 s = 7197 a = 385 t = [7.28, 8.75, 10.33, 11.12, 12.42, 13.59, 14.74, 15.9, 16.68, 17.86, 19.08] s = [12, 1, 2, 3, 4, 5, 6, 9, 8, 11, 7, 10] b = [19, 50, 4, 43, 36, 45, 54, 29, 55, 49] m = [1.67, 1.68, 1.68, 1.68, 1.68, 1.68, 1.68, 1.69, 1.71, 1.73, 1.74, 1.84] dv = [21.84, 20.04, 19.74, 19.5, 18.26, 19.12, 16.53, 19.9, 16.09, 22.18]

A score of 7329 was reached using 120 trajectories to choose from:

evals = 39043304: time = 3067.0 s = 7329 a = 385 t = [7.15, 8.84, 10.14, 11.46, 12.47, 13.74, 14.72, 15.66, 16.6, 17.91, 19.12] s = [9, 1, 11, 10, 5, 8, 2, 4, 3, 12, 6, 7] b = [104, 70, 55, 117, 12, 66, 89, 106, 54, 34] m = [1.65, 1.65, 1.66, 1.66, 1.66, 1.66, 1.66, 1.67, 1.67, 1.69, 1.69, 1.69] dv = [15.77, 18.33, 16.09, 18.54, 20.66, 18.51, 19.62, 16.61, 16.53, 21.77]

You may customize the termination criteria by providing a is_terminate callback:

    xs, front = modecpp.retry(fit.fun, fit.nobj, fit.ncon, fit.bounds, num_retries=640, popsize = 96,
                  max_evaluations = 3000000, nsga_update = True, logger = logger(),
                  is_terminate = is_terminate(), workers=16)
   ...

   class is_terminate(object):

    def __init__(self):
        self.count = 0
        self.score = 0

    def __call__(self, x, y):
        self.count += 1
        score = - y[0] / (A_DYSON * A_DYSON * y[1])
        if self.score < score:
            self.score = score
        if self.count == 300000 and self.score < 20500:
            logger().info("fail: " + str(self.score))
            self.score = 0
            self.count = 0
            return True
        else:
            return False # don't terminate optimization

Here we check if the "virtual" score reached "20500" after 300000 function calls (at this thread). Note that the first component of the objective function result vector is a score based on a mass value considering not only the minimal mass to support the optimization process.

Detailed results

For the results below we mainly used a transfer table based on Tsinhuas 1.1 AU dyson sphere.

Number of trajectories offered for selection

During the competition many teams didn’t integrate trajectory selection in their asteroid transfer / station selection optimization. Instead they used a fixed set of 10 good trajectories and repeated this process for different sets. Is this a good idea? If you for instance found about 60 promising trajectory candidates you end up with 75394027566 possible combinations (n choose k calculator). Lets check what we get with different sets, preselecting the best 10, 60 and 120 trajectories from a set of 3000 good trajectories we selected from the results of 20000 beamsearch runs. Using 60 trajectories we get a score about 7000:

Restricting to the best 10 reduces the score to about 6200:

Using 120 trajectories we can improve to a score about 7100:

These results show that:

Integration of trajectory selection into the objective function used for optimization works very well.
The additional 10 variables increase the dimensionality of the problem, but this doesn’t lead to a significant slowdown of the optimization process.

Note that we haven’t included the 120 trajectories transfer file because of its size, but you may reproduce the first two results.

Update

The relation using the multiobjective modecpp optimizer is similar: score 7067 for 60 trajectories, score 7201 for 120 trajectories.

Size of the Dyson sphere

The average Dyson sphere size chosen by the teams was a = 1.2 AU. The two top teams, Tsinhua and ESA choose opposite extremes regarding the size, Tsinhua opted for a = 1.1 AU, ESA for a = 1.32 AU. We used Tsinhuas sphere to compute our transfer table for our search results, what happens if we choose ESAs bigger sphere instead? Using exactly the same method and optimization approach? The result is surprising:

We end up at a score about 6600 which is only 500 less compared to the Tsinhua sphere. The real competition results differ about 2200. May be the longer transfers enable more improvement potential if you try really hard to minimize time of flight. We didn’t adapt our search to the different spheres, only the trajectory ordering was dependent on the transfers found.

Comparison of algorithms for scheduling

The results below can be summarized as follows:

Continuous optimization works for mixed integer problems (like scheduling).
The objective function speedup achieved by Numba (450x) is essential to get reasonable results.
Parallel optimization / multi threading speeds up the optimization significantly.
The best method is Smart Boundary Management + DE→CMA. SBM was originally developed for GTOP (see smart-retry to check its performance for these problems). It performs surprisingly well for GTOC11 scheduling, although with adapted parameter setting. Like PAGMOs archipelagos SBM utilizes "communication" between the threads and therefore can not be executed single threaded.
The best single threaded algorithms are BiteOpt and the fcmaes implementation of CMA-ES.
The best scipy algorithm is Dual Annealing
Implementation of an objective function in Python using continuous input variables is surprisingly straightforward.
pandas helps to simplify pre-processing of the input data given as CSV-file.

1 thread, no NUMA

The following results were obtained optimizing single threaded after "commenting out" all '@njit' annotoations in the code to disable Numba resulting in a dramatic loss of performance (factor 450x).

scipy minimize should be avoided for scheduling problems:

scipy differential_evolution is better, but still not sufficient:

fcmaes GCL DE, another differential evolution variant.

scipy dual_annealing can be used, but is slow:

The following algorithms all perform quite well given the small amount of function evaluations performed without enabling Numba:

fcmaes differential evolution

CMA-ES

BiteOpt

1 thread, using NUMA

By enabling Numa things improve significantly. But it is interesting to see that algorithms which don’t "fit" to the problem cannot utilize the 450x speedup. And even the best ones need about one hour to find a good solution.

scipy minimize

scipy differential_evolution

scipy dual_annealing

fcmaes differential evolution

fcmaes GCL DE

CMA-ES

BiteOpt

32 threads, using NUMA

fcmaes has a "parallel retry" mechanism which enables the execution of many optimization retries in parallel. We used 32 parallel threads on the 16 core AMD 5950x CPU. The speedup obtained by parallelization is between 8x and 18x depending on the used algorithm. Now we get very good results using BiteOpt, CMA-ES and scipy Dual Annealing.

scipy minimize

scipy dual_annealing

fcmaes differential evolution

CMA-ES

BiteOpt

32 threads, Smart Boundary Management using NUMA

An algorithm suited well for this problem is the fcmaes smart boundary management parallel meta algorithm. We applied it to its default base algorithm - a DE→CMA-ES sequence, to BiteOpt and to CMA-ES alone. BiteOpt fits not very well here, both other base algorithms perform better. Only a few seconds are required for a good result, the optimium is found after about 1 hour.

fcmaes SBM applied to DE→CMA-ES

fcmaes SBM applied to BiteOpt

fcmaes SBM applied to CMA-ES

16 threads, Multi-objective optimization (modecpp) using NUMA

The algorithm - surprisingly - suited best for this problem is the fcmaes modecpp multi-objective optimization used to optimize the single objective "score". Our real objective was "hidden" from modecpp, but searching for the two objectives "maximal mass" and "minimal dv value" seems to guide the optimization in the right direction.

modecpp parallel retry using DE population update (score 7067):

modecpp parallel retry using NSGA-II population update (score 7045):

modecpp parallel retry using DE population update, 120 trajectories (score 7203):

What about the competition ?

There are not many open source optimization libraries out there matching the functionality of fcmaes regarding support for parallelization, multiple objectives, constraints and mixed integer. pymoo fulfills these criteria, has excellent documentation and is quite easy to use. The only thing which is a bit tricky here is that numba doesn’t support mixed integer arrays. We have to use x.astype(float) to convert the decision variables into a float array.

def check_pymoo(dim, fit, lb, ub, is_MO):
    ...
    if is_MO:
        lb[:10] = 0
        ub[:10] = TRAJECTORY_NUM-1 # integer variables include upper bound

    class MyProblem(ElementwiseProblem):

        def __init__(self, **kwargs):
            super().__init__(n_var=dim,
                             n_obj=2,
                             n_constr=0,
                             xl=np.array(lb),
                             xu=np.array(ub), **kwargs)

        def _evaluate(self, x, out, *args, **kwargs):
            if is_MO:
                out["F"] = fit.fun(x.astype(float)) # numba requires all floats
            else:
                out["F"] = fit(x.astype(float))  #  fit returns the score

    pool = ThreadPool(16)
    problem = MyProblem(runner=pool.starmap, func_eval=starmap_parallelized_eval)

    mask = ["int"]*10+["real"]*(dim-10)

    sampling = MixedVariableSampling(mask, {
        "real": get_sampling("real_random"),
        "int": get_sampling("int_random")
    })

    crossover = MixedVariableCrossover(mask, {
        "real": get_crossover("real_sbx", prob=0.9, eta=15),
        "int": get_crossover("int_sbx", prob=0.9, eta=15)
    })

    mutation = MixedVariableMutation(mask, {
        "real": get_mutation("real_pm", eta=20),
        "int": get_mutation("int_pm", eta=20)
    })

    if is_MO:
        algorithm = NSGA2(
            pop_size=256,
            n_offsprings=10,
            sampling=sampling,
            crossover=crossover,
            mutation=mutation,
            eliminate_duplicates=True
        )
    else:
        algorithm = DE(
            pop_size=100,
            variant="DE/rand/1/bin",
            CR=0.3,
            dither="vector",
        )

    from pymoo.optimize import minimize

    res = minimize(problem,
                   algorithm,
                   get_termination("n_gen", 500000),
                   verbose=False)

You find reasonable results quite fast, but I never found a score > 7000 using pymoo. The number of evaluations per second is much lower than using fcmaes, partly because you only can parallelize function evaluations, not whole optimization runs. As with fcmaes, the results using MOO (NSGA2) surpass what you find using single objective optimization (DE). Unfortunately, specifying MixedVariableSampling for DE leads to an error. As does pool = multiprocessing.Pool(16).

Conclusion

There are four main conclusions which can be derived from the results:

Don’t be afraid to add decision variables to your objective function. Almost all GTOC11 participants chose the alternative path: Solving the problem incrementally step by step. Be aware that this way you restrict what the optimizer can do. Adding decision variables makes the job harder for the optimizer, but it offers new opportunities. Don’t underestimate what an optimizer can do.
If you want to optimize a single objective - for a competition like GTOC11 it is the score - which is derived from different multiple objectives, also try a multi-objective optimizer. This way you may get more diverse results, and as shown here, you may even improve the single objective / score. This result could be confirmed using both fcmaes and pymoo.
Python is a language suitable for expensive objective functions - but you have to use numba where it fits. Only numpy based matrix operations can be used without. As soon as loops are involved, numba is mandatory to be competitive with C++.
Try to utilize all cores / threads your CPU provides. GTOC11 participants often used GPUs because of their huge number of cores. This can be done to create the database / table used as basis of the optimization. For the optimization itself we are currently restricted to CPU cores. There we have two options: Parallel objective function evaluation and parallel optimization runs. The second option usually scales better and proved to be advantageous for GTOC11.

The suitability of an optimization algorithm used for a competition like GTOC11 - a kind of real world application - cannot be checked using standard single threaded benchmarks ignoring parallelism. Standard benchmarks can be used to find errors, but never use them to tune your algorithms. Applying pymoo standard algorithms like DE or NSGA2 you find reasonable solutions very fast, but they fail to scale well with the time/evaluation budget or with the number of available CPU cores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduling.adoc

Scheduling.adoc

fcmaes - a Python 3 gradient-free optimization library

Solving a complex scheduling problem using continuous optimization

Important concepts used

Update

GTOC 11

Asteroid transfer table

Implementation

Design of the argument vector

Design of the objective function

Hints

Selection of a suitable Dyson sphere

Finding good mothership trajectories

Estimation of the arrival mass

Optimal delta velocity

QD update

How to use this algorithm to produce a GTOC11 solution

Update

Detailed results

Number of trajectories offered for selection

Update

Size of the Dyson sphere

Comparison of algorithms for scheduling

1 thread, no NUMA

1 thread, using NUMA

32 threads, using NUMA

32 threads, Smart Boundary Management using NUMA

16 threads, Multi-objective optimization (modecpp) using NUMA

What about the competition ?

Conclusion

Files

Scheduling.adoc

Latest commit

History

Scheduling.adoc

File metadata and controls

fcmaes - a Python 3 gradient-free optimization library

Solving a complex scheduling problem using continuous optimization

Important concepts used

Update

GTOC 11

Asteroid transfer table

Implementation

Design of the argument vector

Design of the objective function

Hints

Selection of a suitable Dyson sphere

Finding good mothership trajectories

Estimation of the arrival mass

Optimal delta velocity

QD update

How to use this algorithm to produce a GTOC11 solution

Update

Detailed results

Number of trajectories offered for selection

Update

Size of the Dyson sphere

Comparison of algorithms for scheduling

1 thread, no NUMA

1 thread, using NUMA

32 threads, using NUMA

32 threads, Smart Boundary Management using NUMA

16 threads, Multi-objective optimization (modecpp) using NUMA

What about the competition ?

Conclusion