Skip to content


Repository files navigation

Simple and reliable optimization with local, global, population-based and sequential techniques in numerical discrete search spaces.

Master status: img not loaded: try F5 :) img not loaded: try F5 :)
Code quality: img not loaded: try F5 :) img not loaded: try F5 :)
Latest versions: img not loaded: try F5 :)


Gradient-Free-Optimizers provides a collection of easy to use optimization techniques, whose objective function only requires an arbitrary score that gets maximized. This makes gradient-free methods capable of solving various optimization problems, including:

  • Optimizing arbitrary mathematical functions.
  • Fitting multiple gauss-distributions to data.
  • Hyperparameter-optimization of machine-learning methods.

Gradient-Free-Optimizers is the optimization backend of Hyperactive (in v3.0.0 and higher) but it can also be used by itself as a leaner and simpler optimization toolkit.

Main features

  • Easy to use:

    Simple API-design

    You can optimize anything that can be defined in a python function. For example a simple parabola function:

    def objective_function(para):
        score = para["x1"] * para["x1"]
        return -score

    Define where to search via numpy ranges:

    search_space = {
        "x": np.arange(0, 5, 0.1),

    That`s all the information the algorithm needs to search for the maximum in the objective function:

    from gradient_free_optimizers import RandomSearchOptimizer
    opt = RandomSearchOptimizer(search_space), n_iter=100000)
    Receive prepared information about ongoing and finished optimization runs

    During the optimization you will receive ongoing information in a progress bar:

    • current best score
    • the position in the search space of the current best score
    • the iteration when the current best score was found
    • other information about the progress native to tqdm
  • High performance:

    Modern optimization techniques

    Gradient-Free-Optimizers provides not just meta-heuristic optimization methods but also sequential model based optimizers like bayesian optimization, which delivers good results for expensive objetive functions like deep-learning models.

    Lightweight backend

    Even for the very simple parabola function the optimization time is about 60% of the entire iteration time when optimizing with random search. This shows, that (despite all its features) Gradient-Free-Optimizers has an efficient optimization backend without any unnecessary slowdown.

    Save time with memory dictionary

    Per default Gradient-Free-Optimizers will look for the current position in a memory dictionary before evaluating the objective function.

    • If the position is not in the dictionary the objective function will be evaluated and the position and score is saved in the dictionary.

    • If a position is already saved in the dictionary Gradient-Free-Optimizers will just extract the score from it instead of evaluating the objective function. This avoids reevaluating computationally expensive objective functions (machine- or deep-learning) and therefore saves time.

  • High reliability:

    Extensive testing

    Gradient-Free-Optimizers is extensivly tested with more than 400 tests in 2500 lines of test code. This includes the testing of:

    • Each optimization algorithm
    • Each optimization parameter
    • All attributes that are part of the public api
    Performance test for each optimizer

    Each optimization algorithm must perform above a certain threshold to be included. Poorly performing algorithms are reworked or scraped.

Optimization algorithms:

Gradient-Free-Optimizers supports a variety of optimization algorithms, which can make choosing the right algorithm a tedious endeavor. The gifs in this section give a visual representation how the different optimization algorithms explore the search space and exploit the collected information about the search space for a convex and non-convex objective function. More detailed explanations of all optimization algorithms can be found in the official documentation.

Local Optimization

Hill Climbing

Evaluates the score of n neighbours in an epsilon environment and moves to the best one.

Convex Function Non-convex Function
Stochastic Hill Climbing

Adds a probability to the hill climbing to move to a worse position in the search-space to escape local optima.

Convex Function Non-convex Function
Repulsing Hill Climbing

Hill climbing algorithm with the addition of increasing epsilon by a factor if no better neighbour was found.

Convex Function Non-convex Function
Simulated Annealing

Adds a probability to the hill climbing to move to a worse position in the search-space to escape local optima with decreasing probability over time.

Convex Function Non-convex Function
Downhill Simplex Optimization

Constructs a simplex from multiple positions that moves through the search-space by reflecting, expanding, contracting or shrinking.

Convex Function Non-convex Function

Global Optimization

Random Search

Moves to random positions in each iteration.

Convex Function Non-convex Function
Grid Search

Grid-search that moves through search-space diagonal (with step-size=1) starting from a corner.

Convex Function Non-convex Function
Random Restart Hill Climbing

Hill climbingm, that moves to a random position after n iterations.

Convex Function Non-convex Function
Random Annealing

Hill Climbing, that has large epsilon at the start of the search decreasing over time.

Convex Function Non-convex Function
Pattern Search

Creates cross-shaped collection of positions that move through search-space by moving as a whole towards optima or shrinking the cross.

Convex Function Non-convex Function
Powell's Method

Optimizes each search-space dimension at a time with a hill-climbing algorithm.

Convex Function Non-convex Function

Population-Based Optimization

Parallel Tempering

Population of n simulated annealers, which occasionally swap transition probabilities.

Convex Function Non-convex Function
Particle Swarm Optimization

Population of n particles attracting each other and moving towards the best particle.

Convex Function Non-convex Function
Spiral Optimization

Population of n particles moving in a spiral pattern around the best position.

Convex Function Non-convex Function
Evolution Strategy

Population of n hill climbers occasionally mixing positional information and removing worst positions from population.

Convex Function Non-convex Function

Sequential Model-Based Optimization

Bayesian Optimization

Gaussian process fitting to explored positions and predicting promising new positions.

Convex Function Non-convex Function
Lipschitz Optimization

Calculates an upper bound from the distances of the previously explored positions to find new promising positions.

Convex Function Non-convex Function
DIRECT algorithm

Separates search space into subspaces. It evaluates the center position of each subspace to decide which subspace to sepate further.

Convex Function Non-convex Function
Tree of Parzen Estimators

Kernel density estimators fitting to good and bad explored positions and predicting promising new positions.

Convex Function Non-convex Function
Forest Optimizer

Ensemble of decision trees fitting to explored positions and predicting promising new positions.

Convex Function Non-convex Function

Sideprojects and Tools

The following packages are designed to support Gradient-Free-Optimizers and expand its use cases.

Package Description
Search-Data-Collector Simple tool to save search-data during or after the optimization run into csv-files.
Search-Data-Explorer Visualize search-data with plotly inside a streamlit dashboard.

If you want news about Gradient-Free-Optimizers and related projects you can follow me on twitter.


PyPI version

The most recent version of Gradient-Free-Optimizers is available on PyPi:

pip install gradient-free-optimizers


Convex function
import numpy as np
from gradient_free_optimizers import RandomSearchOptimizer

def parabola_function(para):
    loss = para["x"] * para["x"]
    return -loss

search_space = {"x": np.arange(-10, 10, 0.1)}

opt = RandomSearchOptimizer(search_space), n_iter=100000)
Non-convex function
import numpy as np
from gradient_free_optimizers import RandomSearchOptimizer

def ackley_function(pos_new):
    x = pos_new["x1"]
    y = pos_new["x2"]

    a1 = -20 * np.exp(-0.2 * np.sqrt(0.5 * (x * x + y * y)))
    a2 = -np.exp(0.5 * (np.cos(2 * np.pi * x) + np.cos(2 * np.pi * y)))
    score = a1 + a2 + 20
    return -score

search_space = {
    "x1": np.arange(-100, 101, 0.1),
    "x2": np.arange(-100, 101, 0.1),

opt = RandomSearchOptimizer(search_space), n_iter=30000)
Machine learning example
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_wine

from gradient_free_optimizers import HillClimbingOptimizer

data = load_wine()
X, y =,

def model(para):
    gbc = GradientBoostingClassifier(
    scores = cross_val_score(gbc, X, y, cv=3)

    return scores.mean()

search_space = {
    "n_estimators": np.arange(20, 120, 1),
    "max_depth": np.arange(2, 12, 1),
    "min_samples_split": np.arange(2, 12, 1),
    "min_samples_leaf": np.arange(1, 12, 1),

opt = HillClimbingOptimizer(search_space), n_iter=50)
Constrained Optimization example
import numpy as np
from gradient_free_optimizers import RandomSearchOptimizer

def convex_function(pos_new):
    score = -(pos_new["x1"] * pos_new["x1"] + pos_new["x2"] * pos_new["x2"])
    return score

search_space = {
    "x1": np.arange(-100, 101, 0.1),
    "x2": np.arange(-100, 101, 0.1),

def constraint_1(para):
    # only values in 'x1' higher than -5 are valid
    return para["x1"] > -5

# put one or more constraints inside a list
constraints_list = [constraint_1]

# pass list of constraints to the optimizer
opt = RandomSearchOptimizer(search_space, constraints=constraints_list), n_iter=50)

search_data = opt.search_data

# the search-data does not contain any samples where x1 is equal or below -5
print("\n search_data \n", search_data, "\n")


v0.3.0 ✔️
  • add sampling parameter to Bayesian optimizer
  • add warnings parameter to Bayesian optimizer
  • improve access to parameters of optimizers within population-based-optimizers (e.g. annealing rate of simulated annealing population in parallel tempering)
v0.4.0 ✔️
  • add early stopping parameter
v0.5.0 ✔️
  • add grid-search to optimizers
  • impoved performance testing for optimizers
v1.0.0 ✔️
  • Finalize API (1.0.0)
  • add Downhill-simplex algorithm to optimizers
  • add Pattern search to optimizers
  • add Powell's method to optimizers
  • add parallel random annealing to optimizers
  • add ensemble-optimizer to optimizers
v1.1.0 ✔️
  • add Spiral Optimization
  • add Lipschitz Optimizer
  • print the random seed for reproducibility
v1.2.0 ✔️
  • add DIRECT algorithm
  • automatically add random initial positions if necessary (often requested)
v1.3.0 ✔️
  • add support for constrained optimization
  • add Grid search parameter that changes direction of search
  • add Random search parameter that enables to avoid replacement of the sampling
  • add SMBO parameter that enables to avoid replacement of the sampling
  • add Ant-colony optimization
  • add API, testing and doc to (better) use GFO as backend-optimization package
  • add other acquisition functions to smbo (Probability of improvement, Entropy search, ...)
  • ...

Gradient Free Optimizers <=> Hyperactive

Gradient-Free-Optimizers was created as the optimization backend of the Hyperactive package. Therefore the algorithms are exactly the same in both packages and deliver the same results. However you can still use Gradient-Free-Optimizers as a standalone package. The separation of Gradient-Free-Optimizers from Hyperactive enables multiple advantages:

  • Even easier to use than Hyperactive
  • Separate and more thorough testing
  • Other developers can easily use GFOs as an optimizaton backend if desired
  • Better isolation from the complex information flow in Hyperactive. GFOs only uses positions and scores in a N-dimensional search-space. It returns only the new position after each iteration.
  • a smaller and cleaner code base, if you want to explore my implementation of these optimization techniques.

While Gradient-Free-Optimizers is relatively simple, Hyperactive is a more complex project with additional features. The differences between Gradient-Free-Optimizers and Hyperactive are listed in the following table:

Gradient-Free-Optimizers Hyperactive
Search space composition only numerical numbers, strings and functions
Parallel Computing not supported yes, via multiprocessing or joblib
Distributed computing not supported yes, via data sharing at runtime
Visualization via Search-Data-Explorer via Search-Data-Explorer and Progress Board


  author =   {{Simon Blanke}},
  title =    {{Gradient-Free-Optimizers}: Simple and reliable optimization with local, global, population-based and sequential techniques in numerical search spaces.},
  howpublished = {\url{}},
  year = {since 2020}


Gradient-Free-Optimizers is licensed under the following License: