# **AAHPS: ASSIGNMENT 5**

**Authors**: Nina Mislej and Nika Molan
<br/>**Student numbers**: 63200016 and 63200017

### **Exercise Description**
The aim of this assignment is to find the best result for **24 optimization** (*minimization*) **functions** that are available as **BBOB** (*Black-Box Optimization Benchmark*). We will implement **2 optimization programs** one of which has to be **local search** and the other one can be **any** optimization approach.
<br/>The functions are available in ***smoof*** package in **R** and we will initialize the functions with **40 dimensions** using the *iid*: **2023**

In addition to this report the results include the **coordinates** for each of the **24 minimums**. These are included in a separate file, one for each algorithm. 
<br/>One line represents **40**-**touple** for one function

### **Functions Description**

The [functions](https://numbbo.github.io/gforge/downloads/download16.00/bbobdocfunctions.pdf) are designed to cover a lot of different problems that occur with **different optimization approaches**. Some of the functions are meant to test if the algorithm gets **stuck in local optimums**, some check if the algorithm can find optimums **bordering** the function domain, some functions are symetric while some highly asymetric and so on and so forth. We have to take this properties in to account when chosing our algorithms to optimize the search.

In [113]:
# setting up the package and enviroment
import numpy as np
import math
from rpy2.robjects import numpy2ri
from rpy2.robjects.packages import importr

numpy2ri.activate() # automatic conversion from numpy to R arrays
smoof = importr("smoof") # importing R smoof package

Now we **initialize** the functions.
</br>In this process we also save their known minimums, which are provided in their description. We will use these minimums later on **to compare** the coordinates we get from our optimization algorithms.

In [129]:
# initializing the functions and checking their minimums
functions = {}
true_minimums = {}

for fun in range(1, 25):
    functions[fun] = (smoof.makeBBOBFunction(40, fun, 2023))
    true_minimums[fun] = smoof.getGlobalOptimum(functions[fun]).rx2("value")[0]
    print(f"MINIMUM OF FUNCTION {fun}: {true_minimums[fun]}")

# setting the bounds:
upper_bound = 5
lower_bound = -5

MINIMUM OF FUNCTION 1: 21.1
MINIMUM OF FUNCTION 2: 26.91
MINIMUM OF FUNCTION 3: 311.6
MINIMUM OF FUNCTION 4: 311.6
MINIMUM OF FUNCTION 5: -48.47
MINIMUM OF FUNCTION 6: -91.36
MINIMUM OF FUNCTION 7: 32.49
MINIMUM OF FUNCTION 8: 71.6
MINIMUM OF FUNCTION 9: -356.7
MINIMUM OF FUNCTION 10: 51.03
MINIMUM OF FUNCTION 11: -96.65
MINIMUM OF FUNCTION 12: 553.39
MINIMUM OF FUNCTION 13: 9.88
MINIMUM OF FUNCTION 14: 405.47
MINIMUM OF FUNCTION 15: 64.25
MINIMUM OF FUNCTION 16: -43.28
MINIMUM OF FUNCTION 17: 227.51
MINIMUM OF FUNCTION 18: 227.51
MINIMUM OF FUNCTION 19: 73.06
MINIMUM OF FUNCTION 20: -123.81
MINIMUM OF FUNCTION 21: -44.42
MINIMUM OF FUNCTION 22: 222.1
MINIMUM OF FUNCTION 23: -1000.0
MINIMUM OF FUNCTION 24: -1.33


In [138]:
# comparing minimums we got with the real ones
def comparison(approx_minimums):
    error = 0
    for i,(true,approx) in enumerate(zip(true_minimums.values(), approx_minimums),1):
        print(f"{i : <20} TRUE: {true : <20} APPROXIMATE: {round(approx,2) : <20} DIFFERENCE: {round(abs(true - approx),2)}")
        error = error + round(abs(true - approx),2)
    print(f"OVERALL ABSOLUTE ERROR:  {error}")

# writing the coordinates of the minimum to file
def results_to_file(coordinates, algo_number):
    with open(f"algorithm_{algo_number}.txt", 'w') as f:
        for point in coordinates:
            for xi in point:
                f.write(f"{xi} ")
            f.write("\n")

## **ALGORITHM 1:** Gradient Descent

This algorithm was chosen as the **first one** we tested, under the assumtion it is often wildly used and works for functions that behave well. One big problem is the differentiability of the functions which is not always favourable and is sometimes time consuming so we used an **approximation**. 15 out of the 17 functions with description are described as **differentiable** so it is worth analysing this approach, even though it is a bit naive because it only performs well for smooth, convex, unimodal functions. The problem with this aproach is also that it has a big possibility of getting stuck in a **local minimum**. We will be solving this problem by adding a random component to it, making the algorithm go through **more iterations** starting at **different points** chosen uniformly at random.

The idea is that we start in a randomly chosen point and compute the **gradient** for that set of coordinates. The function **``gradient_approximation``** takes care of that. Instead of differentiating each variable, we use two points that are 0.01 apart. Because this is the direction of the **biggest increase** for the function we are trying to optimize we compute the next point by going in the opposite direction of the mentioned increase. This is the equation $x_n = x_{n-1} - \alpha \cdot \bigtriangleup G(x_n)$ where $\bigtriangleup G$ is the gradient of the objective function.

The next question that arises in this proposition is what is the **step size** in this direction? Let's look at the **``alpha``** parameter in the code bellow. Now there are many ways to find this scalar, the optimal one would be finding the one that **minimizes** the equation: $G(x_{n-1} - \alpha \cdot \bigtriangleup G(x_n))$. This would take a lot of time because we are working with 40 dimentions and after some observations one can notice the elements of the gradients sometimes tend to be very different in size. We tackled with this problem by making the step size for each partial derivative different making it **between 0 and 1** for every single one. This also solves a lot of domain breach problems. We wrapped all of this up in a function that creates the new point **``new_point``**.

***NOTE***: This algorithm is quite **time consuming** for all 24 functions, because we have to calculate the **gradient approximation**.

In [165]:
# function implementation
def gradient_descent(limit, dim, functions):
    
    # setting the best solution to the first one we try and updating it later on 
    # this is not neccessary and could be omited but given our lambda calculation we could skip to a worse solution
    best_coordinates = np.random.uniform(lower_bound, upper_bound, (len(functions), dim))
    best_values = [functions[i + 1](best_coordinates[i])[0] for i in range(len(functions))]

    # our solution in each step - starting position
    curr_coordinates = best_coordinates.copy()

    # limit specifies the number of descents
    for i in range(len(functions)):
        for k in range(limit):
            gradient = gradient_approximation(functions[i + 1], curr_coordinates[i], dim)
            if gradient == -1: break

            curr_coordinates[i] = new_point(curr_coordinates[i], gradient)
            curr_value = functions[i + 1](curr_coordinates[i])[0]

            if curr_value < best_values[i]:
                best_values[i] = curr_value
                best_coordinates[i] = curr_coordinates[i].copy()
            
    
    return best_coordinates, best_values
           
# approximates all partial derivatives in the gradient            
def gradient_approximation(function, x0, dim):
    approx = []
    x1 = x0.copy()

    # approximation for each variable
    for i in range(dim):
        x1[i] = x0[i] + 0.01
        y0 = function(x0)
        y1 = function(x1)

        # if the difference is too small we can increase the difference to move the point still
        if math.isnan(y1 - y0): x1[i] = x1[i] + 0.04
        approx.append(((y1 - y0)/(x1[i] - x0[i]))[0])
        x1[i] = x0[i]
    return approx

# calculating the lambda and deciding on a new point
def new_point(point, gradient):
    new = []
    for xi, grad in zip(point, gradient):

        # calculating te number of digits in each derivative
        no_digits = len(str(abs(math.floor(grad))))
        alpha = 10**(-no_digits)
        next_point = xi - alpha * grad

        # checking the constraints
        if next_point < upper_bound and next_point > lower_bound: new.append(next_point)
        else: new.append(xi)
    return new

In [166]:
# testing our algorithm
# we make multiple runs 
# IMPORTANT: DONT OVERWRITE FINAL RESULTS !!!
min_values = [float('inf') for i in range(len(functions))]
min_coordinates = [[0 for i in range(40)] for k in range(len(functions))]
no_runs = 10 

for i in range(no_runs):
    curr_coordinates, curr_values = gradient_descent(limit=100, dim=40, functions=functions)
    for k in range(len(functions)):
        if curr_values[k] < min_values[k]:
            min_values[k] = curr_values[k]
            min_coordinates[k] = curr_coordinates[k].copy()
    
comparison(min_values)
results_to_file(min_coordinates, 1)

# PARAMETERS 
# 1 run with limit 100: 1 min - testing
# 10 runs with limit 100: 10 min - decent result
# 100 runs with limit 300: 200 min - final result

1                    TRUE: 21.1                 APPROXIMATE: 21.1                 DIFFERENCE: 0.0
2                    TRUE: 26.91                APPROXIMATE: 44999.24             DIFFERENCE: 44972.33
3                    TRUE: 311.6                APPROXIMATE: 1695.34              DIFFERENCE: 1383.74
4                    TRUE: 311.6                APPROXIMATE: 2232.08              DIFFERENCE: 1920.48
5                    TRUE: -48.47               APPROXIMATE: -20.81               DIFFERENCE: 27.66
6                    TRUE: -91.36               APPROXIMATE: -16.37               DIFFERENCE: 74.99
7                    TRUE: 32.49                APPROXIMATE: 2755.95              DIFFERENCE: 2723.46
8                    TRUE: 71.6                 APPROXIMATE: 639.85               DIFFERENCE: 568.25
9                    TRUE: -356.7               APPROXIMATE: 3.2                  DIFFERENCE: 359.9
10                   TRUE: 51.03                APPROXIMATE: 274948.96            DIFFERENCE

Taking these results in to consideration we could deffenitly do better. We can see that in general this algorithm **does not perform well time-wise**, at least not given these parameters.
While it does produce quite good results for some fuctions, we can see it fails most drastically in the case of **function 2** and **12**. Let us try a **local search** approach next to see if we could perhaps imporve these results.

## **ALGORITHM 2:** Simulated Annealing

This was the **second one** algorithm. The reason behind this one was the fact that these functions are very different, so the idea of some improved **random search** could work if we are trying to get the best possible result for all functions even though this could mean cutting some loses at those more specific ones. 

Now for a quick summery of how this algorithm works. We start with random points for each function. At every iteration of the algorithm take a **random step** between -0.1 and 0.1 for all 40 dimensions and check whether this **neighbour** produces a **better objective value**. Now in order not to get stuck in a **local minimum** we decide to move to this new point even if the solution is **worse** with some **probability** that is getting smaller with each iteration. This probability is regulated with a parameter called **``temperature``**. The higher the temperature the bigger the probability of accepting the bad solution, the more search space we explore. The lower the temperature, the more we focus on the solution at hand and optimizing this one. 

In our code **``temp``** is the initial temperature which is used to calculate the actual temperature in each step of the loop. The condition that decides the probability is taken from the Metropolis algorithm and goes as follows: 
1. first we calculate the difference between the current value and the candidate value of the neighbour: $\bigtriangleup$
2. if the difference is a negative one then the solution is better and we accept and save it - this is now our current position
3. otherwise we accept the probability with the Boltzmann distribution: $P = e^{\cfrac {\bigtriangleup} {temperature}}$
4. we generate a random number and if the number is higher we move to the candidate point

In [144]:
# implementation
def simulated_annealing(limit, dim, temp, functions):
    
    # setting the best solution to the first one we try and updating it later on
    best_coordinates = np.random.uniform(lower_bound, upper_bound, (len(functions), dim))
    best_values = [functions[i + 1](best_coordinates[i])[0] for i in range(len(functions))]

    # our solution in each step - starting position
    curr_coordinates = best_coordinates.copy()
    curr_values = best_values.copy()
    
    for i in range(len(functions)):
        for k in range(limit):

            # calculating the neighbour candidate and checking the bounds
            candidate = curr_coordinates[i] + np.random.uniform(-0.1, 0.1, 40)
            if all(xi < upper_bound and xi > lower_bound for xi in candidate):
                
                # if the value is better than the best one so far we save it
                candidate_value = functions[i + 1](candidate)[0]
                if candidate_value < best_values[i]:
                    best_coordinates[i] = candidate.copy()
                    best_values[i] = candidate_value.copy()
                
                # calculating the difference between the values and the new temperature
                diff = candidate_value - curr_values[i]
                temperature = temp / float(k + 1)

                # calculating the probability of accepting a bad solution
                x = -diff / temperature
                try: condition = math.exp(x)
                except OverflowError: condition = float("Inf")

                # random throw whether we accept the solution
                if diff < 0 or np.random.random() < condition:
                    curr_coordinates[i] = candidate.copy()
                    curr_values[i] = candidate_value.copy()
            
    return [best_coordinates, best_values]

In [147]:
# testing our algorithm
# we make multiple runs 
# IMPORTANT: DONT OVERWRITE FINAL RESULTS !!!
min_values_2 = [float('inf') for i in range(len(functions))]
min_coordinates_2 = [[0 for i in range(40)] for k in range(len(functions))]
no_runs = 10

for i in range(no_runs):
    curr_coordinates, curr_values = simulated_annealing(limit=3000, dim=40, temp=100, functions=functions)
    for k in range(len(functions)):
        if curr_values[k] < min_values_2[k]:
            min_values_2[k] = curr_values[k] 
            min_coordinates_2[k] = curr_coordinates[k].copy()
            

comparison(min_values_2)
results_to_file(min_coordinates_2, 2)
# PARAMETERS 
# 10 runs with limit 3000: 3 min - testing
# 100 runs with limit 3000: 30 min - final result

1                    TRUE: 21.1                 APPROXIMATE: 21.47                DIFFERENCE: 0.37
2                    TRUE: 26.91                APPROXIMATE: 67747.44             DIFFERENCE: 67720.53
3                    TRUE: 311.6                APPROXIMATE: 1244.59              DIFFERENCE: 932.99
4                    TRUE: 311.6                APPROXIMATE: 1611.71              DIFFERENCE: 1300.11
5                    TRUE: -48.47               APPROXIMATE: 211.75               DIFFERENCE: 260.22
6                    TRUE: -91.36               APPROXIMATE: -14.13               DIFFERENCE: 77.23
7                    TRUE: 32.49                APPROXIMATE: 765.99               DIFFERENCE: 733.5
8                    TRUE: 71.6                 APPROXIMATE: 122.15               DIFFERENCE: 50.55
9                    TRUE: -356.7               APPROXIMATE: -302.92              DIFFERENCE: 53.78
10                   TRUE: 51.03                APPROXIMATE: 44245.2              DIFFERENCE: 

This result is **overall better**, if we take the absolute error as a measurment of succsess, but gradient descent performed better for some **specific cases**. The difference in the result is mainly due to different results in the case of **function 12**.

## **ALGORITHM 3:** Nika Go Wild ♡
- v tretji celici mas izpis za primerjavo in izpis kordinat v datoteko tko kokr je on hotu
- in general mas ``functions[]`` array vseh funkcij das is it 
- tko implementacija vsega kar rabs je mainly na zacetku
- zaenkrat komot laufas une funkcije ko pise ``DO NOT OVERWRITE`` ampak u soboto okoli 1 bom pushlna gor final rezultate za te algoritme tko da mejbi ne jih laufat kr jih bom pustila cez noc