# Introduction

## Goal.
The goal of this lab is to perform some preliminary experiments aimed at understanding some advantages and pitfalls of Evolutionary Algorithms (EAs). In particular, you will observe the effects of mutations and problem dimensionality, and reflect to what extent these observations also apply to biological evolution.

Note that, unless otherwise specified, in this module's exercises we will use real-valued genotypes. 
I.e., an individual is a vector of real-valued parameters $\mathbf{x} = \{x_1, x_2, \dots, x_N\}$ (however, keep in mind that other types of individual representations are possible, such as trees or bit strings, which will not be explored in this lab). The fitness of an individual is given by the fitness function $f(\mathbf{x})$. The aim of the algorithms will be to *minimize* the fitness function $f(\mathbf{x})$, i.e. to find the vector $\mathbf{x}_{min}$ that has the lowest value $f(\mathbf{x})$. In other words, lower values $f(\mathbf{x})$ correspond to a better fitness!



# Exercise 1
In this first exercise, we will not yet run a complete EA. Instead, we consider a single parent individual **$x_{0}$**, from which a number of offspring individuals are created using a Gaussian mutation operator (perturbe a point locally according to a Gaussain) (which adds a random number from a Gaussian distribution with mean zero and standard deviation $\sigma$ to each parameter $x_i$ of the parent). The fitness function, shown in the figure below for $N=2$ variables, is defined as:
 
$f(\mathbf{x}) = \sum_{i=1}^{N}{x^2_i}$

![sphere.png](img/sphere.png)

This function is usually defined to as *sphere* function $^{[1]}$ and is one of the most used benchmark functions in continuous (real-valued) optimization. This fitness function is unimodal, i.e. it has a single global minimum at the origin. Furthermore, this function is *scalable*, i.e. it can be defined for any arbitrary number of variables ($N=1, 2, 3, ...$, i.e. $N \in \mathbb{N}$).
We will analyze the effects of mutations on the fitness depending on the value of the parent $\mathbf{x}_0$, the mutation magnitude $^{[2]}$ (the standard deviation $\sigma$), and the number of dimensions $N$ of the search space.

To start the experiments, run the next cell $^{[3]}$. It willl generate offspring from a single parent $\mathbf{x}_0$ using a Gaussian mutation operator (which adds a random number from a Gaussian distribution with mean zero and standard deviation $\sigma$ to the parent). Generate offspring from different parents (e.g. $\mathbf{x}_0$=0.1, 1, 10) using different mutation magnitudes (standard deviations $\sigma$). First consider the one-dimensional case, then two dimensions, and finally many dimensions (e.g. $N =100$). For one or two dimensions, the fitness landscape with the parent and the offspring is shown. For more dimensions, a __[boxplot](http://matplotlib.org/api/pyplot_api.html\#matplotlib.pyplot.boxplot)__ if you are unfamiliar with boxplots with the fitness of the offspring is shown where the green, dashed line is the fitness of the parent (we want our perturbations to have a smaller fitness value than the parent).

Try to answer the following questions:

- Do the mutations tend to improve or worsen the fitness of the parent?
- Are low or high mutation magnitudes best for improving the fitness? How does this depend on the initial value of the parent and on the number of dimensions of the search space?


---

[1]: 
Note that its contour lines, i.e. the loci of points for which the function has a constant value, are $N$-dimensional *spheres* centered in **0**. For instance, in 2-D, the contour lines are curves described by $x^2_1+x^2_2=k$, which correspond to a circle (the equivalent of a sphere in 2 dimension) with radius $\sqrt{k}$ and center in $\{0,0\}$. In 3-D, the contour lines are curves described by $x^2_1+x^2_2+x^2_3=k$, which correspond to a sphere with radius $\sqrt{k}$ and center in $\{0,0,0\}$. For $N>3$, each contour line corresponds to a *hyper-sphere*, i.e. a generalization of a sphere.

[2]: 
In the following, *mutation magnitude* indicates a generic measure of the mutation effect on the genotype. E.g. in continuous optimization with Gaussian mutation $x'=x+\mathcal{N}(0, \sigma$) the *mutation magnitude* is simply $\sigma$. This is different from the *mutation probability*, that is the chance that a given loci would be mutated. The combination of these two aspects, magnitude and probability, may be considered the overall *mutation rate*.] 

[3]: 
For all the exercises in this lab you may set the seed for the pseudo-random number generator. This will allow you to reproduce your results. 


In [None]:
import os
import sys

module_path = os.path.abspath(os.path.join(".."))
if module_path not in sys.path:
    sys.path.append(module_path)

In [None]:
# change std_dev
from utils.ga import generate_offspring, run_ga
from utils.simulation import run_ga_simulation, plot_boxplot
from random import Random
from scipy.stats import ttest_1samp
import matplotlib.pyplot as plt
import numpy as np

from inspyred.benchmarks import Sphere


x0 = [10.0, 10.0]
std_dev = 1
num_offspring = 1000


std_devs = [0.1, 1, 10, 100]
for std_dev in std_devs:
    args = {}
    args["fig_title"] = f"std_dev = {std_dev}"
    seed = int(100 * std_dev)
    rng = Random(seed)
    parent_fitness, offspring_fitnesses = generate_offspring(
        rng, x0, std_dev, num_offspring, True, args
    )

    fig = plt.figure("Offspring fitness")
    ax = fig.gca()
    ax.boxplot(offspring_fitnesses)
    ax.set_xticklabels([])
    ax.plot([0, 2], [parent_fitness, parent_fitness], "g--", label="Parent fitness")
    ax.set_ylabel("Fitness")
    ax.set_ylim(ymin=0)
    ax.legend()
    plt.show()

    ttest = ttest_1samp(offspring_fitnesses, parent_fitness)
    print(
        "p-value that offspring fitnesses are from the same distribution as parent fitness: ",
        ttest.pvalue,  # type:ignore
    )

In [None]:
# change num_dimensions

x0 = [10.0, 10.0]
std_dev = 1
num_offspring = 1000

initial_points = [[20.0], [20.0, 20.0], [20.0] * 10, [20.0] * 100]
for x0 in initial_points:
    args = {}
    args["fig_title"] = f"Number dimensions = {len(x0)}"
    seed = int(100 * std_dev)
    rng = Random(seed)
    parent_fitness, offspring_fitnesses = generate_offspring(
        rng, x0, std_dev, num_offspring, True, args
    )

    fig = plt.figure("Offspring fitness")
    ax = fig.gca()
    ax.boxplot(offspring_fitnesses)
    ax.set_xticklabels([])
    ax.plot([0, 2], [parent_fitness, parent_fitness], "g--", label="Parent fitness")
    ax.set_ylabel("Fitness")
    ax.set_ylim(ymin=0)
    ax.legend()
    plt.show()

    ttest = ttest_1samp(offspring_fitnesses, parent_fitness)
    print(
        "p-value that offspring fitnesses are from the same distribution as parent fitness: ",
        ttest.pvalue,  # type:ignore
    )

# Exercise 2

In this exercise we will try to confirm the observations that we did qualitatively in the previous exercise, by plotting boxplots side-by-side to evaluate the statistical significance of observed differences.

Run the next cell, and compare different values for:

- the number of dimensions of the search space;
- the value of the parent (how close the starting point is to the optimum);
- the mutation magnitude $\sigma$ (smaller $\sigma$ gives on average smaller mutations from the parent);

and try to confirm the answers that you gave in the previous exercise. See the comments in the script for more details.

**NOTE**: If you vary one of these three parameters, *make sure that you set the other two at a constant value* (otherwise it may be difficult to interpret your results).

In [None]:
# change starting point

# Number of dimensions
num_vars_1 = 2
num_vars_2 = 2
num_vars_3 = 2
# How close the parent is to the minimum
value_1 = 1
value_2 = 10
value_3 = 20
# Value of the parent
x0_1 = value_1 * np.ones(num_vars_1)
x0_2 = value_2 * np.ones(num_vars_2)
x0_3 = value_3 * np.ones(num_vars_3)
# standard deviation of the mutation
std_dev_1 = 5
std_dev_2 = 5
std_dev_3 = 5
# Number of offspring to be generated
num_offspring = 200

args = {}
seed = 100
rng = Random(seed)
# Generate offspring for the three conditions
parent_fitness_1, offspring_fitness_1 = generate_offspring(
    rng, x0_1, std_dev_1, num_offspring, False, args
)
parent_fitness_2, offspring_fitness_2 = generate_offspring(
    rng, x0_2, std_dev_2, num_offspring, False, args
)
parent_fitness_3, offspring_fitness_3 = generate_offspring(
    rng, x0_3, std_dev_3, num_offspring, False, args
)

fig = plt.figure("Offspring fitness")
ax = fig.gca()
ax.boxplot([offspring_fitness_1, offspring_fitness_2, offspring_fitness_3], notch=False)
ax.plot([0.5, 1.5], [parent_fitness_1, parent_fitness_1], "g--", label="Parent fitness")
ax.plot([1.5, 2.5], [parent_fitness_2, parent_fitness_2], "g--")
ax.plot([2.5, 3.5], [parent_fitness_3, parent_fitness_3], "g--")
ax.set_xticklabels([f"{x0_1}", f"{x0_2}", f"{x0_3}"])
ax.set_xlabel("Starting Point")
ax.set_ylabel("Fitness")
ax.set_ylim(ymin=0)
ax.legend()
plt.show()

print("mean for condition 1", np.mean(offspring_fitness_1))
print(
    "t test for condition 1",
    ttest_1samp(offspring_fitness_1, popmean=parent_fitness_1).pvalue,  # type:ignore
)
print("mean for condition 2", np.mean(offspring_fitness_2))
print(
    "t test for condition 2",
    ttest_1samp(offspring_fitness_2, popmean=parent_fitness_2).pvalue,  # type:ignore
)
print("mean for condition 3", np.mean(offspring_fitness_3))
print(
    "t test for condition 3",
    ttest_1samp(offspring_fitness_3, popmean=parent_fitness_3).pvalue,  # type:ignore
)

In [None]:
# change number of dimensions

# Number of dimensions
num_vars_1 = 2
num_vars_2 = 10
num_vars_3 = 100
# How close the parent is to the minimum
value_1 = 10
value_2 = 10
value_3 = 10
# Value of the parent
x0_1 = value_1 * np.ones(num_vars_1)
x0_2 = value_2 * np.ones(num_vars_2)
x0_3 = value_3 * np.ones(num_vars_3)
# standard deviation of the mutation
std_dev_1 = 5
std_dev_2 = 5
std_dev_3 = 5
# Number of offspring to be generated
num_offspring = 200

args = {}
seed = 100
rng = Random(seed)
# Generate offspring for the three conditions
parent_fitness_1, offspring_fitness_1 = generate_offspring(
    rng, x0_1, std_dev_1, num_offspring, False, args
)
parent_fitness_2, offspring_fitness_2 = generate_offspring(
    rng, x0_2, std_dev_2, num_offspring, False, args
)
parent_fitness_3, offspring_fitness_3 = generate_offspring(
    rng, x0_3, std_dev_3, num_offspring, False, args
)

fig = plt.figure("Offspring fitness")
ax = fig.gca()
ax.boxplot([offspring_fitness_1, offspring_fitness_2, offspring_fitness_3], notch=False)
ax.plot([0.5, 1.5], [parent_fitness_1, parent_fitness_1], "g--", label="Parent fitness")
ax.plot([1.5, 2.5], [parent_fitness_2, parent_fitness_2], "g--")
ax.plot([2.5, 3.5], [parent_fitness_3, parent_fitness_3], "g--")
ax.set_xticklabels([f"{num_vars_1}", f"{num_vars_2}", f"{num_vars_3}"])
ax.set_xlabel("Dimensions")
ax.set_ylabel("Fitness")
ax.set_ylim(ymin=0)
ax.legend()
plt.show()

print("mean for condition 1", np.mean(offspring_fitness_1))
print(
    "t test for condition 1",
    ttest_1samp(offspring_fitness_1, popmean=parent_fitness_1).pvalue,  # type:ignore
)
print("mean for condition 2", np.mean(offspring_fitness_2))
print(
    "t test for condition 2",
    ttest_1samp(offspring_fitness_2, popmean=parent_fitness_2).pvalue,  # type:ignore
)
print("mean for condition 3", np.mean(offspring_fitness_3))
print(
    "t test for condition 3",
    ttest_1samp(offspring_fitness_3, popmean=parent_fitness_3).pvalue,  # type:ignore
)

In [None]:
# change standard deviation

# Number of dimensions
num_vars_1 = 2
num_vars_2 = 2
num_vars_3 = 2
# How close the parent is to the minimum
value_1 = 10
value_2 = 10
value_3 = 10
# Value of the parent
x0_1 = value_1 * np.ones(num_vars_1)
x0_2 = value_2 * np.ones(num_vars_2)
x0_3 = value_3 * np.ones(num_vars_3)
# standard deviation of the mutation
std_dev_1 = 1
std_dev_2 = 5
std_dev_3 = 10
# Number of offspring to be generated
num_offspring = 200

args = {}
seed = 100
rng = Random(seed)
# Generate offspring for the three conditions
parent_fitness_1, offspring_fitness_1 = generate_offspring(
    rng, x0_1, std_dev_1, num_offspring, False, args
)
parent_fitness_2, offspring_fitness_2 = generate_offspring(
    rng, x0_2, std_dev_2, num_offspring, False, args
)
parent_fitness_3, offspring_fitness_3 = generate_offspring(
    rng, x0_3, std_dev_3, num_offspring, False, args
)

fig = plt.figure("Offspring fitness")
ax = fig.gca()
ax.boxplot([offspring_fitness_1, offspring_fitness_2, offspring_fitness_3], notch=False)
ax.plot([0.5, 1.5], [parent_fitness_1, parent_fitness_1], "g--", label="Parent fitness")
ax.plot([1.5, 2.5], [parent_fitness_2, parent_fitness_2], "g--")
ax.plot([2.5, 3.5], [parent_fitness_3, parent_fitness_3], "g--")
ax.set_xticklabels([f"{std_dev_1}", f"{std_dev_2}", f"{std_dev_3}"])
ax.set_xlabel("Standard deviation")
ax.set_ylabel("Fitness")
ax.set_ylim(ymin=0)
ax.legend()
plt.show()

print("mean for condition 1", np.mean(offspring_fitness_1))
print(
    "t test for condition 1",
    ttest_1samp(offspring_fitness_1, popmean=parent_fitness_1).pvalue,  # type:ignore
)
print("mean for condition 2", np.mean(offspring_fitness_2))
print(
    "t test for condition 2",
    ttest_1samp(offspring_fitness_2, popmean=parent_fitness_2).pvalue,  # type:ignore
)
print("mean for condition 3", np.mean(offspring_fitness_3))
print(
    "t test for condition 3",
    ttest_1samp(offspring_fitness_3, popmean=parent_fitness_3).pvalue,  # type:ignore
)

# Exercise 3

We will now use an EA to find the minimum of the unimodal fitness function defined in the previous exercise and analyze the effect of the mutation magnitude and the dimensionality of the search space on the results.

Run the next cell to run a basic, mutation-only EA on the 1-D sphere function first.

- How close is the best individual from the global optimum? 

Increase the dimensionality of the search space to two and more.

- How close are the best individuals now from the global optimum?
- Can you get as close as in the one-dimensional case by modifying the mutation magnitude and/or the number of generations?


In [None]:
# parameters for the GA
args = {}
args["crossover_rate"] = 0  # Crossover fraction
args["tournament_size"] = 2
args["mutation_rate"] = 1  # fraction of loci to perform mutation on
args["num_elites"] = 1  # number of elite individuals to maintain in each gen
args["pop_size"] = 20
args["pop_init_range"] = [-10, 10]  # Range for the initial population
args["num_vars"] = 1
args["std_dev"] = 0.5
args["max_generations"] = 100
display = True  # Plot initial and final populations

args["fig_title"] = "Sphere Function"

# Run the GA
results = run_ga_simulation(
    func=Sphere, num_simulations=10, args=args, print_plots=True
)  # type:ignore

# Display the results
print("Mean Best Individual:", results.mean_best_individual)
print("Mean Best Fitness:", results.mean_best_fitness)
# The distance from the optimum in the N-dimensional space
print(
    "Distance from Global Optimum",
    np.sqrt(np.sum(np.array(results.mean_best_individual) ** 2)),
)

In [None]:
# parameters for the GA
args = {}
args["crossover_rate"] = 0  # Crossover fraction
args["tournament_size"] = 2
args["mutation_rate"] = 1  # fraction of loci to perform mutation on
args["num_elites"] = 1  # number of elite individuals to maintain in each gen
args["pop_size"] = 20
args["pop_init_range"] = [-10, 10]  # Range for the initial population
args["num_vars"] = 10
args["std_dev"] = 0.1
args["max_generations"] = 250
display = True  # Plot initial and final populations

args["fig_title"] = "Sphere Function"

# Run the GA
results = run_ga_simulation(
    func=Sphere, num_simulations=10, args=args, print_plots=True
)  # type:ignore

# Display the results
print("Mean Best Individual:", results.mean_best_individual)
print("Mean Best Fitness:", results.mean_best_fitness)
# The distance from the optimum in the N-dimensional space
print(
    "Distance from Global Optimum",
    np.sqrt(np.sum(np.array(results.mean_best_individual) ** 2)),
)

In [None]:
import numpy as np
from inspyred.benchmarks import Sphere

# parameters for the GA
args = {}
args["crossover_rate"] = 0  # Crossover fraction
args["tournament_size"] = 2
args["mutation_rate"] = 1.0  # fraction of loci to perform mutation on
args["num_elites"] = 1  # number of elite individuals to maintain in each gen
args["pop_size"] = 20
args["pop_init_range"] = [-10, 10]  # Range for the initial population
args["num_vars"] = 2
args["std_dev"] = 1
args["max_generations"] = 100
display = True  # Plot initial and final populations


mutations = [0.05, 0.1, 1, 5, 10]
best_fitnesses: list[list[float]] = []
for mutation in mutations:
    args["fig_title"] = f"Sphere Function (std {mutation})"
    print(f"Mutation standard deviation: {mutation}")
    args["std_dev"] = mutation
    results = run_ga_simulation(
        func=Sphere, num_simulations=30, args=args, print_plots=False
    )  # type:ignore
    best_fitnesses.append(results.all_best_fitness)
    # Display the results
    print("Mean Best Individual:", results.mean_best_individual)
    print("Mean Best Fitness:", results.mean_best_fitness)
    # The distance from the optimum in the N-dimensional space
    print(
        "Distance from Global Optimum",
        np.sqrt(np.sum(np.array(results.mean_best_individual) ** 2)),
    )
    print("-------------------------------------------")
    plt.show()

plot_boxplot(best_fitnesses, mutations, "Standard Deviation")

In [None]:
import numpy as np
from inspyred.benchmarks import Sphere

# parameters for the GA
args = {}
args["crossover_rate"] = 0  # Crossover fraction
args["tournament_size"] = 2
args["mutation_rate"] = 5.0  # fraction of loci to perform mutation on
args["num_elites"] = 1  # number of elite individuals to maintain in each gen
args["pop_size"] = 20
args["pop_init_range"] = [-10, 10]  # Range for the initial population
args["num_vars"] = 2
args["std_dev"] = 1
args["max_generations"] = 100


generations = [10, 50, 100, 200]
best_fitnesses: list[list[float]] = []
for gen in generations:
    print(f"Number of generations: {gen}")
    args["fig_title"] = f"Sphere Function (gen {gen})"
    args["max_generations"] = gen
    results = run_ga_simulation(
        func=Sphere, num_simulations=30, args=args, print_plots=False
    )  # type:ignore
    best_fitnesses.append(results.all_best_fitness)

    # Display the results
    print("Mean Best Individual:", results.mean_best_individual)
    print("Mean Best Fitness:", results.mean_best_fitness)
    # The distance from the optimum in the N-dimensional space
    print(
        "Distance from Global Optimum",
        np.sqrt(np.sum(np.array(results.mean_best_individual) ** 2)),
    )
    print("-------------------------------------------")
    plt.show()

plot_boxplot(best_fitnesses, generations, "Number of Generations")  # type:ignore

In [None]:
import numpy as np
from inspyred.benchmarks import Sphere

# parameters for the GA
args = {}
args["crossover_rate"] = 0  # Crossover fraction
args["tournament_size"] = 2
args["mutation_rate"] = 5.0  # fraction of loci to perform mutation on
args["num_elites"] = 1  # number of elite individuals to maintain in each gen
args["pop_size"] = 20
args["pop_init_range"] = [-10, 10]  # Range for the initial population
args["num_vars"] = 2
args["std_dev"] = 1
args["max_generations"] = 100

dimensions = [1, 2, 10, 100]
best_fitnesses: list[list[float]] = []
for num_vars in dimensions:
    print(f"Number of dimensions: {num_vars}")
    args["fig_title"] = f"Sphere Function (dim {num_vars})"
    args["num_vars"] = num_vars
    results = run_ga_simulation(
        func=Sphere, num_simulations=30, args=args, print_plots=False
    )  # type:ignore
    best_fitnesses.append(results.all_best_fitness)

    # Display the results
    print("Mean Best Individual:", results.mean_best_individual)
    print("Mean Best Fitness:", results.mean_best_fitness)
    # The distance from the optimum in the N-dimensional space
    print(
        "Distance from Global Optimum",
        np.sqrt(np.sum(np.array(results.mean_best_individual) ** 2)),
    )
    print("-------------------------------------------")
    plt.show()

plot_boxplot(best_fitnesses, dimensions, "Number of Dimensions")  # type:ignore

# Exercise 4
In this exercise we will try to confirm the observations that we did qualitatively in the previous exercise, by plotting boxplots side-by-side to evaluate the statistical significance of observed differences.

Run the next cell to do three batches of $30$ runs of the EA with different mutation magnitudes (it may take a minute). The boxplot compares the best fitness values obtained (at the end of each run) in the three conditions.

- Did you see any difference in the best fitness obtained? Try to explain the result.


In [None]:
num_vars = 2  # Number of dimensions of the search space
std_devs = [0.001, 0.01, 0.1, 1.0, 5.0]  # Standard deviation of the Gaussian mutations
max_generations = 50  # Number of generations of the GA
num_runs = 30  # Number of runs to be done for each stdev

# parameters for the GA
args = {}
args["crossover_rate"] = 0  # Crossover fraction
args["tournament_size"] = 2
args["mutation_rate"] = 1.0  # fraction of loci to perform mutation on
args["num_elites"] = 1  # number of elite individuals to maintain in each gen
args["pop_size"] = 20  # population size
args["pop_init_range"] = [-10, 10]  # Range for the initial population
display = False  # Plot initial and final populations

args["fig_title"] = "GA"

seed = None
rng = Random(seed)
# run the GA *num_runs* times for each std_dev and record the best fits
best_fitnesses = [
    [
        run_ga(
            rng,
            num_vars=num_vars,
            max_generations=max_generations,
            display=display,
            gaussian_stdev=std_dev,
            **args,
        )[1]
        for _ in range(num_runs)
    ]
    for std_dev in std_devs
]

fig = plt.figure("GA (best fitness)")
ax = fig.gca()
ax.boxplot(best_fitnesses)
ax.set_xticklabels([str(sd) for sd in std_devs])
ax.set_yscale("log")
ax.set_xlabel("Std. dev.")
ax.set_ylabel("Best fitness")
plt.show()

In [None]:
num_vars = 10  # Number of dimensions of the search space
std_devs = [0.001, 0.01, 0.1, 1.0, 5.0]  # Standard deviation of the Gaussian mutations
max_generations = 50  # Number of generations of the GA
num_runs = 30  # Number of runs to be done for each stdev

# parameters for the GA
args = {}
args["crossover_rate"] = 0  # Crossover fraction
args["tournament_size"] = 2
args["mutation_rate"] = 1.0  # fraction of loci to perform mutation on
args["num_elites"] = 1  # number of elite individuals to maintain in each gen
args["pop_size"] = 20  # population size
args["pop_init_range"] = [-10, 10]  # Range for the initial population
display = False  # Plot initial and final populations

args["fig_title"] = "GA"

seed = None
rng = Random(seed)
# run the GA *num_runs* times for each std_dev and record the best fits
best_fitnesses = [
    [
        run_ga(
            rng,
            num_vars=num_vars,
            max_generations=max_generations,
            display=display,
            gaussian_stdev=std_dev,
            **args,
        )[1]
        for _ in range(num_runs)
    ]
    for std_dev in std_devs
]

fig = plt.figure("GA (best fitness)")
ax = fig.gca()
ax.boxplot(best_fitnesses)
ax.set_xticklabels([str(sd) for sd in std_devs])
ax.set_yscale("log")
ax.set_xlabel("Std. dev.")
ax.set_ylabel("Best fitness")
plt.show()

# Report

## Exercise 1
Sphere function with a single parent individual **$x_{0}$**, from which a number of offspring individuals are created using a Gaussian mutation operator.

As a first step we analyze the effects of the mutation magnitude $\sigma$ on the fitness depending on the value of the parent $\mathbf{x}_0$ and the number of dimensions $N$ of the search space.

 We can see that the offsprings are distributed around the parent and, based on the mutation rate, their spread is more or less wide. The only exception is for  very large mutations (here 10) where the offsprings aren't equally distributed around the parent because the mutation is large enough to reach the global minimum.
 
 ![sd0.1](img/ex1_sd01_pop.png)   ![sd0.1](img/ex1_sd01_fit.png) 

 ![sd0.1](img/ex1_sd1_pop.png)    ![sd1](img/ex1_sd1_fit.png) 
 
 ![sd0.1](img/ex1_sd10_pop.png)   ![sd1](img/ex1_sd10_fit.png) 


Now the observe the effect of the mutation magnitude based on the number of dimensions.

![1D](img/ex1_1d.png) ![2D](img/ex1_2d.png) ![10D](img/ex1_10d.png)

The only interesting thing to note is that the fitness of the parent (and as a consequence the fitness of the offspring) is higher in higher dimensions. This is because the fitness function is the sum of the squares of the parameters, so the more parameters we have, the higher the fitness will be.


## Exercise 2
We will try to confirm the observations that we did qualitatively in the previous exercise, by plotting boxplots side-by-side to evaluate the statistical significance of observed differences.

### Initial Point
Observing the boxplots, we can see that the closer the initial point is to the global minimum, the more effective the mutation is. This is because the offspring are more likely to be closer to the global minimum.

![Initial Point](img/ex2_init_point.png)

### Number of Dimensions
We can see that the larger the number of dimensions, the fitness of the parent is better than that of its offspring. This is because the mutation is applied to each dimension, so the more dimensions we have, the more likely the offspring will be far from the global minimum.

![Number of Dimensions](img/ex2_dim.png)

### Mutation Magnitude
We can see that the larger the mutation magnitude, the more spread out the offsprings are. In this case, since there is only a global minimum, the larger the mutation the better because the offspring are more likely to reach the minimum 

![Mutation Magnitude](img/ex2_sd.png)

## Exercise 3
EA to find the minimum of the unimodal sphere function with a different number of dimensions.

### 1D
We ran the EA with a mutation of 0.5 and 100 generations. The EA was then repeated 30 times to get the average best individual and fitness.

Mean Best Individual: [0.00091159]

Mean Best Fitness: 2.670713845142703e-06

Distance from Global Optimum 0.0009115875196680715

![1D](img/ex3_sphere_1d_fit.png) | ![1D](img/ex3_sphere_1d_pop.png)

### 10D
We ran the EA with a mutation of 0.1 and 250 generations. The EA was then repeated 30 times to get the average best individual and fitness. Increasing the number of generations and decreasing the mutation helped to get close to the global minimum but past this point not much improvement was seen.

Mean Best Individual: [ 0.05436666 -0.04087693 -0.18585926  0.23645044  0.10731215  0.15161582 -0.12307874 -0.02372709 -0.01918248 -0.06533183]

Mean Best Fitness: 1.9201467949492432

Distance from Global Optimum 0.3872078945527439

![10D](img/ex3_sphere_10d_fit.png)

## Exercise 4


Comparison over multiple runs of the EA with different mutation magnitudes. We can see that the larger mutation magnitudes aren't as effective probably because the mutation is so large than the individuals "jump" over the global minimum. While a too small mutation rate is ineffective because the offspring are too close and don't explore the search space.

![Boxplot](img/ex4.png)