# Exercice 1 : Solving Optimization Problems with Genetic Algorithms

### Problem A : Genetic Algorithm for Optimizing a Neural Network

### 1)

In [1]:
from sklearn.datasets import make_blobs
import pandas as pd


In [2]:
X, y = make_blobs(n_samples=100, centers=2, n_features=4, random_state=42)

df = pd.DataFrame(X, columns=['Feature 1', 'Feature 2', 'Feature 3', 'Feature 4'])
df['Target'] = y

df

Unnamed: 0,Feature 1,Feature 2,Feature 3,Feature 4,Target
0,-4.726445,-7.647457,-7.966007,7.506865,1
1,-7.697848,-4.787722,-9.844345,6.109334,1
2,-2.035960,8.941457,3.793085,0.458322,0
3,-4.735683,-6.246191,-10.863470,7.509977,1
4,-4.427969,8.987772,4.700109,4.436412,0
...,...,...,...,...,...
95,-3.580090,9.496759,4.416416,2.687170,0
96,-6.364592,-6.366324,-8.323280,11.176254,1
97,-0.929985,9.781721,4.170404,2.515730,0
98,-6.522612,-7.573019,-7.938728,7.630822,1


We created a dataframe with 4 features and a column target with 0 or 1 (it's a classification case)

### 2)

Let's construct this neural newtork with TensorFlow, we could also do it with Pytorch but i only know how to make a neural network with TensorFlow

In [3]:
from tensorflow import keras 
from tensorflow.keras import layers




In [4]:
model = keras.Sequential([
    layers.Dense(16, activation='sigmoid', input_shape=[4]), # Input layer with 4 neurons and 16 neurons in the 1rst hidden layer
    layers.Dense(16, activation='sigmoid'), # 2nd hidden layer with 16 neurons
    layers.Dense(1, activation='sigmoid') # Output layer with 1 neuron
])




Let's check if this model that we created have all the parameters that we wanted

In [5]:
model.summary() 

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 16)                80        
                                                                 
 dense_1 (Dense)             (None, 16)                272       
                                                                 
 dense_2 (Dense)             (None, 1)                 17        
                                                                 
Total params: 369 (1.44 KB)
Trainable params: 369 (1.44 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


We can verify this parameters with very simple operations :

- Between the 4 inputs and the first 16 neurons layer we have **4*16 = 64 weights** and **16 bias** wich gives us **80 parameters** for this first connection

- Between the first 16 neurons layer and the second 16 neurons layer we have **16*16 = 256 weights** and **16 bias** wich gives us **272 parameters** for this second connection

- Then for the last connection we have **directly one connection of each 16 neurons to the output** and **1 bias** so **17 parameters**

To conclude we have **80 + 272 + 17 = 369 parameters in total**

### 3)

To begin with, we are going to create a function name initialize_population that make our first population to start with.

This population will be compose of n chromosomes, theses chromosomes are basically au list of list with weights and bias.

The architecure of our neural network is : input, layer1, layer2, output

So we have theses connections : [input,layer1], [layer1,layer2], [layer2,output] 

That because for each chromosomes we are going to have 3 list compose of 2 array, the first one is the weights and the second the bias.

We are going to have something like this : **[ [array_weights_1,array_bias_1], [array_weights_2,array_bias_2], [array_weights_3,array_bias_3] ]**

Let's start by visualize our default weights and bias 

In [6]:
print("Weights and bias of the first connection :\n\n",model.layers[0].get_weights())
print("\nWeights and bias of the second connection :\n\n",model.layers[1].get_weights()) 
print("\nWeights and bias of the third connection :\n\n",model.layers[2].get_weights()) 

Weights and bias of the first connection :

 [array([[ 0.05438268, -0.14526641,  0.36453468, -0.12049586, -0.36083972,
        -0.17483607, -0.19916707,  0.34514558, -0.01795381,  0.3769924 ,
        -0.4213738 , -0.36570317, -0.43162164,  0.19861221, -0.15091732,
        -0.07533446],
       [ 0.502874  , -0.34256768, -0.41678   , -0.3952066 ,  0.3426656 ,
        -0.08607474,  0.35560292,  0.5474026 ,  0.5436394 , -0.01876724,
        -0.11848572, -0.05323374,  0.01763648, -0.5053939 , -0.37652493,
        -0.22883585],
       [ 0.44047248, -0.09670481,  0.42762804,  0.02966052,  0.44168913,
         0.4962932 ,  0.14681596, -0.27638006,  0.5254004 ,  0.21729207,
         0.01573461, -0.21773008,  0.40856004, -0.17688853, -0.00445002,
        -0.5372721 ],
       [-0.14764011, -0.2651664 , -0.46957186,  0.5143881 ,  0.2871257 ,
        -0.07707012,  0.15586591,  0.1037181 ,  0.37429583,  0.18541771,
         0.36590934,  0.5258361 ,  0.2224825 , -0.28747344,  0.11538845,
         0.3

Now we are going to create a random population in the same order of magnitude of these weights

In [7]:
import numpy as np

def initialize_population(population_size):
    population = []

    for _ in range(population_size):
        individual = []
        for layer in model.layers:

            weights_shape, bias_shape = layer.get_weights()[0].shape, layer.get_weights()[1].shape

            random_weights = np.random.uniform(-1, 1, size=weights_shape)  # a random between -1 and 1 to not go too far away from the actual values
            random_biases = np.random.uniform(-1, 1, size=bias_shape)      
            individual.append((random_weights, random_biases))
        
        population.append(individual)

    return population


Now we are going to create the fitness_function wich is basically the function that evaluate each solution (chromosome or couples of weights and bias).

We are going to do the inverse of the MSE because more the MSE is small more the chromosme is good so doing the inverse tell us that more the fitness is high better is the chromosome.

This technique give more chance for a good solution to be select for the reproduction.

In [17]:
from tensorflow.keras.losses import MeanSquaredError

def fitness_function(chromosome, X, y):

    for layer, weights in zip(model.layers, chromosome):
        layer.set_weights(weights)  # Set the weights of the model's layers to the chromosome's weights
    
    predictions = model.predict(X)
    mse = MeanSquaredError()(y, predictions).numpy()

    return 1 / (1 + mse)  # Inverse of the mean squared error (the higher the better)


Now it's time to implement :

- Selecton, wich is basically the function that we are going to use to select good parents
- Crossover, wich is the combination of the informations of two parent solution (chromosomes) in order to create children
- Mutation, wich is use to introduce small perturbation in chromosomes genetic to maintain genetic diversoty

In [18]:
import random

def tournament_selection(population, X, y, k=3):

    selected_chromosomes = random.sample(population, k)
    
    fitness_scores = [(individual, fitness_function(individual, X, y)) for individual in selected_chromosomes] # for each k chromosome we calculate their fitness function
    
    best_chromosomes = max(fitness_scores, key=lambda x: x[1])[0]

    return best_chromosomes



In [19]:
import numpy as np

def crossover(parent1, parent2): # this is a two point crossover
    child1, child2 = [], []

    for layer1, layer2 in zip(parent1, parent2):

        weights1 = [np.copy(weight) for weight in layer1]
        weights2 = [np.copy(weight) for weight in layer2]
        
        point1, point2 = sorted(random.sample(range(len(weights1[0])), 2)) # we are taking the points that we are going to switch between weights1 and weights2

        for i in range(point1, point2):
            weights1[0][i], weights2[0][i] = weights2[0][i], weights1[0][i]
        
        child1.append(weights1)
        child2.append(weights2)
        
    return child1, child2


In [20]:
def mutation(chromosome, mutation_rate):

    for layer in chromosome:
        for weight_matrix in layer[0]:
            
            if random.random() < mutation_rate:
                weight_matrix += np.random.normal() # just a small random variation
                
    return chromosome


### 4)

Let's configure the GA

In [21]:
population_size = 100
mutation_rate = 0.1
num_generations = 200
elitism_rate = 10 

### 5)

Now it's time to find the solution, we are going to use all the previous functions that we created 

In [None]:
import matplotlib.pyplot as plt


population = initialize_population(population_size) # Here we initialize the starting population with a population_size of 100 (as we saw in the question 4)
best_mse_per_generation = []

for generation in range(num_generations): 

    scores = [fitness_function(chro, X, y) for chro in population] # for each chromosome of the population we measure his "quality"
    
    best_chromosomes = sorted(list(zip(population, scores)), key=lambda x: x[1], reverse=True)[:elitism_rate] # we keep only the 10 best chromosomes
    new_population = [ind for ind, _ in best_chromosomes] # We keep this 10 best chromosomes in the new_population so know we are going to fill the blank with new chromosomes
    
    while len(new_population) < population_size:

        parent1 = tournament_selection(population, X, y)
        parent2 = tournament_selection(population, X, y)
        
        child1, child2 = crossover(parent1, parent2)
        child1 = mutation(child1, mutation_rate)
        child2 = mutation(child2, mutation_rate)
        
        if len(new_population) + 2 <= population_size:
            new_population.extend([child1, child2])
        elif len(new_population) + 1 == population_size:
            new_population.append(child1)
    
    # Mise à jour de la population avec la nouvelle génération
    population = new_population[:population_size]
    
    # Calcul et enregistrement du meilleur MSE pour cette génération
    best_mse = 1 / max(scores) - 1  # Conversion du fitness en MSE
    best_mse_per_generation.append(best_mse)
    
    # Affichage du progrès pour chaque génération
    print(f'Generation {generation + 1}, Best MSE: {best_mse}')

# Visualisation de l'évolution du MSE
plt.plot(best_mse_per_generation)
plt.xlabel('Generation')
plt.ylabel('Best MSE')
plt.title('GA Optimization of Neural Network Weights')
plt.show()


Generation 1, Best MSE: 0.25065913796424866
Generation 2, Best MSE: 0.25058043003082275
Generation 3, Best MSE: 0.2503452003002167
Generation 4, Best MSE: 0.2503452003002167
Generation 5, Best MSE: 0.25032517313957214
Generation 6, Best MSE: 0.25032517313957214
Generation 7, Best MSE: 0.25032517313957214
Generation 8, Best MSE: 0.2502487897872925
Generation 9, Best MSE: 0.2502487897872925
Generation 10, Best MSE: 0.2502487897872925
Generation 11, Best MSE: 0.2502487897872925
Generation 12, Best MSE: 0.2502487897872925
Generation 13, Best MSE: 0.2500622272491455
Generation 14, Best MSE: 0.25005921721458435
Generation 15, Best MSE: 0.25001999735832214
Generation 16, Best MSE: 0.25001999735832214
Generation 17, Best MSE: 0.25001999735832214
Generation 18, Best MSE: 0.25001999735832214
Generation 19, Best MSE: 0.2500196099281311
Generation 20, Best MSE: 0.25001487135887146
Generation 21, Best MSE: 0.25001487135887146
Generation 22, Best MSE: 0.2500079274177551
Generation 23, Best MSE: 0.25

KeyboardInterrupt: 

### 6)

Évaluation des paramètres et stratégies de l'algorithme génétique

- Taille de la Population : Une population de taille 100 assure une diversité suffisante, permettant à l'algorithme de mieux explorer l'espace de solutions et d'éviter des minima locaux. Une taille trop petite pourrait ne pas capturer la diversité nécessaire pour une convergence optimale.

- Taux de Mutation : Le taux de mutation de 0.1 introduit une bonne variation dans la population sans trop perturber les solutions prometteuses. Un taux plus élevé pourrait entraîner une trop grande dispersion des solutions, tandis qu'un taux trop faible pourrait mener à un manque de diversité.

- Élitisme : L'élitisme (conservation des 10 meilleurs individus) garantit que les meilleures solutions ne sont pas perdues entre les générations. Cela aide l'algorithme à converger plus rapidement vers des solutions de meilleure qualité.

Comparaison avec les Méthodes Traditionnelles

- Précision : L'optimisation par GA peut atteindre une précision raisonnable, mais elle est généralement moins performante que la descente de gradient en termes de minimisation rapide de l'erreur pour des tâches comme l'optimisation de poids de réseaux de neurones.

- Temps d'Exécution : Le GA est beaucoup plus coûteux en temps de calcul que la descente de gradient. En effet, chaque évaluation d'un individu nécessite une exécution complète du modèle, et les opérations de croisement et de mutation nécessitent un nombre de calculs significatif à chaque génération.

Avantages et Limitations de l'Optimisation par Algorithme Génétique

- Avantages : L'algorithme génétique n'exige pas le calcul de dérivées, ce qui en fait une bonne option pour les réseaux de neurones où les gradients sont difficiles à calculer ou lorsque les solutions sont dans des espaces non convexes. GA est également moins susceptible de se bloquer dans des minima locaux.

- Limitations : Le GA est généralement moins efficace pour les tâches d'optimisation de réseaux de neurones, car il est plus coûteux en temps que les méthodes basées sur les gradients. De plus, la convergence peut être plus lente, nécessitant une fine calibration des paramètres pour obtenir des résultats satisfaisants.

En résumé, bien que l'algorithme génétique puisse être utilisé pour l'optimisation de réseaux de neurones, il est souvent préférable pour des cas où les gradients ne sont pas disponibles ou pour explorer des architectures de réseaux non conventionnelles.
