# Deliverable 2 - Evolving Neural Networks
## Group Q

Authors: 

- Rita Soares, 20220616
- Cláudia Rocha, 20191249
- Felix Gayer, 20220320
- David Halder, 20220632
- Leonhard Allgaier, 20220635


# Project Description

1. Defining a string-based representation for the networks (the genotype).
2. Defining a network class that is able to parse those instructions and build a functional pytorch network structure (the phenotype).
3. Defining a string-based representation for the optimizer.
4. Defining a way to parse those instructions and build and functional pytorch optimizer.
5. Sample all parameter values from a grammar (this allows for a restricted search space and removes the need to deal with invalid combinations).
6. Define 4 simple genetic operators: 
    - Network crossover
    - Add layer mutation
    - Remove layer mutation
    - Change optimizer mutation


The grammar should contain a reasonable number of parameters for each of the layers/optimizers. Provided the values of the parameters are standard and within reason, there is no need to justify their choice.

**The networks need to be able to use the following layers and their respective parameters:**

- Linear : number of features, bias 
- BatchNorm1d : eps, momentum 
- LayerNorm : eps
- Dropout : dropout probability (p) 
- AlphaDropout : dropout probability
- [Sigmoid, ReLU, PReLU, ELU, SELU, GELU, CELU, SiLU]

**The networks need to be able to be trained using the following optimizers and their respective parameters:** 
- Adam : lr, betas
- AdamW : lr, betas, weight decay 
- Adadelta : lr, rho
- NAdam : lr, betas, momentum decay 
- SGD : lr, momentum, nesterov

**Each network needs to have at least 1 layer and a maximum 50 layers. This choice needs to be made at random when generating the genotype.**

**The genetic operators that need to be implemented are the following:**
- **Crossover** : Takes a subset of layers from 2 different networks and swaps them. This subset cannot include the first and final layers.
- **Add layer mutation** : Add any new layer to any part of the genotype, and rebuild the network. It cannot be added before the first or after the last layer.
- **Remove layer mutation**: Remove any layer that is neither the first nor the last from the genotype and rebuild the network.
- **Change optimizer mutation**: Change any parameter from the optimizer genotype and rebuild it. If the type of optimizer is changed, the parameters must also be changed to ensure a valid optimizer is generated.

### Experimental setup
- Generate 5 random networks and train them. 
- Then, apply crossover to two of them and one of each mutation to the remaining 3 networks. Then, retrain the newly generated networks.
- The networks should be trained on the mnist dataset, for 50 epochs, following the same procedure as used in the practical classes (showing the training and validation loss and accuracy per epoch).


In [392]:
#!pip install torch
#!pip install torchvision

#!pip install mlxtend==0.19.0 -q
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader, TensorDataset
from torchvision import datasets , transforms
from dataclasses import dataclass
import torchvision.transforms as T 
from mlxtend.plotting import plot_decision_regions
from tqdm import tqdm
%matplotlib inline

import torch.optim as optim
import random 

## 1. Defining a string-based representation for the networks (the genotype).
### Generating a network genotype
### Ensuring the networks have between 1 and 50 layers

In [463]:
def bulid_block(dict_network, in_features, out_features, drop_layer_rate, norm_layer_rate, out_features_fix=False,):
    total_string = []

    #### Add Linear layer 
    string = ''

    string += ('Linear|')
    
    #If the ouput features are not fixed
    if out_features_fix == False:
        out_features = str(random.choice(dict_network['Linear']['in_features']))
        
    #add the parameters to the string
    string += (in_features)
    string += (',')
    string += (out_features)
    string += (',')
    string += (str(random.choice(dict_network['Linear']['bias'])))
    
    #Append it to the genotype
    total_string.append(string)

    
    if out_features_fix == False:
        
        if random.uniform(0, 1) < norm_layer_rate:
            #### Add Normalization layer 

            string = ''

            norm = random.choice(list(dict_network['Linear']['norm'].keys()))

            string += norm
            string += '|'
            string += str(out_features)
            string += ','

            for i in list(dict_network['Linear']['norm'][norm].keys()):
                string += str(random.choice(dict_network['Linear']['norm'][norm][i]))
                string += ','

            string = string[:-1]

            total_string.append(string)


        if random.uniform(0, 1) < drop_layer_rate:
            #### Add Dropout layer 

            string = ''

            drop = random.choice(list(dict_network['Linear']['drop'].keys()))

            string += drop
            string += '|'

            for i in list(dict_network['Linear']['drop'][drop].keys()):
                string += str(random.choice(dict_network['Linear']['drop'][drop][i]))
                string += ','

            string = string[:-1]

            total_string.append(string)


        #### Add Activation function 

        string = 'Act|'
        string += random.choice(list(dict_network['Linear']['activation']))

        total_string.append(string)

    return total_string, out_features




def dict_to_string_representation_network(total_input_size, total_ouput_size, dict_network, min_amount_layers, max_amount_layers, drop_layer_rate=0.5, norm_layer_rate=0.5):
    
    if max_amount_layers > 50 or min_amount_layers <= 0:
        print("Network size not sufficient, please choose between 1-50 layers")
        raise SystemExit
    
    total_input_size = str(total_input_size)
    total_ouput_size = str(total_ouput_size)

    total_string_representation = []
    
    numb_layers = random.randint(0,45)
    counter = 0


    #### Add layers

    input_size = total_input_size

    while counter < numb_layers:

        block = bulid_block(dict_network, 
                            input_size, 
                            total_ouput_size,  
                            drop_layer_rate, 
                            norm_layer_rate,
                            out_features_fix=False)

        for i in block[0]:
            total_string_representation.append(i)

        input_size = block[1]
        counter += len(block[0])


    #### Add output layer 

    block = bulid_block(dict_network, 
                        input_size, 
                        total_ouput_size,  
                        drop_layer_rate, 
                        norm_layer_rate,
                        out_features_fix=True)
    
    for i in block[0]:
        total_string_representation.append(i)

    return total_string_representation




### Defining the grammar (network)

Parameter values were chosen around the default values

Source:https://pytorch.org/docs/stable/nn.html

In [394]:
dict_network = {
    
      "Linear" : {
        "in_features" : [128,256,512],
        "out_features" : [128,256,512],
        "bias" : [True, False],
        "norm" : {
              "BatchNorm1d" : {
                "eps" : [0.1, 0.01, 0.001],
                "momentum" : [0.1, 0.5, 0.9] },

              "LayerNorm" : {
                "eps" : [1e-5, 1e-4, 1e-3]}
                },

        "drop" : {
          "Dropout" : {
            "dropout_probability" : [0.1, 0.3, 0.5, 0.7]},

          "AlphaDropout" : {
            "dropout_probability" : [0.1, 0.3, 0.5, 0.7]},
        },

        "activation" : ["Sigmoid", "ReLU", "PReLU", "ELU", "SELU", "GELU", "CELU", "SiLU"]

    }}

In [465]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)



######## Vizualise a sample network string represetation

string_model

['Linear|784,128,False',
 'LayerNorm|128,0.001',
 'AlphaDropout|0.7',
 'Act|ELU',
 'Linear|128,256,True',
 'BatchNorm1d|256,0.1,0.9',
 'AlphaDropout|0.7',
 'Act|Sigmoid',
 'Linear|256,512,False',
 'AlphaDropout|0.5',
 'Act|PReLU',
 'Linear|512,10,True']

## 2. Defining a network class that is able to parse those instructions and build a functional pytorch network structure (the phenotype).

### Generating a network phenotype

In [440]:
class Net(nn.Module):
    def __init__(self, layer_list):
        super(Net, self).__init__()

        self.layers = nn.ModuleList()

        for layer_str in layer_list:
            layer = self.parse_layer_string(layer_str)
            self.layers.append(layer)

        # to flatten the images into 1d arays
        self.flat = nn.Flatten()


    def parse_layer_string(self, layer_str):
        
        #Linear Layer
        if layer_str.startswith('Linear'):
            features = layer_str.split('|')[-1].split(',')
            in_features = int(features[0])
            out_features = int(features[1])
            bias = bool(features[2])
            return nn.Linear(in_features, out_features, bias)
        
        #Batch Normalisation
        elif layer_str.startswith('Batch'):
            features = layer_str.split('|')[-1].split(',')
            num_features = int(features[0])
            eps = float(features[1])
            momentum = float(features[2])
            return nn.BatchNorm1d(num_features, eps, momentum)
        
        #Layer Normalisation
        elif layer_str.startswith('LayerNorm'):
            features = layer_str.split('|')[-1].split(',')
            normalized_shape = int(features[0])
            eps = float(features[1])
            return nn.LayerNorm(normalized_shape , eps)
        
        #Dropout Layer
        elif layer_str.startswith('Dropout'):
            p = float(layer_str.split('|')[-1])
            return nn.Dropout(p=p)
        
        #Alpha Drouput Layer
        elif layer_str.startswith('Alpha'):
            p = float(layer_str.split('|')[-1])
            return nn.AlphaDropout(p=p)
        
        #Activation Function
        elif layer_str.startswith('Act'):
            act = layer_str.split('|')[-1]
            layer_class = getattr(nn,act)
            return layer_class()
        else:
            raise ValueError('Invalid layer string: {}'.format(layer_str))

In [456]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


### transfer the string represetation to a pytorch model 

model = Net(string_model)



######## Vizualise a sample model 

print(model)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): PReLU(num_parameters=1)
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): BatchNorm1d(512, eps=0.1, momentum=0.5, affine=True, track_running_stats=True)
    (4): Dropout(p=0.3, inplace=False)
    (5): ReLU()
    (6): Linear(in_features=512, out_features=512, bias=True)
    (7): LayerNorm((512,), eps=0.001, elementwise_affine=True)
    (8): ELU(alpha=1.0)
    (9): Linear(in_features=512, out_features=512, bias=True)
    (10): LayerNorm((512,), eps=0.0001, elementwise_affine=True)
    (11): Dropout(p=0.5, inplace=False)
    (12): SELU()
    (13): Linear(in_features=512, out_features=512, bias=True)
    (14): AlphaDropout(p=0.1, inplace=False)
    (15): ReLU()
    (16): Linear(in_features=512, out_features=10, bias=True)
  )
  (flat): Flatten(start_dim=1, end_dim=-1)
)


## 3. Defining a string-based representation for the optimizer.
### Defining the grammar for the optimizer

To ensure a high degree of clarity and a clear structure of the notebook, the grammar for the optimiser and the network were defined separately.
Parameter values were chosen around the default values
Source: https://pytorch.org/docs/stable/optim.html


In [398]:

dict_optimizer = {
    
    "Adam" : {
        "lr" :           [0.1, 0.01, 0.001, 0.0001],
        "betas" :        [(0.9, 0.999), (0.8, 0.9), (0.7, 0.99)] }, 
    
    "AdamW" : {
        "lr" :           [0.1, 0.01, 0.001, 0.0001],
        "betas" :        [(0.9, 0.999), (0.8, 0.9), (0.7, 0.99)], 
        "weight_decay" : [0.01, 0.001, 0.0001]},
    
    "Adadelta" : {
        "lr" :           [1.0, 0.1, 0.01, 0.001],
        "rho" :          [0.9, 0.95, 0.99]},
    
    "NAdam" : {
        "lr" :           [0.1, 0.01, 0.001, 0.0001],
        "betas" :        [(0.9, 0.999), (0.8, 0.9), (0.7, 0.99)],
        "mom_decay" :    [0.0001,0.001,0.01]},
    
    "SGD" : {
        "lr" :           [0.1, 0.01, 0.001],
        "momentum" :     [0.0, 0.9, 0.99],
        "nesterov" :     [False, True]}

}

### Generating an optimizer genotype

In [399]:
def dict_to_string_representation_optimizer(dict_optimizer):

    string = ''

    opt = random.choice(list(dict_optimizer.keys()))
    string += opt
    string += '|'

    for i in list(dict_optimizer[opt].keys()):
        string += str(random.choice(dict_optimizer[opt][i]))
        string += ';'

    string = string[:-1] 
    string = string.replace( '(', '')
    string = string.replace( ')', '')

    return string

In [400]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


### transfer the string represetation to a pytorch model 

model = Net(string_model)


### Create the optimizer string represetation 

opt_string = dict_to_string_representation_optimizer(dict_optimizer)




######## Vizualise diffrent versions of the optimizer string represetation 

for _ in range(5):
    print(dict_to_string_representation_optimizer(dict_optimizer))

Adadelta|0.001;0.9
Adadelta|0.1;0.95
Adadelta|1.0;0.9
Adam|0.001;0.7, 0.99
AdamW|0.1;0.9, 0.999;0.0001


## 4. Defining a way to parse those instructions and build and functional pytorch optimizer.

#### Generating an optimizer phenotype

In [401]:
def parse_opt_string(layer_str, model):
    
    #Parameters of the model
    params =model.parameters()
    
    #AdamW (needs to be first so that it cannot be confused with Adam)
    if layer_str.startswith('AdamW'):
        hyp = layer_str.split('|')[-1].split(';')
        lr = float(hyp[0])
        betas = tuple(map(float, hyp[1].split(', ')))
        weight_decay = float(hyp[2])
        return optim.AdamW(params,lr, betas, weight_decay)
    #Adam
    elif layer_str.startswith('Adam'):
        hyp = layer_str.split('|')[-1].split(';')
        lr = float(hyp[0])
        betas=tuple(map(float, hyp[1].split(', ')))
        return optim.Adam(params, lr, betas)
    #Adadelta
    elif layer_str.startswith('Adadelta'):
        hyp = layer_str.split('|')[-1].split(';')
        lr = float(hyp[0])
        rho = float(hyp[1])
        return optim.Adadelta(params,lr, rho)
    #NAdam
    elif layer_str.startswith('NAdam'):
        hyp = layer_str.split('|')[-1].split(';')
        lr = float(hyp[0])
        betas = tuple(map(float, hyp[1].split(', ')))
        mom_decay = float(hyp[2])
        return optim.NAdam(params, lr, betas, mom_decay)
    #SGD
    elif layer_str.startswith('SGD'):
        hyp = layer_str.split('|')[-1].split(';')
        lr = float(hyp[0])
        momentum = float(hyp[1])
        nesterov = bool(hyp[2])
        return optim.SGD(params, lr, momentum, nesterov)
    
    else:
        raise ValueError('Invalid layer string: {}'.format(layer_str))

In [402]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


### transfer the network string represetation to a pytorch model 

model = Net(string_model)


### Create the optimizer string represetation 

opt_string = dict_to_string_representation_optimizer(dict_optimizer)


### transfer the string represetation to a pytorch optimizer 

optimizer = parse_opt_string(opt_string, model)




######## Vizualise a pytorch optimizer

print(parse_opt_string(opt_string, model))

SGD (
Parameter Group 0
    dampening: True
    lr: 0.01
    maximize: False
    momentum: 0.99
    nesterov: False
    weight_decay: 0
)


## 5. Define 4 simple genetic operators
### Implementing the genetic operators

### 5.1. Network crossover
Takes a subset of layers from 2 different networks and swaps them. This subset cannot include the first and final layers.

In [403]:
def crossover_adaption(offspring_1, point_1, point_2):
        
        #Get the previous layer type (which cointains input or output parameters)
        pre_type = None
        position = 0
        while pre_type != ("Linear" or "BatchNorm1d" or "LayerNorm"):
            position += 1
            pre_type = offspring_1[point_1-position].split('|')[0]

        #Calculate the position of the pre_layer 
        par_pos = point_1-position

        #Check if the layer before is linear -> retrieve the output
        if  pre_type == "Linear": 
            pre_output = offspring_1[par_pos].split('|')[1].split(",")[1]

        #Check if the layer before is BatchNorm1D or LayerNorm -> retrieve the output
        elif pre_type == "BatchNorm1d" or pre_type == "LayerNorm":
            pre_output = offspring_1[par_pos].split('|')[1].split(",")[0]
            
    
        #Get the next layer type (which cointains input or output parameters)
        post_type = None
        position_post = 0
        while post_type !=("Linear" or "BatchNorm1d" or "LayerNorm"):
            position_post += 1
            post_type = offspring_1[point_2+position_post].split('|')[0]  
        
        
        #Calculate the position of the pre_layer 
        post_pos = point_2-position_post
        #Get the input of that layer
        post_input = offspring_1[post_pos].split('|')[1].split(",")[0]
        
        #Iterate through the crossover block to find out the position of the last linear layer
        last_pos = point_2 + 1
        type_backit = None
        while type_backit != "Linear" and last_pos != (point_1-1):
            last_pos = last_pos-1
            type_backit=offspring_1[last_pos].split('|')[0] 


        #Iterate through the crossover block and change the input/output values 
        for bit in range(point_1, point_2):
            #print(offspring_1[bit])
            #Get the type of the current layer
            bit_type = offspring_1[bit].split('|')[0]
            
            if bit_type == "Linear" and bit == last_pos:
                params = offspring_1[bit].split('|')[1].split(",") #get the parameters of the layer
                params[0] = pre_output #change the input
                params[1] = post_input #change the output of the layer to the post output
                params_str = ','.join(params) #make them a string again
                layer_corrected = str(bit_type)+"|"+params_str #put the updated layer together
                #print("linear", layer_corrected)
                offspring_1[bit] = layer_corrected #reintroduce it
                
            #if the layer is a linear layer -> change in and output
            elif bit_type == "Linear": 
                params = offspring_1[bit].split('|')[1].split(",") #get the parameters of the layer
                params[0] = pre_output #change the input
                pre_output = params[1] #save the output
                params_str = ','.join(params) #make them a string again
                layer_corrected = str(bit_type)+"|"+params_str #put the updated layer together
                #print("linear", layer_corrected)
                offspring_1[bit] = layer_corrected #reintroduce it

            elif bit_type == "BatchNorm1d" or bit_type == "LayerNorm":
                params = offspring_1[bit].split('|')[1].split(",") #get the parameters of the layer
                params[0] = pre_output #change the input
                params_str = ','.join(params) #make them a string again
                layer_corrected = str(bit_type)+"|"+params_str #put the updated layer together
                #print("norm:",layer_corrected)
                offspring_1[bit] = layer_corrected #reintroduce it
            
            #else:
             #   print("others", offspring_1[bit])
                
        return offspring_1

In [404]:
def crossover(layer_string_network_1, layer_string_network_2):
    
    if len(layer_string_network_1) == 1 or  len(layer_string_network_2) == 1:
        print("Network too small, please generate a new one")
        raise SystemExit

    # To ensure that the two splitting points are not the same number we initialise both points with "1"
    # and loop through the random selection until a crossover block is chosen 
    point_1 = None
    point_2 = None
    
    #Get a copy of the networks
    offspring_1 = layer_string_network_1.copy()
    offspring_2 = layer_string_network_2.copy()   

    #The smaller network provides the length of the block
    if len(layer_string_network_1) <= len(layer_string_network_2):
        
        #Try until the points are not the same
        while point_1 == point_2:
            point_1 = np.random.randint(1,len(layer_string_network_1)-1)
            point_2 = np.random.randint(point_1+1,len(layer_string_network_1))

        #Conduct the crossover
        offspring_1[point_1:point_2] = layer_string_network_2[point_1:point_2]
        offspring_2[point_1:point_2] = layer_string_network_1[point_1:point_2]
        
    #The smaller network provides the length of the block
    elif len(layer_string_network_1) > len(layer_string_network_2):
        
        #Try until the points are not the same
        while point_1 == point_2:
            point_1 = np.random.randint(1,len(layer_string_network_2)-1)
            point_2 = np.random.randint(point_1+1,len(layer_string_network_2))
        
        #Conduct the crossover
        offspring_1[point_1:point_2] = layer_string_network_2[point_1:point_2]
        offspring_2[point_1:point_2] = layer_string_network_1[point_1:point_2]
        

        # Now we need to ensure that the network is actually functional 
        # -> input and output parameters need to be adapted             
    
    result_1 = crossover_adaption(offspring_1, point_1, point_2)
    result_2 = crossover_adaption(offspring_2, point_1, point_2)
    
    return result_1, result_2

        

In [405]:
### Create two network string represetation to crossover

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model_1 = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)

string_model_2 = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)

#Cross Them
crossed_1, crossed_2=crossover(string_model_1, string_model_2)

#Visualise it: 

crossover(string_model_1, string_model_2)

(['Linear|784,512,False',
  'Act|PReLU',
  'Linear|512,128,False',
  'BatchNorm1d|128,0.1,0.5',
  'AlphaDropout|0.3',
  'Linear|128,128,False',
  'BatchNorm1d|128,0.01,0.9',
  'AlphaDropout|0.1',
  'Act|ELU',
  'Linear|256,512,False',
  'Act|ReLU',
  'Linear|512,10,True'],
 ['Linear|784,256,False',
  'Act|Sigmoid',
  'Linear|256,128,False',
  'BatchNorm1d|128,0.01,0.9',
  'Act|CELU',
  'Act|SiLU',
  'Linear|128,128,False',
  'Act|PReLU',
  'Linear|256,256,True',
  'BatchNorm1d|256,0.01,0.5',
  'Act|PReLU',
  'Linear|256,128,True',
  'Dropout|0.5',
  'Act|ELU',
  'Linear|128,256,False',
  'Dropout|0.1',
  'Act|SELU',
  'Linear|256,128,False',
  'Act|SELU',
  'Linear|128,128,False',
  'LayerNorm|128,0.001',
  'Act|Sigmoid',
  'Linear|128,10,True'])

In [406]:
#test the crossover -> 
print(string_model_1[-1] ==crossed_1[-1],string_model_1[0] ==crossed_1[0] )
print(string_model_2[-1] ==crossed_2[-1], string_model_2[0] ==crossed_2[0])

True True
True True


### 5.2. Add layer mutation
Add any new layer to any part of the genotype, and rebuild the network. It cannot be added before the first or after the last layer.

In [407]:
def add_layer_mutation(layer_string_network, dict_network):     
    
    #if the network is too small: throw an error
    if len(layer_string_network) == 1:
        print("Network too small, please generate a new one")
        raise SystemExit
        
    #Choose position of the layer which will be added
    mut_position = np.random.randint(1,len(layer_string_network)-1) #-1 to not choose the last layer
    offspring = layer_string_network.copy()
    
    
    #generate a model to retrieve a random layer 
    #length of the Mut Model
    mut_length = 1
    while mut_length < 3:
        mutation_model = dict_to_string_representation_network(
                                                    total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)
        mut_length=len(mutation_model)
        print("Mutation Length", mut_length)
    
    rand_mut_layer = mutation_model[np.random.randint(1,len(mutation_model)-1)]
    mut_l_type = rand_mut_layer.split('|')[0]
    
    #put the new layer in
    offspring.insert(mut_position, rand_mut_layer)
    
    #Check if the mutation is a dropout or activation layer -> no action
    if mut_l_type == "Act" or mut_l_type == "Dropout" or mut_l_type == "AlphaDropout":
        print("No Action needed")
    
    #Check if the mutation layer is linear -> input and output must be adapted
    elif mut_l_type == "Linear":
        
        pre_type = None
        position = 0
        #Check if the layer before is linear -> make the output the input of the new layer
        while pre_type != ("Linear" or "BatchNorm1d" or "LayerNorm"):
            position += 1
            pre_type = offspring[mut_position-position].split('|')[0]
            
        pre_pos = mut_position - position
        if  pre_type == "Linear": 
            pre_output = offspring[pre_pos].split('|')[1].split(",")[1]
            params = rand_mut_layer.split('|')[1].split(",")
            params[0] = pre_output
            params_str = ','.join(params)
            rand_mut_layer = str(mut_l_type)+"|"+params_str #change rand_mut_layer so it can be used later
        
        elif pre_type == "BatchNorm1d" or pre_type == "LayerNorm":
            pre_output = offspring[pre_pos].split('|')[1].split(",")[0]
            params = rand_mut_layer.split('|')[1].split(",")
            params[0] = pre_output
            params_str = ','.join(params)
            rand_mut_layer = str(mut_l_type)+"|"+params_str #change rand_mut_layer so it can be used later
        
        #Check if the layer after is linear -> adapt the output of the new layer 
        
        post_type = None
        position_post = 0
        while post_type !=("Linear" or "BatchNorm1d" or "LayerNorm"):
            position_post += 1
            post_type = offspring[mut_position+position_post].split('|')[0] 
        post_pos = mut_position+position_post
        if post_type == "Linear" or post_type == "BatchNorm1d" or post_type == "LayerNorm":
                post_input = offspring[post_pos].split('|')[1].split(",")[0]
                params = rand_mut_layer.split('|')[1].split(",")
                params[1] = post_input
                params_str = ','.join(params)
                rand_mut_layer = str(mut_l_type)+"|"+params_str #change rand_mut_layer so it can be used later
                
                
    #If the mutation layer is norm -> check layer before 
    elif mut_l_type == "BatchNorm1d" or mut_l_type == "LayerNorm":
        pre_type = None
        position = 0
        #Check if the layer before is linear -> make the output the input of the new layer
        while pre_type != ("Linear" or "BatchNorm1d" or "LayerNorm"):
            position += 1
            pre_type = offspring[mut_position-position].split('|')[0]
            
        pre_pos = mut_position - position
        if  pre_type == "Linear": 
            pre_output = offspring[pre_pos].split('|')[1].split(",")[1]
            params = rand_mut_layer.split('|')[1].split(",")
            params[0] = pre_output
            params_str = ','.join(params)
            rand_mut_layer = str(mut_l_type)+"|"+params_str #change rand_mut_layer so it can be used later
            
        elif pre_type == "BatchNorm1d" or pre_type == "LayerNorm":
            pre_output = offspring[pre_pos].split('|')[1].split(",")[0]
            params = rand_mut_layer.split('|')[1].split(",")
            params[0] = pre_output
            params_str = ','.join(params)
            rand_mut_layer = str(mut_l_type)+"|"+params_str #change rand_mut_layer so it can be used later
            
                
    offspring[mut_position]= rand_mut_layer
    print(mut_position)
    print("Integrity Check")
    print(offspring[mut_position-1])
    print(offspring[mut_position])
    print(offspring[mut_position+1])
    return offspring

In [408]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


######## Vizualise the network with extra layer

test_model=add_layer_mutation(string_model, dict_network)
test_model

Mutation Length 11
No Action needed
24
Integrity Check
Act|CELU
Act|GELU
Linear|256,512,True


['Linear|784,512,False',
 'LayerNorm|512,1e-05',
 'Act|ReLU',
 'Linear|512,256,False',
 'Dropout|0.1',
 'Act|ELU',
 'Linear|256,512,True',
 'LayerNorm|512,0.0001',
 'Act|ReLU',
 'Linear|512,128,False',
 'BatchNorm1d|128,0.01,0.9',
 'AlphaDropout|0.7',
 'Act|SiLU',
 'Linear|128,512,False',
 'AlphaDropout|0.7',
 'Act|CELU',
 'Linear|512,512,True',
 'Dropout|0.3',
 'Act|GELU',
 'Linear|512,128,True',
 'BatchNorm1d|128,0.001,0.5',
 'Act|ELU',
 'Linear|128,256,True',
 'Act|CELU',
 'Act|GELU',
 'Linear|256,512,True',
 'LayerNorm|512,0.001',
 'Act|ReLU',
 'Linear|512,128,True',
 'AlphaDropout|0.7',
 'Act|SiLU',
 'Linear|128,256,False',
 'LayerNorm|256,0.0001',
 'Act|CELU',
 'Linear|256,256,True',
 'LayerNorm|256,1e-05',
 'Act|SiLU',
 'Linear|256,128,False',
 'BatchNorm1d|128,0.1,0.9',
 'AlphaDropout|0.7',
 'Act|SELU',
 'Linear|128,128,True',
 'Dropout|0.1',
 'Act|CELU',
 'Linear|128,512,False',
 'Act|ReLU',
 'Linear|512,10,False']

### 5. 3. Remove layer mutation
Remove any layer that is neither the first nor the last from the genotype and rebuild the network.

In [409]:
def mutation_remove_layer(layer_string_network):
    
    if len(layer_string_network) == 1:
        print("Network too small, please generate a new one")
        raise SystemExit
        
    #Choose a random layer from each network
    rand_layer_type = "Linear"
    while rand_layer_type == "Linear":
        rand_layer_1 = np.random.randint(1,len(layer_string_network)-1)
        rand_layer_type = layer_string_network[rand_layer_1].split('|')[0]
    print('removed Layer:', rand_layer_1, layer_string_network[rand_layer_1])
    #remove the layer and return the new genotype
    offspring =layer_string_network.copy()
    offspring.pop(rand_layer_1)
    return offspring


In [410]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


######## Vizualise the orignal network string represetation 

string_model
mutation_remove_layer(string_model)

removed Layer: 7 BatchNorm1d|512,0.1,0.9


['Linear|784,256,False',
 'LayerNorm|256,0.0001',
 'Act|ELU',
 'Linear|256,512,False',
 'Dropout|0.5',
 'Act|Sigmoid',
 'Linear|512,512,False',
 'Act|Sigmoid',
 'Linear|512,512,False',
 'Dropout|0.7',
 'Act|GELU',
 'Linear|512,128,False',
 'Dropout|0.5',
 'Act|Sigmoid',
 'Linear|128,128,False',
 'BatchNorm1d|128,0.1,0.9',
 'AlphaDropout|0.7',
 'Act|ReLU',
 'Linear|128,512,True',
 'LayerNorm|512,1e-05',
 'Dropout|0.3',
 'Act|GELU',
 'Linear|512,10,True']

In [411]:
######## Vizualise the mutated network string represetation 

mutation_remove_layer(string_model)

removed Layer: 20 LayerNorm|512,1e-05


['Linear|784,256,False',
 'LayerNorm|256,0.0001',
 'Act|ELU',
 'Linear|256,512,False',
 'Dropout|0.5',
 'Act|Sigmoid',
 'Linear|512,512,False',
 'BatchNorm1d|512,0.1,0.9',
 'Act|Sigmoid',
 'Linear|512,512,False',
 'Dropout|0.7',
 'Act|GELU',
 'Linear|512,128,False',
 'Dropout|0.5',
 'Act|Sigmoid',
 'Linear|128,128,False',
 'BatchNorm1d|128,0.1,0.9',
 'AlphaDropout|0.7',
 'Act|ReLU',
 'Linear|128,512,True',
 'Dropout|0.3',
 'Act|GELU',
 'Linear|512,10,True']

### 5.4. Change optimizer mutation
Change any parameter from the optimizer genotype and rebuild it. If the type of optimizer is changed, the parameters must also be changed to ensure a valid optimizer is generated.

In [431]:
def mutation_change_opt(opt_string, dict_optimizer=dict_optimizer):
    parameters = []  # List to store the parameters
    
    # Unbuild the string
    string = opt_string.split("|")  # Split the string by '|'
    parameters.append(string[0])  # Add the first part to parameters list
    string = string[1].split(";")  # Split the remaining part by ';'
    
    # Iterate through the parts and add them to parameters list
    for i in string:
        if ',' in i:
            parameters.append('('+(i)+')')
        else:
            parameters.append(i)
    
    # Select the parameter to change
    select_index = np.random.randint(0, len(parameters))  # Randomly select an index
    x = str(parameters[select_index])  # Get the selected parameter
    
    # Print original information
    print('original optimizer string:', opt_string)
    print('parameters in string', parameters)
    print('parameter to change:', x)
    
    if select_index == 0:
        param = list(dict_optimizer.keys())
        print('out of following parameters:', param)
        
        # Choose a new parameter until it's different from the current one
        while parameters[select_index] == x:
            new_string = dict_to_string_representation_optimizer(dict_optimizer)
            x = new_string.split("|")[0]
        
        print('new parameter:', x)
        return new_string

    param_keys = list(dict_optimizer[parameters[0]].keys())

    if select_index == 1:
        param = dict_optimizer[parameters[0]][param_keys[0]]

    if select_index == 2:
        param = dict_optimizer[parameters[0]][param_keys[1]]   

    if select_index == 3:
        param = dict_optimizer[parameters[0]][param_keys[2]]
        
    print('out of following parameters:', param)

    # Choose a new parameter until it's different from the current one
    while parameters[select_index] == x:
        x = str(random.choice(param))
        
    print('new parameter:', x)
    
    # Rebuild the string
    parameters[select_index] = x
    
    string = ''
    opt = parameters[0]
    string += opt
    string += '|'
    
    # Iterate through parameters and add them to the string
    for i in parameters[1:]:
        string += i
        string += ';'

    string = string[:-1] 
    string = string.replace('(', '')
    string = string.replace(')', '')
    
    return string

In [433]:
######## Vizualise the change optimizer mutation

for _ in range(3):

    opt_string = dict_to_string_representation_optimizer(dict_optimizer)

    test = mutation_change_opt(opt_string)
    print(test)
    print('-----')

original optimizer string: SGD|0.1;0.0;False
parameters in string ['SGD', '0.1', '0.0', 'False']
parameter to change: 0.1
out of following parameters: [0.1, 0.01, 0.001]
new parameter: 0.01
SGD|0.01;0.0;False
-----
original optimizer string: SGD|0.01;0.9;False
parameters in string ['SGD', '0.01', '0.9', 'False']
parameter to change: 0.01
out of following parameters: [0.1, 0.01, 0.001]
new parameter: 0.1
SGD|0.1;0.9;False
-----
original optimizer string: SGD|0.001;0.9;True
parameters in string ['SGD', '0.001', '0.9', 'True']
parameter to change: SGD
out of following parameters: ['Adam', 'AdamW', 'Adadelta', 'NAdam', 'SGD']
new parameter: Adadelta
Adadelta|0.01;0.99
-----


## 6. Experimental setup

- Generate 5 random networks and train them. 
- Then, apply crossover to two of them and one of each mutation to the remaining 3 networks. Then, retrain the newly generated networks.
- The networks should be trained on the mnist dataset, for 50 epochs, following the same procedure as used in the practical classes (showing the training and validation loss and accuracy per epoch).

In [414]:
batch_size = 32

train_dataset = datasets.MNIST('./data', 
                               train=True, 
                               download=True, 
                               transform=transforms.ToTensor())

validation_dataset = datasets.MNIST('./data', 
                                    train=False, 
                                    transform=transforms.ToTensor())

train_loader = DataLoader(dataset=train_dataset, 
                            batch_size=batch_size, 
                            shuffle=True)

validation_loader = DataLoader(dataset=validation_dataset, 
                                batch_size=batch_size, 
                                shuffle=False)
#Epochs 50 -> for all experiments
num_epochs = 50

In [415]:
def train_with_validation(model, train_loader, validation_loader, loss_fn, optimizer, num_epochs):
    for epoch in range(num_epochs):
        accuracy_hist_train = 0
        accuracy_hist_val = 0  # Initialize validation accuracy
        loss_hist_train = 0
        loss_hist_val = 0  # Initialize validation loss

        # Training phase
        model.train()  # Set the model to train mode
        for x_batch, y_batch in train_loader:
            pred = model(x_batch)
            loss = loss_fn(pred, y_batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            is_correct = (torch.argmax(pred, dim=1) == y_batch).float()
            accuracy_hist_train += is_correct.sum()
            loss_hist_train += loss.item()
        accuracy_hist_train /= len(train_loader.dataset)
        loss_hist_train /= len(train_loader)

        # Validation phase
        model.eval()  # Set the model to evaluation mode
        with torch.no_grad():  # Disable gradient calculation
            for x_batch, y_batch in validation_loader:
                pred = model(x_batch)
                is_correct = (torch.argmax(pred, dim=1) == y_batch).float()
                accuracy_hist_val += is_correct.sum()
                loss = loss_fn(pred, y_batch)
                loss_hist_val += loss.item()
            accuracy_hist_val /= len(validation_loader.dataset)
            loss_hist_val /= len(validation_loader)

        print(f'Epoch {epoch}  Training Accuracy: {accuracy_hist_train:.4f}  Training Loss: {loss_hist_train:.4f}  Validation Accuracy: {accuracy_hist_val:.4f}  Validation Loss: {loss_hist_val:.4f}')

### 6.1 Experiment 1 - Network 1 & Network 2 // Crossover 

#### 6.1.1 Train Network 1

In [416]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model_1 = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


### transfer the network string represetation to a pytorch model 

model_1 = Net(string_model_1)
print(model_1)


### Create the optimizer string represetation 

opt_string_0 = dict_to_string_representation_optimizer(dict_optimizer)


### transfer the string represetation to a pytorch optimizer 

optimizer_1 = parse_opt_string(opt_string_0, model_1)
print(optimizer_1)


### Train the model 

loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_1, 
                      train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_1, 
                      num_epochs)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): BatchNorm1d(512, eps=0.1, momentum=0.5, affine=True, track_running_stats=True)
    (2): AlphaDropout(p=0.1, inplace=False)
    (3): Sigmoid()
    (4): Linear(in_features=512, out_features=512, bias=True)
    (5): LayerNorm((512,), eps=0.001, elementwise_affine=True)
    (6): Dropout(p=0.7, inplace=False)
    (7): GELU()
    (8): Linear(in_features=512, out_features=128, bias=True)
    (9): ELU(alpha=1.0)
    (10): Linear(in_features=128, out_features=128, bias=True)
    (11): LayerNorm((128,), eps=0.0001, elementwise_affine=True)
    (12): Dropout(p=0.1, inplace=False)
    (13): ELU(alpha=1.0)
    (14): Linear(in_features=128, out_features=512, bias=True)
    (15): SELU()
    (16): Linear(in_features=512, out_features=10, bias=True)
  )
  (flat): Flatten(start_dim=1, end_dim=-1)
)
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.8, 0.9)
    eps: 1e-08
    lr: 0.0001
    maximize

#### 6.1.2 Train Network 2

In [417]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model_2 = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


### transfer the network string represetation to a pytorch model 

model_2 = Net(string_model_2)
print(model_2)


### transfer the string represetation to a pytorch optimizer 

optimizer_2 = parse_opt_string(opt_string_0, model_2)
print(optimizer_2)


### Train the model 

loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_2, 
                      train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_2, 
                      num_epochs)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=256, bias=True)
    (1): SELU()
    (2): Linear(in_features=256, out_features=128, bias=True)
    (3): LayerNorm((128,), eps=0.001, elementwise_affine=True)
    (4): SiLU()
    (5): Linear(in_features=128, out_features=128, bias=True)
    (6): Sigmoid()
    (7): Linear(in_features=128, out_features=512, bias=True)
    (8): GELU()
    (9): Linear(in_features=512, out_features=10, bias=True)
  )
  (flat): Flatten(start_dim=1, end_dim=-1)
)
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.8, 0.9)
    eps: 1e-08
    lr: 0.0001
    maximize: False
    weight_decay: 0
)
Epoch 0  Training Accuracy: 0.8523  Training Loss: 0.5812  Validation Accuracy: 0.9302  Validation Loss: 0.2346
Epoch 1  Training Accuracy: 0.9427  Training Loss: 0.1904  Validation Accuracy: 0.9523  Validation Loss: 0.1567
Epoch 2  Training Accuracy: 0.9589  Training Loss: 0.1358  Validation Accuracy: 0.9632  Validation Loss: 0.1241
Epoch 3  

#### 6.1.3 Perform Crossover

In [419]:
crossover_string_1, crossover_string_2 = crossover(string_model_1, string_model_2)

#### 6.1.4 Train Network 1.2

In [420]:
### transfer the network string represetation to a pytorch model 

model_1_1 = Net(crossover_string_1)
print(model_1_1)


### transfer the string represetation to a pytorch optimizer 

optimizer_1_1 = parse_opt_string(opt_string_0, model_1_1)
print(optimizer_1_1)


### Train the model 

loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_1_1, 
                      train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_1_1, 
                      num_epochs)


Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): BatchNorm1d(512, eps=0.1, momentum=0.5, affine=True, track_running_stats=True)
    (2): AlphaDropout(p=0.1, inplace=False)
    (3): Sigmoid()
    (4): Linear(in_features=512, out_features=512, bias=True)
    (5): Linear(in_features=512, out_features=128, bias=True)
    (6): Sigmoid()
    (7): Linear(in_features=128, out_features=512, bias=True)
    (8): Linear(in_features=512, out_features=128, bias=True)
    (9): ELU(alpha=1.0)
    (10): Linear(in_features=128, out_features=128, bias=True)
    (11): LayerNorm((128,), eps=0.0001, elementwise_affine=True)
    (12): Dropout(p=0.1, inplace=False)
    (13): ELU(alpha=1.0)
    (14): Linear(in_features=128, out_features=512, bias=True)
    (15): SELU()
    (16): Linear(in_features=512, out_features=10, bias=True)
  )
  (flat): Flatten(start_dim=1, end_dim=-1)
)
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.8, 0.9)
    eps: 1e-08
  

#### 6.1.5 Train Network 2.2

In [421]:
### transfer the network string represetation to a pytorch model 

model_2_2 = Net(crossover_string_1)
print(model_2_2)


### transfer the string represetation to a pytorch optimizer 

optimizer_2_2 = parse_opt_string(opt_string_0, model_2_2)
print(optimizer_2_2)


### Train the model 
loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_2_2, 
                      train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_2_2, 
                      num_epochs)


Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): BatchNorm1d(512, eps=0.1, momentum=0.5, affine=True, track_running_stats=True)
    (2): AlphaDropout(p=0.1, inplace=False)
    (3): Sigmoid()
    (4): Linear(in_features=512, out_features=512, bias=True)
    (5): Linear(in_features=512, out_features=128, bias=True)
    (6): Sigmoid()
    (7): Linear(in_features=128, out_features=512, bias=True)
    (8): Linear(in_features=512, out_features=128, bias=True)
    (9): ELU(alpha=1.0)
    (10): Linear(in_features=128, out_features=128, bias=True)
    (11): LayerNorm((128,), eps=0.0001, elementwise_affine=True)
    (12): Dropout(p=0.1, inplace=False)
    (13): ELU(alpha=1.0)
    (14): Linear(in_features=128, out_features=512, bias=True)
    (15): SELU()
    (16): Linear(in_features=512, out_features=10, bias=True)
  )
  (flat): Flatten(start_dim=1, end_dim=-1)
)
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.8, 0.9)
    eps: 1e-08
  

### 6.2. Experiment 2 - Network 3 //  Add layer mutation

#### 6.2.1 Train Network 3

In [422]:
### Create the network string represetation 


total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model_3 = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


### transfer the network string represetation to a pytorch model 

model_3 = Net(string_model_3)
#print(model_3)


### Create the optimizer string represetation 

opt_string_3 = dict_to_string_representation_optimizer(dict_optimizer)


### transfer the string represetation to a pytorch optimizer 

optimizer_3 = parse_opt_string(opt_string_3, model_3)
#print(optimizer_3)


### Train the model 

loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_3, 
                     train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_3, 
                      num_epochs)


Epoch 0  Training Accuracy: 0.1006  Training Loss: 2.3522  Validation Accuracy: 0.0958  Validation Loss: 2.3146
Epoch 1  Training Accuracy: 0.1039  Training Loss: 2.3429  Validation Accuracy: 0.0974  Validation Loss: 2.3125
Epoch 2  Training Accuracy: 0.1028  Training Loss: 2.3385  Validation Accuracy: 0.0932  Validation Loss: 2.3091
Epoch 3  Training Accuracy: 0.0988  Training Loss: 2.3385  Validation Accuracy: 0.0980  Validation Loss: 2.3091
Epoch 4  Training Accuracy: 0.1020  Training Loss: 2.3335  Validation Accuracy: 0.1007  Validation Loss: 2.3068
Epoch 5  Training Accuracy: 0.1017  Training Loss: 2.3322  Validation Accuracy: 0.0980  Validation Loss: 2.3068
Epoch 6  Training Accuracy: 0.1025  Training Loss: 2.3300  Validation Accuracy: 0.0958  Validation Loss: 2.3062
Epoch 7  Training Accuracy: 0.1004  Training Loss: 2.3272  Validation Accuracy: 0.1067  Validation Loss: 2.3026
Epoch 8  Training Accuracy: 0.1012  Training Loss: 2.3263  Validation Accuracy: 0.1037  Validation Loss:

#### 6.2.2 Perform - Add layer mutation

In [423]:
mutated_layer_string_network = add_layer_mutation(string_model_3, dict_network)

Mutation Length 19
No Action needed
11
Integrity Check
Act|CELU
AlphaDropout|0.1
Linear|128,512,True


In [424]:
### transfer the network string represetation to a pytorch model 

model_3_2 = Net(mutated_layer_string_network)
print(model_3_2)


### transfer the string represetation to a pytorch optimizer 

optimizer_3_2 = parse_opt_string(opt_string_3, model_3_2)
print(optimizer_3_2)


### Train the model 
loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_3_2, 
                      train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_3_2, 
                      num_epochs)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): PReLU(num_parameters=1)
    (2): Linear(in_features=512, out_features=256, bias=True)
    (3): LayerNorm((256,), eps=0.001, elementwise_affine=True)
    (4): Sigmoid()
    (5): Linear(in_features=256, out_features=128, bias=True)
    (6): Dropout(p=0.3, inplace=False)
    (7): Sigmoid()
    (8): Linear(in_features=128, out_features=128, bias=True)
    (9): LayerNorm((128,), eps=0.0001, elementwise_affine=True)
    (10): CELU(alpha=1.0)
    (11): AlphaDropout(p=0.1, inplace=False)
    (12): Linear(in_features=128, out_features=512, bias=True)
    (13): AlphaDropout(p=0.7, inplace=False)
    (14): PReLU(num_parameters=1)
    (15): Linear(in_features=512, out_features=512, bias=True)
    (16): Dropout(p=0.5, inplace=False)
    (17): Sigmoid()
    (18): Linear(in_features=512, out_features=256, bias=True)
    (19): BatchNorm1d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=Tr

### 6.3 Experiment 3 - Network 4 //  Remove layer mutation


#### 6.3.1 Train Network 4

In [425]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model_4 = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


### transfer the network string represetation to a pytorch model 

model_4 = Net(string_model_4)
print(model_4)


### Create the optimizer string represetation 

opt_string_4 = dict_to_string_representation_optimizer(dict_optimizer)


### transfer the string represetation to a pytorch optimizer 

optimizer_4 = parse_opt_string(opt_string_4, model_4)
print(optimizer_4)


### Train the model 

loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_4, 
                      train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_4, 
                      num_epochs)




Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=128, bias=True)
    (1): BatchNorm1d(128, eps=0.01, momentum=0.1, affine=True, track_running_stats=True)
    (2): AlphaDropout(p=0.5, inplace=False)
    (3): CELU(alpha=1.0)
    (4): Linear(in_features=128, out_features=512, bias=True)
    (5): ELU(alpha=1.0)
    (6): Linear(in_features=512, out_features=128, bias=True)
    (7): BatchNorm1d(128, eps=0.1, momentum=0.5, affine=True, track_running_stats=True)
    (8): PReLU(num_parameters=1)
    (9): Linear(in_features=128, out_features=512, bias=True)
    (10): AlphaDropout(p=0.5, inplace=False)
    (11): ReLU()
    (12): Linear(in_features=512, out_features=512, bias=True)
    (13): BatchNorm1d(512, eps=0.1, momentum=0.1, affine=True, track_running_stats=True)
    (14): SiLU()
    (15): Linear(in_features=512, out_features=256, bias=True)
    (16): AlphaDropout(p=0.1, inplace=False)
    (17): CELU(alpha=1.0)
    (18): Linear(in_features=256, out_features=128, bia

#### 6.3.1. Perform - Remove layer mutation

In [426]:
remove_layer_mutation_string = mutation_remove_layer(string_model_4)

removed Layer: 2 AlphaDropout|0.5


#### 6.3.1 Train Network 4.2

In [427]:
### transfer the network string represetation to a pytorch model 

model_4_2 = Net(remove_layer_mutation_string)
print(model_4_2)


### transfer the string represetation to a pytorch optimizer 

optimizer_4_2 = parse_opt_string(opt_string_4, model_4_2)
print(optimizer_4_2)


### Train the model 

loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_4_2, 
                      train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_4_2, 
                      num_epochs)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=128, bias=True)
    (1): BatchNorm1d(128, eps=0.01, momentum=0.1, affine=True, track_running_stats=True)
    (2): CELU(alpha=1.0)
    (3): Linear(in_features=128, out_features=512, bias=True)
    (4): ELU(alpha=1.0)
    (5): Linear(in_features=512, out_features=128, bias=True)
    (6): BatchNorm1d(128, eps=0.1, momentum=0.5, affine=True, track_running_stats=True)
    (7): PReLU(num_parameters=1)
    (8): Linear(in_features=128, out_features=512, bias=True)
    (9): AlphaDropout(p=0.5, inplace=False)
    (10): ReLU()
    (11): Linear(in_features=512, out_features=512, bias=True)
    (12): BatchNorm1d(512, eps=0.1, momentum=0.1, affine=True, track_running_stats=True)
    (13): SiLU()
    (14): Linear(in_features=512, out_features=256, bias=True)
    (15): AlphaDropout(p=0.1, inplace=False)
    (16): CELU(alpha=1.0)
    (17): Linear(in_features=256, out_features=128, bias=True)
    (18): BatchNorm1d(128, eps=0.01, 

### 6.4.1 Experiment 4 - Network 5 // Change optimizer mutation

#### 6.4.2 Train Network 5

In [428]:
### Create the network string represetation 

total_input_size  = 28*28
total_ouput_size  = 10
dict_network      = dict_network
min_amount_layers = 1
max_amount_layers = 50
drop_layer_rate   = 0.5
norm_layer_rate   = 0.5

string_model_5 = dict_to_string_representation_network(total_input_size, 
                                                    total_ouput_size, 
                                                    dict_network, 
                                                    min_amount_layers, 
                                                    max_amount_layers,
                                                    drop_layer_rate, 
                                                    norm_layer_rate)


### transfer the network string represetation to a pytorch model 

model_5 = Net(string_model_5)
print(model_5)


### Create the optimizer string represetation 

opt_string_5 = dict_to_string_representation_optimizer(dict_optimizer)


### transfer the string represetation to a pytorch optimizer 

optimizer_5 = parse_opt_string(opt_string_5, model_5)
print(optimizer_5)


### Train the model 

loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_5, 
                      train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_5, 
                      num_epochs)

Net(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    (2): AlphaDropout(p=0.7, inplace=False)
    (3): PReLU(num_parameters=1)
    (4): Linear(in_features=512, out_features=512, bias=True)
    (5): ReLU()
    (6): Linear(in_features=512, out_features=512, bias=True)
    (7): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    (8): Dropout(p=0.3, inplace=False)
    (9): Sigmoid()
    (10): Linear(in_features=512, out_features=128, bias=True)
    (11): BatchNorm1d(128, eps=0.001, momentum=0.9, affine=True, track_running_stats=True)
    (12): SELU()
    (13): Linear(in_features=128, out_features=512, bias=True)
    (14): CELU(alpha=1.0)
    (15): Linear(in_features=512, out_features=512, bias=True)
    (16): AlphaDropout(p=0.1, inplace=False)
    (17): SELU()
    (18): Linear(in_features=512, out_features=512, bias=True)
    (19): BatchNorm1d(512, eps=0.001, momentum=0.9, affine=T

#### 6.4.2 Perform - Change optimizer mutation

In [429]:
mutated_optimizer_string = mutation_change_opt(opt_string_5)

original optimizer string: SGD|0.01;0.9;True
parameters in string ['SGD', '0.01', '0.9', 'True']
parameter to change: True
out of following parameters: [False, True]
new parameter: False


#### 6.4.3 Train Network 5.2

In [430]:
### transfer the string represetation to a pytorch optimizer 

optimizer_5_2 = parse_opt_string(mutated_optimizer_string, model_5)
print(optimizer_5_2)


### Train the model 

loss_fn = nn.CrossEntropyLoss()

train_with_validation(model_5, 
                      train_loader, 
                      validation_loader, 
                      loss_fn, 
                      optimizer_5_2, 
                      num_epochs)

SGD (
Parameter Group 0
    dampening: True
    lr: 0.01
    maximize: False
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)
Epoch 0  Training Accuracy: 0.1000  Training Loss: 2.4446  Validation Accuracy: 0.0888  Validation Loss: 2.5027
Epoch 1  Training Accuracy: 0.1016  Training Loss: 2.4358  Validation Accuracy: 0.1110  Validation Loss: 2.5667
Epoch 2  Training Accuracy: 0.0992  Training Loss: 2.4435  Validation Accuracy: 0.1053  Validation Loss: 2.5045
Epoch 3  Training Accuracy: 0.1015  Training Loss: 2.4401  Validation Accuracy: 0.0974  Validation Loss: 2.4802
Epoch 4  Training Accuracy: 0.1011  Training Loss: 2.4377  Validation Accuracy: 0.1009  Validation Loss: 2.4962
Epoch 5  Training Accuracy: 0.1009  Training Loss: 2.4382  Validation Accuracy: 0.0906  Validation Loss: 2.5071
Epoch 6  Training Accuracy: 0.1001  Training Loss: 2.4368  Validation Accuracy: 0.1021  Validation Loss: 2.5174
Epoch 7  Training Accuracy: 0.1009  Training Loss: 2.4404  Validation Accuracy: