# Adaptive Tensor Genetic Evolutionary Network

## Introduction

The **Adaptive Tensor Genetic Evolutionary Network (ATGEN)** is an innovative approach in neural network design that combines traditional neural networks with genetic algorithms. The core idea is to dynamically evolve the structure and parameters of the neural network during training, thereby creating a highly adaptable model capable of learning complex patterns more efficiently.

### Key Features

1. **Dynamic Network Evolution**: The network's structure is not fixed; it can evolve over time. Layers and neurons can be added or removed based on a set of evolutionary rules, which allows the network to adapt its complexity to the task at hand.

2. **Genetic Algorithm-Based Training**: The network employs a genetic algorithm to evolve its weights and structure. This involves selecting parent networks based on their fitness, performing crossover to combine features of two parents, and applying mutations to introduce new variations.

3. **Adaptive Activation Functions**: The network supports the dynamic adjustment of activation functions. This adaptability allows the model to switch from linear to non-linear transformations seamlessly, providing greater flexibility in learning different types of data distributions.

4. **Neuron and Weight Management**: ATGEN includes mechanisms to add and remove neurons and weights in each layer. This helps in optimizing the network size and performance during the evolution process.

## Implementation Overview

### Components

- **Linear Layer (`Linear` class)**: A custom linear layer with dynamic neuron management, including methods to add, remove, and initialize neurons and weights.
- **Activation Switch (`ActiSwitch` class)**: A module that allows smooth transitions between different activation functions, controlled by a learnable parameter.
- **Evolutionary Network (`ATNetwork` class)**: A neural network that evolves its structure and weights based on predefined rules and random mutations.
- **Genetic Algorithm (`GeneticAlgorithm` class)**: A framework that orchestrates the evolutionary process, handling fitness evaluation, selection, crossover, mutation, and pruning of neurons.

### Getting Started

Below, we provide a step-by-step implementation of the Adaptive Tensor Genetic Evolution Network and demonstrate its unique capabilities through various experiments and tests. The notebook is divided into sections covering the core components, training routines, and network evolution strategies.

---

Let's dive into the implementation details and see how we can build a network that learns to evolve!

In [1]:
# Import necessary libraries
import torch
from torch import nn
import torch.nn.functional as F
from torch.optim import AdamW

# Import utility packages
import math
import copy
import random
from typing import List, Tuple

# Import custom modules from the same folder
from atgen.layers import Linear, Flatten, Conv2D, MaxPool2D, ActiSwitch, Pass
from atgen.network import ATNetwork
from atgen.memory import ReplayBuffer
from atgen.utils import activation_functions
from atgen.ga import ATGEN

# Additional visualization or utility packages, if needed
# from torchsummary import summary
# import gymnasium as gym


## The `Layer` Module

### 1. `ActiSwitch` with ReLU Activation

The `ActiSwitch` class is tested with the `nn.Tanh()` activation function:

- **Input**: A tensor of shape `(5, 4, 3, 2)` with random values.
- **Output**: The resulting tensor after applying the activation function.
- **Assertions**:
  - The shape of the output tensor should match the shape of the input tensor.
  - Check if the input and output tensors have the same values (i.e., the output tensor should be equal to the input tensor).

In [2]:
# 1. Testing ActiSwitch with ReLU Activation
linear_pass_relu = ActiSwitch(nn.Tanh)
x = torch.randn(5, 4, 3, 2)
output: torch.Tensor = linear_pass_relu(x)

print(f"Input: {x.shape}")
print(f"Output: {output.shape}")
print(f"No change in data: {(x==output).all().item()}")

Input: torch.Size([5, 4, 3, 2])
Output: torch.Size([5, 4, 3, 2])
No change in data: False


### 2. `Linear` Layer Initialization

The `Linear` layer is tested for its weight and bias initialization:

- **Initial State**:
  - Display the initial weights and biases of the `Linear` layer.

In [3]:
# 2. Testing Linear Layer Initialization
linear = Linear(in_features=3, out_features=2)

print("Initial weights:")
print(linear.weight)
print("Initial bias:")
print(linear.bias)

Initial weights:
Parameter containing:
tensor([[-0.1652, -0.5730, -0.1794],
        [ 0.0661, -0.2503, -0.4106]], requires_grad=True)
Initial bias:
Parameter containing:
tensor([ 0.4073, -0.3337], requires_grad=True)


### 3. Adding a Neuron to the `Linear` Layer

Test the addition of a new output neuron:

- **Action**: Add a new output neuron to the `Linear` layer.
- **Assertions**:
  - Display the updated weights and biases to verify the addition.

In [4]:
# 3. Adding a Neuron
linear.add_neuron()
print("After adding a new output neuron:")
print(linear.weight)
print(linear.bias)

After adding a new output neuron:
Parameter containing:
tensor([[-0.1652, -0.5730, -0.1794],
        [ 0.0661, -0.2503, -0.4106],
        [-0.2259,  0.4247,  0.0819]], requires_grad=True)
Parameter containing:
tensor([ 0.4073, -0.3337, -0.4422], requires_grad=True)


### 4. Increasing the Input Dimension of the `Linear` Layer

Test the increase of the input dimension:

- **Action**: Increase the input dimension of the `Linear` layer.
- **Assertions**:
  - Display the updated weights and biases to verify the change in input dimension.

In [5]:
# 4. Increasing Input Dimension
linear.add_weight()
print("After increasing the input dimension:")
print(linear.weight)
print(linear.bias)

After increasing the input dimension:
Parameter containing:
tensor([[-0.1652, -0.5730, -0.1794,  0.0000],
        [ 0.0661, -0.2503, -0.4106,  0.0000],
        [-0.2259,  0.4247,  0.0819,  0.0000]], requires_grad=True)
Parameter containing:
tensor([ 0.4073, -0.3337, -0.4422], requires_grad=True)


### 5. Removing a Neuron from the `Linear` Layer

Test the removal of an output neuron:

- **Action**: Remove an output neuron from the `Linear` layer.
- **Assertions**:
  - Display the updated weights and biases to verify the removal.

In [6]:
# 5. Removing a Neuron
linear.remove_neuron(0)  # Assumes such a method exists
print("After removing an output neuron:")
print(linear.weight)
print(linear.bias)

After removing an output neuron:
Parameter containing:
tensor([[ 0.0661, -0.2503, -0.4106,  0.0000],
        [-0.2259,  0.4247,  0.0819,  0.0000]], requires_grad=True)
Parameter containing:
tensor([-0.3337, -0.4422], requires_grad=True)


### 6. Removing an Input Weight from the `Linear` Layer

Test the removal of an input weight:

- **Action**: Remove an input weight from the `Linear` layer.
- **Assertions**:
  - Display the updated weights and biases to verify the removal.

In [7]:
# 6. Removing an Input Weight
linear.remove_weight(0)  # Assumes such a method exists
print("After removing an input weight:")
print(linear.weight)
print(linear.bias)

After removing an input weight:
Parameter containing:
tensor([[-0.2503, -0.4106,  0.0000],
        [ 0.4247,  0.0819,  0.0000]], requires_grad=True)
Parameter containing:
tensor([-0.3337, -0.4422], requires_grad=True)


### 7. Identity Initialization of the `Linear` Layer

Test the identity initialization of the `Linear` layer:

- **Action**: Initialize a `Linear` layer with identity weights.
- **Assertions**:
  - Display the weights and biases to verify the identity initialization.

In [8]:
# 7. Testing Identity Initialization
identity_layer = Linear.init_identity_layer(size=3)
print("Weights after identity initialization:")
print(identity_layer.weight)
print("Bias after identity initialization:")
print(identity_layer.bias)

Weights after identity initialization:
Parameter containing:
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]], requires_grad=True)
Bias after identity initialization:
Parameter containing:
tensor([0., 0., 0.], requires_grad=True)


## Understanding the Identity Matrix in Neural Networks

### Identity Matrix

In the context of neural networks, the identity matrix is often used as an initialization method for weights in newly added layers. The identity matrix is a square matrix with ones on the main diagonal and zeros elsewhere. It has the property that when used in matrix multiplication, it does not alter the original matrix:

$$
I_n = \begin{bmatrix}
1 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 
\end{bmatrix}
$$

where $I_n$ is the identity matrix of size $n \times n$. This matrix is called the identity matrix because it acts as a multiplicative identity in matrix multiplication.

### Matrix Multiplication with the Identity Matrix

When you multiply a matrix $A$ by an identity matrix $I$, the result is the matrix $A$ itself:

$$
A \times I = A
$$

For example, consider a matrix $A$:

$$
A = \begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
$$

Multiplying $A$ by the $2 \times 2$ identity matrix $I$:

$$
A \times I_2 = \begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
\times
\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}
=
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
$$

The result is $A$, demonstrating that the identity matrix does not change the matrix it multiplies.

### Usage in Neural Networks

In neural networks, initializing weights with an identity matrix ensures that newly added layers do not alter the input significantly when first added. This can be particularly useful when experimenting with network architectures or adding new neurons, as it maintains stability in the output until training adjusts the weights.

## Handling Activation Functions in Deep Networks

As neural networks become deeper, the introduction of new layers can affect the output of the network due to the activation functions applied. Even if the new layers are initialized with identity matrices, which theoretically do not alter the output directly, activation functions introduce non-linearity that can change the network's behavior.

### The Problem with Activation Functions

When adding new layers with identity matrix initialization, the direct output of these layers may remain unchanged. However, if the activation functions applied to these layers are non-linear, they can alter the output in ways that affect the network's overall performance. This introduces a challenge: ensuring that the output remains consistent while still benefiting from the added layers.

### Solution: Smooth Transition with Blending Activation Functions

To address this issue, a custom approach is used to blend linear and non-linear transformations. The idea is to smoothly transition between a linear pass-through and a non-linear activation function, based on a learnable weighting factor. This ensures that new layers can be added without introducing abrupt changes in output.

One effective solution is to use a class that blends linear and non-linear activations. This class provides a learnable parameter that adjusts the balance between the linear transformation and the non-linear activation function. By controlling this parameter, the network can maintain the output consistency while allowing the flexibility to benefit from the additional layers.

### Key Points

- **Identity Matrix**: Ensures that additional layers do not affect the output directly but does not account for changes introduced by activation functions.
- **Linear and Semi-Linear Activation Functions**: Maintain output consistency while adding layers, avoiding undesirable performance changes.
- **Blending Mechanism**: Allows a smooth transition between linear and non-linear functions, controlled by a learnable parameter, to ensure the network adapts effectively.

This approach helps in maintaining the performance and output consistency of deep networks while accommodating new layers and activation functions.
```​⬤

In [9]:
# Helper loss function
def loss(x1: torch.Tensor, x2: torch.Tensor):
    val = torch.abs((x2 / x1).mean() - 1) * 100
    print(f"loss = {val:.10f}%")

# Define the Custom Network
class CustomNetwork(ATNetwork):
    def __init__(self):
        # Do not call the parent __init__ with layers; initialize manually
        super(ATNetwork, self).__init__()
        self.layers = nn.ModuleList([
            Linear(5, 3),
            Linear(3, 1)
        ])
        self.activation = nn.ModuleList([
            ActiSwitch(),
            Pass()
        ])
        self.backprob_phase = True
        self.default_activation = nn.ReLU()

# Initialize the model and print the summary
model = CustomNetwork()
# summary(model, input_size=(5,))
model.summary()

# You can also initialize the model simply like next
model = ATNetwork(5, 3, 1)

# Test the network with random input
x = torch.randn(4, 5)
y1: torch.Tensor = model(x)

[1m[38;5;153mModel Summary[0m[1m:
----------------------------------------------------------------------------------------------------
Layer      Type           Output Shape                  Parameters     Activation     
----------------------------------------------------------------------------------------------------
Layer 1    Linear         (batch_size, 3)               18             ActiSwitch(ReLU, 100.00%)
Layer 2    Linear         (batch_size, 1)               4              Pass           
----------------------------------------------------------------------------------------------------
[38;5;153mTotal Parameters:        [0m[1m22             [0m


In [10]:
# Evolve the network by inserting new 10 layers
for _ in range(10):
    model.evolve_network()
# summary(model, input_size=(5,))
model.summary()

[1m[38;5;153mModel Summary[0m[1m:
----------------------------------------------------------------------------------------------------
Layer      Type           Output Shape                  Parameters     Activation     
----------------------------------------------------------------------------------------------------
Layer 1    Linear         (batch_size, 5)               30             ActiSwitch(ReLU, 0.00  %)
Layer 2    Linear         (batch_size, 5)               30             ActiSwitch(ReLU, 0.00  %)
Layer 3    Linear         (batch_size, 5)               30             ActiSwitch(ReLU, 0.00  %)
Layer 4    Linear         (batch_size, 3)               18             ActiSwitch(ReLU, 0.00  %)
Layer 5    Linear         (batch_size, 3)               12             ActiSwitch(ReLU, 0.00  %)
Layer 6    Linear         (batch_size, 3)               12             ActiSwitch(ReLU, 0.00  %)
Layer 7    Linear         (batch_size, 3)               12             ActiSwitch(ReLU, 0.0

In [11]:
# Evolve layers by adding new 500 nodes to random layers
for _ in range(500):
    model.evolve_layer()
# summary(model, input_size=(5,))
model.summary()

[1m[38;5;153mModel Summary[0m[1m:
----------------------------------------------------------------------------------------------------
Layer      Type           Output Shape                  Parameters     Activation     
----------------------------------------------------------------------------------------------------
Layer 1    Linear         (batch_size, 42)              252            ActiSwitch(ReLU, 0.00  %)
Layer 2    Linear         (batch_size, 61)              2623           ActiSwitch(ReLU, 0.00  %)
Layer 3    Linear         (batch_size, 62)              3844           ActiSwitch(ReLU, 0.00  %)
Layer 4    Linear         (batch_size, 42)              2646           ActiSwitch(ReLU, 0.00  %)
Layer 5    Linear         (batch_size, 38)              1634           ActiSwitch(ReLU, 0.00  %)
Layer 6    Linear         (batch_size, 51)              1989           ActiSwitch(ReLU, 0.00  %)
Layer 7    Linear         (batch_size, 40)              2080           ActiSwitch(ReLU, 0.0

In [12]:
# Print the concatenated results and loss
y2: torch.Tensor = model(x)
print(torch.cat((y1, y2), dim=1))  
loss(y1, y2)


tensor([[ 0.1514,  0.1514],
        [ 0.0250,  0.0250],
        [ 0.0641,  0.0641],
        [-0.0042, -0.0042]], grad_fn=<CatBackward0>)
loss = 0.0000000000%


In [13]:
# Get the output after evolution and pruning
with torch.no_grad():
    for layer in model.layers:
        if isinstance(layer, Linear) or str(type(layer)) == "<class 'layers.Linear'>":
            layer.weight.data *= 0.03
            if layer.bias is not None:
                layer.bias.data *= 0.03
    # for layer in model.activation:
    #     if isinstance(layer, ActiSwitch) or str(type(layer)) == "<class 'layers.ActiSwitch'>":
    #         layer.weight.data.copy_(torch.tensor(1.0))

model.prune(0.01)
# summary(model, input_size=(5,))
model.summary()

[1m[38;5;153mModel Summary[0m[1m:
----------------------------------------------------------------------------------------------------
Layer      Type           Output Shape                  Parameters     Activation     
----------------------------------------------------------------------------------------------------
Layer 1    Linear         (batch_size, 42)              252            ActiSwitch(ReLU, 0.00  %)
Layer 2    Linear         (batch_size, 61)              2623           ActiSwitch(ReLU, 0.00  %)
Layer 3    Linear         (batch_size, 62)              3844           ActiSwitch(ReLU, 0.00  %)
Layer 4    Linear         (batch_size, 42)              2646           ActiSwitch(ReLU, 0.00  %)
Layer 5    Linear         (batch_size, 38)              1634           ActiSwitch(ReLU, 0.00  %)
Layer 6    Linear         (batch_size, 51)              1989           ActiSwitch(ReLU, 0.00  %)
Layer 7    Linear         (batch_size, 40)              2080           ActiSwitch(ReLU, 0.0

### How Crossover works

In [14]:
# Create parent networks with specified architectures
parent1 = ATNetwork(10, 2, 5, 1)
parent2 = ATNetwork(10, 2, 4, 3, 1)

# Initialize the GeneticAlgorithm instance
ga = ATGEN(population_size=10, layers=[8, 1, 4])

# Perform crossover
child = ga.crossover(parent1, parent2)

# Print the child network architecture and parameters
# summary(child, (10,))
model.summary()

# ga.run_generation(dummy_fitness_function, 10)

[1m[38;5;153mModel Summary[0m[1m:
----------------------------------------------------------------------------------------------------
Layer      Type           Output Shape                  Parameters     Activation     
----------------------------------------------------------------------------------------------------
Layer 1    Linear         (batch_size, 42)              252            ActiSwitch(ReLU, 0.00  %)
Layer 2    Linear         (batch_size, 61)              2623           ActiSwitch(ReLU, 0.00  %)
Layer 3    Linear         (batch_size, 62)              3844           ActiSwitch(ReLU, 0.00  %)
Layer 4    Linear         (batch_size, 42)              2646           ActiSwitch(ReLU, 0.00  %)
Layer 5    Linear         (batch_size, 38)              1634           ActiSwitch(ReLU, 0.00  %)
Layer 6    Linear         (batch_size, 51)              1989           ActiSwitch(ReLU, 0.00  %)
Layer 7    Linear         (batch_size, 40)              2080           ActiSwitch(ReLU, 0.0

In [15]:
model = ATNetwork(1, 4, 1)
optim = AdamW(model.parameters(), lr=0.01)
# summary(model, (1,))
model.summary()

x = torch.randn(10, 1)
y_real = torch.randn(10, 1)
epochs = 1_000

[1m[38;5;153mModel Summary[0m[1m:
----------------------------------------------------------------------------------------------------
Layer      Type           Output Shape                  Parameters     Activation     
----------------------------------------------------------------------------------------------------
Layer 1    Linear         (batch_size, 4)               8              ActiSwitch(ReLU, 0.00  %)
Layer 2    Linear         (batch_size, 1)               5              Pass           
----------------------------------------------------------------------------------------------------
[38;5;153mTotal Parameters:        [0m[1m13             [0m


In [16]:
for i in range(epochs):
    y_pred = model(x)
    loss = F.mse_loss(y_pred, y_real)
    optim.zero_grad()
    loss.backward()
    optim.step()
print("Loss:", loss.item())


Loss: 0.5993110537528992


In [17]:
for i in range(epochs):
    y_pred = model(x)
    loss = F.mse_loss(y_pred, y_real)
    optim.zero_grad()
    loss.backward()
    optim.step()
print("Loss:", loss.item())

Loss: 0.5991665124893188


In [18]:
for _ in range(5):
    model.evolve_network()
for _ in range(100):
    model.evolve_layer()
optim = AdamW(model.parameters(), lr=0.01)
# summary(model, (1,))
model.summary()


[1m[38;5;153mModel Summary[0m[1m:
----------------------------------------------------------------------------------------------------
Layer      Type           Output Shape                  Parameters     Activation     
----------------------------------------------------------------------------------------------------
Layer 1    Linear         (batch_size, 19)              38             ActiSwitch(ReLU, 0.00  %)
Layer 2    Linear         (batch_size, 18)              360            ActiSwitch(ReLU, 0.00  %)
Layer 3    Linear         (batch_size, 14)              266            ActiSwitch(ReLU, 0.00  %)
Layer 4    Linear         (batch_size, 18)              270            ActiSwitch(ReLU, 0.00  %)
Layer 5    Linear         (batch_size, 20)              380            ActiSwitch(ReLU, 50.44 %)
Layer 6    Linear         (batch_size, 23)              483            ActiSwitch(ReLU, 0.00  %)
Layer 7    Linear         (batch_size, 1)               24             Pass           
----

In [19]:
print(model.layers[5].weight)

Parameter containing:
tensor([[ 1.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  1.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,
          0.0

In [20]:
for i in range(epochs):
    y_pred = model(x)
    loss = F.mse_loss(y_pred, y_real)
    optim.zero_grad()
    loss.backward()
    optim.step()
print("Loss:", loss.item())

Loss: 0.17302699387073517


In [21]:
print(model.layers[5].weight)

Parameter containing:
tensor([[ 6.8865e-01,  9.5886e-02, -3.5920e-02, -1.0901e-02, -1.4520e-01,
          3.5648e-02, -4.3937e-02, -1.3446e-01,  4.3127e-03,  6.9630e-03,
          1.0831e-01, -2.3804e-01,  8.8959e-03, -3.9544e-03, -6.7885e-02,
          8.4374e-02,  2.2941e-01,  4.2520e-02, -5.1575e-02, -5.2093e-03],
        [ 1.0196e-01,  7.9829e-01,  7.3159e-02, -2.4440e-02,  1.1557e-02,
         -1.5293e-01, -8.7847e-02,  1.3030e-01, -1.3881e-01, -3.7994e-02,
         -1.7710e-01,  2.5111e-01, -1.2068e-01, -9.6515e-02,  2.0235e-01,
         -1.7153e-01, -2.9129e-01, -1.7729e-01, -1.5014e-02, -2.6385e-02],
        [ 2.4191e-01, -1.0707e-01,  9.0089e-01, -3.5933e-02,  1.1841e-01,
         -6.8941e-02,  3.9683e-02,  1.5791e-01,  8.0370e-03,  9.8358e-03,
         -1.5469e-01,  2.8491e-01, -3.5831e-02, -1.3980e-02,  5.5826e-02,
         -1.3234e-01, -2.3756e-01, -5.8486e-02,  5.8572e-02, -7.8342e-03],
        [-2.0738e-01,  1.3329e-01, -8.5105e-03,  9.3419e-01, -1.3629e-01,
          8.4