# Mutation for Activation Functions

In this notebook we see how to build a meaningful mutation operator for activation functions in neural networks.

## Defining the distance

In order to define the distance we first have to define the tree structure of the activation functions. We will consider the following activation functions:
- ReLU
- Leaky ReLU
- ELU
- GELU
- Sigmoid
- Tanh

We consider the following aspects of the activation functions:
- Boundness: whether the function is bounded or not
- Smoothness: whether the function is smooth or not

With these aspects in mind we can classify the activation functions as follows:
| Activation Function | Bounded | Smooth |
|---------------------|:-------:|:------:|
| ReLU                | No      | No     |
| Leaky ReLU          | No      | No     |
| ELU                 | No      | Yes    |
| GELU                | No      | Yes    |
| Sigmoid             | Yes     | Yes    |
| Tanh                | Yes     | Yes    |

In [116]:
from collections import defaultdict
import numpy as np

# Step 1: Define the tree structure for the activation functions
functions_tree = {
    "root": ["unbounded", "bounded"],
    "unbounded": ["not-smooth", "smooth"],
    "not-smooth": ["relu", "leakyrelu"],
    "smooth": ["elu","gelu"],
    "bounded": ["tanh", "sigmoid"]
}

In [117]:
# Step 2: Build a mapping from child to parent for easy traversal
parent_map = {}
def build_parent_map(node, parent, tree):
    parent_map[node] = parent
    if node in tree:
        for child in tree[node]:
            build_parent_map(child, node, tree)

build_parent_map("root", None, functions_tree)

In [118]:
# Step 3: Function to calculate the distance (hops) between two activation functions
def distance(node1, node2, parent_map):
    # Find the path to the root for both nodes
    path1, path2 = [], []
    
    while node1 is not None:
        path1.append(node1)
        node1 = parent_map[node1]
    
    while node2 is not None:
        path2.append(node2)
        node2 = parent_map[node2]
    
    # Find the least common ancestor (LCA) by comparing the paths
    path1.reverse()
    path2.reverse()
    min_len = min(len(path1), len(path2))
    
    lca_distance = 0
    for i in range(min_len):
        if path1[i] == path2[i]:
            lca_distance = i
        else:
            break
    
    # The number of hops is the total distance to the LCA
    hops = (len(path1) - lca_distance - 1) + (len(path2) - lca_distance - 1)
    return hops


# Step 4: Compute probabilities based on distances
def compute_probabilities_from_tree(current_function, parent_map, all_functions):
    hops = {func: np.emath.logn(2, distance(current_function, func, parent_map)) for func in all_functions if func != current_function}
    
    # Compute raw probabilities using 1 / distance (or some distance function)
    raw_probs = {func: 1 / (2**dist) for func, dist in hops.items()}
    
    # Normalize the probabilities
    total = sum(raw_probs.values())
    probabilities = {func: prob / total for func, prob in raw_probs.items()}
    
    return probabilities

In [119]:
# Define all activation functions
all_functions = ["relu", "leakyrelu", "elu", "gelu", "tanh", "sigmoid"]

# Example: compute the matrix of probabilities
probabilities = np.zeros((len(all_functions), len(all_functions)))
for i, func in enumerate(all_functions):
    transition_probability = 0.75
    probs = compute_probabilities_from_tree(func, parent_map, all_functions)
    for j, other_func in enumerate(all_functions):
        probabilities[i, j] = probs[other_func]*transition_probability if other_func in probs else 1-transition_probability

In [120]:
# Pretty print the matrix
print("Transition probabilities:")
print( np.array_str(probabilities, precision=2, suppress_small=True) )

Transition probabilities:
[[0.25 0.27 0.13 0.13 0.11 0.11]
 [0.27 0.25 0.13 0.13 0.11 0.11]
 [0.13 0.13 0.25 0.27 0.11 0.11]
 [0.13 0.13 0.27 0.25 0.11 0.11]
 [0.12 0.12 0.12 0.12 0.25 0.29]
 [0.12 0.12 0.12 0.12 0.29 0.25]]


Since the number of activation functions is small, it is more convenient to store the matrix of the transition probabilities and then load it in the code.

In [121]:
np.savetxt("transition_probabilities.csv", probabilities, delimiter=",")