<a href="https://colab.research.google.com/github/RCortez25/PhD/blob/main/LLM/8.%20Shortcut%20connections/Shortcut_connections.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Shortcut connections

Shortcut connections were introduced to solve the problem of vanishing gradients, in which gradients become smaller and smaller as they propagate backward so it's difficult to train earlier layers.

These connections create an alternative path for the gradient to flow by skipping one or more layers. This is achieved by adding the output of one layer to the output of a latter layer.

Let's create a NN to see the effect of shortcut connections.

In [None]:
import torch
import torch.nn as nn

class GELU(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        result = 0.5 * x * (1 + torch.tanh(
            torch.sqrt(torch.tensor(2.0 / torch.pi)) *
            (x + 0.044715 * torch.pow(x, 3))
        ))
        return result

In [None]:
class ExampleDeepNeuralNetwork(nn.Module):
    def __init__(self, layer_sizes, use_shortcut):
        super().__init__()
        self.use_shortcut = use_shortcut
        self.layers = nn.ModuleList([
            for i in range(len(layer_sizes) - 1):
                nn.Sequential(
                    nn.Linear(layer_sizes[i], layer_sizes[i + 1]),
                    GELU()),
        ])

    def forward(self, x):
        for layer in self.layers:
            # Compute the output of the current layer
            output = layer(x)
            # Check if shorcut can be applied
            if self.use_shortcut and x.shape == output.shape:
                # Apply shortcut
                x = x + output
            else
                x = output
        return x

# Example