<a href="https://colab.research.google.com/github/Darshanbreddy/LLM/blob/main/Linear__Layer(1%262)%2C_GELU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Clear demonstration of how input data is transformed step-by-step through a simple feedforward neural network using different activation functions (ReLU vs GELU), showing the effect of nonlinearities and linear layers on the data representation from raw input all the way to final output.

 Block 1: Compare ReLU vs GELU on the same input

In [1]:
import torch
import torch.nn.functional as F

# Input tensor
x = torch.linspace(-3, 3, steps=10)

# Apply ReLU and GELU
relu_output = F.relu(x)
gelu_output = F.gelu(x)

print("Input:")
print(x)
print("\nReLU Output:")
print(relu_output)
print("\nGELU Output:")
print(gelu_output)


Input:
tensor([-3.0000, -2.3333, -1.6667, -1.0000, -0.3333,  0.3333,  1.0000,  1.6667,
         2.3333,  3.0000])

ReLU Output:
tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3333, 1.0000, 1.6667, 2.3333,
        3.0000])

GELU Output:
tensor([-0.0040, -0.0229, -0.0797, -0.1587, -0.1231,  0.2102,  0.8413,  1.5870,
         2.3104,  2.9960])


In [2]:
# Custom GELU implementation
def gelu(x):
    return 0.5 * x * (1 + torch.tanh(torch.sqrt(torch.tensor(2 / 3.14159265)) * (x + 0.044715 * x**3)))

# Input (batch_size=3, features=8)
x = torch.randn(3, 8)

# Define Layer 1: Linear(8 ➝ 16) + GELU
def layer1(x):
    w1 = torch.randn(8, 16)
    b1 = torch.randn(16)
    return gelu(x @ w1 + b1)

# Apply Layer 1
out1 = layer1(x)
print("\nOutput of Layer 1 (Linear + custom GELU):")
print(out1)



Output of Layer 1 (Linear + custom GELU):
tensor([[ 1.0552e+00,  4.0823e+00, -1.8030e-02, -2.6953e-04, -1.7004e-01,
         -0.0000e+00, -2.2586e-02, -7.3840e-03,  2.4161e+00,  1.5999e+00,
          3.2410e+00,  2.0133e+00, -1.5462e-01,  4.4627e+00, -0.0000e+00,
          1.0894e+01],
        [ 1.6755e+00,  2.9511e+00, -5.5913e-02,  1.4067e+00,  3.2412e-01,
         -2.5748e-04, -1.4914e-07,  5.4777e+00, -1.1242e-02,  6.0841e-02,
          2.2075e-01, -1.6773e-01, -3.8090e-02,  4.8550e+00,  2.5292e-01,
          2.2803e+00],
        [ 1.8880e+00,  3.0105e+00, -8.9843e-02, -1.6121e-01,  2.1669e+00,
          2.5209e+00,  1.2876e+00, -1.6372e-01,  7.9291e-01,  1.3498e+00,
          1.0105e+00,  1.6389e+00, -1.6968e-03, -1.3546e-01,  2.2051e+00,
         -1.4944e-01]])


Block 3: Define Layer 2 (Linear 16 ➝ 8)

In [3]:
# Define Layer 2: Linear(16 ➝ 8)
def layer2(x):
    w2 = torch.randn(16, 8)
    b2 = torch.randn(8)
    return x @ w2 + b2

# Apply Layer 2
out2 = layer2(out1)
print("\nOutput of Layer 2 (Linear):")
print(out2)



Output of Layer 2 (Linear):
tensor([[-10.2599,   2.8703,  -7.9834,   0.7555,  13.7122,  11.7981,  19.3064,
          -4.5272],
        [  8.5440,  -3.1209,   3.3968,   7.5250,  -3.9780,   6.6603,  19.0782,
          11.0782],
        [  8.8261,  -0.3243,  -1.8002,  -2.1974,   4.3282,  -1.4196,  -3.2187,
           0.4414]])


 Block 4: Show Original Input and Final Output

In [4]:
print("\nOriginal Input:")
print(x)

print("\nFinal Output after Layer 1 ➝ GELU ➝ Layer 2:")
print(out2)



Original Input:
tensor([[-2.3009, -0.0052, -0.1872,  2.6048, -1.2609,  0.6406, -0.9216, -1.5741],
        [-2.3114, -0.4795,  1.6537, -0.7362, -0.0311, -1.1922, -0.6694,  0.2523],
        [ 0.0401, -0.2795, -0.4661,  0.6647, -0.0064, -1.0065,  0.1872, -0.9176]])

Final Output after Layer 1 ➝ GELU ➝ Layer 2:
tensor([[-10.2599,   2.8703,  -7.9834,   0.7555,  13.7122,  11.7981,  19.3064,
          -4.5272],
        [  8.5440,  -3.1209,   3.3968,   7.5250,  -3.9780,   6.6603,  19.0782,
          11.0782],
        [  8.8261,  -0.3243,  -1.8002,  -2.1974,   4.3282,  -1.4196,  -3.2187,
           0.4414]])


Block 1: Compares ReLU and GELU activation functions on the same input to show their different nonlinear behaviors.

Block 2: Defines a custom GELU function and applies it in a Linear(8→16) layer to simulate the first feedforward step.

Block 3: Applies a second Linear(16→8) layer to transform the output of Layer 1 further.

Block 4: Displays the original input and the final output after passing through both layers to track the full transformation.