# Simple LoRA: Low-Rank Adaptation Linear Layer

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import random
import math

torch.manual_seed(1234)

  from .autonotebook import tqdm as notebook_tqdm


<torch._C.Generator at 0x1fa66371350>

## LoRALayer

Create the LoRALayer as discussed in the paper [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) by Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen, 2023, arXiv.  Also see the 'official LoRA' github at [GitHub:microsoft/LoRA](https://github.com/microsoft/LoRA/tree/main) by Microsoft Corporation, 2023.

The basic idea is to add the LoRA output to the original Linear layer output as shown below with 'x + x1', where 'x1 = (x @ A @ B) * scaling'

In [8]:
class LoRALayer(torch.nn.Module):
    def __init__(self, in_dim, out_dim, rank, alpha):
        super().__init__()
        # Initialize A with random values from N(0, 1/sqrt(rank))
        stdev = torch.tensor(1.0 / math.sqrt(rank))
        self.A = torch.nn.Parameter(torch.randn(in_dim, rank) * stdev)
        # Initialize B with zeros
        self.B = torch.nn.Parameter(torch.zeros(rank, out_dim))
        self.alpha = alpha
        self.scaling = alpha / rank
        
    def forward(self, x):
        x1 = torch.matmul(x, self.A)    # x @ A
        x1 = torch.matmul(x1, self.B)   # x @ A @ B
        x1 = x1 * self.scaling            # x @ A @ B * scaling
        return x + x1

## LinearLoRA

Next, lets create the LinearLoRA layer that combines a Linear layer with the LoRALayer above.  When using LoRA, the Linear layer
weights and bias are 'frozen' by setting their 'requires_grad' = False.

In [14]:
class LinearLoRA(torch.nn.Module):
    def __init__(self, in_dim, out_dim, rank, alpha, enable_lora):
        super().__init__()
        
        self.enable_lora = enable_lora       
        if enable_lora:
            # freeze linear learning when using lora
            self.weight = torch.nn.Parameter(torch.randn(in_dim, out_dim), requires_grad=False)
            self.bias = torch.nn.Parameter(torch.zeros(out_dim), requires_grad=False)          
            self.lora = LoRALayer(in_dim, out_dim, rank, alpha)
        else:
            self.weight = torch.nn.Parameter(torch.randn(in_dim, out_dim))
            self.bias = torch.nn.Parameter(torch.zeros(out_dim))
            
    def forward(self, x):
        original_shape = x.shape
        # flatten input
        x = x.view(-1, original_shape[-1])
        # linear layer
        x1 = F.linear(x, self.weight, self.bias)
        # lora layer
        x2 = self.lora(x1) if self.enable_lora else x1           
        # reshape output
        x = x2.view(original_shape)
        return x

In the forward pass, we first flatten the input so that the Linear runs on just the last axis of data.  The Linear layer output 'x1' 
is passed to the LoRALayer which runs LoRA on 'x1' and then internally adds the results to 'x1'.

## Running with LoRA Disabled

Now lets try it out.  First we will define the LinearLoRA with LoRA disabled.

In [19]:
input_dim = 1
output_dim = 1

linear_layer = LinearLoRA(input_dim, output_dim, rank=4, alpha=0.5, enable_lora=False)
loss_fn = nn.MSELoss()
optimizer = optim.SGD(linear_layer.parameters(), lr=0.01)

x_train = torch.tensor([[[1.0], [2.0], [3.0]], [[4.0], [5.0], [6.0]]])
y_train = torch.tensor([[[2.0], [4.0], [6.0]], [[8.0], [10.0], [12.0]]])

print("LoRA Disabled - start linear weights: ", linear_layer.weight)
print("LoRA Disabled - start linear bias: ", linear_layer.bias)

LoRA Disabled - start linear weights:  Parameter containing:
tensor([[-0.0098]], requires_grad=True)
LoRA Disabled - start linear bias:  Parameter containing:
tensor([0.], requires_grad=True)


Note the initial linear weight and bias values, which should change when 'enable_lora=False'.  

Lets now train the simple linear layer to output [[[2.0], [4.0], [6.0]], [[8.0], [10.0], [12.0]]] when we input [[[1.0], [2.0], [3.0]], [[4.0], [5.0], [6.0]]].

In [23]:
for epoch in range(1000):
    # Forward pass
    y_pred = linear_layer(x_train)

    # Compute the loss
    loss = loss_fn(y_pred, y_train)

    # Zero the gradients
    optimizer.zero_grad()

    # Backward pass
    loss.backward()

    # Update the weights
    optimizer.step()

# Test the model
x_test = torch.tensor([[[1.0], [2.0], [3.0]], [[4.0], [5.0], [6.0]]])
y_test = linear_layer(x_test)
print("The predicted values are: ", y_test)
print("LoRA Disabled - end linear weights: ", linear_layer.weight)
print("LoRA Disabled - end linear bias: ", linear_layer.bias)

The predicted values are:  tensor([[[ 2.0000],
         [ 4.0000],
         [ 6.0000]],

        [[ 8.0000],
         [10.0000],
         [12.0000]]], grad_fn=<ViewBackward0>)
LoRA Disabled - end linear weights:  Parameter containing:
tensor([[-1.4473]])
LoRA Disabled - end linear bias:  Parameter containing:
tensor([0.])


Note, how the linear weight and bias have changed as they now store the knowlege on how to convert the input to the desired output.

## Running with LoRA Enabled

Next, lets run the same test, but this time with LoRA turned on (e.g., 'enable_lora' = True).  When enabled, the Linear weights and bias shold not change as they are 'frozen'.

In [26]:
input_dim = 1
output_dim = 1

linear_layer = LinearLoRA(input_dim, output_dim, rank=4, alpha=0.5, enable_lora=True)
loss_fn = nn.MSELoss()
optimizer = optim.SGD(linear_layer.parameters(), lr=0.01)

x_train = torch.tensor([[[1.0], [2.0], [3.0]], [[4.0], [5.0], [6.0]]])
y_train = torch.tensor([[[2.0], [4.0], [6.0]], [[8.0], [10.0], [12.0]]])

print("LoRA Enabled - start linear weights: ", linear_layer.weight)
print("LoRA Enabled - start linear bias: ", linear_layer.bias)

LoRA Enabled - start linear weights:  Parameter containing:
tensor([[-0.7649]])
LoRA Enabled - start linear bias:  Parameter containing:
tensor([0.])


Again, note the initial linear weight and bias values, which should NOT change when 'enable_lora=True', for they are 'frozen'.

Lets now again train the simple linear layer to output [[[2.0], [4.0], [6.0]], [[8.0], [10.0], [12.0]]] when we input [[[1.0], [2.0], [3.0]], [[4.0], [5.0], [6.0]]] but this time with LoRA enabled.

In [28]:
for epoch in range(1000):
    # Forward pass
    y_pred = linear_layer(x_train)

    # Compute the loss
    loss = loss_fn(y_pred, y_train)

    # Zero the gradients
    optimizer.zero_grad()

    # Backward pass
    loss.backward()

    # Update the weights
    optimizer.step()

# Test the model
x_test = torch.tensor([[[1.0], [2.0], [3.0]], [[4.0], [5.0], [6.0]]])
y_test = linear_layer(x_test)
print("The predicted values are: ", y_test)
print("LoRA Enabled - end linear weights: ", linear_layer.weight)
print("LoRA Enabled - end linear bias: ", linear_layer.bias)

The predicted values are:  tensor([[[ 2.0000],
         [ 4.0000],
         [ 6.0000]],

        [[ 8.0000],
         [10.0000],
         [12.0000]]], grad_fn=<ViewBackward0>)
LoRA Enabled - end linear weights:  Parameter containing:
tensor([[-0.7649]])
LoRA Enabled - end linear bias:  Parameter containing:
tensor([0.])


This time, the linear weights and bias have **not changed**, but the model still learned to output the desired results.  When using LoRA the newly learned knowledge is stored in the LoRA A and B weights instead of the Linear weight and bias.

For other LoRA resources see the following.
[Code LoRA From Scratch](https://lightning.ai/lightning-ai/studios/code-lora-from-scratch) by Sebastian Raschka, 2023, Lightning.AI
[GitHub:minLoRA/cccntu](https://github.com/cccntu/minLoRA) by Jonathan Chen, 2023, GitHub

And for a visual description on how LoRA works, see [Understanding LLM Fine Tuning with Low-Rank Adaptation (LoRA)](https://www.signalpop.com/2024/01/28/understanding-llm-fine-tuning-with-low-rank-adaptation-lora/) by SignalPop, 2023.