# Lora finetuning
> Example of finetuning lora

In the following notebook we are going to use my custom implementation of LoRA to fine-tune a simple model

### General Imports

In [1]:
import numpy as np
from tinygrad import Tensor, nn
import copy

# from extra.training import evaluate, train
from utils import *

##### Importing custom LoRA library

In [2]:
from lora_tinygrad import LoRA

### Define a simple model 

In [3]:
class TinyNet:
    def __init__(self):
        self.l1 = nn.Linear(784, 784 * 3, bias=False)
        self.l2 = nn.Linear(784 * 3, 784, bias=False)
        self.l3 = nn.Linear(784, 128, bias=False)
        self.l4 = nn.Linear(128, 10, bias=False)

    def __call__(self, x):
        x = self.l1(x).leakyrelu()
        x = self.l2(x).leakyrelu()
        x = self.l3(x).leakyrelu()
        x = self.l4(x)
        return x

## Model pre-training 

#### Hyperparameters & Fetching Dataset

In [4]:
lr = 1e-3
epochss = 3
BS = 128
n_outputs = 10

X_train, Y_train, X_test, Y_test = fetch_fashion_mnist()
steps = len(X_train) // BS

#### Defining the model and loss function

In [5]:
# Define the model
model = TinyNet()

# Define loss function
lossfn = Tensor.sparse_categorical_crossentropy

#### Traning the model

In [6]:
# Pre-training the model
for _ in range(epochss):
    optimizer = nn.optim.Adam(nn.state.get_parameters(model), lr=lr)
    train(model, X_train, Y_train, optimizer, lossfn=lossfn, steps=steps, BS=BS)
    accuracy, Y_test_pred = evaluate(model, X_test, Y_test, return_predict=True)
    lr /= 1.2
    print(f"reducing lr to {lr:.7f}")

loss 0.43 accuracy 0.84: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 468/468 [00:04<00:00, 100.48it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 205.31it/s]


test set accuracy is 0.841000
reducing lr to 0.0008333


loss 0.29 accuracy 0.88: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 468/468 [00:04<00:00, 110.80it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 185.02it/s]


test set accuracy is 0.860400
reducing lr to 0.0006944


loss 0.32 accuracy 0.88: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 468/468 [00:04<00:00, 110.13it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 190.61it/s]

test set accuracy is 0.868700
reducing lr to 0.0005787





#### Get mislabeled predictions

In [7]:
mislabeled_counts = get_mislabeled_counts(Y_test, Y_test_pred, n_output=n_outputs)
pretty_print_mislabeled_counts(mislabeled_counts)

worst_class = max(mislabeled_counts, key=lambda k: mislabeled_counts[k])
print(f"Worst class: {worst_class}")

Class 0: Missing 186
Class 1: Missing 37
Class 2: Missing 230
Class 3: Missing 125
Class 4: Missing 160
Class 5: Missing 85
Class 6: Missing 344
Class 7: Missing 67
Class 8: Missing 45
Class 9: Missing 34
Worst class: 6


## Finetuning

Let's start by craeting a dataset for the finetuning on the worst examples to see if there is actually some improvement

In [9]:
print(f"Fine-tuning the worst class, {worst_class}..")
lrs = 1e-5
epochss = 1
BS = 64

# Get a mixture which is mostly filled with the worst class
X_train, Y_train = mix_old_and_new_data(X_train, Y_train, worst_class, ratio = 0.5)
steps = len(X_train) // BS

Fine-tuning the worst class, 6..


ValueError: Sample larger than population or is negative

### Fine-tuning without Lora

Let's first do a full finetuning of the model to then compare the performance

In [None]:
# Creating a copy of the model
model_full_finetuning = copy.deepcopy(model) 

# Finetuning the model
for _ in range(epochss):
    optimizer = nn.optim.Adam(nn.state.get_parameters(model_full_finetuning), lr=lr)
    # Default loss function is sparse_categorical_crossentropy
    train(model_full_finetuning, X_train, Y_train, optimizer, steps=steps, BS=BS)
    accuracy, Y_test_pred = evaluate(model_full_finetuning, X_test, Y_test, return_predict=True)

#### Visualize results

In [None]:
mislabeled_counts = get_mislabeled_counts(Y_test, Y_test_pred, n_output=n_outputs)
pretty_print_mislabeled_counts(mislabeled_counts)

### Fine-tuning with Lora

Now let's do the Lora finetuning on the other same data with a rank of 16

In [None]:
# Getting the Lora model from the original model without modifying the original one
lora_model = LoRA.from_module(model, rank=16, inplace=False)

# Pre-training the model
for _ in range(epochss):
    optimizer = nn.optim.Adam(lora_model.parameters(), lr=lr)
    # Default loss function is sparse_categorical_crossentropy
    train(lora_model, X_train, Y_train, optimizer, steps=steps, BS=BS)
    accuracy, Y_test_pred = evaluate(lora_model, X_test, Y_test, return_predict=True)

#### Visualize results

In [None]:
mislabeled_counts = get_mislabeled_counts(Y_test, Y_test_pred, n_output=n_outputs)
pretty_print_mislabeled_counts(mislabeled_counts)

#### Show the parameters we trained in the model

In [None]:
original_parameters = sum(p.numel() for p in nn.state.get_parameters(model_full_finetuning))
lora_parameters = sum(p.numel() for p in lora_model.parameters())

print(f"{original_parameters = }")
print(f"{lora_parameters = }")
print(f"Percentage of parameters we update: {(lora_parameters / original_parameters) * 100:.2f}%")

## Other functionalities

In the following section we will test some other functionalities I implemented in the library

In [None]:
# Getting a random example to test the model
x = Tensor.randn(1, 28, 28).reshape(-1)

# Assert if the values are not all the same and thus I have done something
assert not np.allclose(model(x).numpy(), lora_model(x).numpy()), "The outputs are too close!"

# Disable the lora parameters
lora_model.disable_lora()

# Assert if the values are the same and thus I haven't changed the original model
assert np.allclose(model(x).numpy(), lora_model(x).numpy()), "The outputs are too close!"
print("Everything works as expected")