# Fine-tuning a Large Language Model

In this lecture we will be looking at how to fine-tune an existing pre-trained language model.

## Learning outcomes
* You will learn how to download a pre-trained model and a training dataset from Hugging Face.
* You will learn how to fine-tune the downloaded model with the dataset using Hugging Face trl library and the supervised fine-tuning (SFT) method.
* You will learn how to use the fine-tuned model to generate text based on user input / prompts.
* You will learn how to upload the fine-tuned model to your own Hugging Face repository so that it can be used later or shared with other users.

## Prerequistes
* You will need the following free accounts: Google, Hugging Face and Weights & Biases. You may use your existing accounts or create new accounts for the purposes of this course.
* We will use the [Hugging Face](https://huggingface.co/) libraries: transformers (for models), datasets (for datasets), trl (for training). We will also store the fine-tuned models in a Hugging Face repository.
* Training is done using [Google Colab](https://colab.research.google.com/), which provides free access to Jupyter notebooks backed with a GPU compute required for fine-tuning.
* For monitoring the training run we will use [Weights & Biases](https://wandb.ai/)


## Fine-tuning

Let's first install some pre-requisites using Python's package manager pip

In [None]:
!pip install transformers peft accelerate
!pip install -q trl xformers wandb datasets einops sentencepiece bitsandbytes

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.9/310.9 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.7/16.7 MB[0m [31m42.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m26.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency res

Then we need to import the required libraries

In [None]:
# Use a pipeline as a high-level helper
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openai-community/gpt2")

# Load model directly
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

#tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")

#model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
model = GPT2Model.from_pretrained('gpt2')
###
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, TextStreamer
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training
import torch, wandb
from datasets import load_dataset
from trl import SFTTrainer
from huggingface_hub import notebook_login

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


We will download a pre-trained large language model from Hugging Face and a dataset to train the model with. Below we assign these to variables we will use later. We will also set the name of the repository and model for the fine-tuned model.

In [None]:
# Pre trained model
#model_name = "mistralai/Mistral-7B-v0.3"
model_name = "openai-community/gpt2"

# Dataset name
from datasets import load_dataset

ds = load_dataset("Israhassan/Shakespeare")

dataset_name = "Israhassan/Shakespeare"
#"vicgalle/alpaca-gpt4"
#https://huggingface.co/datasets/BEE-spoke-data/wikipedia-20230901.en-deduped/resolve/main/README.md


# Hugging face repository link to save fine-tuned model(Create new repository in huggingface,copy and paste here)
new_model = "Litantti/vkneljae"

To access your Hugging Face account, you need to log in. First go to your Hugging Face account, click *Settings* and select *Access Tokens*. Create a new token and copy the token. Then execute the below login command and when asked paste an access token.  

In [None]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Let's then download a subset of the dataset we want to use. Below we limit the dataset to the first 10,000 examples in order to save time. In real life you would probably use the full dataset.

In [None]:
dataset = load_dataset(dataset_name, split="train[350:500]")
dataset["text"][0]

"We must not suppose that Othello's account of his courtship in hisfamous speech before the Senate is intended to be exhaustive. He isaccused of having used drugs or charms in order to win Desdemona; andtherefore his purpose in his defence is merely to show that hiswitchcraft was the story of his life. It is no part of his business totrouble the Senators with the details of his courtship, and he socondenses his narrative of it that it almost appears as though there wasno courtship at all, and as though Desdemona never imagined that he wasin love with her until she had practically confessed her love for him. Hence she has been praised by some for her courage, and blamed by othersfor her forwardness. But at III. iii. 70 f. matters are presented in quite a new light. Therewe find the following words of hers:                             What! Michael Cassio,     That came a-wooing with you, and so many a time,     When I have spoke of you dispraisingly,     Hath ta'en your part. It seems, 

Let's then download the model. We first create a config object for quantization of the model using bitsandbytes. Bitsandbytes enables accessible large language models via k-bit quantization for PyTorch.

We also need to download the tokenizer.

In [None]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.float16,
    bnb_4bit_use_double_quant= False,
)
from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

#model = AutoModelForCausalLM.from_pretrained(
#    model_name,
#    quantization_config=bnb_config,
#    device_map={"": 0}
#)
model = prepare_model_for_kbit_training(model)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
model.config.pretraining_tp = 1

from transformers import GPT2Tokenizer, TFGPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = TFGPT2Model.from_pretrained('gpt2')
text = "Teksti"
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)


#tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
#tokenizer.pad_token = tokenizer.eos_token
#tokenizer.add_eos_token = True
#tokenizer.add_bos_token, tokenizer.add_eos_token

All PyTorch model weights were used when initializing TFGPT2Model.

All the weights of TFGPT2Model were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2Model for predictions without further training.


Below we set the access token to Waights & Biases. You should copy your access token from your account at [https://wandb.ai](https://wandb.ai).

In [None]:
#monitering login
wandb.login(key="16c1604d60d0b0ed9180c9298d5712933edc4917")
run = wandb.init(project='gpt2-shake', job_type="training", anonymous="allow")



VBox(children=(Label(value='2.331 MB of 2.331 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

Then we'll create a configuration for the lo-rank adaptation method we will use.

In [None]:
peft_config = LoraConfig(
    lora_alpha=8,
    lora_dropout=0.1,
    r=16,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj"]
)

We need to set the training arguments for the training run.

In [None]:
training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2,
    optim="paged_adamw_8bit",
    save_steps=1000,
    logging_steps=30,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.3,
    group_by_length=True,
    lr_scheduler_type="linear",
    report_to="wandb",
)

Finally we create the trainer object that uses supervised fine-tuning (SFT) as the training method.

In [None]:
# Setting sft parameters


from trl import SFTTrainer

# Create the trainer object
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    max_seq_length=512,  # Adjust as needed
    dataset_text_field="text",  # Field in your dataset containing the text
)



#trainer = SFTTrainer(
#    model=model,
#    train_dataset=dataset,
#    peft_config=peft_config,
#    max_seq_length= None,
#    dataset_text_field="text",
#    tokenizer=tokenizer,
#    args=training_arguments,
#    packing=False,
#)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


AttributeError: 'TFGPT2Model' object has no attribute 'named_modules'

Then, we can execute the training run. This will approximately 8 hours using the T4 GPU available in Colab and the dataset of 10,000 samples we downloaded.

In [None]:
# Train model
trainer.train()

NameError: name 'trainer' is not defined

In [None]:
# Save the fine-tuned model
trainer.model.save_pretrained(new_model)
wandb.finish()
model.config.use_cache = True
model.eval()

VBox(children=(Label(value='0.743 MB of 0.743 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/epoch,▁
train/global_step,▁

0,1
total_flos,1308289571880960.0
train/epoch,0.92308
train/global_step,6.0
train_loss,1.33972
train_runtime,413.3441
train_samples_per_second,0.242
train_steps_per_second,0.015


MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32768, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralSdpaAttention(
          (q_proj): lora.Linear4bit(
            (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.1, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=4096, out_features=16, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=16, out_features=4096, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
            (lora_magnitude_vector): ModuleDict()
          )
          (k_proj): lora.Linear4bit(
            (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
            (lora_dropout): ModuleDi

In [None]:
def stream(user_prompt):
    runtimeFlag = "cuda:0"
    system_prompt = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n'
    B_INST, E_INST = "### Instruction:\n", "### Response:\n"
    prompt = f"{system_prompt}{B_INST}{user_prompt.strip()}\n\n{E_INST}"
    inputs = tokenizer([prompt], return_tensors="pt").to(runtimeFlag)
    streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
    _ = model.generate(**inputs, streamer=streamer, max_new_tokens=500)

In [None]:
stream("what is newtons 3rd law and its formula")

NameError: name 'stream' is not defined

In [None]:
#base_model = AutoModelForCausalLM.from_pretrained(
#    model_name, low_cpu_mem_usage=True,
#    return_dict=True,torch_dtype=torch.float16,
#    device_map= {"": 0})
#model = PeftModel.from_pretrained(base_model, new_model)
#model = model.merge_and_unload()

base_model = GPT2Model.from_pretrained('gpt2',
    low_cpu_mem_usage=True,
    #return_dict=True,torch_dtype=torch.float16,
    )
#model = PeftModel.from_pretrained(base_model, new_model)
#model = model.merge_and_unload()

# Reload tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2', trust_remote_code=True)
#tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
#tokenizer.pad_token = tokenizer.eos_token
#tokenizer.padding_side = "right"

In [None]:
model.push_to_hub(new_model)
tokenizer.push_to_hub(new_model)

NameError: name 'model' is not defined