## FINE-TUNING
Here we have an example of fine-tunning a pre-trained model to create prompts. We will ask the model to act like "a chemistry expert" and create prompts.

### 1) SET UP THE ENVIROMENT

Install required libraries.


In [None]:
!pip install -q peft==0.10.0
!pip install -q datasets==2.19.0

### 2) LOAD THE MODEL

We are using Bloom, one of the smallest and smarters models avaliable to be trained with the PEFT library.

We can choose between the Bloom "big" an "small" models. In this case we chose the small one (bigscience/bloomz-560m), instead of the big one (bigscience/bloom-1b1); which would be computationally expensive. With this smaller model, is possible to spend less time trainig and avoid memory problems in Colab.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "bigscience/bloomz-560m"
device = "cpu" #"cuda" for NVIDIA GPUs, or "cpu" for no GPU.

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,
                                                        device_map = device)

> Specifying "device" we make sure that the model is loaded on the appropriate hardware for performance (GPU if available, CPU otherwise), making processing more efficient.

### 3) INFERENCE WITH THE PRE-TRAINED MODEL

This step is just to test the performance of the pre-trained model without fine-tuning, to see if something changes after the fine-tuning process.

We want the model to generate a prompt acting as a chemistry expert. In this step of the process, we will ask the model to do it. Then, after fine-tuning, we'll ask the same and see the differences between the two generated prompts.


The next function, get_outputs, generates text from a language model given input tokens.

Parameters:

- model: the pre-trained model used for text generation.

- inputs: tokenized input (includes input_ids and attention_mask).

- max_new_tokens: maximum number of new tokens the model can generate.

- repetition_penalty: penalty used to avoid repetition.
- early_stopping: Allows the model to stop generating text before reaching max_new_tokens.

- eos_token_id: end-of-sequence token to stop generation.

The function returns the generated token sequence based on the input and model configuration.

In [None]:
def get_outputs(model, inputs, max_new_tokens=100):
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        repetition_penalty=1.5,
        early_stopping=True,
        eos_token_id=tokenizer.eos_token_id
    )
    return outputs

Now is time to inference the original model:

In [None]:
input_sentences = tokenizer("I want you to act as chemistry expert. ", return_tensors="pt")
outputs_sentence = get_outputs(model, input_sentences.to(device), max_new_tokens=50)

print(tokenizer.batch_decode(outputs_sentence, skip_special_tokens=True))

### 4) LOAD AND PREPARE THE DATASET

For preparing the model to act well as a chemistry expert, we'll "give" the model a dataset that contains prompts to be used with LLMs. Thereby, the model can learn with the example prompts that are contained in the dataset.

The dataset is

https://huggingface.co/datasets/fka/awesome-chatgpt-prompts

In this cells we:

- load the dataset
- tokenize the data using the map function. This map function applies a tokenizer to the "prompt" column of the dataset, batch-processing multiple samples at once.
- select a subset (in this case we select the first 50 samples from the "train" split of the dataset)
- remove the 'act' colum from the dataset (we just want the prompts)

In [None]:
from datasets import load_dataset
dataset = "fka/awesome-chatgpt-prompts"

data = load_dataset(dataset)
display(data)

In [None]:
data = data.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample = data["train"].select(range(50))

train_sample = train_sample.remove_columns('act')

display(train_sample)

### 5) SET UP TRAINING CONFIGURATION FOR THE FINE-TUNING PROCESS

#### 1. Create a LoRa config.

This code block defines a LoRa configuration using the peft library. Here's an explanation the paramenters:

- r: Controls the rank, or the number of adaptation dimensions. As bigger the R, bigger the parameters there are to train. This means that increasing r results in more expressive power but also requires more computational resources for training.

- lora_alpha: a multiplier controlling the overall strength of connections within a Neural Network. Smaller values provide more control over learning rates. Typically set at 1.

- target_modules: Specifies the layers/modules to be adapted. Each model have a specific value that are targeted to efficiently modify key parts of the model during the lora adaptation process. You can check some of them here: https://github.com/huggingface/peft/blob/39ef2546d5d9b8f5f8a7016ec10657887a867041/src/peft/utils/other.py#L220

- lora_dropout: helps to avoid overfitting.

- bias="lora_only": ensures that only the LoRa-specific parameters are fine-tuned during training, not the entire model. Controls whether the bias term is adjusted alongside the model weights during training. When set to "lora_only," the bias remains unaffected, focusing the fine-tuning on LoRa parameters for a more efficient and lightweight training process.

- task_type="CAUSAL_LM": specifies the type of task for the model.

In [None]:
import peft
from peft import LoraConfig, get_peft_model, PeftModel

lora_config = LoraConfig(
    r=4,
    lora_alpha=1,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="lora_only",
    task_type="CAUSAL_LM"
)

#### 2. Create the PEFT model

Build the fine tuned model using Lora with the chosen configuration  

In [None]:
peft_model = get_peft_model(model, lora_config)
print(peft_model.print_trainable_parameters())

We can see that the number of trainable parameters is really small compared with the total number of parameters in the pre-trained model.

Now we create a directory to contain the model

In [None]:
import os
working_dir = './'

output_directory = os.path.join(working_dir, "peft_lab_outputs")

#### 3. Create the TrainingArgs

In this cell we set the configuration for model training, incluiding the hyperparameters, with "TrainingArguments".

- output_dir: specifies where to save the model's outputs (the output directory that we create in the last cell)

- auto_find_batch_size=True: to automatically determine a batch size that fits the data and system's memory.

- learning_rate=3e-2: determines how much to adjust the model's weights with respect to the loss gradient during training. A small learning rate makes the model learn slowly and converge more precisely, but it may take longer. A large learning rate allows the model to learn faster, but with less precision.

- num_train_epochs=2: number of times to "loop over" the training dataset.

- use_cpu=False: specifies if we want to use the CPU (we set to False, for using a GPU if available).

In [None]:
import transformers
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
    output_dir=output_directory,
    auto_find_batch_size=True,
    learning_rate= 3e-2,
    num_train_epochs=2,
    use_cpu=False
)

### 6) TRAIN THE MODEL

(The fine-tuning happens here)

For training the model we need:
- the PEFT model
- the training_args
- the dataset
- the result of DataCollator, the dataset ready to be processed in blocks

A wandb API key may be requested before starting training. Weights & Biases (wandb), request API keys to log the training metrics and some details to the wandb platform, which provides visualization and tracking tools to help monitorizing and analyzing ML projects.

In our code, we didn't explicitly added wandb, but some machine learning libraries like Hugging Face's Transformers, automatically integrate with wandb.

For this case we don't need/want these tracking features, so we will disable wandb. We can also make sure that wandb is not being called during the training process removing explicitly the wandbcallback before starting training.

In [None]:
import os
import transformers

os.environ["WANDB_MODE"] = "disabled"  # Disable W&B globally

# Remove potential existing WandbCallbacks before training
# trainer.remove_callback(transformers.integrations.WandbCallback)

In [None]:
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_sample,
    data_collator=transformers.DataCollatorForLanguageModeling(
        tokenizer,
        mlm=False)
)

In [None]:
trainer.train()

Now we save and then "load again" the model.

In [None]:
peft_model_path = os.path.join(output_directory, f"lora_model")

trainer.model.save_pretrained(peft_model_path)

In [None]:
loaded_model = PeftModel.from_pretrained(model,
                                        peft_model_path,
                                        is_trainable=False)
loaded_model.to(device)

### 7) INFERENCE WITH THE FINE-TUNED MODEL

Now, we ask the same to the model: create a prompt acting as a chemistry expert. Thereby, we can see the differences between the model's answers before and after the fine-tuning.

In [None]:
input_sentences = tokenizer("I want you to act as a chemistry expert.", return_tensors="pt")
finetuned_outputs_sentence = get_outputs(loaded_model,
                                          input_sentences.to(device),
                                          max_new_tokens=50)

print(tokenizer.batch_decode(finetuned_outputs_sentence, skip_special_tokens=True))

### 8) EXAMINATING AND COMPARING THE RESULTS

In [None]:
print('COMPARING THE TWO ANSWERS:\n')
print('Pre-trained model: \n')
display(tokenizer.batch_decode(outputs_sentence, skip_special_tokens=True))
print('\n')
print('Fine-tuned model: \n')
display(tokenizer.batch_decode(finetuned_outputs_sentence, skip_special_tokens=True))

Despite the fact that the model is trained with minimal resources to be computationally efficient,, the difference between the pre-trained model and the fine-tuned model responses is clear in how they handle the same task.

- The pre-trained model provides a general response without specific adjustments for chemistry.

- The fine-tuned model generates a more domain-specific answer by understanding the context of chemistry and adding the response. The answer can include chemical-related language.