Let's define a wrapper function which will get completion from the model from a user question

## Step 1 - Install necessary packages
First, install the dependencies below to get started. As these features are available on the main branches only, we need to install the libraries below from source.

In [1]:
!pip install -q -U transformers --no-index --find-links ../input/llm-detect-pip/

In [2]:
!pip install -q -U accelerate --no-index --find-links ../input/llm-detect-pip/
!pip install -q -U bitsandbytes --no-index --find-links ../input/transformers-4-38-2/
!pip install -q -U transformers --no-index --find-links ../input/llm-detect-pip/
!pip install /kaggle/input/peft-whl-latest/peft-0.10.0-py3-none-any.whl

Processing /kaggle/input/peft-whl-latest/peft-0.10.0-py3-none-any.whl
Installing collected packages: peft
Successfully installed peft-0.10.0


In [3]:
!pip install -q -U peft --no-index --find-links ../input/llm-pkg/

## Step 2 - Model loading
We'll load the model using QLoRA quantization to reduce the usage of memory


In [4]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig


# Load base model(Mistral 7B)
bnb_config = BitsAndBytesConfig(
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
 )


model_id = "/kaggle/input/mistral-7b-it-v02"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [5]:
from peft import PeftConfig, PeftModel
adapter_model_name = "/kaggle/input/fine-tuned-mistral"
model = PeftModel.from_pretrained(model, adapter_model_name)

Now we specify the model ID and then we load it with our previously defined quantization configuration.

Run a inference on the base model. The model does not seem to understand our instruction and gives us a list of questions related to our query.

## Step 3 - Load dataset for finetuning

In [6]:
import pandas as pd
df = pd.read_csv('/kaggle/input/all-in-one-dataset-with-embedding/df_with_emb_20240405.csv')

def create_prompt(original, rewritten):
    return f"Given are 2 essays, the Rewritten essay was created from the Original essay using the google Gemma model.\nYou are trying to understand how the original essay was transformed into a new version.\nAnalyzing the changes in style, theme, etc., please come up with a prompt that must have been used to guide the transformation from the original to the rewritten essay.\nOnly give me the PROMPT. Start directly with the prompt, that's all I need.\nOutput should be only line ONLY\nOriginal Essay: {original}\nRewritten Essay: {rewritten}"

# Apply the function to each row to create the prompt column
df['prompt'] = df.apply(lambda row: create_prompt(row['original_text'], row['rewritten_text']), axis=1)

In [7]:
df.head(10)

Unnamed: 0,original_text,rewrite_prompt,rewritten_text,dataset_id,original_text_emb_0,original_text_emb_1,original_text_emb_2,original_text_emb_3,original_text_emb_4,original_text_emb_5,...,rewritten_text_emb_759,rewritten_text_emb_760,rewritten_text_emb_761,rewritten_text_emb_762,rewritten_text_emb_763,rewritten_text_emb_764,rewritten_text_emb_765,rewritten_text_emb_766,rewritten_text_emb_767,prompt
0,"Dear Randy,\n\nI hope this letter finds you we...",Rephrase this letter to infuse it with an elfi...,"Dear Randy,\n\nMay this enchanted message find...",host,-0.028678,-0.07226,-0.003482,0.050595,-0.0106,-0.023786,...,0.011079,-0.026521,-0.035838,-0.011306,0.03569,-0.00581,0.004506,-0.037956,0.001139,"Given are 2 essays, the Rewritten essay was cr..."
1,"This quilt, that my mother made, \n \n Still m...",Regency Romance: Model the text on a Regency r...,"The softest brown and brightest blue quilt, cr...",nbroad_1,-0.007521,-0.082697,0.041606,0.04802,0.00787,-0.018765,...,-0.019648,-0.044074,-0.049267,0.000438,0.047913,0.042219,0.012958,-0.048047,0.006332,"Given are 2 essays, the Rewritten essay was cr..."
2,It's the job of our agency to keep track of th...,Write like Ernest Hemingway: Focus on Hemingwa...,The agency's responsibility is to track and co...,nbroad_1,-0.012435,-0.057448,0.04129,0.021997,-0.005588,-0.011431,...,-0.024727,-0.005645,-0.049009,-0.007487,0.029255,0.023511,0.006769,-0.041317,-0.012102,"Given are 2 essays, the Rewritten essay was cr..."
3,"The first punch gets me right in the ribs, kno...",Grimm's Fairy Tales: Adapt the text to mimic t...,"In the sweltering sun, the stench of sweat and...",nbroad_1,0.023922,-0.051777,0.037086,0.057018,0.006423,0.006742,...,0.00621,-0.01069,-0.05732,0.015437,0.041056,0.012486,-0.00654,-0.030836,-0.00044,"Given are 2 essays, the Rewritten essay was cr..."
4,Some nights I lay awake staring at the ceiling...,High Fantasy Epic: Transform the essay into a ...,In the tapestry of the ethereal realm of Eldri...,nbroad_1,-0.002841,-0.056669,0.049056,0.072066,-0.00643,-0.020034,...,0.014793,-0.024928,-0.052689,-0.005655,0.044724,0.016817,-0.001,-0.026472,0.008088,"Given are 2 essays, the Rewritten essay was cr..."
5,"I can hardly read the letter, because the hand...",Fairy Tale Villain: Use the menacing and craft...,"My hand quivered as I clutched the letter, the...",nbroad_1,-0.037955,-0.077311,0.039333,0.026843,-0.003027,-0.016087,...,-0.021211,0.00188,-0.032622,0.013712,0.018871,0.021326,0.011081,-0.025262,-0.015825,"Given are 2 essays, the Rewritten essay was cr..."
6,`` They do n't believe we're interesting?'' on...,"Beat Generation: Channel the spontaneous, free...",The mermaids' council deliberated on the dwind...,nbroad_1,-0.024788,-0.07208,0.046875,0.012351,-0.012402,-0.016356,...,0.013847,-0.006507,-0.027821,-0.012744,0.035653,0.029916,-0.011108,-0.007995,-0.009567,"Given are 2 essays, the Rewritten essay was cr..."
7,Not a single person in the crowd of Nora Janic...,"Fantasy Dwarf: Write with the gruff, hearty st...",The crowd at Nora Janice's funeral was silent ...,nbroad_1,-0.019732,-0.06809,0.044369,0.022035,-0.013128,-0.012057,...,-0.002342,-0.000681,-0.049935,-0.018382,0.035788,0.023763,0.022539,-0.046245,-0.007326,"Given are 2 essays, the Rewritten essay was cr..."
8,`` Brigands and cutpurses have nothing on me. ...,"Drunkard: Infuse the essay with the rambling, ...","""Swerry brutes and cutthroat cutpurse, they ai...",nbroad_1,-0.013881,-0.065521,0.003461,0.035074,0.014662,-0.009928,...,-0.027956,-0.042242,-0.058649,0.016741,0.045051,0.022986,-0.016166,-0.027737,-0.035562,"Given are 2 essays, the Rewritten essay was cr..."
9,Sergeant Clark lifted his wrist to look at the...,High Fantasy Epic: Transform the essay into a ...,The scent of ash and molten earth hung heavy i...,nbroad_1,-0.014956,-0.083447,0.047591,0.024411,0.021683,0.004255,...,0.006138,-0.025443,-0.060491,0.008632,0.040641,0.024933,0.000982,-0.009748,0.011877,"Given are 2 essays, the Rewritten essay was cr..."


Instruction Fintuning - Prepare the dataset under the format of "prompt" so the model can better understand :
1. the function generate_prompt : take the instruction and output and generate a prompt
2. shuffle the dataset
3. tokenizer the dataset

### Formatting the Dataset

Now, let's format the dataset in the required [Mistral-7B-Instruct-v0.1 format](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).

> Many tutorials and blogs skip over this part, but I feel this is a really important step.

We'll put each instruction and input pair between `[INST]` and `[/INST]` output after that, like this:

```
<s>[INST] What is your favorite condiment? [/INST]
Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavor to whatever I'm cooking up in the kitchen!</s>
```

You can use the following code to process your dataset and create a JSONL file in the correct format:

In [8]:
def generate_prompt(row):
    """Generate input text based on a prompt, task instruction, and answer.

    :param row: Series: Data point (row of a pandas DataFrame)
    :return: str: generated prompt text
    """
    # Assuming 'prompt' is the generated context and 'rewrite_prompt' is the instruction
    return f"<s>[INST]{row['prompt']}[/INST]</s>"

# Apply the function to each row to transform the 'prompt' and 'rewrite_prompt' columns
df['prompt'] = df.apply(generate_prompt, axis=1)

In [9]:
from datasets import Dataset
df = df[['prompt', 'rewrite_prompt']]

# Convert the pandas DataFrame to a Hugging Face Dataset
dataset = Dataset.from_pandas(df)
dataset = dataset.select(range(2000))
# Shuffle the dataset with a seed for reproducibility
dataset = dataset.shuffle(seed=1234)

# Tokenize the prompts in the dataset.
# This example assumes that the model requires only 'input_ids'.
# Adjust as necessary for your model, e.g., adding 'attention_mask'.
dataset = dataset.map(lambda samples: tokenizer(samples["prompt"]), batched=True)

# Split the dataset into training and testing sets (90% training, 10% testing)
train_test_split = dataset.train_test_split(test_size=0.1)
train_data = train_test_split['train']
test_data = train_test_split['test']

  0%|          | 0/2 [00:00<?, ?ba/s]

We'll need to tokenize our data so the model can understand.


Split dataset into 90% for training and 10% for testing

### After Formatting, We should get something like this

```json
{
"text":"<s>[INST] Create a function to calculate the sum of a sequence of integers. here are the inputs [1, 2, 3, 4, 5] [/INST]
# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum</s>",
"instruction":"Create a function to calculate the sum of a sequence of integers",
"input":"[1, 2, 3, 4, 5]",
"output":"# Python code def sum_sequence(sequence): sum = 0 for num in,
 sequence: sum += num return sum"
"prompt":"<s>[INST] Create a function to calculate the sum of a sequence of integers. here are the inputs [1, 2, 3, 4, 5] [/INST]
# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum</s>"

}
```

While using SFT (**[Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/main/en/sft_trainer)**) for fine-tuning, we will be only passing in the “text” column of the dataset for fine-tuning.

In [10]:
print(test_data)

Dataset({
    features: ['prompt', 'rewrite_prompt', 'input_ids', 'attention_mask'],
    num_rows: 200
})


## Step 4 - Apply Lora  
Here comes the magic with peft! Let's load a PeftModel and specify that we are going to use low-rank adapters (LoRA) using get_peft_model utility function and  the prepare_model_for_kbit_training method from PEFT.

In [11]:
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [12]:
print(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralSdpaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): lora.Linear4bit(
                (base_layer): 

Use the following function to find out the linear layers for fine tuning.
QLoRA paper : "We find that the most critical LoRA hyperparameter is how many LoRA adapters are used in total and that LoRA on all linear transformer block layers is required to match full finetuning performance."

In [13]:
import bitsandbytes as bnb
def find_all_linear_names(model):
  cls = bnb.nn.Linear4bit #if args.bits == 4 else (bnb.nn.Linear8bitLt if args.bits == 8 else torch.nn.Linear)
  lora_module_names = set()
  for name, module in model.named_modules():
    if isinstance(module, cls):
      names = name.split('.')
      lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if 'lm_head' in lora_module_names: # needed for 16-bit
      lora_module_names.remove('lm_head')
  return list(lora_module_names)

In [14]:
modules = find_all_linear_names(model)
print(modules)

['down_proj', 'base_layer', 'up_proj']


In [15]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=modules,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)

In [16]:
trainable, total = model.get_nb_trainable_parameters()
print(f"Trainable: {trainable} | total: {total} | Percentage: {trainable/total*100:.4f}%")


Trainable: 20971520 | total: 7354978304 | Percentage: 0.2851%


## Step 5 - Run the training!

Setting the training arguments:
* for the reason of demo, we just ran it for few steps (100) just to showcase how to use this integration with existing tools on the HF ecosystem.

In [17]:
# from datasets import load_dataset
# data = load_dataset("TokenBender/code_instructions_122k_alpaca_style", split='train')
# data = data.train_test_split(test_size=0.1)
# train_data = data["train"]
# test_data = data["test"]

In [18]:
import transformers

tokenizer.pad_token = tokenizer.eos_token


trainer = transformers.Trainer(
     model=model,
     train_dataset=train_data,
     eval_dataset=test_data,
     args=transformers.TrainingArguments(
         per_device_train_batch_size=1,
         gradient_accumulation_steps=4,
         warmup_ratio=0.03,
         max_steps=2,
         learning_rate=2e-4,
         fp16=True,
         logging_steps=1,
         output_dir="outputs_mistral_b_finance_finetuned_test",
         optim="paged_adamw_8bit",
         save_strategy="epoch",
             logging_dir='./logs',            # directory for storing logs
        report_to="none",
     ),
     data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
 )


2024-04-06 20:28:24.758803: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-06 20:28:24.758918: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-06 20:28:24.872059: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


### Fine-Tuning with qLora and Supervised Fine-Tuning

We're ready to fine-tune our model using qLora. For this tutorial, we'll use the `SFTTrainer` from the `trl` library for supervised fine-tuning. Ensure that you've installed the `trl` library as mentioned in the prerequisites.

#new code using SFTTrainer
import transformers

from trl import SFTTrainer

tokenizer.pad_token = tokenizer.eos_token
torch.cuda.empty_cache()

trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=test_data,
    dataset_text_field="prompt",
    peft_config=lora_config,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=0.03,
        max_steps=100,
        learning_rate=2e-4,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit",
        save_strategy="epoch",
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

Start the training

### Let's start the training process

In [19]:
import warnings

# Suppress all warnings
warnings.filterwarnings('ignore')

model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()


Step,Training Loss
1,2.0669
2,2.1081


TrainOutput(global_step=2, training_loss=2.0875030755996704, metrics={'train_runtime': 80.6965, 'train_samples_per_second': 0.099, 'train_steps_per_second': 0.025, 'total_flos': 417744053747712.0, 'train_loss': 2.0875030755996704, 'epoch': 0.0})

In [20]:
import re
def remove_special_characters(text):
    # This regex will match any character that is not a letter, number, or whitespace
    pattern = r'[^a-zA-Z0-9\s]'
    text =text.replace("Transform" ,"improve")
    text =text.replace("Reimagine" ,"rewrite")
    # Replace these characters with an empty string
    clean_text = re.sub(pattern, '', text)
    return clean_text
def get_completion(prompt) -> str:
    device = "cuda:0"
    print('prompt')
    print(prompt)
    encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)

    model_inputs = encodeds.to(device)


    generated_ids = model.generate(**model_inputs, max_new_tokens=200, do_sample=True, pad_token_id=tokenizer.eos_token_id)
    generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print('gen')
    print(generated_text)
    prompt_length = len(tokenizer.encode(prompt))
    print('output')
    prompt_tokens = tokenizer.encode(prompt, add_special_tokens=False)
    generated_tokens = generated_ids[0][len(prompt_tokens):]
    generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True).strip()
    print(generated_text)
    return generated_text

In [21]:
test_df = pd.read_csv("/kaggle/input/llm-prompt-recovery/test.csv")
test_df['prompt'] = test_df.apply(lambda row: create_prompt(row['original_text'], row['rewritten_text']), axis=1)
test_df['prompt'] = test_df.apply(generate_prompt, axis=1)
test_df['rewrite_prompt'] = test_df.apply(lambda row: get_completion(row['prompt']), axis=1)
test_df = test_df[['id', 'rewrite_prompt']]
test_df

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


prompt
<s>[INST]Given are 2 essays, the Rewritten essay was created from the Original essay using the google Gemma model.
You are trying to understand how the original essay was transformed into a new version.
Analyzing the changes in style, theme, etc., please come up with a prompt that must have been used to guide the transformation from the original to the rewritten essay.
Only give me the PROMPT. Start directly with the prompt, that's all I need.
Output should be only line ONLY
Original Essay: The competition dataset comprises text passages that have been rewritten by the Gemma LLM according to some rewrite_prompt instruction. The goal of the competition is to determine what prompt was used to rewrite each original text.  Please note that this is a Code Competition. When your submission is scored, this example test data will be replaced with the full test set. Expect roughly 2,000 original texts in the test set.
Rewritten Essay: Here is your shanty: (Verse 1) The text is rewritten,

Unnamed: 0,id,rewrite_prompt
0,-1,"[h3]Here is your shanty, me hearties,[/:eacute..."


In [22]:
test_df.to_csv('submission.csv', index=False)