For the implementation of the method relying on LLMs, such as Llama or Qwen, we used the template provided by Unsloth (https://huggingface.co/unsloth), which is a resource-efficient implementation suited for our task.




To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + support us if you can!
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

In [1]:
import torch
major_version, minor_version = torch.cuda.get_device_capability()

# %%
import os
import pandas as pd
import numpy as np
# os.environ["WANDB_DISABLED"] = "true"

# %%
from unsloth import FastLanguageModel
import torch
# model_name = "Qwen/Qwen2-1.5B";load_in_4bit = False,
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+




# As our base, we use Llama-3 8B model
model_name = "unsloth/llama-3-8b-bnb-4bit";load_in_4bit = True,



model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,load_in_4bit = load_in_4bit,
    max_seq_length = max_seq_length,
    dtype = dtype,
    

)



🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth: Fast Llama patching release 2024.6
   \\   /|    GPU: NVIDIA RTX 4000 Ada Generation. Max memory: 19.681 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.26.post1+cu118. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


* We support Llama, Mistral, CodeLlama, TinyLlama, Vicuna, Open Hermes etc
* And Yi, Qwen ([llamafied](https://huggingface.co/models?sort=trending&search=qwen+llama)), Deepseek, all Llama, Mistral derived archs.
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* [**NEW**] With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.

In [5]:
# specify the yes and no token ids
yes_token_id = tokenizer.encode("Yes", add_special_tokens=False)[0]
no_token_id = tokenizer.encode("No", add_special_tokens=False)[0]

In [6]:
# positive_token_id = tokenizer.encode("Positive", add_special_tokens=False)[0]
# negative_token_id = tokenizer.encode("Negative", add_special_tokens=False)[0]
# print(yes_token_id, no_token_id, positive_token_id, negative_token_id)

In [8]:
# keep only the yes and no tokens from lm_head
par = torch.nn.Parameter(torch.vstack([model.lm_head.weight[yes_token_id, :], model.lm_head.weight[no_token_id, :]]))
print(par.shape)
print(model.lm_head.weight.shape)
model.lm_head.weight = par

torch.Size([2, 4096])
torch.Size([128256, 4096])


In [10]:
# Adding LoRA adapters to the model

from peft import LoftQConfig

model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = [
        "lm_head",
        "q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    # init_lora_weights = 'loftq',
    # loftq_config = LoftQConfig(loftq_bits = 4, loftq_iter = 1), # And LoftQ
)
print("trainable parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))

Unsloth: Already have LoRA adapters! We shall skip this step.


Unsloth: Casting lm_head to float32
trainable parameters: 83894272


In [11]:
from datasets import load_dataset
from trl import SFTTrainer

In [12]:
# Load our tweet dataset

import pandas as pd
import numpy as np


import os
cwd = os.getcwd()
kaggle = cwd == "/kaggle/working"
input_dir = "/kaggle/input/tweets/twitter-datasets/twitter-datasets/" if kaggle else "data/twitter-datasets/twitter-datasets/"
output_dir = "/kaggle/working/" if kaggle else "data/"

In [13]:
# prepare training and validation datasets

data = pd.read_csv(output_dir + "data_cleaned.csv")

from sklearn.model_selection import train_test_split

# keep a subset (for testing)
#data_sample = data.sample(n=10000, random_state=42)
data_sample = data.sample(n=255000)

# find frac so val size is 5000
train_df, val_df = train_test_split(data_sample, test_size=5000/len(data_sample), random_state=42)
# save to output_dir
train_df.to_csv(output_dir + "train.csv", index=False)
print(len(train_df))

250000


In [15]:
dataset = load_dataset(output_dir,data_files="train.csv", split="train")
dataset

Repo card metadata block was not found. Setting CardData to empty.


Generating train split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['text', 'label'],
    num_rows: 250000
})

In [16]:
# Define the prompt for the task which will be used for determining whether the tweet is positive or negative

prompt = """Here is a tweet:
{}

Does this tweet have a positive sentiment?
The correct answer is:{}"""

positivelabel = "Yes"
negativelabel = "No"


def formatting_prompts_func(dataset_):
    if isinstance(dataset_['text'], str):
        if model_name.lower().__contains__("qwen"):
            return [""]
        elif model_name.lower().__contains__("llama"):
            return [""]*100
        else:
            return ""
        
    texts = []
    for i in range(len(dataset_['text'])):
        t = dataset_['text'][i]
        label = positivelabel if dataset_['label'][i] == 1 else negativelabel
        text = prompt.format(t, label)


        texts.append(text)
    return texts




In [17]:
import warnings
import numpy as np
from typing import Any, Dict, List, Union
from transformers import DataCollatorForLanguageModeling

class DataCollatorForLastTokenLM(DataCollatorForLanguageModeling):
    """
    Data collator used for completion tasks. It ensures that all the tokens of the labels are set to an 'ignore_index'
    except for the last token. This ensures that the loss is only calculated on the last token of the sequence.

    Args:
        mlm (`bool`, *optional*, defaults to `False`): Whether or not to use masked language modeling in the underlying
            `DataCollatorForLanguageModeling` class. Note that this option currently has no effect but is present
             for flexibility and backwards-compatibility.
        ignore_index (`int`, *optional*, defaults to `-100`):
            The index to use to ignore all tokens except the last one.
    """

    def __init__(
        self,
        *args,
        mlm: bool = False,
        ignore_index: int = -100,
        **kwargs,
    ):
        super().__init__(*args, mlm=mlm, **kwargs)
        self.ignore_index = ignore_index

    def torch_call(self, examples: List[Union[List[int], Any, Dict[str, Any]]]) -> Dict[str, Any]:
        batch = super().torch_call(examples)

        for i in range(len(examples)):
            # Find the last non-padding token
            last_token_idx = (batch["labels"][i] != self.ignore_index).nonzero()[-1].item()
            # Set all labels to ignore_index except for the last token
            batch["labels"][i, :last_token_idx] = self.ignore_index
            # The old labels for the Yes and No tokens need to be mapped to 1 and 0
            batch["labels"][i, last_token_idx] = 1 if batch["labels"][i, last_token_idx] == yes_token_id else 0


        return batch

In [18]:
collator = DataCollatorForLastTokenLM(tokenizer=tokenizer)

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [20]:
from trl import SFTTrainer
from transformers import TrainingArguments, Trainer
from typing import Tuple

In [21]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    # dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 32,
        gradient_accumulation_steps = 1,
        warmup_steps = 10,
        # max_steps = 60,
        learning_rate = 4e-5,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        num_train_epochs = 1,
        report_to = "wandb",
        group_by_length = True,
    ),
    formatting_func=formatting_prompts_func,
    data_collator=collator,
)

Map (num_proc=2):   0%|          | 0/250000 [00:00<?, ? examples/s]

In [22]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA RTX 4000 Ada Generation. Max memory = 19.681 GB.
5.449 GB of memory reserved.


In [23]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 250,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 32 | Gradient Accumulation steps = 1
\        /    Total batch size = 32 | Total steps = 7,813
 "-____-"     Number of trainable parameters = 83,894,272
[34m[1mwandb[0m: Currently logged in as: [33mcs06[0m ([33methz1[0m). Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
1,0.6402
2,0.6836
3,0.6626
4,0.6392
5,0.7473
6,1.0814
7,0.7777
8,0.4823
9,0.4691
10,0.5174


In [24]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

11868.9523 seconds used for training.
197.82 minutes used for training.
Peak reserved memory = 6.061 GB.
Peak reserved memory for training = 0.612 GB.
Peak reserved memory % of max memory = 30.796 %.
Peak reserved memory for training % of max memory = 3.11 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

In [25]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

In [26]:
model.save_pretrained(f"lora_model_{model_name.replace('/','_')}")

In [29]:
# load test_data.txt and turn it into a df
test_lines = open(input_dir+'test_data.txt').read().splitlines() # id,text
test = np.array(test_lines)
test_df = pd.DataFrame(test, columns=['text'])
test_df.index.name = 'Id'
# test_df = test_df.iloc[:100]
test_df # Id, text

Unnamed: 0_level_0,text
Id,Unnamed: 1_level_1
0,"1,sea doo pro sea scooter ( sports with the po..."
1,"2,<user> shucks well i work all week so now i ..."
2,"3,i cant stay away from bug thats my baby"
3,"4,<user> no ma'am ! ! ! lol im perfectly fine ..."
4,"5,whenever i fall asleep watching the tv , i a..."
...,...
9995,"9996,had a nice time w / my friend lastnite"
9996,"9997,<user> no it's not ! please stop !"
9997,"9998,not without my daughter ( dvd two-time os..."
9998,"9999,<user> have fun in class sweetcheeks"


In [31]:
# Tokenize the test inputs and sort them by their tokenized length
test_tokenized_inputs = []
for i in range(len(test_df['text'])):
    text = test_df['text'].iloc[i]
    test_str = prompt.format(text, "")
    tokenized_input = tokenizer(test_str, return_tensors="pt", add_special_tokens=False)
    test_tokenized_inputs.append((tokenized_input, test_str, i))

# Sort by tokenized length
test_tokenized_inputs.sort(key=lambda x: x[0]['input_ids'].shape[1])

# Group the test inputs by their tokenized length
test_grouped_inputs = defaultdict(list)
for tokenized_input, test_str, idx in test_tokenized_inputs:
    length = tokenized_input['input_ids'].shape[1]
    test_grouped_inputs[length].append((tokenized_input, test_str, idx))

# Process each test group in batches of 64
test_batch_size = 64
test_probabilities = []
test_indices = []

for length, group in test_grouped_inputs.items():
    for i in range(0, len(group), test_batch_size):
        batch = group[i:i+test_batch_size]
        batch_inputs = [item[0] for item in batch]
        batch_strings = [item[1] for item in batch]
        batch_indices = [item[2] for item in batch]

        # Concatenate the batch inputs
        input_ids = torch.cat([item['input_ids'] for item in batch_inputs], dim=0).to("cuda")
        attention_mask = torch.cat([item['attention_mask'] for item in batch_inputs], dim=0).to("cuda")

        # Forward pass
        with torch.no_grad():
            outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        
        # Get logits for the first token prediction (assuming binary classification)
        logits = outputs.logits[:, -1, :2]  # Only consider logits for 0 and 1
        
        # Apply softmax
        probabilities = F.softmax(logits, dim=-1)
        
        test_probabilities.extend(probabilities[:, 1].cpu().numpy())  # Probability of positive class
        test_indices.extend(batch_indices)

# Ensure the probabilities are sorted by their original index
sorted_probabilities = [x for _, x in sorted(zip(test_indices, test_probabilities))]

# Create binary predictions using threshold 0.5
binary_predictions = [1 if prob >= 0.5 else 0 for prob in sorted_probabilities]

# Create the submission DataFrame with binary predictions
submission_df['Prediction'] = binary_predictions
submission_df.to_csv('submission.csv', index=False)

# Create the test_probs DataFrame with probabilities
test_probs_df = pd.DataFrame({'Probability': sorted_probabilities})
test_probs_df.to_csv(f'test_probs_{model_name[:4]}.csv', index=False)

print("Submission files created: submission.csv and test_probs.csv")

Submission files created: submission.csv and test_probs.csv


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [35]:
model.save_pretrained("lora_model") # Local saving
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving