**Efficient Fine-Tuning of LLaMA 3.1 8B-Instruct with 4-bit Quantization and LoRA**

This project demonstrates the efficient deployment and fine-tuning of the LLaMA 3.1 8B-Instruct model using 4-bit quantization (via bitsandbytes) and Low-Rank Adaptation (LoRA) for memory-efficient adaptation of large-scale language models. The aim is to make large transformer-based models accessible and tunable even on modest hardware setups.

Objective:
To load and fine-tune a quantized LLaMA 3.1 (8B) model using 4-bit quantization and LoRA adapters, thereby reducing GPU memory requirements and accelerating training while preserving performance.

Key Technologies:
BitsAndBytes 4-bit Quantization: Reduces memory usage and speeds up inference by representing weights with only 4 bits and computing with 16-bit precision (float16).

LoRA (Low-Rank Adaptation): Fine-tunes small adapter layers inserted into the frozen base model, significantly reducing the number of trainable parameters.

Transformers and Accelerate Libraries: Power seamless model loading, tokenization, and training workflows.

Model & Tokenizer:
Base Model: LLaMA 3.1 - 8B Instruct

Tokenizer is loaded with padding set to eos_token for causal language modeling compatibility.

Quantization is configured with load_in_4bit=True and bnb_4bit_compute_dtype='float16'.

Training Strategy:
The model is trained using Hugging Face's Trainer API.

LoRA adapters are injected into the base model to fine-tune it on a custom dataset while keeping the majority of model weights frozen.

Only a small fraction of parameters are updated, greatly reducing computational cost.

Benefits:
High Efficiency: By combining LoRA and 4-bit quantization, this setup achieves a strong trade-off between performance and memory usage.

Scalability: Enables experimentation with multi-billion parameter models on single or few-GPU environments.

Customization: The architecture supports domain adaptation, dialogue fine-tuning, or specialized instruction following with minimal resources.

This project exemplifies modern strategies to scale down large language model usage and training, democratizing access to powerful AI tools for research, prototyping, and deployment.

In [None]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


In [None]:
# Installing bitsandbytes for quantization
!pip uninstall -y bitsandbytes
!pip install -U bitsandbytes


In [None]:
#Important libraries
import importlib
import torch
import bitsandbytes as bnb
importlib.reload(bnb)
print("bitsandbytes version:", bnb.__version__)


In [None]:
torch.cuda.init()

In [None]:
!pip install --upgrade transformers

import transformers

#To enable efficient memory handling
torch.backends.cuda.enable_mem_efficient_sdp(True)
torch.backends.cuda.enable_flash_sdp(False)

model_path= "/kaggle/input/llama-3.1/transformers/8b-instruct/2"

In [None]:
!pip install datasets

In [None]:
import wandb


import datasets
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import DataCollatorForLanguageModeling
from transformers import TrainingArguments, Trainer

In [None]:
#Loading the training dataset using datasets
train_dataset = load_dataset("csv", data_files='/kaggle/input/multi-lingual-sentiment-analysis/train.csv')

In [None]:
#Loading test data
test_dataset=load_dataset("csv", data_files="/kaggle/input/multi-lingual-sentiment-analysis/test.csv")

In [None]:
#Dictionary for language mapping
lang={'as': 'Assamese','bd': 'Bodo','bn': 'Bengali','gu': 'Gujarati','hi': 'Hindi','kn': 'Kannada','ml':'Malayalam','mr': 'Marathi','or': 'Odia','pa': 'Punjabi','ta': 'Tamil','te': 'Telugu','ur': 'Urdu'}

In [None]:
!pip uninstall -y bitsandbytes
!pip install -U bitsandbytes

!pip install -U transformers accelerate

from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True,  # Enable 4-bit quantization
    bnb_4bit_compute_dtype="float16")  # Setting computation precision

In [None]:
# Loading the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token = tokenizer.eos_token

# Loading the model
model = AutoModelForCausalLM.from_pretrained(
    model_path,pad_token_id=tokenizer.eos_token_id, quantization_config=quantization_config,
    device_map="auto"
)

In [None]:
#Dictionary for mapping prompt
option={"Positive":"A","Negative":"B"}

In [None]:
# Function for adding a column in the dataset having the prompt
def modify_data(example):
    language=lang[example["language"]]
    prompt="Question: Which sentiment does the sentence "+ example["sentence"]+ " in the Indian language "+language+" have? Option A) Positive, Option B) Negative.The answer is Option "+option[example["label"]]+") "+example["label"]
    example["modified_text"] = prompt
    return example

In [None]:
#Modifying the dataset by adding a prompt column
train_dataset=train_dataset["train"].map(modify_data,remove_columns=['ID', 'sentence', 'label', 'language'])

In [None]:
#Tokenizing function
def tokenize(example):
    return tokenizer(example["modified_text"])

In [None]:
train_dataset= train_dataset.map(tokenize,batched=True,num_proc=4, remove_columns=['modified_text'])

In [None]:
#Data collator
data_collator = DataCollatorForLanguageModeling(tokenizer,mlm=False)

In [None]:
from peft import LoraConfig, TaskType, LoraModel

#Lora configuration
lora_config = LoraConfig(
    r=16,
    target_modules=["q_proj", "v_proj"],
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    lora_alpha=32,
    lora_dropout=0.05
)

In [None]:
from peft import get_peft_model

lora_model = get_peft_model(model, lora_config)

In [None]:
#The training arguments
training_args = TrainingArguments( output_dir='lora_llama_1b_ct',
                                  num_train_epochs=5,
                                  per_device_train_batch_size=1,
                                  bf16=False,
                                  fp16=False,
                                  tf32=False,
                                  gradient_accumulation_steps=10,
                                  adam_beta1=0.9,
                                  adam_beta2=0.999,
                                  learning_rate=2e-5,
                                  weight_decay=0.01,
                                  logging_dir='logs',
                                  report_to='none',
                                )


In [None]:
#Creating the trainer
trainer = Trainer(model=lora_model,
                  args = training_args,
                 train_dataset=train_dataset,
                 eval_dataset=None,
                 data_collator = data_collator)

In [None]:
#Training
results = trainer.train()

In [None]:
#Lists to add the predictions and indices
predictions=[]
ids=[]
id=1

In [None]:
for data in test_dataset["train"]:
    language=lang[data["language"]]

    #Creating prompt
    prompt="Question: Which sentiment does the sentence "+ data["sentence"]+ " in the Indian language "+language+" have? Option A) Positive, Option B) Negative.The answer is Option "
    len_prompt=len(prompt)
    #Tokenized inputs
    inputs = tokenizer(prompt,return_tensors='pt')
    #Generating output
    outputs = lora_model.generate(**inputs, max_new_tokens=10, do_sample=False)
    output=tokenizer.batch_decode(outputs, skip_special_tokens=True)
    stringout=output[0][len_prompt:]
    stringout=stringout.lower()
    #Checking if the CLM output contains the words Positive or Negative
    if "positive" in stringout[:15]:
        predictions.append("Positive")
    else:
        predictions.append("Negative")
    ids.append(id)
    id+=1



In [None]:
#Converting to csv
submission = pd.DataFrame({
    'ID': ids,                # ID column
    'label':predictions  # Predictions column
})


submission.to_csv('submission.csv', index=False)