# There are 4 steps in the implemenation of fine-tuning LLM with company policy documents


1. Installing necessary libraries for fine-tuning HR Assistance model with Company
2. Preprocessing and Chunking Company Policy Documents for Fine-Tuning Model using QLoRA
3. Fine-Tuning Language Model using QLoRA with Company Policy Dataset for HR Assistant & Save Fine-Tuned LLM to GoogleDrive
4. Clean up the used space

# 1. Installing necessary libraries for fine-tuning HR Assistance model with Company Policy documents



In [None]:
%%capture
!pip install langchain  # Framework for building language model applications
!pip install sentence_transformers==2.2.2  # Embedding model for sentence-level representations
!pip install -U langchain-community  # To use Hugging Face embedding models
!pip install gradio  # To build a demo site for chatbots
!pip install langchain_google_genai --upgrade  # Install or upgrade the langchain_google_genai package for Google GenAI support
!pip install faiss-cpu  # Vector database for efficient similarity search
!pip install --upgrade gradio  # Upgrade Gradio to the latest version
!pip install pypdf  # For handling PDF documents
!pip install datasets  # For accessing various datasets
# !pip install wandb  # Weights & Biases for experiment tracking (not used)
!pip install transformers accelerate  # Hugging Face Transformers and Accelerate library for efficient model deployment
!pip install bitsandbytes  # Quantization and optimization for transformers
!pip install huggingface_hub  # For accessing models from Hugging Face Hub
!pip install torch  # For deep learning framework
!pip install peft transformers accelerate  # Parameter-Efficient Fine-Tuning for transformers
!pip install -U bitsandbytes accelerate transformers peft  # Upgrade bitsandbytes and related libraries
!pip install torch --index-url https://download.pytorch.org/whl/cu118  # Install PyTorch with CUDA support for GPU acceleration


# 2. Preprocessing and Chunking Company Policy Documents for Fine-Tuning Model using QLoRA


**Documentation:** Preprocessing and Chunking Company Policy Documents for Fine-Tuning Model using QLoRA for HR AI Assistant

**Usage Context:**
This preprocessing pipeline is designed for an HR AI Assistant that helps employees retrieve company policies using a fine-tuned LLM with QLoRA. The chunked dataset is utilized for Retrieval-Augmented Generation (RAG), enabling efficient question-answering on HR policies. By structuring policy documents in smaller, meaningful sections, the AI assistant can provide more accurate and contextually relevant responses.

1. Suppress Warnings & Import Libraries:
Suppresses unnecessary warnings to improve readability.
Imports essential libraries for text processing, embeddings, vector storage, document loading, and LLM-based chat assistance.

2. Setup & Authentication:
Mounts Google Drive to access stored policy documents.
Downloads NLTK datasets required for text processing.
Logs into the Hugging Face Hub and sets an API token for authentication.

3. Load Company Policy Documents:
Retrieves all PDF files from a specified folder in Google Drive.
Extracts text from PDFs using PyPDFLoader, converting them into structured documents for further processing.

4. Text Splitting & Preprocessing:
Uses RecursiveCharacterTextSplitter to divide extracted text into manageable chunks. Ensures overlapping chunks for contextual coherence, which helps in fine-tuning models using QLoRA.

5. Dataset Creation for Fine-Tuning:
Saves processed text chunks into a JSONL file (company_policy_dataset.jsonl).
This dataset is later used for fine-tuning the HR AI Assistant using QLoRA to improve document-based query responses.


In [None]:
import warnings
warnings.filterwarnings("ignore")  # Suppress unnecessary warnings

# Import necessary libraries for processing and fine-tuning
from langchain.text_splitter import RecursiveCharacterTextSplitter  # For chunking text
from langchain.embeddings import HuggingFaceEmbeddings  # For embedding models
from langchain.vectorstores import FAISS  # For vector database support
from langchain_community.document_loaders import PyPDFLoader  # For loading PDF documents
from langchain_google_genai import GoogleGenerativeAI  # For using Google LLM for chat assistance
from langchain.prompts import PromptTemplate  # For prompt templating
from langchain.chains import RetrievalQA  # For Question-Answering tasks over documents
from google.colab import drive  # For mounting Google Drive in Colab
from huggingface_hub import notebook_login  # For logging into Hugging Face hub

import os
import json
import torch
import nltk  # For NLP tasks

# Mount Google Drive and set up directories
drive.mount('/content/drive')
folder_path = '/content/drive/MyDrive/AI Project/CompanyPolicyDocuments'  # Folder containing PDFs

# Download necessary NLTK data
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Login to Hugging Face Hub
notebook_login()
os.environ["HF_TOKEN"] = ""  # Set Hugging Face token

# Get list of all PDF files from the folder
pdf_files = [f for f in os.listdir(folder_path) if f.endswith('.pdf')]

# Load all PDF documents
documents = []
for pdf_file in pdf_files:
    pdf_path = os.path.join(folder_path, pdf_file)
    loader = PyPDFLoader(pdf_path)  # PDF loader for extracting text
    documents.extend(loader.load())

print(f"Loaded {len(documents)} documents from {folder_path}")

# Split documents into smaller chunks using RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
text_chunks = text_splitter.split_documents(documents)

print(f"Number of text chunks created: {len(text_chunks)}")

# Save text chunks into a JSONL file for training
dataset_path = "company_policy_dataset.jsonl"
with open(dataset_path, "w") as f:
    for chunk in text_chunks:
        json.dump({"text": chunk.page_content}, f)
        f.write("\n")

print(f"Dataset saved to {dataset_path}")


Loaded 44 documents from /content/drive/MyDrive/AI Project/CompanyPolicyDocuments
212


# 3. Fine-Tuning Language Model using QLoRA with Company Policy Dataset for HR Assistant & Save Fine-Tuned LLM to GoogleDrive

**Documentation: Fine-Tuning LLaMA-2 with QLoRA for HR AI Assistant**

**Usage Context:**
This pipeline fine-tunes LLaMA-2-7B using QLoRA to enhance an HR AI Assistant that processes and understands company policy documents. By applying quantization and LoRA, the model is efficiently adapted for retrieval-augmented question-answering (RAG), enabling employees to query HR policies more effectively. The fine-tuned model is stored for deployment in AWS EKS as part of the HR chatbot system.

1. Import Necessary Libraries
Loads essential libraries for fine-tuning, including Hugging Face’s transformers, datasets, peft (for LoRA), and BitsAndBytesConfig (for quantization).
2. Load and Prepare the Dataset
Loads the processed company_policy_dataset.jsonl file.
Splits the data into training (60%) and validation (40%) sets.
Saves these datasets for future reuse.
3. Model Setup for Fine-Tuning
Loads LLaMA-2-7B from Hugging Face with 4-bit quantization for efficient fine-tuning.
Ensures the tokenizer has a padding token to handle sequence alignment.
Configures LoRA (Low-Rank Adaptation) to fine-tune only select layers, reducing memory usage.
4. Tokenization and Data Preparation
Defines a tokenization function that processes text data into tokenized sequences.
Maps this function to both the training and validation datasets.
5. Training Configuration & Execution
Defines training arguments optimized for efficient fine-tuning, including batch sizes, learning rate, and reduced checkpointing.
Initializes a Trainer to handle the training process.
Trains the model and saves it to Google Drive for future use.


In [None]:

# Importing libraries for fine-tuning model
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType
from datasets import load_dataset
from transformers import DataCollatorWithPadding, BitsAndBytesConfig

# Load the dataset from the saved JSONL file
dataset = load_dataset("json", data_files="company_policy_dataset.jsonl")

# Split data into training and validation sets
dataset = dataset["train"].train_test_split(test_size=0.4, seed=42)
train_data = dataset["train"]
val_data = dataset["test"]

# Optionally save the train and validation datasets for future use
train_data.to_json("train_dataset.json", orient="records", indent=4)
val_data.to_json("val_dataset.json", orient="records", indent=4)
print("Datasets saved!")

# Model setup for fine-tuning
model_name = "meta-llama/Llama-2-7b-hf"
bnb_config = BitsAndBytesConfig(load_in_4bit=True)  # Use 4-bit quantization for model efficiency

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", quantization_config=bnb_config)

# Add padding token if missing
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': tokenizer.eos_token})
    base_model.resize_token_embeddings(len(tokenizer))

# LoRA (Low-Rank Adaptation) configuration for fine-tuning
lora_config = LoraConfig(
    r=16,  # Rank for LoRA
    lora_alpha=32,
    lora_dropout=0.1,
    task_type=TaskType.CAUSAL_LM,
)

# Apply LoRA adapters to the model
model = get_peft_model(base_model, lora_config)

# Tokenization function for training and validation datasets
def tokenize_function(examples):
    tokenized_output = tokenizer(examples["text"], padding="max_length", truncation=True, max_length=512)
    tokenized_output["labels"] = tokenized_output["input_ids"]  # Causal LM requires labels to be same as input_ids
    return tokenized_output

# Apply tokenization to both datasets
tokenized_train = train_data.map(tokenize_function, batched=True)
tokenized_val = val_data.map(tokenize_function, batched=True)

# Training arguments setup (Optimized for faster training)
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy="no",  # Skip evaluation to save time
    save_strategy="no",  # Avoid frequent checkpointing
    logging_dir="./logs",
    learning_rate=5e-5,
    per_device_train_batch_size=8,  # Larger batch size for efficient training
    per_device_eval_batch_size=8,
    num_train_epochs=1,  # Reduce epochs to 1 for faster training
    weight_decay=0.01,
    logging_steps=100,  # Less frequent logging
    save_total_limit=1,
    push_to_hub=False  # Set to True if pushing to Hugging Face Hub
)

# Data collator for padding sequences dynamically
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Initialize Trainer for model training
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    data_collator=data_collator,
    tokenizer=tokenizer
)

# Start model training
trainer.train()

# Save the trained model to Google Drive
save_path = "/content/drive/My Drive/trained_llama_model_gen_AI"
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

print("Model trained and saved successfully!")

Generating train split: 0 examples [00:00, ? examples/s]

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Datasets created!


tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Map:   0%|          | 0/169 [00:00<?, ? examples/s]

Map:   0%|          | 0/43 [00:00<?, ? examples/s]



<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmani-srtechlead[0m ([33mmani-srtechlead-tractor-supply[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Epoch,Training Loss,Validation Loss
1,0.5249,0.517938
2,0.4448,0.484946
3,0.463,0.476997


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Model trained and saved successfully!


# 4. Clearing up the used space
**Documentation:** Clearing Model and Freeing GPU Memory
1. Delete Model and Related Objects
Removes the fine-tuned model (model), tokenizer (tokenizer), trainer (trainer), and data collator (data_collator) from memory.
Also deletes the base pre-trained model (base_model) to ensure no residual memory is occupied.
2. Free Up GPU Memory
Calls torch.cuda.empty_cache() to clear unused memory from the GPU, preventing memory leaks and ensuring efficient usage of GPU resources for future tasks.
**Usage Context:**
This cleanup step is crucial when working with large language models (LLMs) like LLaMA-2-7B, especially in resource-constrained environments like Google Colab or cloud-based AWS GPU instances. It helps free up VRAM, allowing new models to be loaded or further experiments to be conducted without running into CUDA Out-of-Memory (OOM) errors.

In [None]:
# Clearing space
del model
del tokenizer
del trainer
del data_collator
del base_model
torch.cuda.empty_cache()