
## The development process has been guided and supported by various resources, including the following repositories and documentation:

1. [Basics of Fine-Tuning - Shreyash Singh](https://github.com/ShreyashSingh1/Fine-Tuning-models)

2. [Advanced Fine-Tuning - Shreyash Singh](https://github.com/ShreyashSingh1/Adavence-Fine-Tunning)


3. [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers/en/main_classes/processors)

These references have been instrumental in understanding fine-tuning techniques and implementing efficient model training workflows.



# QLoRA Fine-Tuning Documentation

##All details of the fine-tuning process are documented here:  
[QLoRA Fine-Tuning Documentation - Shreyash Singh](https://charmed-amount-e80.notion.site/QLoRA-Fine-Tuning-Documentation-Shreyash-Singh-19f0d537ad5080ec8c62c7ae408911ec)  


In [None]:
!pip install pandas torch transformers peft datasets huggingface_hub bitsandbytes accelerate


Collecting datasets
  Downloading datasets-3.3.1-py3-none-any.whl.metadata (19 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.2-py3-none-manylinux_2_24_x86_64.whl.metadata (5.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from 

In [None]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
import os
os.environ["WANDB_DISABLED"] = "true"

In [None]:
import torch
import pandas as pd
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model
from datasets import Dataset
from huggingface_hub import HfApi, login

# Check for GPU availability
device = "cuda" if torch.cuda.is_available() else "cpu"

# Define fine-tuning class
class QLoRAFineTuner:
    """
    A class to fine-tune a language model using Quantized LoRA (QLoRA).
    """
    def __init__(self, model_id, lora_r, lora_alpha, lora_dropout, learning_rate, epochs, batch_size):
        """
        Initializes the QLoRA fine-tuner.

        Args:
            model_id (str): The Hugging Face model ID to fine-tune.
            lora_r (int): Rank of the LoRA adapter.
            lora_alpha (int): Scaling factor for LoRA.
            lora_dropout (float): Dropout rate for LoRA.
            learning_rate (float): Learning rate for fine-tuning.
            epochs (int): Number of training epochs.
            batch_size (int): Training batch size.
        """
        self.model_id = model_id

        quantization_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_use_double_quant=True,
        )

        # Load model with quantization
        self.model = AutoModelForCausalLM.from_pretrained(
            model_id,
            quantization_config=quantization_config,
            device_map="auto",
        )

        # Load tokenizer and set padding token if missing
        self.tokenizer = AutoTokenizer.from_pretrained(model_id)
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token

        # Detect LoRA target modules dynamically
        possible_target_modules = ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "down_proj", "up_proj"]
        model_modules = dict(self.model.named_modules()).keys()
        target_modules = [layer for layer in possible_target_modules if any(layer in name for name in model_modules)]

        if not target_modules:
            raise ValueError("No valid LoRA target modules found in the model. Check model architecture.")

        # Apply LoRA
        self.lora_config = LoraConfig(
            r=lora_r,
            lora_alpha=lora_alpha,
            lora_dropout=lora_dropout,
            target_modules=target_modules,
            task_type="CAUSAL_LM",
        )
        self.model = get_peft_model(self.model, self.lora_config)

        self.learning_rate = learning_rate
        self.epochs = epochs
        self.batch_size = batch_size

    def tokenize_function(self, examples):
        """
        Tokenizes input text and target pairs for causal language model training.

        Args:
            examples (dict): A dictionary containing "text" and "target" keys.

        Returns:
            dict: Tokenized inputs with input IDs and labels.
        """
        inputs = [f"{text} {target}" for text, target in zip(examples["text"], examples["target"])]

        tokenized = self.tokenizer(
            inputs, truncation=True, padding="max_length", max_length=512
        )

        tokenized["labels"] = tokenized["input_ids"].copy()
        return tokenized

    def fine_tune(self, dataset):
        """
        Fine-tunes the model on the provided dataset.

        Args:
            dataset (pd.DataFrame): A Pandas DataFrame containing "text" and "target" columns.

        Returns:
            str: Message indicating completion of fine-tuning.
        """
        dataset = Dataset.from_pandas(dataset)
        tokenized_datasets = dataset.map(self.tokenize_function, batched=True)

        training_args = TrainingArguments(
            output_dir="./fine_tuned_model",
            evaluation_strategy="no",
            learning_rate=self.learning_rate,
            per_device_train_batch_size=self.batch_size,
            num_train_epochs=self.epochs,
            weight_decay=0.01,
            fp16=True,
            save_total_limit=1,
            save_strategy="epoch",
            logging_dir="./logs",
            logging_steps=10,
        )

        trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=tokenized_datasets,
        )

        trainer.train()
        self.model.save_pretrained("./fine_tuned_model")
        return "Fine-tuning complete!"

    def push_to_huggingface(self, hf_token, repo_id):
        """
        Pushes the fine-tuned model to Hugging Face Hub.

        Args:
            hf_token (str): The user's Hugging Face API token.
            repo_id (str): The repository ID on Hugging Face.

        Returns:
            str: Confirmation message after successful upload.
        """
        login(hf_token)
        api = HfApi()
        api.upload_folder(folder_path="./fine_tuned_model", repo_id=repo_id, repo_type="model")
        return f"Model pushed successfully to {repo_id}"


# === USER INPUTS ===
model_id = input("Enter Hugging Face model ID (e.g., 'meta-llama/Llama-3.2-1B'): ")

# Upload dataset
print("Upload your dataset (CSV or JSON)")
from google.colab import files
uploaded = files.upload()
file_name = list(uploaded.keys())[0]

# Load dataset
if file_name.endswith(".csv"):
    df = pd.read_csv(file_name)
elif file_name.endswith(".json"):
    df = pd.read_json(file_name)
else:
    raise ValueError("Only CSV and JSON files are supported!")

print("Dataset preview:")
print(df.head())

# Hyperparameters
lora_r = int(input("Enter LoRA Rank (r) (default: 8): ") or 8)
lora_alpha = int(input("Enter LoRA Alpha (default: 32): ") or 32)
lora_dropout = float(input("Enter LoRA Dropout (default: 0.1): ") or 0.1)
epochs = int(input("Enter number of epochs (default: 3): ") or 3)
batch_size = int(input("Enter batch size (default: 4): ") or 4)
learning_rate = float(input("Enter learning rate (default: 5e-5): ") or 5e-5)

# Initialize fine-tuner
print("Initializing fine-tuning...")
tuner = QLoRAFineTuner(model_id, lora_r, lora_alpha, lora_dropout, learning_rate, epochs, batch_size)
print("Fine-tuning in progress...")
result = tuner.fine_tune(df)
print(result)

# Push to Hugging Face
push_model = input("Do you want to push the model to Hugging Face? (yes/no): ").lower()
if push_model == "yes":
    hf_token = input("Enter your Hugging Face API Token: ")
    repo_id = input("Enter your Hugging Face repo ID (e.g., 'username/qlora-model'): ")
    print("Uploading model to Hugging Face...")
    push_result = tuner.push_to_huggingface(hf_token, repo_id)
    print(push_result)

print("✅ Script execution completed!")

Enter Hugging Face model ID (e.g., 'meta-llama/Llama-3.2-1B'): meta-llama/Llama-3.2-1B
Upload your dataset (CSV or JSON)


Saving LLMdata.csv to LLMdata (1).csv
Dataset preview:
                                               text  \
0           ### Human: What is a service blueprint?   
1           ### Human: What is a service blueprint?   
2                ### Human: What is customer churn?   
3                ### Human: What is customer churn?   
4  ### Human: How does continuous improvement work?   

                                              target  
0  ### target: Customer churn refers to the loss ...  
1  ### target: DMAIC is a data-driven quality str...  
2  ### target: A service blueprint is a detailed ...  
3  ### target: Yes! Process optimization involves...  
4  ### target: Continuous improvement is an ongoi...  
Enter LoRA Rank (r) (default: 8): 
Enter LoRA Alpha (default: 32): 
Enter LoRA Dropout (default: 0.1): 
Enter number of epochs (default: 3): 1
Enter batch size (default: 4): 
Enter learning rate (default: 5e-5): 
Initializing fine-tuning...
Fine-tuning in progress...


Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Step,Training Loss
10,5.0022
20,0.2285
30,0.164
40,0.1215
50,0.1056


Fine-tuning complete!
Do you want to push the model to Hugging Face? (yes/no): no
✅ Script execution completed!


#Trained the meta-llama/Llama-3.2-1B model on sample data to verify the workflow functionality.



# Fine-Tuning API Documentation

All details of the fine-tuning API are documented here:


[Fine-Tuning API Documentation - Shreyash Singh](https://www.notion.so/Fine-Tuning-API-Documentation-Shreyash-Singh-19f0d537ad5080868794c923ff2ed538)  


In [None]:
! pip install fastapi uvicorn torch transformers python-multipart

# Developed a sample API workflow for fine-tuning services.

In [None]:
from fastapi import FastAPI, UploadFile, File, BackgroundTasks, HTTPException, Form
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
from datasets import Dataset
import shutil
import os
import pandas as pd
import uuid

from huggingface_hub import login

login(token="UR_HF_TOKEN")

app = FastAPI()

# Directory to store datasets and models
DATASET_DIR = "datasets"
MODEL_DIR = "fine_tuned_models"
os.makedirs(DATASET_DIR, exist_ok=True)
os.makedirs(MODEL_DIR, exist_ok=True)

def train_model(model_id, dataset_path, job_id):
    """Fine-tuning process"""
    df = pd.read_csv(dataset_path)  # Load CSV

    if "text" not in df.columns:
        raise ValueError("CSV file must contain a 'text' column for fine-tuning")

    # Load model with quantization
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
    )

    model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token

    # Apply LoRA
    target_modules = ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "down_proj", "up_proj"]
    lora_config = LoraConfig(r=8, lora_alpha=32, lora_dropout=0.1, target_modules=target_modules, task_type="CAUSAL_LM")
    model = get_peft_model(model, lora_config)

    # Tokenization
    def tokenize_function(examples):
        inputs = examples["text"]
        tokenized = tokenizer(inputs, truncation=True, padding="max_length", max_length=512)
        tokenized["labels"] = tokenized["input_ids"].copy()
        return tokenized

    dataset = Dataset.from_pandas(df)
    tokenized_datasets = dataset.map(tokenize_function, batched=True)

    training_args = TrainingArguments(
        output_dir=f"{MODEL_DIR}/{job_id}",
        evaluation_strategy="no",
        learning_rate=5e-5,
        per_device_train_batch_size=4,
        num_train_epochs=3,
        weight_decay=0.01,
        fp16=True,
        save_total_limit=1,
        save_strategy="epoch",
        logging_dir=f"{MODEL_DIR}/{job_id}/logs",
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_datasets,
    )

    trainer.train()
    model.save_pretrained(f"{MODEL_DIR}/{job_id}")

@app.post("/upload")
async def upload_dataset(
    file: UploadFile = File(...),
    model_id: str = Form(...),  # User must provide the model ID
    background_tasks: BackgroundTasks = BackgroundTasks()
):
    """Upload dataset and start fine-tuning"""
    if not file.filename.endswith(".csv"):
        raise HTTPException(status_code=400, detail="Only CSV files are supported")

    if not model_id:
        raise HTTPException(status_code=400, detail="Model ID is required")

    job_id = str(uuid.uuid4())
    dataset_path = f"{DATASET_DIR}/{job_id}.csv"

    with open(dataset_path, "wb") as buffer:
        shutil.copyfileobj(file.file, buffer)

    background_tasks.add_task(train_model, model_id, dataset_path, job_id)

    return {"message": "Dataset uploaded successfully. Fine-tuning in progress", "job_id": job_id}

@app.get("/status/{job_id}")
def get_status(job_id: str):
    """Check if model training is completed"""
    model_path = f"{MODEL_DIR}/{job_id}"
    if os.path.exists(model_path):
        return {"status": "completed", "model_path": model_path}
    return {"status": "in progress"}

# Sample Upload request via postman
![](https://i.imgur.com/kUMzl7t.png)

# Streamlit web Application for fine-Tunning

# Landing Page
![Landing Page](https://i.imgur.com/OYJTvUb.png)

# Backend training Model
![Backend training Image](https://i.imgur.com/78RbOuD.png)


# Streamlit Code

In [None]:
import streamlit as st

st.title("Agentic AI QLoRA Fine-Tuning & Deployment")
st.write("Fine-tune Hugging Face LLM models with QLoRA and deploy them.")

# Step 1: Model ID
model_id = st.text_input("Enter Hugging Face model ID:", "meta-llama/Llama-3-8B")

# Step 2: Upload Dataset
uploaded_file = st.file_uploader("Upload dataset (CSV or JSON):", type=["csv", "json"])
file_format = st.selectbox("Dataset format:", ["csv", "json"])
if uploaded_file:
    df = pd.read_csv(uploaded_file) if file_format == "csv" else pd.read_json(uploaded_file)
    st.dataframe(df.head())

# Step 3: Define hyperparameters
lora_r = st.number_input("LoRA Rank (r):", min_value=1, value=8)
lora_alpha = st.number_input("LoRA Alpha:", min_value=1, value=32)
lora_dropout = st.number_input("LoRA Dropout:", min_value=0.0, max_value=1.0, value=0.1)
epochs = st.number_input("Epochs:", min_value=1, value=3)
batch_size = st.number_input("Batch Size:", min_value=1, value=4)
learning_rate = st.text_input("Learning Rate:", "5e-5")

# Step 4: Fine-tuning
if st.button("Start Fine-Tuning") and uploaded_file:
    with st.spinner("Initializing fine-tuning with QLoRA... Please wait!"):
        try:
            tuner = QLoRAFineTuner(model_id, lora_r, lora_alpha, lora_dropout, float(learning_rate), epochs, batch_size)
            result = tuner.fine_tune(df)
            st.success(result)
        except Exception as e:
            st.error(f"Error during fine-tuning: {e}")

# Step 5: Push to Hugging Face
hf_token = st.text_input("Enter Hugging Face API Token:", type="password")
repo_id = st.text_input("Enter your Hugging Face repo ID:", "your-hf-username/qlora-model")

if st.button("Push Model to Hugging Face"):
    if hf_token and repo_id:
        with st.spinner("Uploading model to Hugging Face..."):
            try:
                push_result = tuner.push_to_huggingface(hf_token, repo_id)
                st.success(push_result)
            except Exception as e:
                st.error(f"Error during upload: {e}")
    else:
        st.error("Please enter your Hugging Face API token and repo ID.")

# Security, Deployment & Scalability Considerations

## 1. Securing Model & Dataset Uploads
- **Use authentication** (API keys, OAuth, JWT) to ensure only authorized users can upload.  
- **Encrypt files**  
  - *In transit:* Use HTTPS.  
  - *At rest:* Use AES encryption.  
- **Limit file size and type** to prevent malicious uploads.  
- **Scan uploaded files** for malware or unauthorized content.  

## 2. Scaling the Fine-Tuning Process
- **Use GPU-enabled cloud instances** to handle multiple users efficiently.  
- **Implement job queues** (e.g., Celery, Kafka) to manage training requests without overloading servers.  
- **Enable auto-scaling** to add or remove compute resources based on demand.  

## 3. Deployment Options
- **AWS (EC2, S3, EKS):** For full control over infrastructure.  
- **Serverless (AWS Lambda, Google Cloud Functions):** For handling API requests without managing servers.  
- **Kubernetes (EKS, GKE):** For managing multiple workloads in a scalable way.  

## 4. Best Practices for API & Service Reliability
- **Load balancing** to distribute requests across multiple servers.  
- **Caching** frequently used responses to reduce computation time.  
- **Monitoring & logging** (Prometheus, Grafana) to track performance and errors.  
- **Rate limiting** to prevent abuse and ensure fair usage among users.  

---