# Fine-Tuning ERNIE for Accurate Indian Tax Regime Question Answering

**Team Name:** Butterfly Effect  
**Team Members:** Madhava Sriram, Dhruv Meena, Hardik Gohil

---
## Introduction

Large language models trained on broad web data often struggle with **statutory and regulatory domains**, where rules are precise, context-dependent, and frequently updated. In the case of Indian income tax law, models commonly produce **outdated or incorrect answers**, especially when distinguishing between the **Old Tax Regime** and the **New Tax Regime**.

This project demonstrates how **ERNIE-4.5** can be adapted to handle such domain-specific factual distinctions using **parameter-efficient fine-tuning**. We fine-tune ERNIE with **LoRA adapters via Unsloth** to conditionally override generic web-trained priors when authoritative, present-day tax rules apply.

The goal is not to retrain the model from scratch, but to **surgically correct regime-sensitive facts**—for example, clarifying that deductions under **Section 80C are not available under the New Tax Regime**, while remaining valid under the Old Tax Regime.

---

## Key Contributions

- Fine-tuning **ERNIE-4.5-0.3B** using **Unsloth** for efficient LoRA-based adaptation
- Designing a **curated, factually consistent dataset** covering Indian income tax rules (AY 2024–25)
- Applying **answer-only supervision** to focus learning on factual correctness
- Using **selective oversampling** to strengthen regime-conflicting statutory rules
- Demonstrating clear behavioral improvement over the base ERNIE model

---

## Why ERNIE and Unsloth?

- **ERNIE** provides strong language understanding with support for structured and factual reasoning
- **Unsloth** enables fast, memory-efficient LoRA fine-tuning on limited hardware
- Together, they allow precise factual correction without sacrificing general language ability

---

## Scope of the Notebook

This notebook:
- Loads the ERNIE-4.5 base model
- Applies LoRA adapters to attention and feed-forward layers
- Constructs a balanced tax QA dataset
- Fine-tunes the model with answer-only loss
- Compares base and fine-tuned model outputs on regime-sensitive queries

The resulting model is intended for **educational and research purposes**, illustrating how modern LLMs can be adapted for high-precision, domain-specific applications.

---


### Firstly we begin with installing all necessary libraries in their appropriate versions including UNSLOTH

In [None]:
!pip install -q unsloth
!pip install -q transformers datasets accelerate bitsandbytes


### We further import all the installed packages

In [2]:
import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"  # debug-safe

import torch
from unsloth import FastLanguageModel
from datasets import Dataset
from transformers import TrainingArguments, Trainer
from datasets import Dataset


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


## Model Loading and Environment Setup

We begin by loading the **ERNIE-4.5-0.3B** pretrained language model using the **Unsloth** framework. This model serves as the base upon which we apply parameter-efficient fine-tuning.

Key design choices in this step:

- **4-bit quantized loading** is enabled to reduce GPU memory usage and allow fine-tuning on commodity hardware (e.g., a single T4 GPU).
- The data type (`dtype`) is left unspecified so that Unsloth can automatically select the most stable and efficient precision.
- The model cache is disabled (`use_cache = False`) to ensure compatibility with gradient checkpointing during training.

This setup allows us to fine-tune ERNIE efficiently while preserving its original language capabilities.


In [3]:
import torch
from unsloth import FastLanguageModel

model_name = "baidu/ERNIE-4.5-0.3B-PT"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=2048,
    dtype=None,              # let Unsloth decide
    load_in_4bit=True,       # REQUIRED for stability
)
model.config.use_cache = False



==((====))==  Unsloth 2025.12.10: Fast Ernie4_5 patching. Transformers: 4.57.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.1+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.5.1
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/722M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/226 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/1.61M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.2M [00:00<?, ?B/s]

added_tokens.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

chat_template.jinja:   0%|          | 0.00/754 [00:00<?, ?B/s]

## LoRA Adapter Configuration

To adapt ERNIE to the tax-regime question answering task without modifying all model parameters, we apply **Low-Rank Adaptation (LoRA)** using Unsloth.

Key aspects of this configuration:

- LoRA adapters are injected into both **attention projections** (`q_proj`, `k_proj`, `v_proj`, `o_proj`) and **feed-forward layers** (`gate_proj`, `up_proj`, `down_proj`), allowing the model to adjust both contextual understanding and factual recall.
- A low-rank dimension of **r = 8** is used to balance expressiveness and training stability.
- **LoRA scaling (`lora_alpha = 16`)** amplifies the adapter updates while keeping the base model frozen.
- **No bias parameters** are trained, ensuring that only LoRA weights contribute to adaptation.
- **Gradient checkpointing** is enabled to reduce memory usage during training.

This approach allows us to selectively correct regime-sensitive factual behavior while retaining the general language understanding of the pretrained ERNIE model.


In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r=8,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_alpha=16,
    lora_dropout=0.0,
    bias="none",
    use_gradient_checkpointing=True,
)


Unsloth: Making `model.base_model.model.model` require gradients


## Baseline Evaluation (Pre-Fine-Tuning)

Before evaluating the fine-tuned model, we first examine the behavior of the **base ERNIE model** without any task-specific adaptation. This serves as a reference point to understand how pretrained, web-scale knowledge influences responses to regime-sensitive tax queries.

The prompt below asks whether **Section 80C deductions are available under the New Tax Regime**—a question that is frequently answered incorrectly by models trained on outdated or mixed online sources.

We intentionally use a **slightly higher temperature** during generation to expose potential hallucinations or inconsistencies that arise from the model’s pretrained priors.

This baseline output is later compared against the fine-tuned model to demonstrate the effectiveness of LoRA-based adaptation.


In [5]:
prompt = """### Entity:
Tax Section: 80C

### Query:
Is Section 80C deduction available under the new tax regime?

### Answer:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=80,
    temperature=0.2,   # slightly higher to expose hallucination
)

print("===== BASE MODEL OUTPUT =====")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


===== BASE MODEL OUTPUT =====
### Entity:
Tax Section: 80C

### Query:
Is Section 80C deduction available under the new tax regime?

### Answer:
Yes, Section 80C deduction is available under the new tax regime.


## Dataset Design (ERNIE-Style Structured Prompting)

The fine-tuning dataset is constructed using a **structured, ERNIE-style prompt format** to align with the model’s pretraining objectives and improve factual grounding.

Each training example follows a consistent schema:

- **Entity**: Identifies the tax section, regime, or concept being queried
- **Attributes**: Provides structured contextual metadata (e.g., applicable regime, purpose, income limits)
- **Query**: Poses a natural-language question
- **Answer**: Supplies a concise, factually correct response

This format encourages ERNIE to:
- Leverage structured context before answering
- Learn conditional reasoning (e.g., Old vs New Tax Regime)
- Avoid mixing conflicting statutory rules across regimes

The dataset deliberately includes **paired contrasts** (Old vs New Tax Regime) for the same tax sections, ensuring the model learns regime-dependent distinctions rather than memorizing isolated facts.

All examples are curated to reflect **current Indian income tax rules (Assessment Year 2024–25)** and are written to minimize ambiguity and factual drift.


In [6]:
data = [

# =====================
# SECTION 80C
# =====================
{
"text": """### Entity:
Tax Section: 80C

### Attributes:
Applicable Regime: New Tax Regime

### Query:
Is Section 80C deduction available under the new tax regime?

### Answer:
No. Section 80C deductions are not available under the new tax regime.
"""
},
{
"text": """### Entity:
Tax Section: 80C

### Attributes:
Applicable Regime: Old Tax Regime

### Query:
Is Section 80C deduction available under the old tax regime?

### Answer:
Yes. Section 80C deductions are available under the old tax regime.
"""
},
{
"text": """### Entity:
Tax Section: 80C

### Attributes:
Instrument: Public Provident Fund
Applicable Regime: Old Tax Regime

### Query:
Are PPF contributions eligible for deduction under Section 80C?

### Answer:
Yes. Contributions to Public Provident Fund are eligible under Section 80C in the old tax regime.
"""
},
{
"text": """### Entity:
Tax Section: 80C

### Attributes:
Instrument: ELSS
Applicable Regime: New Tax Regime

### Query:
Are ELSS investments deductible under the new tax regime?

### Answer:
No. ELSS investments are not deductible under the new tax regime.
"""
},

# =====================
# CHAPTER VI-A (GENERAL)
# =====================
{
"text": """### Entity:
Tax Chapter: Chapter VI-A

### Attributes:
Applicable Regime: New Tax Regime

### Query:
Are Chapter VI-A deductions allowed under the new tax regime?

### Answer:
No. Most Chapter VI-A deductions are not allowed under the new tax regime.
"""
},
{
"text": """### Entity:
Tax Chapter: Chapter VI-A

### Attributes:
Applicable Regime: Old Tax Regime

### Query:
Does the old tax regime allow Chapter VI-A deductions?

### Answer:
Yes. The old tax regime allows Chapter VI-A deductions.
"""
},

# =====================
# SECTION 80D
# =====================
{
"text": """### Entity:
Tax Section: 80D

### Attributes:
Purpose: Health Insurance
Applicable Regime: Old Tax Regime

### Query:
Is health insurance premium deductible under Section 80D?

### Answer:
Yes. Health insurance premiums are deductible under Section 80D in the old tax regime.
"""
},
{
"text": """### Entity:
Tax Section: 80D

### Attributes:
Purpose: Health Insurance
Applicable Regime: New Tax Regime

### Query:
Is health insurance premium deductible under the new tax regime?

### Answer:
No. Section 80D deductions are not allowed under the new tax regime.
"""
},

# =====================
# SECTION 87A
# =====================
{
"text": """### Entity:
Tax Section: 87A

### Attributes:
Applicable Regime: New Tax Regime
Income Limit: ₹7,00,000

### Query:
Who is eligible for rebate under Section 87A?

### Answer:
Resident individuals with total income up to ₹7,00,000 are eligible for rebate under Section 87A.
"""
},
{
"text": """### Entity:
Tax Section: 87A

### Attributes:
Residency Requirement

### Query:
Is Section 87A rebate available to non-residents?

### Answer:
No. Section 87A rebate is available only to resident individuals.
"""
},

# =====================
# STANDARD DEDUCTION
# =====================
{
"text": """### Entity:
Tax Deduction: Standard Deduction

### Attributes:
Applicable Regime: New Tax Regime

### Query:
Is standard deduction allowed under the new tax regime?

### Answer:
Yes. Standard deduction is allowed under the new tax regime.
"""
},
{
"text": """### Entity:
Tax Deduction: Standard Deduction

### Attributes:
Applicable Regime: Old Tax Regime

### Query:
Is standard deduction allowed under the old tax regime?

### Answer:
Yes. Standard deduction is allowed under the old tax regime.
"""
},

# =====================
# HRA
# =====================
{
"text": """### Entity:
Tax Exemption: House Rent Allowance

### Attributes:
Applicable Regime: New Tax Regime

### Query:
Is HRA exemption allowed under the new tax regime?

### Answer:
No. HRA exemption is not allowed under the new tax regime.
"""
},
{
"text": """### Entity:
Tax Exemption: House Rent Allowance

### Attributes:
Applicable Regime: Old Tax Regime

### Query:
Is HRA exemption allowed under the old tax regime?

### Answer:
Yes. HRA exemption is allowed under the old tax regime subject to conditions.
"""
},

# =====================
# NPS
# =====================
{
"text": """### Entity:
Tax Section: 80CCD(1B)

### Attributes:
Purpose: Additional NPS Contribution
Applicable Regime: Old Tax Regime

### Query:
Is additional NPS contribution deductible under Section 80CCD(1B)?

### Answer:
Yes. An additional deduction of ₹50,000 is allowed under Section 80CCD(1B).
"""
},
{
"text": """### Entity:
Tax Section: 80CCD(1B)

### Attributes:
Applicable Regime: New Tax Regime

### Query:
Is Section 80CCD(1B) deduction available under the new tax regime?

### Answer:
No. Section 80CCD(1B) deduction is not available under the new tax regime.
"""
},

# =====================
# DONATIONS
# =====================
{
"text": """### Entity:
Tax Section: 80G

### Attributes:
Purpose: Donations
Applicable Regime: Old Tax Regime

### Query:
Are donations deductible under Section 80G?

### Answer:
Yes. Donations are deductible under Section 80G in the old tax regime.
"""
},
{
"text": """### Entity:
Tax Section: 80G

### Attributes:
Applicable Regime: New Tax Regime

### Query:
Are donations deductible under the new tax regime?

### Answer:
No. Donations under Section 80G are not deductible under the new tax regime.
"""
},

# =====================
# DEFAULT REGIME
# =====================
{
"text": """### Entity:
Tax Regime

### Attributes:
Default Status

### Query:
Which tax regime is the default regime?

### Answer:
The new tax regime is the default tax regime.
"""
},

# =====================
# COMPARATIVE
# =====================
{
"text": """### Entity:
Tax Section: 80C

### Attributes:
Comparison: Old vs New Tax Regime

### Query:
Compare Section 80C availability under old and new tax regimes.

### Answer:
Section 80C deductions are available under the old tax regime but not under the new tax regime.
"""
},
{
"text": """### Entity:
Tax Regime Comparison

### Attributes:
Scope of Deductions

### Query:
Which tax regime allows more deductions?

### Answer:
The old tax regime allows more deductions compared to the new tax regime.
"""
},

]


## Tokenization and Answer-Only Supervision Strategy

To ensure that the model learns to generate **factually correct answers** without overfitting to prompt structure, we apply an **answer-only supervision** strategy during tokenization.

### Tokenization Process

- Each example is tokenized with a fixed maximum length and padding for efficient batching.
- Special tokens are preserved to maintain ERNIE’s internal formatting expectations.

### Label Construction

- Tokens corresponding to the **prompt portion** (`Entity`, `Attributes`, and `Query`) are masked using `-100`, ensuring they do not contribute to the training loss.
- Only tokens appearing **after the `### Answer:` marker** are used as training labels.
- Padding tokens are explicitly excluded from loss computation.

This approach focuses learning on **answer correctness**, rather than memorization of prompt templates, and helps the model override incorrect pretrained priors only when generating answers.

### Motivation

Answer-only supervision is particularly effective for:
- Domain-specific factual correction
- Regime-dependent rule enforcement
- Preventing degradation of general language understanding

The resulting tokenized dataset is then used for LoRA-based fine-tuning.


In [7]:
def tokenize(batch):
    texts = batch["text"]

    enc = tokenizer(
        texts,
        truncation=True,
        max_length=1024,
        padding="max_length",
        add_special_tokens=True,
    )

    labels = []

    for text, input_ids in zip(texts, enc["input_ids"]):
        label = [-100] * len(input_ids)

        prefix = text.split("### Answer:")[0] + "### Answer:"
        prefix_ids = tokenizer(
            prefix,
            truncation=True,
            max_length=1024,
            add_special_tokens=True,
        )["input_ids"]

        start = len(prefix_ids) - 1  # align with padding

        for i in range(start, len(input_ids)):
            if input_ids[i] != tokenizer.pad_token_id:
                label[i] = input_ids[i]

        labels.append(label)

    enc["labels"] = labels
    return enc


dataset = Dataset.from_list(data)


tokenized_ds = dataset.map(
    tokenize,
    batched=True,
    remove_columns=dataset.column_names,
)


Map:   0%|          | 0/21 [00:00<?, ? examples/s]

## Sanity Check: Label and Vocabulary Consistency

Before proceeding with training, we perform a basic sanity check to ensure that the constructed labels are valid and compatible with the model configuration.

Specifically, we verify that:
- All non-masked label token IDs fall within the model’s vocabulary size
- No out-of-range token IDs are introduced during tokenization or label construction

This check helps detect common issues such as tokenizer–model mismatches or incorrect masking logic, which can silently corrupt training.


In [8]:
sample = tokenized_ds[0]
print("Max label id:", max([x for x in sample["labels"] if x != -100]))
print("Vocab size:", model.config.vocab_size)


Max label id: 94009
Vocab size: 103424


## Training Configuration

The fine-tuning process is configured using lightweight, stability-focused training parameters suitable for parameter-efficient adaptation.

### Key Training Choices

- **Small per-device batch size (1)** combined with **gradient accumulation** allows effective training under limited GPU memory.
- **Warmup steps** are included to stabilize early optimization and prevent sudden loss spikes.
- **A modest learning rate (1e-4)** is chosen to update LoRA adapters without overwriting pretrained knowledge.
- **Mixed-precision training** is automatically selected based on hardware support to improve efficiency.
- **8-bit AdamW optimizer** further reduces memory footprint while maintaining optimization quality.
- **Checkpoint saving is disabled** to keep the workflow simple and focused on demonstration.

These settings are intentionally conservative, prioritizing training stability and reproducibility over aggressive optimization.


In [9]:
training_args = TrainingArguments(
    output_dir="./outputs",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    max_steps=50,
    learning_rate=1e-4,
    bf16=torch.cuda.is_bf16_supported(),
    fp16=not torch.cuda.is_bf16_supported(),
    logging_steps=10,
    optim="adamw_8bit",
    save_strategy="no",
    report_to="none",
)


## Fine-Tuning Execution

With the model, dataset, and training configuration in place, we proceed to fine-tune ERNIE using the Hugging Face `Trainer` API.

During this step:
- Only the **LoRA adapter parameters** are updated
- The base ERNIE model remains frozen
- Training runs for a limited number of steps to demonstrate targeted factual correction

This controlled fine-tuning process allows the model to learn regime-specific tax rules efficiently, without degrading its general language understanding.


In [10]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds,
)

trainer.train()


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 21 | Num Epochs = 9 | Total steps = 50
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 3,022,848 of 363,770,880 (0.83% trained)


Step,Training Loss
10,2.5493
20,1.653
30,1.1385
40,0.9656
50,0.8579


TrainOutput(global_step=50, training_loss=1.4328543663024902, metrics={'train_runtime': 115.4218, 'train_samples_per_second': 1.733, 'train_steps_per_second': 0.433, 'total_flos': 278840450482176.0, 'train_loss': 1.4328543663024902, 'epoch': 8.380952380952381})

## Inference After Fine-Tuning (Post-Training Evaluation)

After completing LoRA-based fine-tuning, we evaluate the adapted ERNIE model on the same regime-sensitive query used in the baseline test.

Before inference, the model is switched to **inference mode** to disable training-specific behaviors and optimize generation.

### Evaluation Setup

- The prompt structure is kept **identical** to the baseline evaluation to ensure a fair comparison.
- A **very low temperature (0.01)** is used to encourage deterministic, factual responses.
- The output length is limited to prevent unnecessary elaboration and reduce hallucination risk.

This step demonstrates whether fine-tuning successfully corrected the model’s behavior when answering questions that depend on precise statutory distinctions between the Old and New Tax Regimes.


In [11]:
FastLanguageModel.for_inference(model)

prompt = """### Entity:
Tax Section: 80C

### Query:
Is Section 80C deduction available under the new tax regime?

### Answer:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=68,
    temperature=0.01,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))


### Entity:
Tax Section: 80C

### Query:
Is Section 80C deduction available under the new tax regime?

### Answer:
No. Section 80C deduction is not available under the new tax regime.

### Explanation:
Section 80C deduction is a deduction available under the old tax regime. The new tax regime is the current tax regime. Section 80C deduction is not available under the new tax regime


We save the model weights

In [12]:
model.save_pretrained("ernie_4_5_unsloth_lora", safe_serialization=True)
tokenizer.save_pretrained("ernie_4_5_unsloth_lora")


('ernie_4_5_unsloth_lora/tokenizer_config.json',
 'ernie_4_5_unsloth_lora/special_tokens_map.json',
 'ernie_4_5_unsloth_lora/chat_template.jinja',
 'ernie_4_5_unsloth_lora/tokenizer.model',
 'ernie_4_5_unsloth_lora/added_tokens.json',
 'ernie_4_5_unsloth_lora/tokenizer.json')

## Example-Based Comparison: Base vs Fine-Tuned Model

To evaluate the effectiveness of fine-tuning, we examine a concrete, regime-sensitive example drawn directly from the dataset.

### Test Query

**Question:**  
Is Section 80C deduction available under the New Tax Regime?

**Ground Truth (AY 2024–25):**  
Section 80C deductions are **not available** under the New Tax Regime. They are available only under the Old Tax Regime.

---

### Model Outputs

| Model Version | Observed Output |
|--------------|----------------|
| **Base ERNIE (Pre-Fine-Tuning)** | Incorrectly states that Section 80C deductions are available under the New Tax Regime, reflecting outdated or mixed web-trained information. |
| **Fine-Tuned ERNIE (LoRA + Unsloth)** | Correctly states that Section 80C deductions are **not available** under the New Tax Regime, aligning with current statutory rules. |

This example highlights the base model’s inability to reliably condition its response on the applicable tax regime, and the fine-tuned model’s improved factual precision.

---

## Conclusion

This project demonstrates how **parameter-efficient fine-tuning** can be used to correct regime-dependent factual errors in large language models without retraining the entire model.

By applying **LoRA adapters via Unsloth** to ERNIE-4.5 and training on a **structured, ERNIE-style dataset**, the model learns to override incorrect pretrained priors only when necessary, while preserving its general language understanding.

### Key Outcomes

- Accurate differentiation between Old and New Tax Regime rules
- Improved factual reliability for statutory question answering
- Efficient adaptation using less than 1% trainable parameters
- Clear behavioral improvement over the base ERNIE model

### Limitations and Future Work

- The dataset covers a limited subset of income tax provisions
- Explanations are lightly supervised and may occasionally overgeneralize
- Future work could expand coverage across additional sections, assessment years, or integrate automated validation against official tax sources

Overall, this work illustrates a practical and scalable approach to adapting large language models for **high-precision regulatory domains**, where correctness and context-awareness are critical.
