# 1. Problem Definition & Objective
## a. Problem Statement
The preservation of 16th-century Punjab (Sapt-Sindhu) cultural narratives is challenging due to the scarcity of digitized, accessible resources. Traditional LLMs often hallucinate or provide generic answers when asked about specific historical moral frameworks like "Waahadat-ul-Wajood" or specific figures like "Bhai Mardana". we mixed AI and history to create a sapt sindhu based moral story teller!

## b. Real-world Relevance
This project aims to bridge the gap between cultural heritage and AI by creating a storytelling assistant. It uses a Fine-Tuned Small Language Model (Phi-3) augmented with RAG to generate culturally accurate stories rooted in specific moral canons, serving as an educational tool for preserving intangible heritage.

In [None]:
# Install dependencies if running in Colab
# !pip install torch transformers peft langchain langchain_community faiss-cpu sentence-transformers

import torch
import os
import textwrap
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# RAG Dependencies
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

# --- Configuration ---
# Using Microsoft Phi-3 Mini as the base model
BASE_MODEL_ID = "microsoft/Phi-3-mini-4k-instruct"
# Path to your fine-tuned adapter (Ensure this folder exists in your submission)
ADAPTER_PATH = "./my_finetuned_model" 
PDF_PATH = "rag.pdf"
DB_PATH = "faiss_index"

# Check for Hardware Acceleration
if torch.backends.mps.is_available():
    device = "mps"
elif torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"
print(f"‚úÖ Runtime Device: {device}")


# 2. Data Understanding & Preparation
## a. Dataset Source
We utilize a synthetic/collected and a PDF document (rag.pdf) containing the moral canon of the Sapt-Sindhu region.

## b. Preprocessing & Indexing
The data is loaded using PyPDFLoader, chunked into 600-character segments to fit the context window, and embedded using all-MiniLM-L6-v2 for efficient retrieval.

In [None]:
def setup_rag():
    print("üìö Checking Knowledge Base...")
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    
    # Check if vector store exists, otherwise create it
    if os.path.exists(DB_PATH):
        db = FAISS.load_local(DB_PATH, embeddings, allow_dangerous_deserialization=True)
        print("‚úÖ Existing Knowledge Base Loaded!")
    else:
        print("‚öôÔ∏è Indexing PDF... (This happens once)")
        if not os.path.exists(PDF_PATH):
            print(f"‚ùå Error: PDF not found at {PDF_PATH}. Please upload the file.")
            return None
            
        loader = PyPDFLoader(PDF_PATH)
        documents = loader.load()
        # Splitting text for RAG Context
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=600, chunk_overlap=100)
        texts = text_splitter.split_documents(documents)
        
        db = FAISS.from_documents(texts, embeddings)
        db.save_local(DB_PATH)
        print(f"‚úÖ Created Knowledge Base with {len(texts)} chunks.")
        
    return db

# Initialize the vector database
db = setup_rag()

# 3. Model & System Design
## a. Architecture
We utilize a Hybrid RAG + Fine-Tuning approach.

Model: Phi-3 Mini (4k Instruct) was chosen for its high reasoning capabilities relative to its small size, allowing it to run on consumer hardware.

Fine-Tuning (PEFT): We used LoRA (Low-Rank Adaptation) to adapt the model to the storytelling style.

RAG: Vector retrieval ensures the model adheres to specific historical facts found in rag.pdf rather than relying solely on pre-trained weights.

In [None]:
def load_model():
    print("‚è≥ Loading AI Model... (This may take a minute)")
    try:
        # Load Base Model
        base_model = AutoModelForCausalLM.from_pretrained(
            BASE_MODEL_ID,
            device_map=device,  
            torch_dtype=torch.float16 if device != "cpu" else torch.float32, 
            trust_remote_code=True,
            attn_implementation="eager" 
        )
        
        # Load Adapter (If available, otherwise fallback to base)
        if os.path.exists(ADAPTER_PATH):
            model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)
            tokenizer = AutoTokenizer.from_pretrained(ADAPTER_PATH, trust_remote_code=True)
            print(f"‚úÖ Fine-Tuned Adapter Loaded on {device.upper()}!")
        else:
            print("‚ö†Ô∏è Adapter not found. Loading Base Model only.")
            model = base_model
            tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)

        return model, tokenizer
    except Exception as e:
        print(f"Error loading model: {e}")
        return None, None

model, tokenizer = load_model()

# 4. Core Implementation
## a. Inference Logic & Prompt Engineering
Here we define the generation function. We use a structured system prompt that forces the model to separate the "STORY" from the "MEANING." The pipeline performs the following:

1. Accepts a user query.

2. Retrieves relevant context from the FAISS database.

3. Constructs a strict prompt with the retrieved canon.

4. Generates the response.

In [None]:
def generate_story(user_request, theme="Any", figure="Any"):
    if not model or not db:
        return "System not initialized."

    # 1. Construct constraints
    constraints = []
    if theme != "Any": constraints.append(f"Focus on theme: {theme}")
    if figure != "Any": constraints.append(f"Include figure: {figure}")
    constraint_str = ". ".join(constraints)
    augmented_prompt = f"{user_request}. (Context instructions: {constraint_str})"
    
    print(f"\nüìù Processing Request: {user_request}")
    print(f"üîé Retrieving context for: {augmented_prompt[:50]}...")

    # 2. Retrieve Context
    retrieved_docs = db.similarity_search(augmented_prompt, k=4)
    context_str = "\n".join([f"[Source Chunk]: {d.page_content}" for d in retrieved_docs])
    
    # 3. System Prompt Engineering
    system_prompt = (
        "You are a Sapt-Sindhu civilizational storytelling model.\n"
        "You do not invent morals. You only reason using the retrieved moral‚Äìhistorical canon provided below.\n"
        "Strictly follow this structure:\n"
        "dont use historical figures directly as characters.\n"
        "STORY\n[Write the story here]\n"
        "MEANING\n[Explain the morals here]\n"
    )
    
    user_prompt = f"""Use the following Retrieved Canon to answer the User Request.

    Retrieved Canon:
    {context_str}

    User Request: {user_request}
    Additional Constraints: {constraints}
    """
    
    chat_struct = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
    
    input_text = tokenizer.apply_chat_template(chat_struct, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(input_text, return_tensors="pt").to(device)
    
    # 4. Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=800,
            do_sample=True,
            temperature=0.6,
            use_cache=True
        )
    
    generated_text = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
    return generated_text

# 5. Evaluation & Analysis
## a. Sample Outputs
Below, we demonstrate the system's performance by generating stories with specific constraints.

In [None]:
# Test Case 1: General Story
print("--- TEST CASE 1: Courage ---")
output_1 = generate_story("Tell me a story about courage", theme="Saint-Warrior Ideal")
print(textwrap.fill(output_1, width=80))

# Test Case 2: Specific Figure Context
print("\n--- TEST CASE 2: Compassion ---")
output_2 = generate_story("A story about forgiveness", figure="Bhai Mardana", theme="Wand Chako")
print(textwrap.fill(output_2, width=80))

## b. Performance Analysis
Context Adherence: The RAG system successfully injects specific vocabulary (verified by the presence of terms found in the chunks).

Instruction Following: The model adheres to the "STORY" and "MEANING" split defined in the system prompt.

Latency: takes upto 3 minutes in macbook m2 air 16gb.

# 6. Ethical Considerations & Responsible AI
Bias Mitigation: Synthetic data can often contain biases. The "Retrieved Canon" approach minimizes hallucination by grounding the model in approved text, rather than letting it invent historical facts.

Representation: We ensure that the moral themes (e.g., Anekantavada - many-sided truth) promote inclusivity.

Limitations: The model relies heavily on the quality of the PDF. If the PDF is sparse, the story quality degrades.

# 7. Conclusion & Future Scope
a. Summary
We successfully deployed a local, privacy-focused storytelling AI that preserves Sapt-Sindhu narratives.

b. Future Improvements
Voice Integration: Adding Text-to-Speech (TTS) for oral storytelling.

Multilingual Support: Extending the tokenizer to support Gurmukhi script inputs.