# FinRAG Reference Implementation: Local Test

This notebook demonstrates a **Minimum Viable Financial RAG (FinRAG)** system running entirely locally using:
- **ChromaDB** (Local Vector Database)
- **LangChain** (Orchestration)
- **Shiva-k22/gemma-FinAI** (Finetuned Finance LLM)

**Goal**: Demonstrate the full pipeline: Ingestion -> Chunking -> Vector Storage -> Retrieval -> Answer.

In [1]:
# Cell 1: Environment Setup & Auto-Install
# Ensure dependencies are installed in the CURRENT Jupyter kernel.

import sys
import subprocess

print("Checking dependencies...")
try:
    import chromadb
    print("chromadb is already installed.")
except ImportError:
    print("Installing chromadb...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "chromadb"])
    import chromadb
    print("chromadb installed successfully.")

import os
import torch
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings, HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

import warnings
warnings.filterwarnings('ignore')

print("Environment ready. FinRAG initialized.")

Checking dependencies...
chromadb is already installed.
Environment ready. FinRAG initialized.


In [2]:
# Cell 2: Load Finetuned Finance Model

MODEL_NAME = "Shiva-k22/gemma-FinAI"
print(f"Loading {MODEL_NAME}...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16 if torch.cuda.is_available() or torch.backends.mps.is_available() else torch.float32,
    device_map="auto"
)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.1, 
    return_full_text=False
)

llm = HuggingFacePipeline(pipeline=pipe)
print("Model loaded successfully.")

Loading Shiva-k22/gemma-FinAI...


`torch_dtype` is deprecated! Use `dtype` instead!
Device set to use mps


Model loaded successfully.


In [4]:
# Cell 3: Define Ingestion Pipeline

def ingest_document(file_path):
    print(f"Ingesting {file_path}...")
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File not found: {file_path}")
    
    # 1. Load
    loader = PyPDFLoader(file_path)
    pages = loader.load()
    print(f" -> Loaded {len(pages)} pages.")
    
    # 2. Split (Financial Aware)
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=100
    )
    chunks = text_splitter.split_documents(pages)
    
    # 3. Add Metadata
    for i, chunk in enumerate(chunks):
        chunk.metadata["chunk_id"] = i
        chunk.metadata["source"] = os.path.basename(file_path)
        if "page" not in chunk.metadata:
            chunk.metadata["page"] = "Unknown"
            
    print(f" -> Created {len(chunks)} chunks with metadata.")
    return chunks

print("Ingestion function defined.")

Ingestion function defined.


In [5]:
# Cell 4: Initialize Embeddings & Vector Store

EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
embedding_model = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL)

# Local Vector Store (ChromaDB)
vectorstore = None 
print("Embeddings ready. Vector Store waiting for data.")

Embeddings ready. Vector Store waiting for data.


In [7]:
# Cell 5: Perform Ingestion (User Action)
# Specify your file here.

FILE_TO_INGEST = "/Users/burra/FinSmartAI/Annual-Report-2024-25.pdf"

try:
    # 1. Ingest chunks
    chunks = ingest_document(FILE_TO_INGEST)
    
    # 2. Store in Chroma
    if vectorstore is None:
        vectorstore = Chroma.from_documents(
            documents=chunks, 
            embedding=embedding_model,
            collection_name="finrag_test"
        )
    else:
        vectorstore.add_documents(chunks)
        
    print(f"SUCCESS: Indexed {len(chunks)} chunks from {FILE_TO_INGEST} into Vector Store.")
    
except Exception as e:
    print(f"Ingestion skipped/failed: {e}")
    if "not found" in str(e).lower():
         print("Please ensure the PDF file exists in this folder to proceed with real data.")

Ingesting /Users/burra/FinSmartAI/Annual-Report-2024-25.pdf...
 -> Loaded 66 pages.
 -> Created 233 chunks with metadata.
SUCCESS: Indexed 233 chunks from /Users/burra/FinSmartAI/Annual-Report-2024-25.pdf into Vector Store.


In [8]:
# Cell 6: Retrieval Logic (FinRAG)

def detect_intent(question):
    question = question.lower()
    if any(x in question for x in ["summarize", "overview", "brief", "report"]):
        return "SUMMARY"
    return "SPECIFIC"

def get_context(question):
    if vectorstore is None:
        return ""
        
    intent = detect_intent(question)
    k = 5 if intent == "SUMMARY" else 3
    
    docs = vectorstore.similarity_search(question, k=k)
    
    context_str = ""
    for d in docs:
        source = d.metadata.get('source', 'Unknown')
        page = d.metadata.get('page', 'N/A')
        context_str += f"[Source: {source} | Page: {page}]\n{d.page_content}\n\n"
    return context_str

print("Retrieval Logic Ready.")

Retrieval Logic Ready.


In [9]:
# Cell 7: FinRAG System Prompt

from langchain.prompts import PromptTemplate

FINRAG_PROMPT = PromptTemplate(
    template="""You are a senior financial analyst. Your role is to provide strict, factual answers based ONLY on the provided context.

RULES:
1. NO FLUFF: Do not use phrases like "strong performance" unless you cite numbers.
2. GROUNDING: Every claim must use data from the context.
3. CITATION: Mention [Source: Doc | Page: X] for every fact.
4. IF UNCERTAIN: Say "Information not available in the documents."

CONTEXT:
{context}

QUESTION: 
{question}

ANSWER:""",
    input_variables=["context", "question"]
)

print("Prompt Defined.")

Prompt Defined.


In [10]:
# Cell 8: Generation Function

def answer_question(question):
    context = get_context(question)
    if not context.strip():
         return "[System] No documents ingested or no context found."
         
    prompt_text = FINRAG_PROMPT.format(context=context, question=question)
    
    print(f"Processing: {question}...")
    response = llm.invoke(prompt_text)
    return response

print("Q&A Function ready.")

Q&A Function ready.


In [11]:
# Cell 9: Run Tests

q1 = "Summarize the key financial highlights"
ans1 = answer_question(q1)
print("\n--- Answer 1 ---")
print(ans1)

q2 = "What are the main risks mentioned?"
ans2 = answer_question(q2)
print("\n--- Answer 2 ---")
print(ans2)

Processing: Summarize the key financial highlights...

--- Answer 1 ---

Key financial highlights:
* Consolidated net profit of Rs. 7,796 crore in FY24, up 18% YoY.
* Gross NPAs declined to 6.1% from 7.3% in FY23.
* Net NPAs declined to 4.9% from 6.1% in FY23.
* Return on Equity (RoE) improved to 14.7% from 13.5% in FY23.
* Return on Assets (RoA) improved to 2.8% from 2.5% in FY23.
* Net interest margin (NIM) improved to 3.5% from 3.2% in FY23.
* Capital adequacy ratio improved to 15.2% from 14.8% in FY23.
* Dividend payout ratio increased to 40% from 30% in FY23.
* Total assets increased to Rs. 19,042 crore from Rs. 16,598 crore in FY23.
* Total deposits increased to Rs. 6,59,815 crore from Rs. 5,62,538 crore in FY23.
* Total loans and advances increased to Rs. 4,70,109 crore from Rs. 3,62,838 crore in FY23.
* Number of RRBs earning profit increased to 40 from 37 in FY23.
* Amount of profit earned by RRBs increased to Rs. 7,796 crore from Rs. 6,178 crore in FY23.
* Number of RRBs incu