# RAG Pipeline Testing

This notebook is for testing and developing the RAG  pipeline components.

## Setup & Imports

In [23]:
import sys
from pathlib import Path

# Add scripts directory to path
project_root = Path.cwd().parent
scripts_path = project_root / "scripts"
sys.path.insert(0, str(scripts_path))

In [24]:
# Import RAG components
from embedding import EmbeddingModel
from vector import FaissVectorStore
from llm_wrapper import generate_answer, DEFAULT_LLM_MODEL

## 1. Test Embedding Model

In [25]:
# Initialize embedding model
embedding_model = EmbeddingModel()

print(f"Model: {embedding_model.model_name}")
print(f"Embedding dimension: {embedding_model.dimension}")

Loading weights: 100%|██████████| 103/103 [00:00<00:00, 644.78it/s, Materializing param=pooler.dense.weight]                             
BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


Model: sentence-transformers/all-MiniLM-L6-v2
Embedding dimension: 384


In [26]:
# Test embedding generation
test_texts = [
    "What are the citizenship requirements in Nepal?",
    "How to register a company in Nepal?",
    "What is the punishment for cyber crime?"
]

embeddings = embedding_model.embed(test_texts)

## 2. Test Vector Store

In [27]:
# Initialize vector store with paths relative to project root
index_path = project_root / "database" / "legal_faiss.index"
metadata_path = project_root / "database" / "legal_faiss_meta.json"

vector_store = FaissVectorStore(
    index_path=index_path,
    metadata_path=metadata_path
)

In [28]:
# Load the vector store (if index exists)
try:
    vector_store.load()
    print(f"Vector store loaded successfully!")
    print(f"Number of documents: {len(vector_store.metadata)}")
except FileNotFoundError as e:
    print(f"Vector store not found: {e}")
    print("You may need to build the index first using vector_store.build()")

Vector store loaded successfully!
Number of documents: 4685


## 3. Test LLM Wrapper

In [29]:
# Prepare context from search results
context_chunks = [result['text'] for result in results]

print(f"Number of context chunks: {len(context_chunks)}")

Number of context chunks: 3


In [30]:
# Generate answer using LLM
question = "What are the requirements to obtain Nepali citizenship?"

try:
    answer = generate_answer(
        question=question,
        context_chunks=context_chunks,
        max_tokens=512,
        temperature=0.3
    )
    print(f"Question: {question}\n")
    print(f"Answer:\n{answer}")
except Exception as e:
    print(f"Error generating answer: {e}")

Question: What are the requirements to obtain Nepali citizenship?

Answer:
According to the Nepal Citizenship Act 2063 (2006), the requirements to obtain Nepali citizenship are as follows:

1. **Acquisition of Nepali Citizenship by Descent:**
   - A person born to a Nepali citizen father or mother shall be a citizen of Nepal by descent (Section 3, Sub-section (1)).
   - A child born out of wedlock to a Nepali female citizen and a foreign national shall be considered a citizen in accordance with Section 5, Sub-section (2).
   - Every child found in the territory integrated with Nepal shall be considered a citizen of Nepal from the date the Government of Nepal publishes a notification as prescribed in the Nepal Gazette (Section 3, Sub-section (3)).

2. **Filing of Application to obtain Citizenship Certificate:**
   - A person attaining the age of 16 years desiring to acquire citizenship of Nepal by descent shall have to file an application in the prescribed form along with the following 

## 4. Full RAG Pipeline Test

In [31]:
def ask_legal_question(question: str, top_k: int = 5) -> str:
    
    results = vector_store.search(
        query=question,
        embedding_model=embedding_model,
        top_k=top_k
    )
    
    context_chunks = [result['text'] for result in results]
    
    answer = generate_answer(
        question=question,
        context_chunks=context_chunks
    )
    
    return answer, results

In [33]:
# Test the full pipeline with different questions
test_questions = [
    "What is the punishment for cyber crime in Nepal?",
    "How to register a company in Nepal?",
    "What are the fundamental rights in Nepal's constitution?"
]

for q in test_questions:  
    print(f"Q: {q}")
    try:
        answer, sources = ask_legal_question(q)
        print(f"\nA: {answer}")
    except Exception as e:
        print(f"Error: {e}")

Q: What is the punishment for cyber crime in Nepal?

A: I don't have enough information to answer this question. The provided context does not mention cybercrime explicitly.
Q: How to register a company in Nepal?

A: To register a company in Nepal, you need to submit the following documents to the Office, along with the application:

(a) Permission obtained by the foreign company from the competent authority to carry on its business or transaction in Nepal;
(b) Copies of the charter, certificate of incorporation, memorandum of association, articles of association of the company, and Nepalese translation thereof;
(c) Full name, address of the registered office and principal place of business of the company, date of incorporation of the company, description of the paid-up capital and major objectives of such company;
(d) Names, addresses of directors, manager, company secretaries or main officers of the company and description of their citizenship;
(e) Name and address of the person resi