# PuchoNayaSe : Rag Based Legal Advisor

In [3]:
!pip install langchain openai faiss-cpu tiktoken PyMuPDF streamlit



* Langchain : Framework for chainning LLMs and retrivers
* openai : OpenAI API for GPT-3.5/GPT-4
* faiss-cpu : Vector store for efficient similarity search
* tiktoken : Tokenizer for OpenAI models
* PyMuPDF : For extracting text from PDF files
* streamlit : for frontend

# Extracting text from PDFs

In [6]:
import fitz # from PyMuPDF
import os
# Legal ADVISIOR/
def extract_text_from_pdf_file(folder_path):
    text_=[]
    for filename in os.listdir(folder_path):
        if filename.endswith('.pdf'):
            doc=fitz.open(os.path.join(folder_path,filename))
            text=""
            for page in doc:
                text+=page.get_text()
            text_.append(text)
    return text_

* This function reads the text from pdf files
* fitz extracts the plain text from pdf files by using get_text function

# Embedding

In [9]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# loading textual data
texts=extract_text_from_pdf_file("/Users/parthverma/Desktop/Programming/Legal ADVISIOR")

# making chunks of textual data
splitter=RecursiveCharacterTextSplitter(chunk_size=800,chunk_overlap=100)

chunks=[]
for doc in texts:
    chunks.extend(splitter.split_text(doc))
print(f"total number of Chunks Created : {len(chunks)} ")

total number of Chunks Created : 3159 


* LangChain’s RecursiveCharacterTextSplitter splits large texts into overlapping chunks : This helps avoid LLM context size issues and ensures continuity.

* Chunk_size=800 : max token-like length

* chunk_overlap=100 : small overlap for context continuity


# Embedd chunks and store in FAISS

In [12]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from tqdm import tqdm

# Load huggingface embeddings
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create the FAISS vector store
db = FAISS.from_texts(list(tqdm(chunks)), embedding_model)

# Save the vector store to disk
db.save_local("legal_faiss_index")

  embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3159/3159 [00:00<00:00, 2133280.69it/s]


In [13]:
pip install tqdm

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


# Loading Vectore Store and Building Rag Pipeline

In [15]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Reloading embedding model
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Load the saved FAISS index
vectordb = FAISS.load_local("legal_faiss_index", embedding_model,allow_dangerous_deserialization=True)


# Build a Retrieval-Based QA Chain (Without OpenAI)

In [17]:
from transformers import pipeline

# Loading local model for text generation
qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-base", tokenizer="google/flan-t5-base")

# Creating retriever from FAISS
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

# Custom RAG function
def rag_local_qa(query, retriever, qa_pipeline):
    # Step 1: Retrieve top relevant documents
    docs = retriever.get_relevant_documents(query)

    # Step 2: Combine contents
    context = "\n".join([doc.page_content for doc in docs])

    # Step 3: Prepare prompt
    prompt = f"Answer the question based on the following context:\n{context}\n\nQuestion: {query}"

    # Step 4: Generate answer
    result = qa_pipeline(prompt, max_length=256, do_sample=False)[0]['generated_text']

    return result, docs  # answer + source docs


Device set to use mps:0


# Asking Questions

In [19]:
query = "What is the punishment for theft under IPC?"
query1="What is the punishment for murder under IPC?"
answer, sources = rag_local_qa(query1, retriever, qa_pipeline)

# Print the answer
print("Answer:", answer)

# Print sources
for i, doc in enumerate(sources):
    print(f"\n--- Source {i+1} ---")
    print(doc.page_content[:300])

  docs = retriever.get_relevant_documents(query)
Token indices sequence length is longer than the specified maximum sequence length for this model (1002 > 512). Running this sequence through the model will result in indexing errors
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Answer: 1[imprisonment for life], and shall also be liable to fine

--- Source 1 ---
intended.—If a person, by doing anything which he intends or knows to be likely to cause death, 
commits culpable homicide by causing the death of any person, whose death he neither intends nor knows 
himself to be likely to cause, the culpable homicide committed by the offender is of the descriptio

--- Source 2 ---
years, and shall also be liable to fine. 
307. Attempt to murder.—Whoever does any act with such intention or knowledge, and under such 
circumstances that, if he by that act caused death, he would be guilty of murder, shall be punished with 
imprisonment of either description for a term which may e

--- Source 3 ---
by such act, shall be punished with imprisonment of either description for a term which may extend to 
seven years, or with fine, or with both. 
Illustration 
A, on grave and sudden provocation, fires a pistol at Z, under such circumstances that if he thereby caused death he w

In [20]:
import gradio as gr
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM

# Loads vector DB and retriever
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectordb = FAISS.load_local("legal_faiss_index", embedding_model, allow_dangerous_deserialization=True)
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

# Loads the LLM
model_name = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_length=512)
llm = HuggingFacePipeline(pipeline=pipe)

# Builds QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

Device set to use mps:0
  llm = HuggingFacePipeline(pipeline=pipe)


In [21]:
# Defines chatbot logic
def legal_qa_bot(message):
    try:
        result = qa_chain(message)
        return result["result"]
    except Exception as e:
        return f"❌ Error: {str(e)}"

# Gradio UI
iface = gr.Interface(
    fn=legal_qa_bot,
    inputs=gr.Textbox(lines=2, placeholder="Ask your legal question here..."),
    outputs="text",
    title="PuchoNyaySe : Legal Advisor 👨🏻‍⚖️",
    description="""Get Instant Legal Answers from Indian Law 📚
Ask clear, simple legal questions — no lawyer required!"""
)

iface.launch(share=True)

* Running on local URL:  http://127.0.0.1:7860


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


* Running on public URL: https://64c7f0743013d75c4b.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


