<a href="https://colab.research.google.com/github/Mihawk1891/PR-LLM-and-RAG-based-Invoice-Data-Retrieve-Based-on-User/blob/main/RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Project Title: LLM and RAG-based Invoice Data Retrieve Based on User

LLM and RAG-based Invoice Data Retrieval
This project implements a Python-based solution for extracting and analyzing data from PDF invoices using Large Language Models (LLM) and Retrieval-Augmented Generation (RAG).
Features

PDF text extraction and preprocessing
LLM integration for intelligent data extraction
RAG implementation for enhanced accuracy
User-prompt based information retrieval

Objectives

Process PDF invoices and extract text content
Utilize LLM to interpret and extract key information
Implement RAG to improve extraction accuracy
Provide a user-friendly interface for data retrieval based on prompts

Implementation

PDF Processing: Extract and preprocess text from invoices
LLM Integration: Connect to an LLM API (e.g., GPT-3.5 or GPT-4) for data interpretation
RAG Implementation: Develop a knowledge base and retrieval mechanism for context-aware processing

In [None]:
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
import google.generativeai as genai
from langchain_community.vectorstores import FAISS



  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Set up Google API key
os.environ["GOOGLE_API_KEY"] = "AIzaSyC9iM6ZkCIL_Ugia20S7udaazRwmKrELBA"
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])



In [None]:
# Load Documents using PyPDF LOader
files={"invoice":["invoice (1).pdf","invoice_Aaron Bergman_36258.pdf"]}## Storing of invoices


## Vector Store
start=0

for invoice in files["invoice"]:
    print(invoice)
    loader = PyPDFLoader(invoice) # change document Here
    pages = loader.load()
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    db = FAISS.from_documents(pages,embeddings ) # creating Faiss Vector store

    if start==0:
        db1=db
        start+=1
    else:
        db1.merge_from(db)


relevant_invoices = db1.as_retriever()

loader = PyPDFLoader("invoice_structure_and_terms.pdf") # change document Here
pages = loader.load()
db = FAISS.from_documents(pages,embeddings )
relevant_knowledge = db.as_retriever()



I0000 00:00:1722192943.407917  137320 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
I0000 00:00:1722192943.413433  137320 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


invoice (1).pdf
invoice_Aaron Bergman_36258.pdf


I0000 00:00:1722192945.741817  137320 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 11 0 (offset 0)
Ignoring wrong pointing object 13 0 (offset 0)


In [None]:
prompti = """
User Query: {user_input}

Relevant Invoice Data:
{relevant_invoices}

Relevant Invoice Knowledge:
{relevant_knowledge}

You are a document analysis assistant. Based on the User Query, the relevant invoice data, and the knowledge about invoice structures and terms, please provide a detailed and accurate response.If you need any clarification or additional information, please ask.
The Answer Should be points and then subpoints. No paragraph until it is required.

Focus solely on the document content to answer the user's question.If there is a user query that Do not reference or utilize any external knowledge or information beyond what is explicitly stated within the document then answer 'Please ask questions on the invoice'
Donot tell what the user asked. If the user is vague just provide the answers and also give suggestive questions.


You have the Chat History below:
"""

In [None]:
from langchain_core.runnables import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

In [None]:
# Create prompt template

def llm_ans(chat_input,history):
    global prompti
    prompt2=f"""{prompti}\n {history}"""
    prompt = ChatPromptTemplate.from_template(prompt2)

    llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0)
    chain = (
        {"relevant_invoices": relevant_invoices,"relevant_knowledge":relevant_knowledge ,"user_input": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
        )

    return chain.invoke(chat_input)



In [None]:
chathistory=""

while True:
    user=input()
    ans=llm_ans(user,chathistory)
    chathistory+=f"HumanMessage:{user}\n"
    chathistory+=f"AIMessage:{ans}\n\n"

    print("input:",user)
    print("Output:",ans,"\n\n")



In [None]:
print(chathistory)

HumanMessage:what is product total price
AIMessage:- **Invoice 36258:** The product total price is **$48.71**.  Here's the breakdown:
    - This is the price before the discount and shipping are applied.
    - The final total price for the invoice is **$50.10**.
- **Invoice AMD2-3878067:** The product total price is **₹599.00**. Here's why:
    - This price includes the 18% IGST tax.
    - The total price for the invoice, including shipping, is also **₹599.00**.

Do you need the price before tax or any other specific details? 


HumanMessage:give me earphones
AIMessage:- **Invoice AMD2-3878067:**
    - The user is requesting earphones.
    - The invoice lists "realme Buds 2 Wired in Ear Earphones with Mic (Blue)".
    - The total price for the earphones, including tax, is **₹599.00**.
- **Possible Questions:**
    -  Are you asking about the earphones on Invoice AMD2-3878067?
    - Do you want to know the price of the earphones before tax? 


HumanMessage:tax
AIMessage:- **Invoice AMD2