

# 🔹 Interview-Style Q\&A – Invoice Summarization and Chatbot Assistant

**Q1. Can you explain the Invoice Summarization and Chatbot Assistant project?**
**A:**
“At Globant, I built a Generative AI-powered chatbot that could summarize invoices and handle finance-related queries in natural language. The system used LangChain with OpenAI LLMs, combined with Chroma DB for storing past invoices. It also generated automated finance reports, which reduced summarization time by nearly 40% compared to manual processes.”

---

**Q2. What was the architecture of the system?**
**A:**
“The pipeline was:

1. Invoices (PDF/Excel/structured) were ingested and parsed into text.
2. Chunks were embedded using OpenAI embeddings and stored in Chroma DB.
3. When a user asked a question — for example, *‘What’s the total tax across invoices this month?’* — the system retrieved relevant chunks from Chroma.
4. LangChain passed the retrieved context into GPT, which produced a concise answer or summary.
5. The backend, built on FastAPI, served endpoints for `/ingest_invoice`, `/summarize_invoice`, and `/query` for finance teams.”

---

**Q3. How did you optimize invoice summarization for speed and cost?**
**A:**
“I applied text preprocessing to remove redundant details before sending invoices to the LLM, which cut token usage. I also cached embeddings for previously processed invoices in Chroma DB to avoid re-processing. Additionally, frequent queries were cached using Redis. These optimizations reduced summarization time by \~40% and cut inference costs.”

---

**Q4. How did you ensure accuracy in financial summaries?**
**A:**
“I used Retrieval-Augmented Generation (RAG) with strict prompting, ensuring GPT only answered based on retrieved invoice text. I also enforced structured outputs (JSON with fields like invoice\_id, amount, tax, due\_date). Finally, the chatbot returned citations from invoice sections, so finance users could verify answers.”

---

**Q5. What business impact did this project deliver?**
**A:**
“The assistant automated invoice summarization and query handling, reducing manual finance team effort. It accelerated report preparation by \~40% and improved accuracy in expense tracking. This freed up finance staff to focus on higher-value analysis rather than repetitive summarization.”

---

**Q6. How would you extend this solution in the future?**
**A:**
“Future improvements could include multimodal support to directly process scanned invoice images using OCR, integrating with ERP systems for real-time data, and adding anomaly detection to flag unusual amounts or missing tax details. This would make the assistant not just reactive but proactive in financial oversight.”

---

# 🔹 Demo Code – `invoice_assistant.py`

Here’s a **minimal working demo** (you can expand later). It uses **LangChain + OpenAI + Chroma + FastAPI**.

```python
# invoice_assistant.py

from fastapi import FastAPI, UploadFile
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate
import os

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your_api_key_here"

app = FastAPI(title="Invoice Summarization & Query Assistant")

# Embeddings + Vector DB setup
embedding_model = OpenAIEmbeddings()
persist_dir = "invoice_db"
vectorstore = Chroma(persist_directory=persist_dir, embedding_function=embedding_model)

# LLM setup
llm = ChatOpenAI(model="gpt-4", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Prompt template for invoice summarization
SUMMARY_PROMPT = PromptTemplate(
    input_variables=["context"],
    template="Summarize the following invoice text:\n\n{context}\n\nProvide key details like invoice id, vendor, total amount, tax, and due date."
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff"
)

@app.post("/ingest_invoice")
async def ingest_invoice(file: UploadFile):
    """Ingest invoice text into Chroma DB"""
    text = await file.read()
    text = text.decode("utf-8")  # Assuming text/CSV for demo. Use PyPDF2/Docx for real invoices.

    # Split and embed
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    docs = splitter.create_documents([text])
    vectorstore.add_documents(docs)
    vectorstore.persist()
    return {"status": "Invoice ingested successfully", "chunks": len(docs)}

@app.get("/summarize_invoice")
async def summarize_invoice(invoice_id: str):
    """Summarize invoice from DB"""
    results = vectorstore.similarity_search(invoice_id, k=1)
    if not results:
        return {"error": "Invoice not found"}
    summary = llm.predict(SUMMARY_PROMPT.format(context=results[0].page_content))
    return {"invoice_id": invoice_id, "summary": summary}

@app.get("/query")
async def query_invoice(query: str):
    """Ask questions over invoices"""
    answer = qa_chain.run(query)
    return {"query": query, "answer": answer}
```

---

### 🔹 How to Run

1. Save as `invoice_assistant.py`.
2. Install dependencies:

   ```bash
   pip install fastapi uvicorn langchain langchain-openai chromadb
   ```
3. Run server:

   ```bash
   uvicorn invoice_assistant:app --reload
   ```
4. Test endpoints in browser:

   * `POST /ingest_invoice` → Upload text invoice
   * `GET /summarize_invoice?invoice_id=INV123`
   * `GET /query?query=What is the total tax this month?`

---

