This notebook builds a Retrieval-Augmented Generation (RAG) chatbot that retrieves relevant information from PDF documents and generates responses using a Language Model (LLM). The chatbot processes PDFs, extracts text, embeds it in a vector database, and performs semantic search for accurate answers.

## 1- Import Libraries 

In [46]:
import os
from langchain_fireworks import ChatFireworks
from langchain_fireworks import Fireworks
from langchain_fireworks import FireworksEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
import warnings
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from langchain.retrievers import EnsembleRetriever
import re
from dateparser import parse
from dateparser.search import search_dates
from datetime import datetime

## 2- Set API key 

In [2]:
# Set the API key
os.environ["FIREWORKS_API_KEY"] = "fw_3ZnE89uyrvBT8Xvdk1Yr2Qdr"

llm = Fireworks(api_key="fw_3ZnE89uyrvBT8Xvdk1Yr2Qdr", model="accounts/fireworks/models/deepseek-v3")
response = llm.invoke("Hello, how are you?")
print(response)

 Today I’ll post a tutorial where I’ll explain step by step how


This code snippet sets up authentication, initializes a language model (DeepSeek-v3 from Fireworks AI), sends a text input, and prints the model's response. The invoke method is used to generate a reply based on the input prompt.

## 3- Initialize embeddings

In [3]:
embeddings = FireworksEmbeddings(api_key="fw_3ZnE89uyrvBT8Xvdk1Yr2Qdr")

## 4- Reading pdfs

In [4]:
pdf_files = [
  r"How-to-Manage-your-Finances.pdf",
            r"pdf_50_20_30.pdf",
            r"Personal-Finance-Management-Handbook.pdf",
            r"reach-my-financial-goals.pdf",
            r"tips-to-manage-your-money.pdf",
            r"beginners-guide-to-saving-2024.pdf",
            r"40MoneyManagementTips.pdf"
]

## 5-Spliting documents into smaller meanigful chunks

In [5]:
# Load and split PDF
documents = []
for pdf in pdf_files:
    loader = PyPDFLoader(pdf)
    documents.extend(loader.load())

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)
#  Generate embeddings
def batch_texts(texts, batch_size=256):
    for i in range(0, len(texts), batch_size):
        yield texts[i:i + batch_size]

batch_size = 256
chunk_batches = list(batch_texts(chunks, batch_size))

all_embeddings = []
for batch in chunk_batches:
    batch_texts = [chunk.page_content for chunk in batch]
    batch_embeddings = embeddings.embed_documents(batch_texts)
    all_embeddings.extend(batch_embeddings)

## 6- Store chunks in vectorestore FIASS

In [6]:
#  Store in FAISS
vector_store = FAISS.from_embeddings(
    text_embeddings=list(zip([chunk.page_content for chunk in chunks], all_embeddings)),
    embedding=embeddings
)

retriever = vector_store.as_retriever(search_kwargs={"k": 5})  # Retrieve top 5 relevant chunks

## 7- Create memory

In [7]:
#  Initialize memory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


## 8- Define a prompt templete

In [8]:
#  Step 6: Define prompt template for financial advice
finance_template = PromptTemplate(
    input_variables=["context", "question", "chat_history"],
    template="""
You are an expert financial advisor. Use the chat history and retrieved context to answer the question in a conversational manner.

Chat History:
{chat_history}

Context:
{context}

Question:
{question}

Answer:
"""
)


## 9- Intialize LLM model (deeoseek)

In [26]:
#  Initialize Fireworks LLM
llm = Fireworks(
    api_key="fw_3ZnE89uyrvBT8Xvdk1Yr2Qdr",
    model="accounts/fireworks/models/deepseek-v3",
    temperature=1.0,
    max_tokens=1024
)


## 10- Create converational RAG pipline 

In [27]:
#  Step 8: Create Conversational RAG Pipeline
conversational_rag = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",
    memory=memory,
    combine_docs_chain_kwargs={"prompt": finance_template}
)

## Example follow-up questions

In [28]:
query_1 = "What are the best strategies for saving money?"
response_1 = conversational_rag.invoke({"question": query_1})
print(response_1["answer"])



 From our discussion, the best two strategies for saving money are: 

1. **Pay Yourself First**: Automatically transfer a portion of your income into savings as soon as you receive your paycheck. This ensures you save consistently and prioritizes your financial future. 

2. **Budgeting**: Create a budget to track your income and expenses. It helps you identify areas to cut back and allocate funds toward your savings goals. 

These two strategies work together—budgeting determines how much you can save, and paying yourself first ensures it happens consistently. Start small, stay disciplined, and build your savings over time! Let me know if you’d like help implementing these strategies.


In [12]:
query_2="Just choose the best two strategies from the previous question"
response_2=conversational_rag.invoke({"question":query_2})
print(response_2["answer"])

Based on our discussion, the two most effective strategies for saving money are:

1. **Pay Yourself First**: This is a mindset shift where you treat savings as a non-negotiable expense, just like rent or utilities. By automatically transferring a portion of your income into a savings account before you spend on anything else, you ensure consistent saving. This method works because it removes the temptation to spend what you could save, and it builds the habit of saving over time. For example, even small amounts, like $10 a month, can grow significantly with compound interest.

2. **Budgeting**: Creating and sticking to a budget is essential for identifying where your money goes and finding opportunities to save. By tracking your income and expenses, you can pinpoint areas where you can cut back, like reducing dining out or entertainment costs. Budgeting gives you control over your finances, allowing you to allocate funds toward your savings goals effectively.

These two strategies work

## 11- load csv dataset

In [13]:
df=pd.read_csv(r"cleaned_finance_data.csv")

In [29]:
df.head()

Unnamed: 0,Date,Description,Debit,Credit,Amount,sub-category,Category,Category Type
0,2022-01-02,salary,0,20900,20900,salary,salary,income
1,2022-01-14,rent received,0,3112,3112,house rent,salary,income
2,2022-01-15,agriculture,0,18000,18000,cultivator,salary,income
3,2022-01-16,e nagarpalika,4736,0,-4736,taxes,living expenses,expenses
4,2022-01-23,other,250,0,-250,other,other expenses,expenses


## 12- Convert dataset into documents

In [30]:
documents = df.apply(lambda row: f"Date: {row['Date']}, Description: {row['Description']}, "
                                 f"Debit: {row['Debit']}, Credit: {row['Credit']}, Amount: {row['Amount']}, "
                                 f"Sub-category: {row['sub-category']}, Category: {row['Category']}, "
                                 f"Category Type: {row['Category Type']}", axis=1).tolist()


## 13- Generate embeddings 

In [31]:
csv_embeddings = embeddings.embed_documents(documents)

## 14- Create vectorstore 

In [32]:
# Create FAISS vector database
csv_vector_store = FAISS.from_embeddings(
    text_embeddings=list(zip(documents, csv_embeddings)),
    embedding=embeddings
)

# Create a retriever for searching
csv_retriever = csv_vector_store.as_retriever(search_kwargs={"k": 5})

## 15- Merge csv and pdf retrieval 

In [33]:
# Combine both retrievers (PDF and CSV)
combined_retriever = EnsembleRetriever(retrievers=[retriever, csv_retriever], weights=[0.5, 0.5])


## 16- Edit pipline

In [34]:
#  RAG Pipeline
conversational_rag = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=combined_retriever,
    chain_type="stuff",
    memory=memory,
    combine_docs_chain_kwargs={"prompt": finance_template}
)

In [42]:
query = "How much i spend last month?"
response = conversational_rag.invoke({"question": query})
print(response["answer"])


In [43]:
handle_query(query1)

" In October 2022, your total spending was **£1,247**. This amount was categorized under fashion, specifically for clothes. Based on your past spending, clothing seems to be a recurring expense for you. Let me know if you'd like help managing this category or creating a budget to optimize your expenses!"

In [47]:

def extract_transaction_data(text: str):
    import re
    from dateparser.search import search_dates
    from datetime import datetime

    # Extract amount
    amount_match = re.search(r"\b\d+(\.\d{1,2})?\b", text)
    amount = float(amount_match.group()) if amount_match else None

    # Extract date
    parsed_date = search_dates(text, settings={'RELATIVE_BASE': datetime.now()})
    date = parsed_date[0][1] if parsed_date else None

    return {
        "description": text.strip(),  # <- use the original input as description
        "amount": amount,
        "date": date
    }


In [48]:
def process_user_input(text):
    extracted = extract_transaction_data(text)
    description = extracted['description']
    amount = extracted['amount']
    date = extracted['date']

    # Get category from model (or fallback to embedding match)
    category = classify_transaction(description)

    return {
        "description": description,
        "amount": amount,
        "date": date.strftime('%Y-%m-%d') if date else None,
        "category": category
    }


In [50]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

def classify_transaction(description):
    candidate_labels = ["Food", "Health", "Transportation", "Entertainment", "Fashion", "Lifestyle", "Education"]
    result = classifier(description, candidate_labels)
    return result['labels'][0]


Device set to use cpu


In [51]:
process_user_input("I bought milk for 150 last Monday")

{'description': 'I bought milk for 150 last Monday',
 'amount': 150.0,
 'date': '2025-04-14',
 'category': 'Food'}

In [52]:
def handle_user_input(user_input):
    # Heuristic: treat it as a transaction if it has an amount + "for/with" + time expression
    if any(word in user_input.lower() for word in ["today", "yesterday", "last", "ago"]) or re.search(r"\b\d+(\.\d{1,2})?\b", user_input):
        try:
            structured_data = process_user_input(user_input)
            return f"✅ Added {structured_data['amount']} to {structured_data['category']} on {structured_data['date']}."
        except Exception as e:
            return f"❌ Failed to parse transaction: {str(e)}"
    else:
        # fallback to chatbot
        response = conversational_rag.invoke({"question": user_input})
        return response["answer"]


In [53]:
print(handle_user_input("I bought a burger for 75 last Friday"))

print(handle_user_input("How can I start saving money for a house?"))


✅ Added 75.0 to Food on 2025-04-11.
 Saving for a house is a big financial goal, but with the right strategies, it’s absolutely achievable. Here’s a step-by-step guide to help you get started:

### 1. **Set a Clear Savings Goal**
   - Start by determining how much you need for a down payment. Typically, this is around **10-20%** of the home’s price. For example, if you’re looking to buy a £250,000 home, aim to save between **£25,000 and £50,000**.
   - Include additional costs like closing fees, moving expenses, and potential home repairs in your target amount.

### 2. **Create a Realistic Timeline**
   - Decide when you want to buy the house and calculate how much you need to save each month to reach your goal. For instance, if you want to buy a home in 5 years and need to save £25,000, you’ll need to save around **£417 per month**.

### 3. **Open a Dedicated Savings Account**
   - Open a **high-yield savings account** or a **tax-free savings account (TFSA)** specifically for your hou

In [55]:
print(handle_user_input("how i can manage my finance"))

Managing your finances effectively begins with understanding your current financial situation and setting clear goals. Financial planning involves making a budget, tracking your spending, and making sure your expenses do not exceed your income. It also involves saving for the future, investing wisely, and managing your debt.

To help you manage your finances effectively, begin by creating a budget that outlines all your sources of income as well as your monthly expenses. Then, track your spending regularly to see where your money is going and adjust your budget accordingly to meet your financial goals.

If you have debt, it is important to prioritize paying off high-interest debt first, such as credit card debt. Additionally, you should create an emergency fund to cover unexpected expenses, as well as start saving for long-term goals such as retirement or buying a house. It is also a good idea to consult with a financial advisor if you need help with investment decisions or other compl