<a href="https://colab.research.google.com/github/Yashh0/novel_chatbot/blob/main/chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install langchain langchain-community langchain-text-splitters
!pip install chromadb
!pip install sentence-transformers
!pip install pypdf


Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.11.0->sentence-transforme

# https://www.planetebook.com/free-ebooks/the-adventures-of-huckleberry-finn.pdf

LINK OF THE DOCUMENT

In [11]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
import os

# 1. Load the PDF
pdf_path = "/content/the-adventures-of-huckleberry-finn.pdf"  # Change if needed
loader = PyPDFLoader(pdf_path)
pages = loader.load()

# 2. Smart Chunking: Sentence-aware splitting
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", ".", " "],
    chunk_size=800,   # More context
    chunk_overlap=150,  # Preserve flow
)

documents = text_splitter.split_documents(pages)

# 3. Use a high-quality embedding model
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# 4. Save to vector store
vectordb = Chroma.from_documents(
    documents,
    embedding=embedding_model,
    persist_directory="vectorstore"  # folder where vector DB will be saved
)

vectordb.persist()
print("‚úÖ Vector store created and saved to 'vectorstore/' folder.")


‚úÖ Vector store created and saved to 'vectorstore/' folder.


In [16]:
!pip install langchain langchain-community langchain-text-splitters
!pip install chromadb sentence-transformers pypdf
!pip install groq
!pip install langchain_groq
!pip install --upgrade langchain-community




In [12]:
import os
os.environ["GROQ_API_KEY"] = "Please use your GROQ API"


In [None]:
import os
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain_groq import ChatGroq
from dotenv import load_dotenv

# Load API key
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")

# Load existing vector store
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = Chroma(persist_directory="./chroma_db", embedding_function=embedding)

# Retriever
retriever = vector_store.as_retriever(search_kwargs={'k': 5})

# Prompt template
prompt_template = """
You are a helpful AI assistant. Use the context below to answer the question as accurately as possible.
If the answer cannot be found in the context, say "I cannot answer based on the document."

Context:
{context}

Question: {question}
Answer:
"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

# Groq LLM
llm = ChatGroq(groq_api_key=groq_api_key, model_name="meta-llama/llama-4-scout-17b-16e-instruct")

# QA Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=False,
    chain_type_kwargs={"prompt": prompt}
)

# Chat loop
print("üß† RAG Chatbot is ready. Ask anything from the document. Type 'exit' to quit.\n")
while True:
    query = input("You: ")
    if query.lower() == "exit":
        break
    print("ü§ñ Bot: Thinking...\n")
    response = qa_chain.invoke({"query": query})
    print("üß† Answer:")
    print(response['result'], "\n")


üß† RAG Chatbot is ready. Ask anything from the document. Type 'exit' to quit.

You: what is the name of the book
ü§ñ Bot: Thinking...

üß† Answer:
The Adventures of Huckleberry Finn. 

You: who is the main character in the book
ü§ñ Bot: Thinking...

üß† Answer:
The main character in the book is not explicitly stated, but based on the context provided, it appears to be Huckleberry Finn, often referred to as Huck. The narrative is told from his perspective, and he mentions "Tom and me" indicating that he is likely the main character. 

You: tell me about one of the story
ü§ñ Bot: Thinking...

üß† Answer:
The story appears to be an excerpt from "The Adventures of Huckleberry Finn" by Mark Twain. 

One part of the story is about the narrator (Huck Finn) who gets separated and goes to a house where he meets an old lady, Rachel, and her son Buck. They take Huck in, give him dry clothes to wear, and provide him with food. Buck and Huck later escape together, fleeing down a river on a 

# New section