# ***INTELLIGENT CHATBOT SYSTEM 🧠 BASED ON RAG FOR KNOWLEDGE EXTRACTION FROM PDF DOCUMENTS***

## **STEP 1:** ENVIRONMENT SETUP

In this section, we install all the necessary dependencies for our RAG implementation:
 - **langchain**: Framework for developing applications with Large Language Models
 - **faiss-cpu**: Facebook AI Similarity Search for efficient vector similarity search
 - **openai**: For interacting with OpenAI models (or compatible APIs)
 - **langchain integrations**: To connect with various model providers
 - **sentence-transformers**: For generating embeddings from text
 - **pypdf**: For processing and extracting content from PDF documents

In [1]:
!pip install langchain faiss-cpu openai langchain_openai langchain_community langchain_huggingface sentence-transformers huggingface_hub pypdf

Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting langchain_openai
  Downloading langchain_openai-0.3.15-py3-none-any.whl.metadata (2.3 kB)
Collecting langchain_community
  Downloading langchain_community-0.3.23-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain_huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting pypdf
  Downloading pypdf-5.4.0-py3-none-any.whl.metadata (7.3 kB)
Collecting tiktoken<1,>=0.7 (from langchain_openai)
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_commun

In [2]:
from google.colab import userdata

### Authentication with Hugging Face

Connect to Hugging Face Hub to access embedding models and other AI resources. Replace the placeholder with your actual HF token.

In [4]:
# This step will be needed if you decide to use HuggingFace Models
# else comment the code block below
from huggingface_hub import login

HF_TOKEN = userdata.get('HF_TOKEN')
login(token=HF_TOKEN)

## **STEP 2:** DATA INGESTION AND PREPROCESSING

This is a critical step in any RAG system where we:
1. Load the source documents (PDFs in this case)
2. Split the document into manageable chunks

The chunk size (10000 characters) and overlap (200 characters) are important parameters:
- **Larger chunks** preserve more context but can lead to less relevant retrievals
- **Chunk overlap** helps maintain continuity between chunks
- These parameters should be adjusted based on document type and query patterns

In [6]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader("B-CNA-500-my_torch.pdf")
pages = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=200)
chunks = splitter.split_documents(pages)

## **STEP 3:** VECTOR STORE CREATION

This section transforms document chunks into vector embeddings and stores them for efficient similarity search. Key components:

1. **Embedding Model**: all-MiniLM-L6-v2 is chosen for its balance between performance and efficiency
2. **FAISS Vector Database**: Enables fast similarity search at scale

Alternative embedding models to consider based on performance needs:
  - **Small models**: faster but less accurate (e.g., all-MiniLM-L6-v2)
  - **Large models**: more accurate but slower (e.g., all-mpnet-base-v2)
  - **Domain-specific models**: tuned for specific content types

In [10]:
from langchain.vectorstores import FAISS
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

embeddings_model_name = "text-embedding-3-small"

embeddings = OpenAIEmbeddings(
    model=embeddings_model_name,
    openai_api_key=userdata.get('OPENAI_API_KEY'),
)

vectorstore = FAISS.from_documents(chunks, embeddings)

## **STEP 4:** LLM SELECTION AND RAG CHAIN CONFIGURATION

In this step we:
1. Configure the LLM that will generate responses (OpenAI in this case)
2. Set up the RAG retrieval chain that connects:
    - The vector store (retrieval component)
    - The LLM (generation component)

### Important LLM Parameters:
- **temperature=1.3**: Relatively high setting that encourages creative responses (lower values like 0.3 would produce more deterministic and factual results)
- **max_tokens=500**: Limits the length of the response

### Consider experimenting with:
- Different retrieval strategies (e.g., MMR, SelfQueryRetriever)
- Chain types (stuff, map_reduce, refine, map_rerank)
- Search parameters (k, fetch_k, lambda_mult)

In [11]:
from langchain.chains import RetrievalQA

model_name = "gpt-3.5-turbo"
api_key = userdata.get('OPENAI_API_KEY')
api_base = userdata.get('OPENAI_API_BASE')
temperature = 1.3
max_tokens = 500

def get_openai_llm():
    return ChatOpenAI(
        model=model_name,
        openai_api_key=api_key,
        openai_api_base=api_base,
        temperature=temperature,
        max_tokens=max_tokens,
    )

llm = get_openai_llm()

qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore.as_retriever())

## **STEP 5:** QUERY EXECUTION AND RESPONSE GENERATION

In this final step, we:
1. Take a user query
2. Retrieve relevant context from our vector store
3. Generate a response using the LLM enriched with the retrieved context

This is where the magic of RAG happens - the LLM's response is now grounded in the specific content of your document, rather than just its general pre-training.

### Production Considerations:
- Add evaluation metrics to measure relevance and accuracy
- Implement user feedback loops to improve retrieval quality
- Add caching for frequent queries
- Monitor token usage and response times
- Implement streaming responses for better user experience

In [13]:
query = "What are the goals of the project?"
response = qa_chain.invoke(query)
print(response["result"])

{'query': 'What are the goals of the project?', 'result': 'The goals of the project involve developing two specific binaries: a neural network generator and a chessboard analyzer. The neural network generator must create a new neural network based on a configuration file, while the chessboard analyzer can operate in training or evaluation modes. These binaries must utilize a machine-learning-based solution trained through supervised learning. Additionally, thorough documentation, including benchmarks, should be provided. The project also involves optimization strategies to enhance the learning process, such as avoiding overfitting and optimizing hyperparameters. Special emphasis is placed on maintaining professional documentation, providing the necessary files for replicating the network training, and delivering a pre-trained neural network named starting with "my_torch_network."'}
