# ***INTELLIGENT CHATBOT SYSTEM 🧠 BASED ON RAG FOR KNOWLEDGE EXTRACTION FROM PDF DOCUMENTS***

## **STEP 1:** ENVIRONMENT SETUP

In this section, we install all the necessary dependencies for our RAG implementation:
 - **langchain**: Framework for developing applications with Large Language Models
 - **faiss-cpu**: Facebook AI Similarity Search for efficient vector similarity search
 - **openai**: For interacting with OpenAI models (or compatible APIs)
 - **langchain integrations**: To connect with various model providers
 - **sentence-transformers**: For generating embeddings from text
 - **pypdf**: For processing and extracting content from PDF documents

In [None]:
!pip install langchain faiss-cpu openai langchain_openai langchain_community langchain_huggingface sentence-transformers huggingface_hub pypdf

Collecting langchain_huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Downloading langchain_huggingface-0.1.2-py3-none-any.whl (21 kB)
Installing collected packages: langchain_huggingface
Successfully installed langchain_huggingface-0.1.2


### Authentication with Hugging Face

Connect to Hugging Face Hub to access embedding models and other AI resources. Replace the placeholder with your actual HF token.

In [None]:
from huggingface_hub import login

HF_TOKEN = "****************************"
login(token=HF_TOKEN)

## **STEP 2:** DATA INGESTION AND PREPROCESSING

This is a critical step in any RAG system where we:
1. Load the source documents (PDFs in this case)
2. Split the document into manageable chunks

The chunk size (10000 characters) and overlap (200 characters) are important parameters:
- **Larger chunks** preserve more context but can lead to less relevant retrievals
- **Chunk overlap** helps maintain continuity between chunks
- These parameters should be adjusted based on document type and query patterns

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader("B-CNA-500-my_torch.pdf")
pages = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=200)
chunks = splitter.split_documents(pages)

## **STEP 3:** VECTOR STORE CREATION

This section transforms document chunks into vector embeddings and stores them for efficient similarity search. Key components:

1. **Embedding Model**: all-MiniLM-L6-v2 is chosen for its balance between performance and efficiency
2. **FAISS Vector Database**: Enables fast similarity search at scale

Alternative embedding models to consider based on performance needs:
  - **Small models**: faster but less accurate (e.g., all-MiniLM-L6-v2)
  - **Large models**: more accurate but slower (e.g., all-mpnet-base-v2)
  - **Domain-specific models**: tuned for specific content types

In [None]:
from langchain.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI

embeddings = HuggingFaceEmbeddings(
        model_name="all-MiniLM-L6-v2",
        model_kwargs={"device": "cpu"},
        encode_kwargs={"normalize_embeddings": True}
)
vectorstore = FAISS.from_documents(chunks, embeddings)

## **STEP 4:** LLM SELECTION AND RAG CHAIN CONFIGURATION

In this step we:
1. Configure the LLM that will generate responses (DeepSeek in this case)
2. Set up the RAG retrieval chain that connects:
    - The vector store (retrieval component)
    - The LLM (generation component)

### Important LLM Parameters:
- **temperature=1.3**: Relatively high setting that encourages creative responses (lower values like 0.3 would produce more deterministic and factual results)
- **max_tokens=500**: Limits the length of the response

### Consider experimenting with:
- Different retrieval strategies (e.g., MMR, SelfQueryRetriever)
- Chain types (stuff, map_reduce, refine, map_rerank)
- Search parameters (k, fetch_k, lambda_mult)

In [None]:
from langchain.chains import RetrievalQA

# Configuration du LLM (DeepSeek dans ce cas)
DEEPSEEK_API_KEY = "sk-*************************"
DEEPSEEK_API_BASE = "https://api.deepseek.com/v1"

def get_deepseek_llm():
    return ChatOpenAI(
        model="deepseek-chat",
        openai_api_key=DEEPSEEK_API_KEY,
        openai_api_base=DEEPSEEK_API_BASE,
        temperature=1.3,
        max_tokens=500
    )

llm = get_deepseek_llm()

# Création du pipeline RAG utilisant la chaîne de questions-réponses avec récupération
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore.as_retriever())

## **STEP 5:** QUERY EXECUTION AND RESPONSE GENERATION

In this final step, we:
1. Take a user query
2. Retrieve relevant context from our vector store
3. Generate a response using the LLM enriched with the retrieved context

This is where the magic of RAG happens - the LLM's response is now grounded in the specific content of your document, rather than just its general pre-training.

### Production Considerations:
- Add evaluation metrics to measure relevance and accuracy
- Implement user feedback loops to improve retrieval quality
- Add caching for frequent queries
- Monitor token usage and response times
- Implement streaming responses for better user experience

In [None]:
query = "What are the goals of the project?"
response = qa_chain.run(query)
print(response)

The goals of the project are to deliver two binaries:

1. **Neural Network Generator**:  
   - Generates a new neural network from a configuration file.  
   - Must be implemented from scratch (libraries like PyTorch or TensorFlow are **not** allowed).  

2. **Chessboard Analyzer**:  
   - Can be launched in **training mode** (to train the neural network) or **evaluation mode** (to analyze chessboards).  
   - Must use **supervised learning** for training.  
   - Requires a pre-trained neural network (named `my_torch_network*`).  

### Additional Requirements:  
- Provide **documentation** (README, benchmarks, justification of design choices).  
- Keep all **scripts and training datasets** used for reproducibility.  
- Error messages must be written to **stderr**, and the program should exit with code **84** on errors (**0** if successful).  

### Bonus Options (Optional Enhancements):  
- Optimize training speed using **parallel computing** (multithreading, GPGPU, etc.).  
- Display *