# Project: Building a RAG chatbot using python, Lancgchain and a pre-defined knowledge-base

# 1. Retrieval-Augmented Generation (RAG)

## What is RAG?
One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as Retrieval Augmented Generation,

**This tutorial will show how to build a simple Q&A application over a text data source**
### Overview:
A typical RAG application has two main components:

### Indexing

1. **Load**:
   - First, we need to load our data. This is done using **Document Loaders**.
   - Document loaders handle various file formats (e.g., text files, PDFs, web pages) and convert them into a standardized format for processing.

2. **Split**:
   - Text splitters break large documents into smaller chunks.
   - This is useful for both indexing data and passing it into a model, as:
     - Large chunks are harder to search over.
     - They may not fit in a model's finite context window.

3. **Store**:
   - We need a place to store and index our splits so they can be searched over later.
   - This is often done using a **VectorStore** and an **Embeddings model**.
   - The embeddings model converts text into numerical vectors, and the vector store allows for efficient similarity search.

### Retrieval and generation

1. **Retrieve**:
   - Given a user input, relevant splits are retrieved from storage using a **Retriever**.

2. **Generate**:
   - A **ChatModel** / **LLM** produces an answer using a prompt that includes both the question and the retrieved data.

# 2. Building the Bot

### 1. Installing required packages

for this tutorial we need the following python packages:


1. **langchain-text-splitters**: A library for splitting text into chunks (e.g., for RAG pipelines).
2. **langchain-community**: A collection of community-contributed tools and integrations for LangChain.
3. **langchain[mistralai]**: A wrapper that connects our application with **open-source** and **free** **ChatModel**.
4. **langchain-mistralai**: A library for integrating Mistral AI models with LangChain.
5. **langchain-core**: Provides core functionalities for LangChain, including an in-memory vector store.

In [1]:
# This command installs or upgrades specific Python packages using pip.
# install: The pip subcommand used to install Python packages.

# --quiet or -q: Suppresses unnecessary output during installation, making the process less verbose.
# --upgrade or -U: Upgrades the specified packages to their latest versions if they are already installed.

!pip install --quiet --upgrade langchain-text-splitters langchain-community 
!pip install -qU "langchain[mistralai]"
!pip install -qU langchain-mistralai
!pip install -qU langchain-core