# Retrieval-Augmented Generation (RAG) Application using LangChain, FAISS, and OpenAI

This notebook implements a RAG pipeline using LangChain, FAISS for vector search, and OpenAI's GPT models. The goal is to allow large language models to augment their generation process by retrieving relevant context from a custom document store.

---

## 🔧 Architecture Overview

The architecture follows a three-phase pipeline:

### 1. Ingestion
- **Objective**: Process and convert raw documents into vector embeddings.
- **Tools**: LangChain `DocumentLoader`, `TextSplitter`, OpenAI Embeddings.
- **Steps**:
  - Load data (PDF, text, web, etc.)
  - Split documents into manageable chunks
  - Embed chunks using OpenAI embeddings
  - Store embeddings in a FAISS vector store

### 2. Retrieval
- **Objective**: Fetch relevant document chunks based on user query.
- **Tools**: FAISS similarity search
- **Steps**:
  - Convert query into an embedding
  - Perform nearest-neighbor search in FAISS index
  - Return top-k relevant chunks

### 3. Generation
- **Objective**: Generate accurate and context-aware answers.
- **Tools**: LangChain’s `RetrievalQA`, OpenAI LLM
- **Steps**:
  - Feed retrieved documents + query to OpenAI model
  - Get final answer using `stuff` or `map-reduce` chain type

---

📌 This notebook showcases the modular design, enabling easy switching between different vector databases, LLMs, and prompt strategies.


**Installing the necessary libraries**

In [1]:
!pip install langchain openai tiktoken rapidocr-onnxruntime

Collecting langchain
  Downloading langchain-0.3.25-py3-none-any.whl.metadata (7.8 kB)
Collecting rapidocr-onnxruntime
  Downloading rapidocr_onnxruntime-1.4.4-py3-none-any.whl.metadata (1.3 kB)
Collecting langchain-core<1.0.0,>=0.3.58 (from langchain)
  Downloading langchain_core-0.3.64-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain)
  Downloading langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Collecting langsmith<0.4,>=0.1.17 (from langchain)
  Downloading langsmith-0.3.45-py3-none-any.whl.metadata (15 kB)
Collecting pyclipper>=1.2.0 (from rapidocr-onnxruntime)
  Downloading pyclipper-1.3.0.post6-cp312-cp312-macosx_10_13_universal2.whl.metadata (9.0 kB)
Collecting Shapely!=2.0.4,>=1.7.1 (from rapidocr-onnxruntime)
  Downloading shapely-2.1.1-cp312-cp312-macosx_11_0_arm64.whl.metadata (6.8 kB)
Collecting onnxruntime>=1.7.0 (from rapidocr-onnxruntime)
  Downloading onnxruntime-1.22.0-cp312-cp312-macosx_13_0_univer

In [None]:
OPENAI_API_KEY = 