<a href="https://colab.research.google.com/github/Pavun-KumarCH/AI-Enhanced-RAG-System-for-Automated-University-Course-Content-Generation/blob/main/Multi_Search_Agentic_RAG_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi-Search Agentic-RAG System
### A system leveraging multiple knowledge retrieval tools for enhanced query processing.

### Framework: LangChain  
### LLM: OpenAI Models

---

## 1. Tools Used

### 1.1. Wiki
**Medium:** Wikipedia API or LangChain Wiki API integration  
Wiki is used to retrieve factual and broad knowledge across a variety of domains. It is especially effective for general knowledge queries or historical data, where a detailed explanation or background information is necessary.

### 1.2. arxiv
**Medium:** arxiv API for research paper retrieval  
arxiv provides access to scientific papers, useful for academic queries, technical topics, or research-related requests. This allows the system to respond with cutting-edge research or deep technical insights from recent publications.

### 1.3. Retriever
**Medium:** LangChain's Retriever Components  
The retriever serves as a tool for fetching relevant documents based on the embeddings of the query. It operates across multiple sources (Wiki, arxiv, etc.) and ensures that the LLM has the most relevant context for generating responses.

---

## 2. LLM and LangChain Integration

### 2.1. LLM: OpenAI Models
**Medium:** OpenAI GPT-4 and fine-tuned variations via LangChain  
The LLM generates responses based on the documents retrieved by the retriever. It can summarize, interpret, and produce detailed answers, factoring in the specific knowledge drawn from the sources (Wiki, arxiv) used by the retriever.

### 2.2. LangChain Framework
**Medium:** Chains and Agents for multi-modal data flow  
LangChain orchestrates the system’s flow, handling how user queries are split between different tools. It manages which tool (Wiki, arxiv, etc.) is used based on the nature of the question and ensures that the retrieved data is processed properly by the LLM.

- **Agentic Behavior:**  
  Dynamic agents enable the system to act autonomously, selecting the best retriever based on context and intelligently processing multi-tool requests.

---

## 3. Future Enhancements

### 3.1. Integration of Additional Data Sources
**Medium:** APIs or Web Scraping via LangChain  
Potential additions include sources like Google Scholar, GitHub, or other specialized databases that can enhance the system’s versatility across domains.

### 3.2. Fine-tuning for Specific Tasks
**Medium:** Custom fine-tuning of the LLM  
Fine-tuned models could be introduced to optimize the system for niche tasks, such as legal research or medical information retrieval.

# Requirements

In [None]:
!pip install --q arxiv wikipedia langchain_community langchain_openai faiss-cpu

In [None]:
#@title Load Dependencies
import os
from langchain import hub
from IPython.display import display, Markdown

from langchain_community.tools import WikipediaQueryRun, ArxivQueryRun
from langchain_community.utilities import WikipediaAPIWrapper, ArxivAPIWrapper
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS

from langchain.tools.retriever import create_retriever_tool

from langchain.agents import create_openai_tools_agent, AgentExecutor

# Load Environment Variable
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get("OPENAI_API_KEY")

# Tools

In [None]:
#@title Wikipedia Tool
api_wrapper = WikipediaAPIWrapper(top_k_results = 1, doc_content_chars_max = 200)

wiki_tool = WikipediaQueryRun(api_wrapper=api_wrapper)

wiki_tool.name

In [None]:
#@title Arxiv Tool
# the website were all the research papers are being uploaded

arxiv_api_wrapper = ArxivAPIWrapper()
arxiv_tool = ArxivQueryRun(api_wrapper = arxiv_api_wrapper)

arxiv_tool.name

In [None]:
# Retriever Vectorstore

loader = WebBaseLoader("https://docs.smith.langchain.com/")
docs = loader.load()
documents = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200).split_documents(docs)
vector_db = FAISS.from_documents(documents, OpenAIEmbeddings())

retriever = vector_db.as_retriever()

retriever

In [None]:
#@title Retriever Tool
retrival_tool = create_retriever_tool(retriever, "langsmith_search", "Search for information about LangSmith. For any questions about LangSmith, you must use this tool!")

retrival_tool.name

In [None]:
tools =  [wiki_tool, arxiv_tool, retrival_tool]

# Agents

In [None]:
#@ Agent

# Initialize the LLM Model
llm = ChatOpenAI(
    model = "gpt-3.5-turbo-0125",
    temperature = 0.3,
    top_p = 0.8,
    max_tokens = 400,
)

# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-functions-agent")

# Intializing Agent
Agent = create_openai_tools_agent(llm = llm,
                                  tools = tools,
                                  prompt = prompt)

## Agent Executor

In [None]:
# Executor
agent_executor = AgentExecutor(agent = Agent, tools = tools, verbose = True)

# Response's

In [None]:
#@title Retriver Tool Invoke
question = {"input" : "Tell me About LangSmith and LangSmith Graph Agents"}
response = agent_executor.invoke(question)

# Render using Markdown
Markdown(response['output'])

In [None]:
#@title Wikipedia Tool Invoke
question = {"input" : "Tell me About GAN"}
response = agent_executor.invoke(question)

# Render using Markdown
Markdown(response['output'])

In [None]:
#@title Arxiv Tool Invoke
question = {"input" : "What's the paper 1605.08386 about?"}
response = agent_executor.invoke(question)

# Render using Markdown
Markdown(response['output'])