# Multi-Source Information Retrieval Project
This notebook demonstrates how to build a multi-source information retrieval system that processes documents from various sources and provides relevant answers to user queries. We'll be leveraging different document sources like Wikipedia and Arxiv, as well as vector-based similarity searches.

### Step 1: Setting Up the Environment
We'll start by loading the necessary libraries and modules for document retrieval and processing. Additionally, we'll load environment variables for API keys.

In [1]:
from dotenv import load_dotenv
import os
from langchain_openai import ChatOpenAI
from langchain_community.utilities import WikipediaAPIWrapper, ArxivAPIWrapper
from langchain_community.tools import WikipediaQueryRun, ArxivQueryRun
from langchain.tools.retriever import create_retriever_tool
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain import hub

# Load environment variables
load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")

### Step 2: Setting Up Retrieval Tools for Wikipedia and Arxiv
We will now set up tools to retrieve information from Wikipedia and Arxiv. These tools help us fetch specific document segments from these sources based on user queries.

In [2]:
# Wikipedia tool
wiki_api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=200)
wiki_tool = WikipediaQueryRun(api_wrapper=wiki_api_wrapper)

# Arxiv tool
arxiv_api_wrapper = ArxivAPIWrapper(top_k_results=1, doc_content_chars_max=100)
arxiv_tool = ArxivQueryRun(api_wrapper=arxiv_api_wrapper)

### Step 3: Loading Documents from a Website and Setting Up a Searchable Index
We will load documents from a website and split them into smaller sections for processing. Using vector-based similarity searches, we will then create an index that allows us to retrieve relevant documents based on a user's query.

In [3]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Load documents from the web // i have used langsmith docs but you can you use any other web page 
loader = WebBaseLoader("https://docs.smith.langchain.com/")
docs = loader.load()

# Split documents into chunks for better processing
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = splitter.split_documents(docs)

# Create a vector-based searchable index using FAISS
vectordb = FAISS.from_documents(documents, OpenAIEmbeddings())
retriever = vectordb.as_retriever()

### Step 4: Setting Up Tools for LangSmith Search
We'll now configure a custom search tool that will enable us to retrieve documents related to LangSmith, an important topic in our project.

In [4]:
# Create a custom search tool for LangSmith
retriever_tool = create_retriever_tool(retriever, "langsmith_search", "Search for information about LangSmith")

### Step 5: Setting Up the Main System and Agent
We will combine all the tools into an agent that can retrieve and process information. The agent is capable of querying multiple sources and delivering relevant results.

In [5]:
# Combine all tools into an agent
tools = [wiki_tool, retriever_tool, arxiv_tool]

# Load a pre-built prompt from the hub
prompt = hub.pull("hwchase17/openai-functions-agent")

# Create the agent
agent = create_openai_tools_agent(LLM, tools, prompt)

### Step 6: Querying the System
We can now use the agent to query the system. It will retrieve relevant information based on the user's input, drawing from the various sources we configured.

In [6]:
from langchain.agents import AgentExecutor

# Execute queries with the agent
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
response = agent_executor.invoke({"input": "Tell me about LangSmith"})
response

### Step 7: Additional Queries
We can now ask additional questions to the system, allowing it to retrieve and display information from various sources based on the queries.

In [7]:
# Query the system with other topics
response_2 = agent_executor.invoke({"input": "Tell me about the latest research on data science."})
response_2

### Conclusion
This project demonstrates how we can build a powerful information retrieval system that processes documents from multiple sources and provides relevant responses to user queries. By utilizing document processing techniques and vector-based similarity search, we can ensure accurate and context-aware results.