# Retrievers
Now that we understand the indexing pipeline, we can 
utilize our vector database to retrieve relevant documents
for a given query.

LangChain provides a uniform interface for interacting with different types of retrieval systems. The LangChain retriever interface is straightforward:

Input: A query (string)
Output: A list of documents (standardized LangChain Document objects)

https://python.langchain.com/docs/concepts/retrievers/

In [1]:
%pip install -qU langchain langchain_community wikipedia

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement wikipedia (from versions: none)
ERROR: No matching distribution found for wikipedia


In [2]:
# Wikipedia Retriever
from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever()
docs = retriever.invoke("France")

docs[0].page_content[:200]

ImportError: Could not import wikipedia python package. Please install it with `pip install wikipedia`.

In [None]:
%pip install -qU langchain-huggingface

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import InMemoryVectorStore

loader = WebBaseLoader("https://www.govinfo.gov/content/pkg/CDOC-110hdoc50/html/CDOC-110hdoc50.htm")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", " ", ""])
chunks = text_splitter.split_documents(documents)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = InMemoryVectorStore.from_documents(chunks, embeddings)

Note: you may need to restart the kernel to use updated packages.


In [None]:
# Using a Vector store as a retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})  # Get top 2 results
docs = retriever.invoke("How long can senators serve?") # Ask a question from the constitution

def print_docs(docs):
    for doc in docs:
        print(doc.page_content[:500])
        print("-"*100+"\n")

print_docs(docs)


Qualification for office. No religious              6    .......       3
 test shall ever be required as a.........
Qualifications of electors of members of            1          2       1
 the House of Representatives shall be the
 same as electors for the most numerous
 branch of the State legislature..........
Qualifications of electors of Senators             17          1  ......
 shall be the same as electors of the most
 numerous branch of the State legislature.
 [Amendments].............
----------------------------------------------------------------------------------------------------

Article [XVII.]

    The Senate of the United States shall be composed of two 
Senators from each State, elected by the people thereof, for 
six years; and each Senator shall have one vote. The electors 
in each State shall have the qualifications requisite for 
electors of the most numerous branch of the State legislatures.
    When vacancies happen in the representation of any State in 
the S