In [None]:
"""
This script retrieves relevant Wikipedia content for a given query using open-source tools.
It uses LangChain's community retriever module that interfaces with the Wikipedia API.
The script is fully open-source and runs entirely on the local CPU without any API keys.
It first installs the required packages: langchain, langchain-community, and wikipedia.
Then, it initializes the WikipediaRetriever to fetch the top 2 relevant articles in English.
A query about the geopolitical history of India and Pakistan is defined.
The retriever invokes the query and returns a list of matching Wikipedia documents.
Each document's content is then printed (truncated to 1000 characters for readability).
This code is useful for building question-answering or summarization apps using public data.
It serves as a lightweight alternative to using commercial LLM APIs for simple retrieval tasks.
"""

In [1]:
!pip install langchain langchain-community wikipedia

Collecting langchain
  Using cached langchain-0.3.26-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain-community
  Using cached langchain_community-0.3.26-py3-none-any.whl.metadata (2.9 kB)
Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting langchain-core<1.0.0,>=0.3.66 (from langchain)
  Using cached langchain_core-0.3.66-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain)
  Using cached langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Collecting langsmith>=0.1.17 (from langchain)
  Using cached langsmith-0.4.1-py3-none-any.whl.metadata (15 kB)
Collecting pydantic<3.0.0,>=2.7.4 (from langchain)
  Using cached pydantic-2.11.7-py3-none-any.whl.metadata (67 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Using cached 

In [4]:
from langchain_community.retrievers import WikipediaRetriever

# Initialize the Wikipedia retriever
retriever = WikipediaRetriever(top_k_results=2, lang="en")

In [5]:
# Define your query
query = "the geopolitical history of india and pakistan from the perspective of a chinese"

In [6]:
# Retrieve documents from Wikipedia
docs = retriever.invoke(query)

In [7]:
# Display the results
# Show first 1000 chars
for i, doc in enumerate(docs):
    print(f"\n--- Result {i+1} ---")
    print(f"Content:\n{doc.page_content[:1000]}...")


--- Result 1 ---
Content:
This article is about territorial disputes of the People's Republic of China (PRC). A territorial dispute is a disagreement over the possession or control of land between two or more political entities. Many of China's territorial disputes result from the historical consequences of colonialism in Asia and the lack of clear historical boundary demarcations. Many of these disputes are almost identical to those that the Republic of China (ROC) based in Taipei, also known as Taiwan, has with other countries. Therefore, many of the subsequent resolved disputes made by the PRC after 1949 with other governments may not be recognized by the ROC.


== Current disputes ==
Many of China's territorial disputes result from the historical consequences of colonialism in Asia and the lack of clear historical boundary demarcations.: 251 
China's claims to disputed maritime territories date from prior to the founding of the People's Republic of China.: 197 


=== Bhutan ===

B