## Autonomous Research Assistant (Agentic AI)
An AI agent that ingests research PDFs, retrieves relevant information, and generates structured summaries autonomously using LangChain and LangGraph.

In [1]:
import os
from dotenv import load_dotenv

# -------------------------------
# Load Environment Variables
# -------------------------------
load_dotenv()

env_vars = ["OPENAI_API_KEY", "TAVILY_API_KEY", "LANGSMITH_API_KEY", "LANGCHAIN_PROJECT"]

for var in env_vars:
    value = os.getenv(var)
    if value is None:
        print(f"‚ö†Ô∏è Warning: {var} not found in .env")
    else:
        os.environ[var] = value

In [2]:
# -------------------------------
# Load PDFs using PyMuPDFLoader
# -------------------------------

import os
from langchain_community.document_loaders import PyMuPDFLoader

# Path to your PDF folder
PDF_FOLDER = "docs/"

# List to store all loaded PDF pages
docs = []

# Loop through all PDF files in the folder
for file_name in os.listdir(PDF_FOLDER):
    if file_name.endswith(".pdf"):
        file_path = os.path.join(PDF_FOLDER, file_name)
        try:
            # Load PDF using PyMuPDFLoader
            loader = PyMuPDFLoader(file_path)
            loaded_docs = loader.load()  # each page is a separate Document
            docs.extend(loaded_docs)
            print(f"‚úÖ Loaded '{file_name}' successfully with {len(loaded_docs)} pages")
        except Exception as e:
            # Handle any error while loading a PDF
            print(f"‚ùå Error loading '{file_name}': {e}")

print(f"üìÑ Total pages loaded from PDFs: {len(docs)}")



  from .autonotebook import tqdm as notebook_tqdm


‚úÖ Loaded 'Agentic AI Frameworks Architectures Protocols and Design Challenges.pdf' successfully with 8 pages
‚úÖ Loaded 'Agentic Web  Weaving the Next Web with AI Agents.pdf' successfully with 76 pages
‚úÖ Loaded 'AI Agents vs. Agentic AI A Conceptual taxonomy, applications and challenges.pdf' successfully with 30 pages
‚úÖ Loaded 'AI in data science education experiences from the classroom.pdf' successfully with 6 pages
‚úÖ Loaded 'From AI for Science to Agentic Science.pdf' successfully with 74 pages
‚úÖ Loaded 'Small Language Models are the Future of Agentic AI.pdf' successfully with 17 pages
üìÑ Total pages loaded from PDFs: 211


In [3]:
# -------------------------------
# Split Text into Chunks
# -------------------------------
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Split documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)

split_docs = text_splitter.split_documents(docs)
print(f"üìÑ Total chunks after splitting: {len(split_docs)}")

üìÑ Total chunks after splitting: 1113


In [None]:
# -------------------------------
# 4. Create Embeddings + Vector Store (Chroma)
# -------------------------------
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma


embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
persist_directory = "./RAG_Pipeline_Test"

# Initialize ChromaDB With OpenAI Embeddings
vector_store = Chroma.from_documents(
    collection_name="research_docs",
    documents=split_docs,
    embedding=embeddings,
    persist_directory=persist_directory,
)

print(f"Vector store created with {vector_store._collection.count()} vectors")
print(f"Persisted to: {persist_directory}")



Vector store created with 1113 vectors
Persisted to: ./RAG_Pipeline_Test


## Hybrid Retrivers 
Combine 2 Retrivers Similarity + MMR

In [5]:
# 1. Semantic Search Retriever
semantic_retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}  # Retrieve top 5 similar documents
)


In [6]:
query = "Agentic AI Can Dangerous?"

# Retrieve relevant documents
relevant_docs = semantic_retriever.invoke(query)
print(f"Retrieved {len(relevant_docs)} relevant documents for the query: '{query}'")
print(relevant_docs)

Retrieved 5 relevant documents for the query: 'Agentic AI Can Dangerous?'
[Document(metadata={'author': 'Yingxuan Yang; Mulei Ma; Yuxuan Huang; Huacan Chai; Chenyu Gong; Haoran Geng; Yuanjian Zhou; Ying Wen; Meng Fang; Muhao Chen; Shangding Gu; Ming Jin; Costas Spanos; Yang Yang; Pieter Abbeel; Dawn Song; Weinan Zhang; Jun Wang', 'page': 68, 'moddate': '', 'trapped': '', 'total_pages': 76, 'creationdate': '', 'source': 'docs/Agentic Web  Weaving the Next Web with AI Agents.pdf', 'format': 'PDF 1.5', 'modDate': '', 'producer': 'pikepdf 8.15.1', 'creator': 'arXiv GenPDF (tex2pdf:)', 'keywords': '', 'title': 'Agentic Web: Weaving the Next Web with AI Agents', 'subject': '', 'creationDate': '', 'file_path': 'docs/Agentic Web  Weaving the Next Web with AI Agents.pdf'}, page_content='Agentic Web\nOWASP GenAI Security Project. Agentic AI Threats and Mitigations. https://genai.owasp.org/\nresource/agentic-ai-threats-and-mitigations/, April 2025. Accessed: 2025-07-03.\nAbby O‚ÄôNeill, Abdul Reh

In [7]:
# 2. Create MMR Retriever
mmr_retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 20}  # Retrieve top 5 diverse documents from top 20
)

In [8]:
query = "Agentic AI Can Dangerous?"

# Retrieve relevant documents
relevant_docs = mmr_retriever.invoke(query)
print(f"Retrieved {len(relevant_docs)} relevant documents for the query: '{query}'")
print(relevant_docs)

Retrieved 5 relevant documents for the query: 'Agentic AI Can Dangerous?'
[Document(metadata={'total_pages': 76, 'creationDate': '', 'moddate': '', 'keywords': '', 'file_path': 'docs/Agentic Web  Weaving the Next Web with AI Agents.pdf', 'producer': 'pikepdf 8.15.1', 'trapped': '', 'creator': 'arXiv GenPDF (tex2pdf:)', 'modDate': '', 'subject': '', 'source': 'docs/Agentic Web  Weaving the Next Web with AI Agents.pdf', 'creationdate': '', 'format': 'PDF 1.5', 'page': 68, 'author': 'Yingxuan Yang; Mulei Ma; Yuxuan Huang; Huacan Chai; Chenyu Gong; Haoran Geng; Yuanjian Zhou; Ying Wen; Meng Fang; Muhao Chen; Shangding Gu; Ming Jin; Costas Spanos; Yang Yang; Pieter Abbeel; Dawn Song; Weinan Zhang; Jun Wang', 'title': 'Agentic Web: Weaving the Next Web with AI Agents'}, page_content='Agentic Web\nOWASP GenAI Security Project. Agentic AI Threats and Mitigations. https://genai.owasp.org/\nresource/agentic-ai-threats-and-mitigations/, April 2025. Accessed: 2025-07-03.\nAbby O‚ÄôNeill, Abdul Reh

In [9]:
# Combine two retrievers (Semantic + MMR) into a single hybrid retriever

from langchain_classic.retrievers.ensemble import EnsembleRetriever

hybrid_retriever = EnsembleRetriever(
    retrievers=[semantic_retriever, mmr_retriever],
    weights=[0.7, 0.3]
)

hybrid_retriever

EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001CB47850C20>, search_kwargs={'k': 5}), VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001CB47850C20>, search_type='mmr', search_kwargs={'k': 5, 'fetch_k': 20})], weights=[0.7, 0.3])

In [10]:
query = "Agentic AI research"

result = hybrid_retriever.invoke(query)
for doc in result[:5]:
    print(doc.page_content[:500])  # first 500 chars

Agentic ai:
Autonomous in-
telligence for complex goals‚Äîa comprehensive survey.
IEEE Access, 13:18912‚Äì18936, 2025.
doi:10.1109/ACCESS.2025.3532853.
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman,
Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al.
Gpt-4 technical
report. arXiv preprint arXiv:2303.08774, 2023.
Adobe. Our vision for accelerating creativity and productivity with agentic ai. Adobe Blog, 2025.
URL https://blog.adobe.com
both AI agents and agentic AI paradigms. Application domains enabled by AI Agents such as customer 
support, scheduling, and data summarization are then contrasted with Agentic AI deployments in research 
automation, robotic coordination, and medical decision support. We further examine unique challenges in 
each paradigm including hallucination, brittleness, emergent behavior, and coordination failure, and propose 
targeted solutions such as ReAct loops, retrieval-augmented generation (RAG

## Query Enhancement

# Advantages
1. Improved Retrieval Accuracy
2. Better Handling of Complex Queries
3. Scalable & Modular

# Disadvantages / Limitations
1. Increased Latency
2. Price Increase
3. Maintenance Complexity

In [11]:
# LLM Model for Query Enhancement
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4", temperature=0.5)
llm


ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001CB4BE182F0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001CB4BE18830>, root_client=<openai.OpenAI object at 0x000001CB48EF9450>, root_async_client=<openai.AsyncOpenAI object at 0x000001CB48EF8910>, model_name='gpt-4', temperature=0.5, model_kwargs={}, openai_api_key=SecretStr('**********'), stream_usage=True)

In [12]:
# **********
# Query expansion
# **********
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Define prompt template for query expansion
query_expansion_prompt = PromptTemplate.from_template(
"""
Expand the following query with relevant synonyms, technical terms, and related context for better research retrieval.

Original query: "{query}"

Expanded query:
"""
)

query_expansion_chain = query_expansion_prompt | llm | StrOutputParser()
query_expansion_chain

PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='\nExpand the following query with relevant synonyms, technical terms, and related context for better research retrieval.\n\nOriginal query: "{query}"\n\nExpanded query:\n')
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001CB4BE182F0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001CB4BE18830>, root_client=<openai.OpenAI object at 0x000001CB48EF9450>, root_async_client=<openai.AsyncOpenAI object at 0x000001CB48EF8910>, model_name='gpt-4', temperature=0.5, model_kwargs={}, openai_api_key=SecretStr('**********'), stream_usage=True)
| StrOutputParser()

In [13]:
# Example Test query expansion
query = {"query": "Agentic AI research"}

expamnsion_result = query_expansion_chain.invoke(query)
expamnsion_result

'"Research on agentic artificial intelligence, studies on autonomous AI, investigation on self-governing artificial intelligence systems, exploration of independent AI technology, research papers on AI agency, scholarly articles on decision-making artificial intelligence, technical reports on autonomous machine learning, research on AI with decision-making capabilities, exploration of AI autonomy, studies on AI self-regulation, investigation on AI self-management, research on AI self-control, AI agency research, autonomous systems in artificial intelligence research, self-directed AI studies"'

In [14]:
# **********
# Query Decomposition
# **********

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Define query decomposition prompt
query_decomposition_prompt = PromptTemplate.from_template(
"""
Decompose the following complex research query into simpler, specific sub-queries.
Each sub-query should focus on one key aspect or question that helps answer the main query.

Main query: "{query}"

Decomposed sub-queries:
1.
"""
)

# Create chain
query_decomposition_chain = query_decomposition_prompt | llm | StrOutputParser()
query_decomposition_chain

PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='\nDecompose the following complex research query into simpler, specific sub-queries.\nEach sub-query should focus on one key aspect or question that helps answer the main query.\n\nMain query: "{query}"\n\nDecomposed sub-queries:\n1.\n')
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001CB4BE182F0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001CB4BE18830>, root_client=<openai.OpenAI object at 0x000001CB48EF9450>, root_async_client=<openai.AsyncOpenAI object at 0x000001CB48EF8910>, model_name='gpt-4', temperature=0.5, model_kwargs={}, openai_api_key=SecretStr('**********'), stream_usage=True)
| StrOutputParser()

In [15]:
query = {"query": expamnsion_result}

decomposition_result = query_decomposition_chain.invoke(query)
decomposition_result


'"What is agentic artificial intelligence?"\n2. "What are the key studies on autonomous AI?"\n3. "What is meant by self-governing artificial intelligence systems?"\n4. "What is independent AI technology and how does it work?"\n5. "What are the most influential research papers on AI agency?"\n6. "What are the key findings from scholarly articles on decision-making artificial intelligence?"\n7. "What do technical reports say about autonomous machine learning?"\n8. "What research has been conducted on AI with decision-making capabilities?"\n9. "What does exploration of AI autonomy entail?"\n10. "What does self-regulation mean in the context of AI?"\n11. "How does AI self-management work?"\n12. "What research has been done on AI self-control?"\n13. "What are the main topics covered in AI agency research?"\n14. "What is the role of autonomous systems in artificial intelligence research?"\n15. "What are the key findings from studies on self-directed AI?"'

In [16]:
# **********
# HyDE (Hypothetical Document Expansion)
# **********

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

# HyDE prompt: generate a hypothetical research-style paragraph
hyde_prompt = PromptTemplate.from_template(
"""
Write a short, factual research-style passage that could answer the following query.
Avoid general opinions‚Äîfocus on informative, academic-style content.

Query: "{query}"

Hypothetical document:
"""
)

# Create HyDE chain
hyde_chain = hyde_prompt | llm | StrOutputParser()
hyde_chain

PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='\nWrite a short, factual research-style passage that could answer the following query.\nAvoid general opinions‚Äîfocus on informative, academic-style content.\n\nQuery: "{query}"\n\nHypothetical document:\n')
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001CB4BE182F0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001CB4BE18830>, root_client=<openai.OpenAI object at 0x000001CB48EF9450>, root_async_client=<openai.AsyncOpenAI object at 0x000001CB48EF8910>, model_name='gpt-4', temperature=0.5, model_kwargs={}, openai_api_key=SecretStr('**********'), stream_usage=True)
| StrOutputParser()

In [17]:
query = {"query": decomposition_result}

hyde_result = hyde_chain.invoke(query)
hyde_result

'"Agentic Artificial Intelligence (AI) refers to the technology that is capable of making decisions and taking actions independently, based on the information it has been programmed to process. This type of AI can be self-governing, meaning it operates without continuous human intervention. It is also referred to as autonomous or independent AI.\n\nKey studies on autonomous AI include \'Artificial Intelligence ‚Äî The Revolution Hasn‚Äôt Happened Yet\' by Michael Jordan, which discusses the limitations and potential of autonomous AI, and \'Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo\' by Mullainathan and Spiess, which explores the economic implications of autonomous AI systems.\n\nSelf-governing artificial intelligence systems are those that can operate and make decisions independently of human control. This is achieved through complex algorithms and machine learning techniques that allow the AI to learn and adapt to new

In [18]:
# *********
# RAG (Retrieval Augmented Generation) chain
# *********

from langchain_classic.chains.combine_documents import create_stuff_documents_chain

# RAG answering prompt
answer_prompt = PromptTemplate.from_template(
"""
You are an expert research assistant specializing in advanced AI systems.
Write a detailed, well-structured, and factual research-style answer to the user's query,
using only the context provided below. Cite evidence and include critical reasoning.

--- Context Start ---
{context}
--- Context End ---

Question: {input}

Your response must follow this structure:

**Title:** A concise headline summarizing the topic  
**Abstract:** A 2‚Äì3 sentence overview of your findings  
**Background:** Explain key concepts relevant to the query  
**Analysis:** Provide detailed reasoning and synthesis using context evidence  
**Implications / Risks:** Highlight potential challenges, dangers, or applications  
**Conclusion:** Summarize insights concisely  
**References:** Mention any context sources, paper titles, or sections used (if available)

Use a formal, academic tone. Avoid speculative opinions not supported by context.
Ensure clarity, accuracy, and depth suitable for a research report.
"""
)

document_chain = create_stuff_documents_chain(
    llm=llm,
    prompt=answer_prompt
)

document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template="\nYou are an expert research assistant specializing in advanced AI systems.\nWrite a detailed, well-structured, and factual research-style answer to the user's query,\nusing only the context provided below. Cite evidence and include critical reasoning.\n\n--- Context Start ---\n{context}\n--- Context End ---\n\nQuestion: {input}\n\nYour response must follow this structure:\n\n**Title:** A concise headline summarizing the topic  \n**Abstract:** A 2‚Äì3 sentence overview of your findings  \n**Background:** Explain key concepts relevant to the query  \n**Analysis:** Provide detailed reasoning and synthesis using context evidence  \n**Implications / Risks:** Highlight potential challenges, dangers, or applicati

In [19]:
# **********
# Combine all together: Final RAG Chain
# **********

from langchain_core.runnables import RunnableLambda, RunnablePassthrough

enhance_chain = (
    RunnablePassthrough()
    | RunnableLambda(lambda x: {"query": x})
    | query_expansion_chain
    | RunnableLambda(lambda x: {"query": x})
    | query_decomposition_chain
    | RunnableLambda(lambda x: {"query": x})
    | hyde_chain
)

enhance_chain

RunnablePassthrough()
| RunnableLambda(...)
| PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='\nExpand the following query with relevant synonyms, technical terms, and related context for better research retrieval.\n\nOriginal query: "{query}"\n\nExpanded query:\n')
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001CB4BE182F0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001CB4BE18830>, root_client=<openai.OpenAI object at 0x000001CB48EF9450>, root_async_client=<openai.AsyncOpenAI object at 0x000001CB48EF8910>, model_name='gpt-4', temperature=0.5, model_kwargs={}, openai_api_key=SecretStr('**********'), stream_usage=True)
| StrOutputParser()
| RunnableLambda(...)
| PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='\nDecompose the following complex research query into simpler, specific sub-queries.\nEach sub-quer

In [20]:
# *********
# RAG Pipeline combining query enhancement with document retrieval and answer generation
# *********

from langchain_core.runnables import RunnableParallel


rag_pipeline = (
    RunnableParallel(
        enhanced_query=enhance_chain,
        original_query=RunnablePassthrough()
    )
    | RunnableLambda(lambda x: {
        "context": hybrid_retriever.invoke(x["enhanced_query"]),
        "input": x["original_query"]
    })
    | document_chain
)

rag_pipeline

{
  enhanced_query: RunnablePassthrough()
                  | RunnableLambda(...)
                  | PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='\nExpand the following query with relevant synonyms, technical terms, and related context for better research retrieval.\n\nOriginal query: "{query}"\n\nExpanded query:\n')
                  | ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001CB4BE182F0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001CB4BE18830>, root_client=<openai.OpenAI object at 0x000001CB48EF9450>, root_async_client=<openai.AsyncOpenAI object at 0x000001CB48EF8910>, model_name='gpt-4', temperature=0.5, model_kwargs={}, openai_api_key=SecretStr('**********'), stream_usage=True)
                  | StrOutputParser()
                  | RunnableLambda(...)
                  | PromptTemplate(input_variables=['query'], input_types={}, par

In [21]:
# *********
# Example RAG query
# *********
query = "What is the future of AI?"
answer = rag_pipeline.invoke(query)

print("\nüßæ Final Answer:\n", answer)


üßæ Final Answer:
 **Title:** The Future of AI: The Emergence of Agentic AI and its Implications

**Abstract:** The future of AI is anticipated to be dominated by the emergence of Agentic AI, autonomous systems capable of proactive decision-making and execution. While offering potential for massive productivity and economic growth, this transition also carries significant disruption risks to labor markets and could exacerbate economic inequality.

**Background:** The future of AI is projected to evolve from generative AI, which responds to human prompts, to agentic AI, characterized by proactive, independent decision-making and execution (Acharya et al., 2025). Emerging protocols such as Protocol AI support agent-blockchain integration for decentralized tokenization of alternative assets and autonomous operations in decentralized finance (Protocol AI, 2025; Borjigin et al., 2025; Ante, 2024).

**Analysis:** The evolution toward agentic AI is expected to bring about a fundamental para

In [22]:
import pprint

pprint.pprint(answer)

('**Title:** The Future of AI: The Emergence of Agentic AI and its '
 'Implications\n'
 '\n'
 '**Abstract:** The future of AI is anticipated to be dominated by the '
 'emergence of Agentic AI, autonomous systems capable of proactive '
 'decision-making and execution. While offering potential for massive '
 'productivity and economic growth, this transition also carries significant '
 'disruption risks to labor markets and could exacerbate economic inequality.\n'
 '\n'
 '**Background:** The future of AI is projected to evolve from generative AI, '
 'which responds to human prompts, to agentic AI, characterized by proactive, '
 'independent decision-making and execution (Acharya et al., 2025). Emerging '
 'protocols such as Protocol AI support agent-blockchain integration for '
 'decentralized tokenization of alternative assets and autonomous operations '
 'in decentralized finance (Protocol AI, 2025; Borjigin et al., 2025; Ante, '
 '2024).\n'
 '\n'
 '**Analysis:** The evolution toward a