# 🔬 Multi-Agent RAG Research Assistant

An interactive and modular Retrieval-Augmented Generation (RAG) pipeline that combines PDF/text ingestion, FAISS vector retrieval, and LLM-based summarization using Together.ai’s LLaMA-4. Built with LangGraph agents and deployed via Gradio for real-time, user-customizable research summarization.


## ⚙️ Tech Stack

- **LangGraph** – Multi-agent state machine orchestration
- **FAISS** – Fast Approximate Nearest Neighbor vector store
- **Hugging Face Sentence Transformers** – Text embeddings (MiniLM)
- **Together.ai** – Hosted LLaMA-4 summarization
- **SerpAPI** – Google Search snippets
- **Gradio** – User-friendly interface


In [7]:
# Basic NLP and retrieval tools
!pip install -q sentence-transformers langchain faiss-cpu serpapi

In [5]:
# LLaMA model loader (this is the slow one that may need C++ compilation)
!pip install -q llama-cpp-python


  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone


In [6]:
!pip install -U langchain-community


Collecting langchain-community
  Downloading langchain_community-0.3.21-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-core<1.0.0,>=0.3.51 (from langchain-community)
  Downloading langchain_core-0.3.51-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain<1.0.0,>=0.3.23 (from langchain-community)
  Downloading langchain-0.3.23-py3-none-any.whl.metadata (7.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-

In [8]:
!pip install llama-cpp-python[server]


Collecting uvicorn>=0.22.0 (from llama-cpp-python[server])
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting fastapi>=0.100.0 (from llama-cpp-python[server])
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting sse-starlette>=1.6.1 (from llama-cpp-python[server])
  Downloading sse_starlette-2.2.1-py3-none-any.whl.metadata (7.8 kB)
Collecting starlette-context<0.4,>=0.3.6 (from llama-cpp-python[server])
  Downloading starlette_context-0.3.6-py3-none-any.whl.metadata (4.3 kB)
Collecting starlette<0.47.0,>=0.40.0 (from fastapi>=0.100.0->llama-cpp-python[server])
  Downloading starlette-0.46.1-py3-none-any.whl.metadata (6.2 kB)
Downloading fastapi-0.115.12-py3-none-any.whl (95 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading sse_starlette-2.2.1-py3-none-any.whl (10 kB)
Downloading starlette_context-0.3.6-py3-none-any.whl (12 kB)
Downloading uvicorn-0.34

In [9]:
!pip install together


Collecting together
  Downloading together-1.5.5-py3-none-any.whl.metadata (14 kB)
Collecting eval-type-backport<0.3.0,>=0.1.3 (from together)
  Downloading eval_type_backport-0.2.2-py3-none-any.whl.metadata (2.2 kB)
Downloading together-1.5.5-py3-none-any.whl (87 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.9/87.9 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading eval_type_backport-0.2.2-py3-none-any.whl (5.8 kB)
Installing collected packages: eval-type-backport, together
Successfully installed eval-type-backport-0.2.2 together-1.5.5


In [10]:
!pip install langgraph


Collecting langgraph
  Downloading langgraph-0.3.25-py3-none-any.whl.metadata (7.7 kB)
Collecting langgraph-checkpoint<3.0.0,>=2.0.10 (from langgraph)
  Downloading langgraph_checkpoint-2.0.24-py3-none-any.whl.metadata (4.6 kB)
Collecting langgraph-prebuilt<0.2,>=0.1.1 (from langgraph)
  Downloading langgraph_prebuilt-0.1.8-py3-none-any.whl.metadata (5.0 kB)
Collecting langgraph-sdk<0.2.0,>=0.1.42 (from langgraph)
  Downloading langgraph_sdk-0.1.61-py3-none-any.whl.metadata (1.8 kB)
Collecting xxhash<4.0.0,>=3.5.0 (from langgraph)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting ormsgpack<2.0.0,>=1.8.0 (from langgraph-checkpoint<3.0.0,>=2.0.10->langgraph)
  Downloading ormsgpack-1.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Downloading langgraph-0.3.25-py3-none-any.whl

In [11]:
!pip install grandalf


Collecting grandalf
  Downloading grandalf-0.8-py3-none-any.whl.metadata (1.7 kB)
Downloading grandalf-0.8-py3-none-any.whl (41 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.8/41.8 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: grandalf
Successfully installed grandalf-0.8


In [12]:
!pip install gradio
!python app.py


Collecting gradio
  Downloading gradio-5.23.3-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.8.0 (from gradio)
  Downloading gradio_client-1.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 (from gradio)
  Downloading safehttpx-0.1.6-py3-none-any.whl.metadata (4.2 kB)
Collecting semantic-version~=2.0 

In [13]:
import faiss
import numpy as np
import requests
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores.faiss import FAISS
from langchain.docstore.document import Document

## 🧪 Pipeline Flow

1. 🔍 **SearchAgent** – Uses SerpAPI to fetch top Google results
2. 📄 **PDFLoaderAgent** – Loads and splits uploaded PDFs into text
3. 🧠 **EmbedAgent** – Embeds all documents using MiniLM
4. 🔎 **RetrieveAgent** – Uses FAISS to find top-k relevant chunks
5. 📝 **SummarizeAgent** – Calls Together.ai LLaMA-4 with user prompt

The entire workflow is managed via LangGraph's stateful graph execution.


## 📥 How to Use

1. Enter your **research query** (e.g., "AutoML in 2024 (500-word summary)")
2. Optionally upload a **PDF file**
3. Enter your **SerpAPI** and **Together.ai API keys**
4. Click **Submit** and wait for the generated summary


In [14]:
def search_serpapi(query, api_key):
    params = {
        "engine": "google",
        "q": query,
        "api_key": api_key,
        "num": 20  # Ask for more results
    }
    response = requests.get("https://serpapi.com/search", params=params)
    data = response.json()
    results = []
    for item in data.get("organic_results", []):
        results.append(item.get("snippet", ""))
    return results

# -----------------------------
# 2. Query SerpAPI for NAS content
# -----------------------------
SERPAPI_KEY = "e63948d7aaba37cb53c7f84f9a8e8e2e2985470ad2bb00937fc2be10aabf13ad"
query = "NAS (Neural Architecture Search) Techniques in 2024"
snippets = search_serpapi(query, SERPAPI_KEY)

print("🔎 Retrieved snippets:")
for i, s in enumerate(snippets, 1):
    print(f"{i}. {s}")
print(f"\n✅ Total snippets retrieved: {len(snippets)}")


🔎 Retrieved snippets:
1. One famous implementation is Google's AutoML, where RL agents design neural networks for tasks like image classification.
2. This paper delves into the multifaceted aspects of NAS, elaborating on its recent advances, applications, tools, benchmarks and prospective research directions.
3. This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process ...
4. Neural architecture search (NAS) is an automated machine learning method that aims to find optimal model structures by searching the neural network architecture ...
5. Neural Architecture Search (NAS) is a technique in machine learning that automates the process of designing neural network architectures.
6. NAS with Reinforcement Learning. Reinforcement learning-based NAS methods frame the architecture search as a sequential decision-making process.
7. We propose a novel and effective method called Evolutionary Gradient-Based Neural

In [15]:
docs = [Document(page_content=s) for s in snippets if s.strip() != ""]

# Ensure we have enough to train IVF index (e.g., at least nlist)
if len(docs) < 5:
    raise ValueError("Not enough snippets to train IndexIVFFlat. Please retrieve more results.")


## 🔐 API Keys Required

- `SERPAPI_KEY` from [serpapi.com](https://serpapi.com/)
- `TOGETHER_API_KEY` from [platform.together.xyz](https://platform.together.xyz)

Keys can be provided in the Gradio UI or loaded securely in a `.env` file (if running locally).


In [16]:
from langchain.docstore import InMemoryDocstore
from langchain_community.vectorstores.faiss import FAISS
from langchain.docstore.document import Document
from langchain_community.embeddings import HuggingFaceEmbeddings
import faiss
import numpy as np

# 1. Documents & embeddings
docs = [Document(page_content=s) for s in snippets if s.strip() != ""]
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectors = embedding_model.embed_documents([doc.page_content for doc in docs])
vectors_np = np.array(vectors).astype("float32")

# 2. Build IVF index
d = vectors_np.shape[1]
nlist = 5
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
index.train(vectors_np)
index.add(vectors_np)

# 3. LangChain expects this mapping:
index_to_docstore_id = {i: str(i) for i in range(len(docs))}
docstore = InMemoryDocstore({str(i): doc for i, doc in enumerate(docs)})

# 4. Now construct FAISS object manually
vectorstore = FAISS(
    embedding_function=embedding_model.embed_query,
    index=index,
    docstore=docstore,
    index_to_docstore_id=index_to_docstore_id
)

# 5. Run similarity search
query_text = "recent advancements in neural architecture search"
results = vectorstore.similarity_search(query_text, k=3)

print("\n📄 Top relevant documents:")
for i, doc in enumerate(results, 1):
    print(f"\n--- Retrieved {i} ---\n{doc.page_content}")


  embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]




📄 Top relevant documents:

--- Retrieved 1 ---
We propose a novel and effective method called Evolutionary Gradient-Based Neural Architecture Search (EG-NAS).

--- Retrieved 2 ---
Neural Architecture Search (NAS) methods seek effective optimization toward performance metrics regarding model accuracy and generalization while facing ...

--- Retrieved 3 ---
Neural Architecture Search (NAS) aims to automate deep neural network design across various applications, while a good search space design is core to NAS ...


In [17]:
from together import Together

client = Together(api_key="f686a109d1f4267d0019648d3b74d2f014a5c8585ec58e9734e98c6b38f1f658")

retrieved_texts = [doc.page_content for doc in results]
prompt = "Summarize the following research snippets:\n\n" + "\n\n".join(retrieved_texts)

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[
        {"role": "system", "content": "You are a helpful research assistant."},
        {"role": "user", "content": prompt}
    ],
    max_tokens=512,
    temperature=0.7,
)

print("\n✅ Summary:\n", response.choices[0].message.content)



✅ Summary:
 Here is a summary of the research snippets:

The snippets discuss Neural Architecture Search (NAS), a method that aims to automate the design of deep neural networks. The researchers propose a new NAS method called Evolutionary Gradient-Based Neural Architecture Search (EG-NAS), suggesting that existing NAS methods face challenges in optimizing model accuracy and generalization. A key aspect of NAS is designing a good search space, which is crucial for its effectiveness.


## 🔐 API Keys Required

- `SERPAPI_KEY` from [serpapi.com](https://serpapi.com/)
- `TOGETHER_API_KEY` from [platform.together.xyz](https://platform.together.xyz)

Keys can be provided in the Gradio UI or loaded securely in a `.env` file (if running locally).
