-
Notifications
You must be signed in to change notification settings - Fork 4
Description
🧠 AI Research Intelligence Laboratory — Multi-Agent Collaboration + RAG Reasoning
🎯 Purpose
Design an end-to-end AI reasoning workflow combining collaborative multi-agent research (CrewAI) and retrieval-augmented summarization (LangChain + vector store).
The laboratory is divided into two independent tasks, each focused on distinct reasoning pipelines:
Task 1: Multi-Agent Research Team (CrewAI + local LLM)
Task 2: Wikipedia-based RAG Summarizer (LangChain + ChromaDB)
Each task can be completed independently.
No paid LLM APIs are required — all models are open-source and locally executable (e.g., Mistral 7B, Llama 3 8B, Phi-3-mini).
📘 Repository Names
Task 1: multiagent_research-lab
Task 2: rag_wikipedia-lab
🗂️ Create two separate GitHub repositories — one per task. Each repo should contain a notebooks/, src/, and data/ directory.
🧠 EXERCISE 1 — “AI Research Team” (CrewAI + LangChain + Hugging Face Inference API)
🎯 Purpose
Simulate a multi-agent research collaboration where autonomous AI agents gather, analyze, and synthesize information about an AI-related topic using open-source frameworks and the Hugging Face Inference API.
Each agent acts as part of a “virtual research lab” working to produce a coherent research summary.
📘 Repository Name
multi-agent_research-lab
🧩 Structure
Objective
Create a three-agent workflow that simulates collaborative research around a chosen AI topic (e.g., “Impact of Synthetic Data in Healthcare” or “Bias in LLMs”).
Agents will communicate using CrewAI (or LangChain Agents) and rely on Hugging Face Inference API models for reasoning and summarization.
🧠 Agents and Roles
⚙️ Environment
Python 3.10+
Frameworks: CrewAI, LangChain, Hugging Face Hub
Editor: VSCode or Google Colab
No local LLMs (inference handled via Hugging Face Inference API)
🧰 Tasks
0️⃣ Setup
Install required libraries:
pip install crewai langchain huggingface_hub duckduckgo-search chromadb pandas
Configure Hugging Face token:
from huggingface_hub import login
login("YOUR_HF_TOKEN")
1️⃣ Define the Agents
Create three agents within CrewAI or LangChain, defining:
Role / Goal / (Tools / APIs) / Memory (if applicable)
Example:
from crewai import Agent
researcher = Agent(
name="Researcher",
goal="Find reliable web sources about the impact of synthetic data in healthcare.",
tools=["duckduckgo-search"],
)
writer = Agent(
name="Writer",
goal="Write a coherent 500-word research summary using retrieved sources.",
llm="HuggingFaceH4/zephyr-7b-beta", # via Hugging Face API
)
reviewer = Agent(
name="Reviewer",
goal="Evaluate and correct factual inconsistencies and coherence issues.",
llm="microsoft/deberta-v3-small"
)
2️⃣ Workflow
Define communication cycles:
Researcher → performs search → returns snippets.
Writer → generates first draft using those snippets.
Reviewer → critiques and refines the text.
Writer → finalizes Markdown report.
Each agent should send messages back and forth using CrewAI’s coordination logic or LangChain’s agent loop.
3️⃣ Tools
Use the DuckDuckGo Search Tool (or Tavily if available) for gathering open-access content:
from langchain_community.tools import DuckDuckGoSearchRun search_tool = DuckDuckGoSearchRun() results = search_tool.run("Impact of synthetic data in healthcare site:medium.com OR site:researchgate.net")
No BeautifulSoup is needed; extract titles and summaries from search results directly.
4️⃣ Final Output
The Writer Agent generates:
-
research_summary.md(500 words) -
Structure:
-
Introduction
-
Key Findings
-
Ethical & Technical Challenges
-
Conclusion
-
Reviewer edits should be reflected in the final version.
5️⃣ Evaluation (Rubric)
Total: 20 pts
Total: 20 pts
🧮 Deliverables
-
/src/agents.py— all agent definitions -
/notebooks/workflow_demo.ipynb— end-to-end execution -
research_summary.md— final report -
requirements.txt— reproducible environment
🛠️ Technical Requirements
-
Python 3.10+
-
Libraries:
crewai langchain huggingface_hub duckduckgo-search chromadb pandas -
No local LLMs — only Hugging Face Inference API calls.
⏱️ Duration
8 hours total
-
2h setup
-
5h implementation
-
1h presentation & discussion
🔁 Recommended Workflow
-
Define agents and roles.
-
Implement search → writing → reviewing loop.
-
Generate and store Markdown report.
-
Present outputs and discuss team collaboration performance.
🧮 Task 2 — Wikipedia-based RAG Summarizer (LangChain + ChromaDB)
🎯 Objective
Build a retrieval-augmented summarization system using open-source Wikipedia data.
Use LangChain + ChromaDB + SentenceTransformers to query, embed, and summarize factual content from Wikipedia without multi-agent coordination.
⚙️ Steps
0️⃣ Environment Setup
Install:
pip install wikipedia-api sentence-transformers chromadb langchain transformers torch pandas
1️⃣ Dataset Creation
Fetch Wikipedia content:
import wikipediaapi wiki = wikipediaapi.Wikipedia('en') page = wiki.page("Federated_learning")
-
Extract main text and chunk into ~300-word segments.
-
Save to
/data/wiki_corpus.csvwith columns:id, title, text.
2️⃣ Embedding + Vector Store
Embed using:
from sentence_transformers import SentenceTransformer import chromadb
model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
collection = client.create_collection("wiki_ai")
Upsert all chunks into ChromaDB with metadata.
3️⃣ Query Pipeline
Implement LangChain retrieval chain:
from langchain.chains import RetrievalQA from langchain.llms import Ollama from langchain.vectorstores import Chroma
qa = RetrievalQA.from_chain_type(
llm=Ollama(model="mistral"),
chain_type="stuff",
retriever=Chroma(...).as_retriever()
)
Query example:
qa.run("Explain federated learning challenges in healthcare.")
4️⃣ Generate and Save Summary
Combine top retrieved results into one coherent summary (400–500 words) and save as rag_summary.md.
🧮 Deliverables (Task 2)
-
/notebooks/rag_wikipedia.ipynb -
/data/wiki_corpus.csv -
/outputs/rag_summary.md -
/outputs/retrieval_examples.json
📏 Rubric (20 pts)
🧩 Bonus Task — Conceptual Comparison (+3 pts)
Write a short Markdown reflection comparing both approaches:
-
How did the multi-agent workflow handle ambiguity and contradictions?
-
How did the RAG approach handle factuality and retrieval coverage?
-
Which approach is better suited for open-ended vs. factual questions?
Save as /outputs/reflection.md.
🛠️ Technical Requirements
Python 3.10+
Dependencies:
crewai langchain sentence-transformers chromadb wikipedia-api transformers torch pandas numpy markdown
Execution:
All notebooks must run fully in Google Colab or VSCode using a local LLM (Ollama or LM Studio).
🔁 Recommended Workflow
Task 1 — Multi-Agent Research
Define topic → Configure CrewAI agents → Run Researcher → Writer → Reviewer cycle → Save Markdown report
Task 2 — RAG Summarization
Load Wikipedia pages → Chunk text → Embed → Store in ChromaDB → Query with LangChain → Generate summary
Bonus
Write reflection on Multi-Agent vs RAG reasoning strengths
📤 Submission
Submit both repositories:
1️⃣ multiagent_research-lab
2️⃣ rag_wikipedia-lab
Provide GitHub URLs in the submission sheet:
👉 [Submission Excel – Repository & Dashboard Links]
Deadline: November 15