Skip to content

Homework_5 #159

@jeanpool1415

Description

@jeanpool1415

🧠 AI Research Intelligence Laboratory — Multi-Agent Collaboration + RAG Reasoning
🎯 Purpose

Design an end-to-end AI reasoning workflow combining collaborative multi-agent research (CrewAI) and retrieval-augmented summarization (LangChain + vector store).

The laboratory is divided into two independent tasks, each focused on distinct reasoning pipelines:

Task 1: Multi-Agent Research Team (CrewAI + local LLM)

Task 2: Wikipedia-based RAG Summarizer (LangChain + ChromaDB)

Each task can be completed independently.
No paid LLM APIs are required — all models are open-source and locally executable (e.g., Mistral 7B, Llama 3 8B, Phi-3-mini).

📘 Repository Names

Task 1: multiagent_research-lab

Task 2: rag_wikipedia-lab

🗂️ Create two separate GitHub repositories — one per task. Each repo should contain a notebooks/, src/, and data/ directory.

🧠 EXERCISE 1 — “AI Research Team” (CrewAI + LangChain + Hugging Face Inference API)

🎯 Purpose

Simulate a multi-agent research collaboration where autonomous AI agents gather, analyze, and synthesize information about an AI-related topic using open-source frameworks and the Hugging Face Inference API.
Each agent acts as part of a “virtual research lab” working to produce a coherent research summary.


📘 Repository Name

multi-agent_research-lab


🧩 Structure

Objective

Create a three-agent workflow that simulates collaborative research around a chosen AI topic (e.g., “Impact of Synthetic Data in Healthcare” or “Bias in LLMs”).

Agents will communicate using CrewAI (or LangChain Agents) and rely on Hugging Face Inference API models for reasoning and summarization.


🧠 Agents and Roles

Agent | Responsibility | Tools / Functions -- | -- | -- Researcher Agent | Conducts information search online and retrieves relevant text sources. | Web search tool (e.g., DuckDuckGo Search API or Tavily), text retrieval, document parsing Writer Agent | Synthesizes retrieved knowledge into a 500-word structured summary (Markdown format). | Hugging Face Inference API for summarization Reviewer Agent | Evaluates coherence, factuality, and structure of the final summary, suggesting corrections. | Text analysis with Hugging Face sentiment/classification model

⚙️ Environment

Python 3.10+

Frameworks: CrewAI, LangChain, Hugging Face Hub

Editor: VSCode or Google Colab

No local LLMs (inference handled via Hugging Face Inference API)

🧰 Tasks
0️⃣ Setup

Install required libraries:

pip install crewai langchain huggingface_hub duckduckgo-search chromadb pandas

Configure Hugging Face token:

from huggingface_hub import login
login("YOUR_HF_TOKEN")

1️⃣ Define the Agents

Create three agents within CrewAI or LangChain, defining:
Role / Goal / (Tools / APIs) / Memory (if applicable)

Example:

from crewai import Agent

researcher = Agent(
name="Researcher",
goal="Find reliable web sources about the impact of synthetic data in healthcare.",
tools=["duckduckgo-search"],
)

writer = Agent(
name="Writer",
goal="Write a coherent 500-word research summary using retrieved sources.",
llm="HuggingFaceH4/zephyr-7b-beta", # via Hugging Face API
)

reviewer = Agent(
name="Reviewer",
goal="Evaluate and correct factual inconsistencies and coherence issues.",
llm="microsoft/deberta-v3-small"
)

2️⃣ Workflow

Define communication cycles:

Researcher → performs search → returns snippets.

Writer → generates first draft using those snippets.

Reviewer → critiques and refines the text.

Writer → finalizes Markdown report.

Each agent should send messages back and forth using CrewAI’s coordination logic or LangChain’s agent loop.

3️⃣ Tools

Use the DuckDuckGo Search Tool (or Tavily if available) for gathering open-access content:

from langchain_community.tools import DuckDuckGoSearchRun search_tool = DuckDuckGoSearchRun() results = search_tool.run("Impact of synthetic data in healthcare site:medium.com OR site:researchgate.net")

No BeautifulSoup is needed; extract titles and summaries from search results directly.


4️⃣ Final Output

The Writer Agent generates:

  • research_summary.md (500 words)

  • Structure:

    • Introduction

    • Key Findings

    • Ethical & Technical Challenges

    • Conclusion

Reviewer edits should be reflected in the final version.


5️⃣ Evaluation (Rubric)

Criterion | Points -- | -- Correct setup and configuration (CrewAI + Hugging Face) | 4 pts Functional multi-agent collaboration (communication cycles working) | 6 pts Researcher retrieves meaningful text data | 3 pts Writer generates coherent, structured text via Hugging Face API | 3 pts Reviewer produces factuality & coherence feedback | 2 pts Markdown summary well-structured and readable | 2 pts

Total: 20 pts

Total: 20 pts


🧮 Deliverables

  • /src/agents.py — all agent definitions

  • /notebooks/workflow_demo.ipynb — end-to-end execution

  • research_summary.md — final report

  • requirements.txt — reproducible environment


🛠️ Technical Requirements

  • Python 3.10+

  • Libraries:

    crewai langchain huggingface_hub duckduckgo-search chromadb pandas
  • No local LLMs — only Hugging Face Inference API calls.


⏱️ Duration

8 hours total

  • 2h setup

  • 5h implementation

  • 1h presentation & discussion


🔁 Recommended Workflow

  1. Define agents and roles.

  2. Implement search → writing → reviewing loop.

  3. Generate and store Markdown report.

  4. Present outputs and discuss team collaboration performance.

🧮 Task 2 — Wikipedia-based RAG Summarizer (LangChain + ChromaDB)

🎯 Objective

Build a retrieval-augmented summarization system using open-source Wikipedia data.
Use LangChain + ChromaDB + SentenceTransformers to query, embed, and summarize factual content from Wikipedia without multi-agent coordination.


⚙️ Steps

0️⃣ Environment Setup

Install:

pip install wikipedia-api sentence-transformers chromadb langchain transformers torch pandas

1️⃣ Dataset Creation

Fetch Wikipedia content:

import wikipediaapi wiki = wikipediaapi.Wikipedia('en') page = wiki.page("Federated_learning")
  • Extract main text and chunk into ~300-word segments.

  • Save to /data/wiki_corpus.csv with columns: id, title, text.

2️⃣ Embedding + Vector Store

Embed using:

from sentence_transformers import SentenceTransformer import chromadb

model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
collection = client.create_collection(
"wiki_ai")

Upsert all chunks into ChromaDB with metadata.

3️⃣ Query Pipeline

Implement LangChain retrieval chain:

from langchain.chains import RetrievalQA from langchain.llms import Ollama from langchain.vectorstores import Chroma

qa = RetrievalQA.from_chain_type(
llm=Ollama(model="mistral"),
chain_type=
"stuff",
retriever=Chroma(...).as_retriever()
)

Query example:

qa.run("Explain federated learning challenges in healthcare.")

4️⃣ Generate and Save Summary

Combine top retrieved results into one coherent summary (400–500 words) and save as rag_summary.md.


🧮 Deliverables (Task 2)

  • /notebooks/rag_wikipedia.ipynb

  • /data/wiki_corpus.csv

  • /outputs/rag_summary.md

  • /outputs/retrieval_examples.json


📏 Rubric (20 pts)

Category | Description | Points -- | -- | -- Wikipedia Data | Correct extraction and chunking | 4 Embedding + Storage | Proper embedding using SentenceTransformers and ChromaDB | 6 LangChain Pipeline | Functional retrieval and generation pipeline | 6 Final Summary | Coherence, accuracy, and factual completeness | 4

🧩 Bonus Task — Conceptual Comparison (+3 pts)

Write a short Markdown reflection comparing both approaches:

  • How did the multi-agent workflow handle ambiguity and contradictions?

  • How did the RAG approach handle factuality and retrieval coverage?

  • Which approach is better suited for open-ended vs. factual questions?

Save as /outputs/reflection.md.


🛠️ Technical Requirements

Python 3.10+
Dependencies:

crewai langchain sentence-transformers chromadb wikipedia-api transformers torch pandas numpy markdown

Execution:
All notebooks must run fully in Google Colab or VSCode using a local LLM (Ollama or LM Studio).


🔁 Recommended Workflow

Task 1 — Multi-Agent Research

Define topic → Configure CrewAI agents → Run Researcher → Writer → Reviewer cycle → Save Markdown report

Task 2 — RAG Summarization

Load Wikipedia pages → Chunk text → Embed → Store in ChromaDB → Query with LangChain → Generate summary

Bonus

Write reflection on Multi-Agent vs RAG reasoning strengths

📤 Submission

Submit both repositories:

1️⃣ multiagent_research-lab
2️⃣ rag_wikipedia-lab

Provide GitHub URLs in the submission sheet:
👉 [Submission Excel – Repository & Dashboard Links]

Deadline: November 15

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions