# Project 4: **Build a Deep Research System**
Welcome to project 4! For this project, we shift our focus from tool use and agents to *reasoning* models. You will practice state‚Äëof‚Äëthe‚Äëart inference‚Äëtime scaling methods such as *Chain‚Äëof‚ÄëThought* prompting and *Tree‚Äëof‚ÄëThoughts*, and briefly explore high-level concepts of training reasoning models using techniques like **STaR**.


Finally, you will put everything together to build a *deep research agent* that can browse the web, reason over what it finds, and give structured answers.

## Learning Objectives  
* Apply common inference‚Äëtime scaling methods: **zero‚Äëshot / few‚Äëshot CoT, self‚Äëconsistency, sequential revision, tree‚Äëof‚Äëthoughts**  
* Gain intuition for **training** reasoning‚Äëcapable models following **STaR** approach 
* Build a minimal **deep‚Äëresearch agent** that combines step‚Äëby‚Äëstep reasoning with live web search   
* Practice extending deep-search to a multi-agent system 

## Roadmap  
0. Environment setup  
1. Inference‚Äëtime scaling  
  1.1 Few‚Äëshot.   
  1.2 Zero‚Äëshot‚ÄØCoT.   
  1.3 Self‚Äëconsistency.   
  1.4 Sequential revisions.     
  1.5 Tree‚Äëof‚ÄëThought (ToT)
2. Training reasoning models and inspecting deepseek-r1 
3. Deep-research agent  
4. (Optional) Multi-agent deep-research

# 0- Environment setup

### Step 1: Create your environment and install dependencies 
Before we start coding, you need a reproducible setup. Open a terminal in the same directory as this notebook, and use Conda or uv to install the project dependencies.

#### Option 1: Conda
```bash
# Create and activate the conda environment
conda env create -f environment.yaml && conda activate deep_research
```

#### Option 2: uv (Fast alternative)
If you prefer [uv](https://docs.astral.sh/uv/) over Conda:

```bash
# Install uv (skip if already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create a virtual environment and install dependencies
uv venv .venv-deep-research && source .venv-deep-research/bin/activate
uv pip install -r requirements.txt
```

### Step 2: Register this environment as a Jupyter kernel
```bash
python -m ipykernel install --user --name=deep_research --display-name "deep_research"
```
Now open your notebook and switch to the `deep_research` kernel (Kernel ‚Üí Change Kernel).

### Step 3: Setup and run Ollama serve

In this project we use the `llama3.2:3b`, `qwen2.5:3b-instruct` and `deepseek-r1:1.5b` models. You can try other smaller or larger reasoning LLMs such as `phi4-mini` to compare performance. Explore available models here: https://ollama.com/library.

Open terminal and run ollama:
```bash
ollama serve
```
Then open another terminal and pull required models: 
```bash
ollama pull llama3.2:3b
ollama pull deepseek-r1:1.5b
ollama pull qwen2.5:3b-instruct
# Additional small reasoning models to compare
# ollama pull phi4-mini
```

---  
# 1‚Äë Inference‚Äëtime scaling

Inference-time scaling refers to techniques that make an existing model reason better without retraining it. Instead of changing the model‚Äôs weights, we achieve reasoning capability by adjusting how we prompt, sample, or aggregate LLM's outputs.

In this section, we‚Äôll explore several inference-time strategies that improve reasoning quality using a non-reasoning base model. You will experiment with and compare methods such as:

- Few-shot Chain-of-Thought (CoT)
- Zero-shot CoT
- Self-consistency
- Sequential revision
- Tree-of-Thoughts (ToT)

### 1.1: Few-Shot CoT

Few-shot prompting provides examples before asking a new question. The model learns from the pattern and applies it to new inputs.

We'll explore this with two models to understand how few-shot interacts with model capabilities:

1. **GPT-2** (no instruction tuning): Doesn't reason by default. We'll see if few-shot examples can elicit reasoning.
2. **Llama 3.2** (instruction-tuned): Already reasons naturally. We'll use few-shot to control the output format.

#### GPT-2: Can few-shot examples elicit reasoning?

GPT-2 is a base language model that just predicts the next token. It wasn't trained to follow instructions or reason step-by-step. Let's see what happens with and without few-shot examples.

In [43]:
# Sect 1.1: GPT-2 Few-Shot reasoning
import os, torch; from transformers import pipeline
gen = pipeline('text-generation', model='gpt2')
q = 'rectangle perimeter 36, length=2*width, area?'
p = f'Q: triangle base 10, height 5, area? R: (10*5)/2=25. A: 25. \nQ: {q}\nR:'
print(gen(p, max_new_tokens=50)[0]['generated_text'])


Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]

GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     |  | 
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=50) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Q: triangle base 10, height 5, area? R: (10*5)/2=25. A: 25. 
Q: rectangle perimeter 36, length=2*width, area?
R: (36*4)/2=40. A: 40. 
Q: circle perimeter 10, height=2*width, area?
R: (10*5)/2=25. A: 25. 
Q: triangle


#### Llama 3.2: Using few-shot to control output format

Unlike GPT-2, Llama 3.2 is instruction-tuned and already produces reasoning traces by default. So what's the point of few-shot examples?

**The power of few-shot with instruction-tuned models is controlling the output format.** We can make the model follow a specific structure like `[GIVEN]/[FIND]/[SOLVE]/[ANSWER]` that it wouldn't use naturally.

In [None]:
# Sect 1.1: Llama 3.2 Few-Shot Control
from openai import OpenAI; c = OpenAI(api_key='ollama', base_url='http://localhost:11434/v1')
q = 'rectangle perimeter 36, length=2*width, area?'
p = f'[GIVEN]: Square 4cm. [SOLVE]: 4*4=16. [ANSWER]: 16. \n[GIVEN]: {q} \n[SOLVE]:'
print(c.chat.completions.create(model='llama3.2:3b', messages=[{'role':'user','content':p}]).choices[0].message.content)


### 1.2: Zero‚ÄëShot Chain‚Äëof‚ÄëThought
Zero-shot CoT encourages the model to reason without examples by adding a short cue such as ‚ÄúLet‚Äôs think step by step.‚Äù This simple phrase often activates the model‚Äôs latent reasoning ability even when no demonstrations are provided. It serves as a baseline to compare with few-shot and other inference-time scaling methods.

In [None]:
# --- Section 1.2: Zero-Shot Chain-of-Thought ---
# Adding 'Let's think step by step' activates the model's reasoning without examples.
from openai import OpenAI
client = OpenAI(api_key='ollama', base_url='http://localhost:11434/v1')
prompt = "If a car travels 120 miles in 2h, how far in 5h? Let's think step by step."
print(client.chat.completions.create(model='llama3.2:3b', messages=[{'role': 'user', 'content': prompt}]).choices[0].message.content)


### 1.3 Self‚ÄëConsistency
Self-consistency enhances reasoning accuracy by sampling multiple independent reasoning paths for the same question instead of relying on a single deterministic answer. Each run may follow a slightly different logical chain, and the diversity helps correct individual mistakes. After generating several reasoning traces, you then aggregate the final answers using majority voting.

This approach is especially useful when tasks involve multi-step reasoning or arithmetic, where single-path outputs may be incorrect.

In [None]:
# --- Section 1.3: Self-Consistency ---
# Improving reliability by sampling multiple paths and using majority voting.
from openai import OpenAI
import re, collections
client = OpenAI(api_key='ollama', base_url='http://localhost:11434/v1')
def get_ans(q):
    p = f"{q}\nThink step by step. End with 'Final Answer: <val>'"
    r = client.chat.completions.create(model='llama3.2:3b', messages=[{'role': 'user', 'content': p}], temperature=1.2).choices[0].message.content
    m = re.search(r'Final Answer:\s*(.*)', r, re.I)
    return m.group(1).strip() if m else r[-10:]

answers = [get_ans('Square root of 144?') for _ in range(5)]
print('Votes:', collections.Counter(answers))


### 1.4: Sequential Revision

Sequential revision iteratively improves an answer by generating a first draft, critiquing it, and producing revised drafts that condition on prior answers. Each round should be short and focused, so improvements accumulate without drifting from the question.

In [21]:
# --- Section 1.4: Sequential Revision ---
# Model critiques and improves its own draft iteratively.
from openai import OpenAI
client = OpenAI(api_key='ollama', base_url='http://localhost:11434/v1')
def revise(q, steps=2):
    m = [{'role': 'user', 'content': q}]
    d = client.chat.completions.create(model='llama3.2:3b', messages=m).choices[0].message.content
    print('Draft 1:', d)
    for i in range(2, steps+1):
        m.extend([{'role':'assistant','content':d}, {'role':'user','content':'Critique and provide a better version.'}])
        d = client.chat.completions.create(model='llama3.2:3b', messages=m).choices[0].message.content
        print(f'Draft {i}:', d)
    return d
revise('Explain quantum entanglement.')


Draft 1: Quantum entanglement is a fundamental concept in quantum mechanics that describes the interconnectedness of two or more particles in such a way that their properties are correlated, regardless of the distance between them.

In classical physics, it was thought that particles were independent and unaffected by each other's presence. However, in the early 20th century, physicists discovered that under certain conditions, particles could become "entangled" in a way that led to fascinating and unexpected effects.

Entanglement occurs when two or more particles interact with each other in such a way that their quantum states become linked. This means that if something happens to one particle, it instantly affects the state of the connected particles, regardless of the distance between them.

Here are some key features of entanglement:

1. **Correlation**: Entangled particles exhibit correlated behavior, meaning that when the state of one particle is measured, it instantly affects t

"Here's a critique of the original explanation:\n\n**Strengths:**\n\n* The explanation clearly conveys the basic concept of quantum entanglement.\n* It provides some examples to illustrate its significance and applications.\n\n**Weaknesses:**\n\n* The language is somewhat simplistic, which may not be suitable for a more technical audience.\n* There's a lack of depth in explaining the underlying physics and implications of entanglement.\n* Some sections feel disconnected and could benefit from additional transitions or explanatory sentences.\n\nHere's an improved version:\n\nQuantum Entanglement: A Mysterious Phenomenon at the Heart of Quantum Mechanics\n\nIn the realm of quantum mechanics, entanglement represents a fundamental puzzle waiting to be unraveled. This enigmatic phenomenon describes the interconnectedness of two or more particles in such a way that their properties are correlated, regardless of the distance between them.\n\n**The Core Concept:**\n\nEntanglement arises when t

### 1.5 Tree-of-Thoughts

Tree-of-Thoughts (ToT) reframes reasoning as a search problem. Instead of generating one linear chain of thoughts, the model:
1. Generates multiple candidate "thoughts" at each step
2. Evaluates how promising each thought is
3. Expands only the best candidates (beam search)
4. Backtracks if needed

This mirrors how humans solve hard problems: brainstorm options, evaluate them, pursue the best, and backtrack when stuck.

#### Example 1: Word Ladder (Algorithmic ToT)

This example shows ToT as pure beam search without LLM calls. Each "thought" is a candidate word that differs by one letter. We score by edit distance to goal and keep the best candidates.

This demonstrates the **core algorithm** behind ToT: expand, score, prune.

In [22]:
# --- Section 1.5: Tree-of-Thought (Word Ladder) ---
# Search problem: Breadth/Beam search through word mutations.
def neighbors(w, vocab):
    return [w[:i]+c+w[i+1:] for i in range(len(w)) for c in 'abcdefghijklmnopqrstuvwxyz' if c!=w[i] and w[:i]+c+w[i+1:] in vocab]
def tot_ladder(start, goal, vocab, beam=4):
    frontier = [[start]]
    for _ in range(5):
        new_f = []
        for p in frontier:
            if p[-1] == goal: return p
            for n in neighbors(p[-1], vocab):
                if n not in p: new_f.append(p+[n])
        if not new_f: break
        new_f.sort(key=lambda p: sum(1 for a,b in zip(p[-1], goal) if a!=b))
        frontier = new_f[:beam]
    return None
print('Path:', tot_ladder('hit', 'cog', {'hit','dot','cog','log','dog','lot','lit','hot'}))


Path: ['hit', 'hot', 'dot', 'dog', 'cog']


#### Example 2: Generic ToT for Open-Ended Problems

For open-ended problems without verifiable answers, we can still apply ToT by having the LLM both propose and evaluate thoughts.

In [28]:
# --- Section 1.5: Generic ToT (LLM Propose & Score) ---
import re
from openai import OpenAI
client = OpenAI(api_key='ollama', base_url='http://localhost:11434/v1')
def solve_tot(q):
    # Propose 2 steps, score both, pick best (simplified ToT)
    p = f"{q}\nPropose 2 next steps. One per line."
    steps = client.chat.completions.create(model='llama3.2:3b', messages=[{'role':'user','content':p}]).choices[0].message.content.split('\n')
    best = None; max_s = -1
    for s in [st.strip() for st in steps if st.strip()]:
        score_p = f"Score this step 1-10: {s}. Return ONLY the number."
        score_r = client.chat.completions.create(model='llama3.2:3b', messages=[{'role':'user','content':score_p}]).choices[0].message.content
        print(f'Scored {s} as {score_r}')
        val = int(re.search(r'\d+', score_r).group()) if re.search(r'\d+', score_r) else 0
        if val > max_s: max_s = val; best = s
    print(f'Best step (Score {max_s}): {best}')
solve_tot('Design a science workshop plan related to hot water and wax.')


Scored Here's a science workshop plan related to hot water and wax: as 8
Scored **Workshop Title:** "Melting Point Mystery: Investigating the Science of Hot Water and Wax" as 8
Scored **Age Group:** 8-12 years old as 6
Scored **Objectives:** as 7
Scored * To understand the concept of melting point and how it relates to temperature as 8.
Scored * To observe and record the effects of varying temperatures on wax as 6
Scored * To develop scientific inquiry skills through experimentation and data analysis as 6
Scored **Workshop Plan:** as 8
Scored 1. Introduction (10 minutes): as 8
Scored * Introduce the concept of melting point and its importance in everyday life. as 8
Scored * Show examples of materials with different melting points, such as butter or ice. as 9
Scored 2. Materials Preparation (15 minutes): as 8
Scored * Prepare multiple containers filled with identical amounts of wax (e.g., beeswax or candle wax). as 8
Scored * Label each container with a temperature range (e.g., 50¬∞C to

---  
# 2‚Äë Training Models for Reasoning

### 2.1: CoT Training
Chain-of-Thought (CoT) training conditions the model on explicit rationales during fine-tuning. Instead of teaching the model to output only the final answer, we train on (question, rationale, answer) so the model learns to internalize multi-step reasoning patterns. A practical recipe is STaR (Self-Taught Reasoner), which uses a stronger teacher model to bootstrap rationales that a smaller student can learn from.

For tasks that require multi-hop reasoning, models fine-tuned on rationales often achieve higher accuracy and are more stable at inference time than models trained on direct answers only. 

Training a full language model is beyond the scope of this notebook, but here is the high-level workflow followed by a short pseudocode:
- Collect questions: Prepare a dataset of questions and correct answers.
- Generate rationales: Use a strong LLM to produce step-by-step reasoning ending with the correct answer.
- Filter and clean: Discard incorrect or low-quality rationales.
- Prepare training data: Format triples (question, rationale, answer) for supervised fine-tuning.
- Fine-tune: Fine-tune the LLM on rationales.
- Iterate: Refine prompts, improve data quality, and retrain for stronger reasoning.

In [29]:
# Pseudocode (STaR loop)
# for round in 1 ... iters:
    # STEP 1: self-generate reasoning (teacher creates rationale + answer)
    # STEP 2: keep only correct, high-quality traces
    # STEP 3: fine-tune student on (question, rationale, answer) data

### 2.2: ORM¬†vs¬†PRM¬†+ RL
Training a Reward Model (RM) allows large language models to be improved through reinforcement learning (RL). Instead of fine-tuning directly on examples, we train a separate model that can score or rank model outputs, and use those scores as feedback signals to refine the policy model.

Two main reward modeling approaches are ORM (predicts a scalar reward for the final answer) and PRM (evaluates the reasoning steps instead of just the outcome)



| Approach | Typical loss | When to use |
|-----------|-------------|-------------|
|*Outcome Reward Model* | Predict scalar reward | Easy to collect training data using verifiers |
|*Process Reward Model* | Predict rewards per step | Difficult to collect training data but more accurate |
| *RLHF* | Use RM as reward in **RL** fine‚Äëtuning | Aligns policy with human signals | Aligns model policy with human or synthetic preferences




In [None]:
# for round = 1 ... iters:
    # STEP 1:  Generate reasoning
        # sample a minibatch of questions
        # policy roll‚Äëout (actions + log‚Äëprobs)
    # STEP 2:  Score the trajectory
        # ORM: scalar reward for the final answer / PRM: scalar reward for the thought process
    # STEP 3:  Reinforce the policy (PPO)

### 2.3 Inspect a reasoning model

Now that we've discussed how reasoning models are trained, let's see one in action. We'll use **DeepSeek-R1**, a reasoning model that produces explicit *thinking tokens* before giving its final answer. The model wraps its internal chain-of-thought inside `<think>...</think>` tags, followed by a clean final response.

In the cell below we send a question to DeepSeek-R1 and parse the output to separate:
- **Thinking tokens** ‚Äî the model's internal reasoning process (hidden from the end user in production).
- **Final answer** ‚Äî the polished response the user actually sees.

We use `deepseek-r1:1.5b` here for speed. You can switch to `deepseek-r1:8b` for higher-quality reasoning, but it will take longer to run. Pull whichever variant you want to try:

In [30]:
# --- Section 2.3: Inspecting DeepSeek-R1 ---
import re
from openai import OpenAI
client = OpenAI(api_key='ollama', base_url='http://localhost:11434/v1')
r = client.chat.completions.create(model='deepseek-r1:1.5b', messages=[{'role': 'user', 'content': 'Sum first 10 primes'}]).choices[0].message.content
think = re.search(r'<think>(.*?)</think>', r, re.S)
print('THINKING:', think.group(1).strip() if think else 'No trace')
print('ANSWER:', re.sub(r'<think>.*?</think>', '', r, flags=re.S).strip())


THINKING: No trace
ANSWER: To find the first \(10\) prime numbers, follow these steps:

1. **Understand What a Prime Number Is**:
   - A prime number is a natural number greater than \(1\) that has no positive divisors other than \(1\) and itself.

2. **List Out the First 10 Odd Numbers** since even numbers (other than \(2\)) are not primes:

   \[
   3, 5, 7, 9, 11, 13, 15, 17, 19, 21
   \]

3. **Identify Which Of These Are Primes**:
   
   - **\(3\)**: Divisible by \(1\) and \(3\). Prime.
   - **\(5\)**: Divisible by \(1\) and \(5\). Prime.
   - **\(7\)**: Divisible by \(1\) and \(7\). Prime.
   - **\(9\)**: Divisible by \(1, 3,\) and \(9\). Not prime.
   - **\(11\)**: Divisible only by \(1\) and \(11\). Prime.
   - **\(13\)**: Divisible only by \(1\) and \(13\). Prime.
   - **\(15\)**: Divisible by \(1, 3,\) and \(15\). Not prime.
   - **\(17\)**: Divisible only by \(1\) and \(17\). Prime.
   - **\(19\)**: Divisible only by \(1\) and \(19\). Prime.
   - **\(21\)**: Divisible by \(1,

---  
# 3‚Äë A Deep Research Agent

A deep-research agent pairs a reasoning model with external tools for web search and retrieval. We will follow the ReAct pattern: the model writes short thoughts, decides when to call tools, reads observations, and continues reasoning until it can answer or reaches a step limit.

We now combine a **search tool** with an LLM in a multi-step setup. We follow the *ReAct* pattern (reason ‚Üí tool ‚Üí observation):

1. The model reasons and decides to use tools
2. The agent searches and feeds condensed snippets back as context
3. Iterate until the model answers or hits a step limit

We use `create_agent` from `langchain.agents`, which builds a ReAct-style agent graph. Note: the agent model must support **tool calling** (e.g., `llama3.2:3b`). Models like `deepseek-r1` are reasoning models that do not support native tool calling and cannot be used directly as the agent LLM. We can stick to the `llama3.2:3b` or `qwen2.5:3b-instruct` for this section.

In [45]:
# --- Section 3: Deep Research Agent (Search Tool) ---
from ddgs import DDGS
from langchain_core.tools import tool
@tool
def ddg_search(query: str, k: int = 5) -> str:
    """Search web and return snippets."""
    with DDGS() as ddgs: 
        print(f"DDGT links:", ddgs.text(query, max_results=k))
        return '\n\n'.join([r['body'] for r in list(ddgs.text(query, max_results=k))])


In [46]:
# --- Section 3: Deep Research Agent (ReAct Logic) ---
# We use the simplified create_agent API (version 1.2.8) suitable for this environment.
from langchain.agents import create_agent
from langchain_ollama import ChatOllama

# 1. Initialize the model
# Note: In this version, create_agent expects the model object or a provider string.
llm = ChatOllama(model='qwen2.5:3b-instruct', temperature=0)

# 2. Build the ReAct agent graph
graph = create_agent(
    model=llm,
    tools=[ddg_search],
    system_prompt='You are a deep research assistant. Use the search tool to find information.'
)

# 3. Execute the agent
print('Agent is researching...')
inputs = {'messages': [{'role': 'user', 'content': 'What are the best resources to learn machine learning in 2025?'}]}
# In this graph-based version, we invoke and look at the final message
result = graph.invoke(inputs)

print('\nFinal Research Answer:')
# The final answer is in the last message of the state
print(result['messages'][-1].content)


Agent is researching...
DDGT links: [{'title': 'Harvard Business School - Artificial Intelligence Course', 'href': 'https://www.bing.com/aclick?ld=e8kxvSt_PxAuzdKd0W0mezGTVUCUxh5vNqgZeJS_9S_j3lbzEBfYGE5IL-pM07E9rQHWdcAx3psDbglKbK2-3sY0o2YKwaUR-cQk-GEzk5_ycg7oIbiq6qG4-6iPRbHnRWspQX5kr3w8-3dhKbnV779r78W4t79j9BJai7rPaQ2LCSgtpHBZ77hfGexZ3Y_T3muQARtw&u=aHR0cHMlM2ElMmYlMmZhZC5kb3VibGVjbGljay5uZXQlMmZzZWFyY2hhZHMlMmZsaW5rJTJmY2xpY2slM2ZsaWQlM2Q0MzcwMDA4MTQxOTc3NjEwNCUyNmRzX3Nfa3dnaWQlM2Q1ODcwMDAwODg0MTQ3NDQxNSUyNmRzX2FfY2lkJTNkNDAwODc5NTk3JTI2ZHNfYV9jYWlkJTNkMjA4NjczMjMzMjclMjZkc19hX2FnaWQlM2QxNjkzMjQxMzAxMzAlMjZkc19hX2xpZCUzZGt3ZC0zMzM4MTU3MzglMjYlMjZkc19lX2FkaWQlM2Q4MTkxMzc0NzQ0ODk0NCUyNmRzX2VfdGFyZ2V0X2lkJTNka3dkLTgxOTEzOTY5ODMzOTQ3JTNhbG9jLTcyJTI2JTI2ZHNfZV9uZXR3b3JrJTNkbyUyNmRzX3VybF92JTNkMiUyNmRzX2Rlc3RfdXJsJTNkaHR0cHMlM2ElMmYlMmZ3d3cuZXhlZC5oYnMuZWR1JTJmY29tcGV0aW5nLWFnZS1haSUzZnV0bV9zb3VyY2UlM2RiaW5nJTI2dXRtX21lZGl1bSUzZHBhaWQtc2VhcmNoJTI2dXRtX2NhbXBhaWduJTNkbm9uLWJyYW5kLXByb2dyYW0tYW

In [36]:
result

{'messages': [HumanMessage(content='What are the best resources to learn machine learning in 2025?', additional_kwargs={}, response_metadata={}, id='3c1bc54f-5a3f-4e6a-b0c8-b43357691d82'),
  AIMessage(content="To provide you with the most relevant and up-to-date information on resources for learning machine learning in 2025, we might need to conduct a search. Let's use a tool to find recent educational materials and trends.\n", additional_kwargs={}, response_metadata={'model': 'qwen2.5:3b-instruct', 'created_at': '2026-02-14T14:24:25.627493238Z', 'done': True, 'done_reason': 'stop', 'total_duration': 2005088932, 'load_duration': 256315894, 'prompt_eval_count': 166, 'prompt_eval_duration': 75605400, 'eval_count': 108, 'eval_duration': 1238611574, 'logprobs': None, 'model_name': 'qwen2.5:3b-instruct', 'model_provider': 'ollama'}, id='lc_run--019c5c89-d844-7cb3-ab28-ba5ab35d300b-0', tool_calls=[{'name': 'ddg_search', 'args': {'k': 3, 'query': 'best resources to learn machine learning 2025

# 4- (Optional) Multi-Agent Deep Research

Instead of a single agent, we can design multiple collaborating agents that work in parallel:

1. **Planner**: Analyzes the query and breaks it into sub-questions
2. **Researchers**: Run in parallel, each searching and summarizing findings for one sub-question  
3. **Synthesizer**: Combines all research into a coherent final report

This setup improves coverage and speed by parallelizing the research phase.

In [42]:
# --- Section 4: Multi-Agent Deep Research ---
from concurrent.futures import ThreadPoolExecutor
from openai import OpenAI
from ddgs import DDGS
client = OpenAI(api_key='ollama', base_url='http://localhost:11434/v1')
MODEL = 'llama3.2:3b'

def plan_research(query: str):
    # Planner agent: breaks query into sub-questions.
    p = f'Break research query into 3 sub-questions: {query}. Respond with questions only.'
    res = client.chat.completions.create(model=MODEL, messages=[{'role': 'user', 'content': p}]).choices[0].message.content
    return [q.strip() for q in res.split('\n') if q.strip()][:3]

def search_and_summarize(sub_question: str):
    # Researcher agent: searches web and summarizes findings.
    with DDGS() as ddgs: snippets = ' '.join([r['body'] for r in list(ddgs.text(sub_question, max_results=2))])
    p = f'Summarize for {sub_question}: {snippets}'
    summary = client.chat.completions.create(model=MODEL, messages=[{'role': 'user', 'content': p}]).choices[0].message.content
    return {'question': sub_question, 'summary': summary}

def deep_research(query: str):
    # Orchestrator
    sub_questions = plan_research(query)
    with ThreadPoolExecutor() as exe: findings = list(exe.map(search_and_summarize, sub_questions))
    context = '\n'.join([f'Q: {f["question"]}\nA: {f["summary"]}' for f in findings])
    return client.chat.completions.create(model=MODEL, messages=[{'role': 'user', 'content': f'Report on {query}:\n{context}'}]).choices[0].message.content

print(deep_research('AI Trends 2025'))


Based on the provided information, here's a report on AI trends for 2025:

**Summary**

The year 2025 promises to be a transformative period for artificial intelligence (AI) as emerging technologies continue to shape industries and push boundaries. This report highlights several key developments that are expected to impact various sectors and highlight opportunities and challenges.

**Key Trend 1: Collaboration between Humans and AI-generated Content**

In the realm of content generation, we can expect a new collaboration model where humans work alongside AI systems to create high-quality, original content. While AI has made significant strides in producing engaging content, creative human input remains essential for sparking innovation and unique perspectives.

**Trend 2: Edge AI for Real-time Processing and Analytics**

Edge AI technology is on the cusp of revolutionizing real-time processing, analytics, and decision-making across various industries. By enabling real-time reasoning, 

## üéâ Congratulations!

You have:
* Practiced various inference-time reasoning methods (CoT, self-consistency, sequential revision, ToT)
* Gained intuition about training reasoning models (STaR, ORM/PRM)
* Built a **deep-research agent** with tool calling and ReAct-style reasoning
* Implemented a **multi-agent system** with parallel research and report synthesis


üëè **Great job!** Take a moment to celebrate. The techniques you implemented here power many production agents and chatbots.