# Project 4: **Build a Deep Research System**
Welcome to project 4! For this project, we shift our focus from tool use and agents to *reasoning* models. You will practice state‚Äëof‚Äëthe‚Äëart inference‚Äëtime scaling methods such as *Chain‚Äëof‚ÄëThought* prompting and *Tree‚Äëof‚ÄëThoughts*, and briefly explore high-levels of training reasoning models using techniques like **STaR**.


Finally, you will put everything together to build a *deep research agent* that can browse the web, reason over what it finds, and give structured answers.

## Learning Objectives  
* Apply common inference‚Äëtime scaling methods: **zero‚Äëshot / few‚Äëshot CoT, self‚Äëconsistency, sequential decoding, tree‚Äëof‚Äëthoughts**  
* Gain intuition for **training** reasoning‚Äëcapable models following **STaR** approach 
* Build a minimal **deep‚Äëresearch agent** that combines step‚Äëby‚Äëstep reasoning with live web search   
* Practice extending deep-search to a multi-agent system 

## Roadmap  
1. Environment setup  
2. Inference‚Äëtime scaling  
   2.1 Few‚Äëshot & zero‚Äëshot‚ÄØCoT  
   2.2 Self‚Äëconsistency
   2.3 Sequential revisions  
   2.4 Tree‚Äëof‚ÄëThought
3. STaR for training models for reasoning  
4. Deep-research agent  
5. (Optional) Multi-agent deep-research

# 1‚Äë Environment setup

## 1.1- Conda environment

Before we start coding, you need a reproducible setup. Open a terminal in the same directory as this notebook and run:

```bash
# Create and activate the conda environment
conda env create -f environment.yaml && conda activate deep_research

# Register this environment as a Jupyter kernel
python -m ipykernel install --user --name=deep_research --display-name "deep_research"
```
Once this is done, you can select "deep_research" from the Kernel ‚Üí Change Kernel menu in Jupyter or VS Code.

## 1.2 Ollama setup

In this project we use the `llama3.2:3b` and `deepseek-r1:8b` models. You can try other smaller or larger reasoning LLMs such as `qwen2.5:3b-instruct` or `phi4-mini` to compare performance. Explore available models here: https://ollama.com/library.

```bash
ollama pull llama3.2:3b
ollama pull deepseek-r1:8b
# Additional small reasoning models to compare
# ollama pull qwen2.5:3b-instruct
# ollama pull phi4-mini

```

`ollama pull` downloads the model so you can run it locally without API calls.

---  
# 2‚Äë Inference‚Äëtime scaling

Inference-time scaling refers to techniques that make an existing model reason better without retraining it. Instead of changing the model‚Äôs weights, we achieve reasoning capability by adjusting how we prompt, sample, or aggregate LLM's outputs.

In this section, we‚Äôll explore several inference-time strategies that improve reasoning quality using a non-reasoning base model. You will experiment with and compare methods such as:

- Few-shot Chain-of-Thought (CoT)
- Zero-shot CoT
- Self-consistency
- Sequential revision
- Tree-of-Thoughts (ToT)

### 2.1: Few‚ÄëShot CoT
Few-shot prompting helps a model reason by showing one or multiple examples before asking a new question. By observing the pattern of reasoning and final answers, the model learns how to structure its own reasoning process on the new input.

In this exercise, you will create a prompt that includes a few example Q&A pairs demonstrating step-by-step reasoning. Then, you will feed a new question and see the model‚Äôs output.

In [1]:
# Step 1: Write a few examples showing reasoning steps
# Step 2: Write your new question
# Step 3: Concatenate examples + new question into a single prompt
# Step 4: Call your Ollama or OpenAI client to get a response from llama3.2:3b # e.g., client.chat.completions.create(...)
# Step 5: Print the final answer

from openai import OpenAI

examples = [
    "Q: Why is the sky blue?\nA: Because molecules scatter shorter wavelengths more.",
    "Q: What is 2+2?\nA: 4",
]
new_question = "Explain overfitting vs underfitting briefly."
prompt = "\n\n".join(examples + [f"Q: {new_question}\nA:"])
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
resp = client.chat.completions.create(model="llama3.2:3b", messages=[{"role":"user","content":prompt}])
print(resp.choices[0].message.content.strip())

Overfitting and underfitting are common issues in machine learning modeling:

**Underfitting**: When a model is too simple to capture the underlying patterns in the data, resulting in poor performance. The model fails to learn the essential features of the data, leading to high error rates.

**Overfitting**: Conversely, when a model is too complex and memorizes the training data, fitting it extremely closely but failing to generalize well to new, unseen data. This results in good performance on the training set but poor performance on the test set, due to capturing random noise instead of underlying patterns.

Both underfitting and overfitting are undesirable outcomes; the optimal scenario is to find a model with sufficient complexity to capture important features while avoiding excessive memorization of noise.


### (Optional) Few-shot CoT on GPT2
GPT-2 is a pre-trained language model without instruction tuning. It continues text rather than answering questions. In this section, you'll try the exact same CoT pattern on GPT-2 and observe what happens. The goal is to test whether few-shot CoT alone can elicit structured reasoning from a non-chat LLM.

In [2]:
import os
import torch
from transformers import pipeline

# Step 1: Load GPT-2 text-generation from huggingface (https://huggingface.co/docs/transformers/en/model_doc/gpt2)
# Step 2: Write 1‚Äì2 few-shot reasoning examples (short, explicit steps + final answer in your own unique format)
# Step 3: Append a new test question after the examples to form one prompt string
# Step 4: Generate 1‚Äì3 completions with different decoding settings (e.g., greedy vs. top-k)
# Step 5: Print raw outputs; check if steps are followed and if the final answer is correct

gen = pipeline("text-generation", model="gpt2")
examples = [
    "Question: What is 12+7?\nSteps: 12+7=19\nFinal: 19",
    "Question: A car goes 60 km in 2 h. Speed?\nSteps: speed=60/2=30 km/h\nFinal: 30 km/h",
]
question = "If x=5 and y=3, what is x^2 + y?"
prompt = "\n\n".join(examples + [f"Question: {question}\nSteps:", "Final:"])

outs = []
outs.append(gen(prompt, max_new_tokens=80, do_sample=False)[0]["generated_text"])  # greedy
outs.append(gen(prompt, max_new_tokens=80, do_sample=True, top_k=50, temperature=0.9)[0]["generated_text"])  # top-k
outs.append(gen(prompt, max_new_tokens=80, do_sample=True, top_p=0.9, temperature=0.8)[0]["generated_text"])  # nucleus

for i, o in enumerate(outs, 1):
    print(f"=== Output {i} ===\n{o}\n")

Device set to use mps:0
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


=== Output 1 ===
Question: What is 12+7?
Steps: 12+7=19
Final: 19

Question: A car goes 60 km in 2 h. Speed?
Steps: speed=60/2=30 km/h
Final: 30 km/h

Question: If x=5 and y=3, what is x^2 + y?
Steps:

Final: x=5/2=5

Question: If x=5 and y=3, what is x^2 + y?

Steps:

Final: x=5/2=5

Question: If x=5 and y=3, what is x^2 + y?

Steps:

Final: x=5/2=

=== Output 2 ===
Question: What is 12+7?
Steps: 12+7=19
Final: 19

Question: A car goes 60 km in 2 h. Speed?
Steps: speed=60/2=30 km/h
Final: 30 km/h

Question: If x=5 and y=3, what is x^2 + y?
Steps:

Final:

Question: When you are at the top of a hill like a mountain, this causes you to walk up it, at the bottom it causes you to walk down it, and, at the top the two things start to cross each other. How does that happen?

=== Output 3 ===
Question: What is 12+7?
Steps: 12+7=19
Final: 19

Question: A car goes 60 km in 2 h. Speed?
Steps: speed=60/2=30 km/h
Final: 30 km/h

Question: If x=5 and y=3, what is x^2 + y?
Steps:

Final: x=5/3 = 3.

### 2.2: Zero‚ÄëShot Chain‚Äëof‚ÄëThought
Zero-shot CoT encourages the model to reason without examples by adding a short cue such as ‚ÄúLet‚Äôs think step by step.‚Äù This simple phrase often activates the model‚Äôs latent reasoning ability even when no demonstrations are provided. It serves as a baseline to compare with few-shot and other inference-time scaling methods.

In [3]:
from openai import OpenAI

# Step 1: Write the question and a zero-shot CoT cue (e.g., "Let's think step by step.")
# Step 2: Build a single prompt string that includes brief role guidance plus the question
# Step 3: Call your Ollama or OpenAI client to get a response from llama3.2:3b  # e.g., client.chat.completions.create(...)
# Step 4: Print the chain and the final answer

question = "If a store has 25 apples and sells 7, then gets 15 more, how many apples are there now?"
role = "You are a helpful assistant that solves problems using clear reasoning."
cot_cue = "Let's think step by step."
prompt = f"{role}\n\n{cot_cue}\n\n{question}"

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
resp = client.chat.completions.create(model="llama3.2:3b", messages=[{"role":"user","content":prompt}])
print(resp.choices[0].message.content)

To find out the total number of apples in the store, let's break it down:

1. The store initially had 25 apples.
2. They sold 7 apples.
3. To account for the sale, we subtract 7 from 25: 25 - 7 = 18
4. Then, they received 15 more apples.

To find the new total, we simply add the remaining apples (after the sale) to the newly arrived apples:

18 + 15 = 33

So, now the store has a total of 33 apples.


### 2.3 Self‚ÄëConsistency
Self-consistency enhances reasoning accuracy by sampling multiple independent reasoning paths for the same question instead of relying on a single deterministic answer. Each run may follow a slightly different logical chain, and the diversity helps correct individual mistakes. After generating several reasoning traces, you then aggregate the final answers using majority voting.

This approach is especially useful when tasks involve multi-step reasoning or arithmetic, where single-path outputs may be incorrect.

In [5]:
from openai import OpenAI
import re, collections

client = OpenAI(api_key = "ollama", base_url = "http://localhost:11434/v1")
MODEL = "llama3.2:3b"

def cot_answer(question, temperature=1.0):
    # Generate a step-by-step reasoning chain for the given question and extract the final answer.
    prompt = f"Let's think step by step.\n\n{question}"
    resp = client.chat.completions.create(model=MODEL, messages=[{"role":"user","content":prompt}], temperature=temperature)
    text = resp.choices[0].message.content
    # Look for the last number in the reasoning chain, prioritizing numbers near the end
    matches = re.findall(r'(?:answer|result|equals?|is|so|therefore)\s*[:=]?\s*([0-9.]+)', text, re.I)
    return text, matches[-1] if matches else None

def self_consistent(question, n=10):
    # Run multiple reasoning chains and select the most frequent final answer by majority voting.
    answers = []
    for i in range(n):
        _, ans = cot_answer(question, temperature=0.8)
        if ans:
            answers.append(ans)
    counter = collections.Counter(answers)
    return counter.most_common(1)[0] if counter else (None, counter)


question = "What is the square root of 144?"
winner, counter = self_consistent(question)
print("Votes:", counter)
print("Chosen answer:", winner)

Votes: 6
Chosen answer: 12.


### 2.4: Sequential Revision

Sequential revision iteratively improves an answer by generating a first draft, critiquing it, and producing revised drafts that condition on prior answers. Each round should be short and focused, so improvements accumulate without drifting from the question.

In [6]:
MODEL = "llama3.2:3b"

def sequential_revision(question: str, max_steps: int = 3) -> str:
    # Generate an initial draft answer, then iteratively refine it by conditioning each revision on the previous one.
    client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
    # Step 1: Ask the model to produce the first draft for the given question
    draft = client.chat.completions.create(
        model=MODEL,
        messages=[{"role":"user","content":f"{question}"}]
    ).choices[0].message.content
    print(f"[Draft 1]\n{draft}\n")

    # Step 2: Loop for max_steps-1 times, each time feeding the last draft back to the model with a request to revise
    for i in range(2, max_steps + 1):
        revised = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role":"user","content":f"{question}"},
                {"role":"assistant","content":draft},
                {"role":"user","content":"Please revise and improve this answer. Make it clearer and more thorough."}
            ]
        ).choices[0].message.content
        print(f"[Draft {i}]\n{revised}\n")
        draft = revised

    return draft
    # Step 3: Print each draft to observe how the answer evolves
    # Step 4: Return the final improved draft


# Step 1: Define a question that benefits from multi-step reasoning
question = "Explain why regularization helps prevent overfitting in machine learning."
# Step 2: Call sequential_revision(question, max_steps)
final_answer = sequential_revision(question, max_steps=3)
# Step 3: Print the final output
print("=== Final Answer ===")
print(final_answer)

[Draft 1]
Regularization, also known as L1 or L2 regularization, is a technique used to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too complex and performs well on the training data but fails to generalize well to new, unseen data.

Here's why regularization helps:

1.  **Reduces capacity**: Regularization adds constraints to the model that reduce its capacity. This means less flexibility is available for the model to create complex patterns in the training data.
2.  **Enforces sparsity or low dimensionality**: L1-regularized models (also known as Lasso) enforce sparsity by setting coefficients to zero unless they belong to a small subset of variables. In contrast, L2-regularized models (Ridge regression) do not set any variable to zero but penalize larger weights.
3.  **Regularizes coefficients**: Coefficient vectors in the parameters are often treated as vectors and regularized together with other model terms to keep all of them at a certa

### 2.5 Tree‚Äëof‚ÄëThoughts
Tree-of-Thoughts reframes reasoning as a search process rather than a single forward chain.
Instead of producing one linear sequence of thoughts, the model generates multiple candidate thoughts at each step, evaluates their promise, and then expands only the best few. This allows exploration of different reasoning paths before committing to a final answer, similar to how humans brainstorm, prune, and refine ideas.


In this section, you‚Äôll experiment with two simplified versions of ToT:
1. Word Ladder puzzle solver: a small example where each ‚Äúthought‚Äù is a candidate word transition.
2. Generic ToT search (depth 2, width 2): a minimal logic to expand, evaluate, and select reasoning branches

In [None]:
###### Word Ladder Puzzle ##########

def neighbors(word, vocabulary):
    # Generate all valid one-letter mutations of 'word' that exist in 'vocabulary' and return them.
    result = []
    for i in range(len(word)):
        for char in 'abcdefghijklmnopqrstuvwxyz':
            candidate = word[:i] + char + word[i+1:]
            if candidate in vocabulary and candidate != word:
                result.append(candidate)
    return result


def tree_of_thought(start, goal, vocab, max_depth=5, beam_width=4):
    # Search over partial thoughts (paths) using a small beam.
    # Step 1: Initialize the frontier with a single path [start]
    # Step 2: For each depth, expand each path by one neighbor from 'neighbors'
    # Step 3: Score paths by edit distance between last word and 'goal' (smaller is better)
    # Step 4: Keep the top 'beam_width' paths and stop early if any reaches 'goal'
    # Step 5: Return the best goal-reaching path or None
    from difflib import SequenceMatcher

    frontier = [[start]]

    for depth in range(max_depth):
        candidates = []
        for path in frontier:
            if path[-1] == goal:
                return path
            for neighbor in neighbors(path[-1], vocab):
                if neighbor not in path:
                    candidates.append(path + [neighbor])

        if not candidates:
            return None

        scores = [(path, len(path[-1]) - sum(a==b for a,b in zip(path[-1], goal)))
                  for path in candidates]
        scores.sort(key=lambda x: x[1])
        frontier = [path for path, _ in scores[:beam_width]]

    return None


vocab = {"hit","dot","cog","log","dog","lot","lit","hot"}
print(tree_of_thought("hit", "cog", vocab)) # one candidate solution: ['hit', 'hot', 'dot', 'dog', 'cog']


['hit', 'hot', 'dot', 'dog', 'cog']


In [None]:
###### Generic ToT Search ##########

import re

MODEL = "llama3.2:3b"

def propose_thoughts(question, state, k=2):
    # Propose up to k next ‚Äúthoughts‚Äù that extend the current partial solution/state.
    # Steps: build a short prompt with problem + current state; call your client with n=k. Then return a list of stripped strings (‚â§ k).
    from openai import OpenAI
    client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

    prompt = f"Question: {question}\nCurrent thinking: {state}\nPropose {k} next thoughts to continue solving this."
    resp = client.chat.completions.create(
        model=MODEL,
        messages=[{"role":"user", "content":prompt}],
        n=k,
        temperature=0.9
    )
    return [choice.message.content.strip() for choice in resp.choices]


def score_state(question, state):
    # Score how promising a partial solution is on a 1‚Äì10 scale (higher is better).
    # Steps: build a rating prompt; call the model; parse the first integer 1‚Äì10;
    from openai import OpenAI
    client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

    prompt = f"Question: {question}\nProposed solution: {state}\nRate how promising this is from 1-10:"
    resp = client.chat.completions.create(
        model=MODEL,
        messages=[{"role":"user", "content":prompt}]
    )
    match = re.search(r'\b([1-9]|10)\b', resp.choices[0].message.content)
    return int(match.group(1)) if match else 5


def tree_of_thoughts(question, depth=2, width=2):
    # Run a tiny ToT search: expand states with propose_thoughts, score with score_state, keep top-k at each depth.
    # Steps: initialize frontier=[("", 0)]; for each depth, expand each state with k=width thoughts; score each; sort by score desc; keep top 'width'; return best state and score.
    frontier = [("", 0)]

    for d in range(depth):
        candidates = []
        for state, _ in frontier:
            thoughts = propose_thoughts(question, state, k=width)
            for thought in thoughts:
                new_state = state + " " + thought if state else thought
                candidates.append((new_state, score_state(question, new_state)))

        candidates.sort(key=lambda x: x[1], reverse=True)
        frontier = candidates[:width]

    return frontier[0] if frontier else ("", 0)


question = "Design a plan for a weekend science workshop for 12-year-olds."
solution, score = tree_of_thoughts(question)

print(f"Best solution (score {score}):\n{solution}")

Best solution (score 9):
**Weekend Science Workshop Plan for 12-Year-Olds**

**Workshop Title:** "Curious Minds: Exploring the World of Science"

**Objective:** To provide an engaging and interactive science experience for 12-year-olds, fostering curiosity, critical thinking, and problem-solving skills.

**Duration:** Saturday and Sunday (6 hours each day)

**Age Group:** 12-year-olds

**Number of Participants:** 10-15 students

**Workshop Schedule:**

**Day 1 (Saturday):**

9:00 am - 9:30 am: Welcome and Introduction
   - Icebreaker games to get participants comfortable with each other.
   - Brief overview of the workshop's objectives and agenda.

9:30 am - 10:30 am: **Physics in Action**
   - Hands-on activities exploring motion, forces, and energy (e.g., marble runs, ball toss).
   - Introduce basic physics concepts through experiments and discussions.

10:30 am - 10:50 am: Break
   - Snack time and opportunity for participants to ask questions or share experiences.

10:50 am - 12:0

---  
# 3‚Äë Training Models for Reasoning

### 3.1: CoT Training
Chain-of-Thought (CoT) training conditions the model on explicit rationales during fine-tuning. Instead of teaching the model to output only the final answer, we train on (question, rationale, answer) so the model learns to internalize multi-step reasoning patterns. A practical recipe is STaR (Self-Taught Reasoner), which uses a stronger teacher model to bootstrap rationales that a smaller student can learn from.

For tasks that require multi-hop reasoning, models fine-tuned on rationales often achieve higher accuracy and are more stable at inference time than models trained on direct answers only. 

Training a full language model is beyond the scope of this notebook, but here is the high-level workflow followed by a short pseudocode:
- Collect questions: Prepare a dataset of questions and correct answers.
- Generate rationales: Use a strong LLM to produce step-by-step reasoning ending with the correct answer.
- Filter and clean: Discard incorrect or low-quality rationales.
- Prepare training data: Format triples (question, rationale, answer) for supervised fine-tuning.
- Fine-tune: Fine-tune the LLM on rationales.
- Iterate: Refine prompts, improve data quality, and retrain for stronger reasoning.

In [None]:
# Pseudocode (STaR loop)
# for round in 1 ... iters:
    # STEP 1: self-generate reasoning (teacher creates rationale + answer)
    # STEP 2: keep only correct, high-quality traces
    # STEP 3: fine-tune student on (question, rationale, answer) data

### 3.2: ORM¬†vs¬†PRM¬†+ RL
Training a Reward Model (RM) allows large language models to be improved through reinforcement learning (RL). Instead of fine-tuning directly on examples, we train a separate model that can score or rank model outputs, and use those scores as feedback signals to refine the policy model.

Two main reward modeling approaches are ORM (predicts a scalar reward for the final answer) and PRM (evaluates the reasoning steps instead of just the outcome)



| Approach | Typical loss | When to use |
|-----------|-------------|-------------|
|*Outcome Reward Model* | Predict scalar reward | Easy to collect training data using verifiers |
|*Process Reward Model* | Predict rewards per step | Difficult to collect training data but more accurate |
| *RLHF* | Use RM as reward in **RL** fine‚Äëtuning | Aligns policy with human signals | Aligns model policy with human or synthetic preferences




In [None]:
# for round = 1 ... iters:
    # STEP 1:  Generate reasoning
        # sample a minibatch of questions
        # policy roll‚Äëout (actions + log‚Äëprobs)
    # STEP 2:  Score the trajectory
        # ORM: scalar reward for the final answer / PRM: scalar reward for the thought process
    # STEP 3:  Reinforce the policy (PPO)

---  
# 4‚Äë A Deep Research Agent

A deep-research agent pairs a reasoning model (e.g., deepseek-r1) with external tools for web search and retrieval. We will follow the ReAct pattern: the model writes short thoughts, decides when to call tools, reads observations, and continues reasoning until it can answer or reaches a step limit.

We now combine a **search tool** with a reasoning model (e.g., `deepseek-r1`) in a multi-step setup. We follow the *ReAct* pattern (reason ‚Üí tool ‚Üí observation):

1. The model reasoins and decides to use tools
2. The agent searches and feed condensed snippets back as context
3. Iterate until the model answers or hits a step limit

We use `AgentType.OPENAI_FUNCTIONS`, which hides the loop inside the LangChain agent.

In [11]:
from ddgs import DDGS
from langchain_community.tools import Tool

def ddg_search(query: str, k: int = 5) -> str:
    # Use DDGS to run a simple web search and return joined snippets.
    results = []
    with DDGS() as ddgs:
        for r in ddgs.text(query, max_results=k):
            results.append(r.get("body", ""))
    return "\n\n".join(results)

search_tool = Tool(
    name="DuckDuckGo Search",
    func=ddg_search,
    description="Search the public web. Input: a plain English query. Returns: concatenated snippets."
)


In [17]:
try:
    from langchain_ollama import ChatOllama
except ImportError:
    from langchain_community.chat_models import ChatOllama

try:
    from langchain.agents import initialize_agent, AgentType
    USE_AGENT = True
except ImportError:
    USE_AGENT = False
    print("Using fallback agent implementation...")

MODEL = "deepseek-r1:8b"
question = "What are the best resources to learn machine learning in 2025?"

# Step 1: Initialize the reasoning model via ChatOllama
llm = ChatOllama(model=MODEL, base_url="http://localhost:11434")

# Step 2: Build the agent with tool access (DuckDuckGo Search) and function-calling interface
if USE_AGENT:
    agent = initialize_agent([search_tool], llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
    result = agent.invoke({"input": question})
    print(result["output"])
else:
    # Fallback: direct call with search
    from openai import OpenAI
    client = OpenAI(api_key="ollama", base_url="http://localhost:11434/v1")

    # First search
    search_results = ddg_search("best machine learning resources 2025", k=5)

    # Ask model with context
    prompt = f"Based on the following search results, answer: {question}\n\nSearch results:\n{search_results}"
    resp = client.chat.completions.create(model=MODEL, messages=[{"role":"user","content":prompt}])
    print(resp.choices[0].message.content)

Using fallback agent implementation...
Based on the search results, the best resources to learn machine learning in 2025 include:

1. **Andrew Ng‚Äôs Machine Learning Course**: A foundational course that introduces key concepts, though it uses Octave. It's recommended to learn Python alongside or after completing this course, as Python is crucial for most ML tasks.

2. **Dataquest (Free Python Lessons)**: Offers interactive, browser-based Python lessons that are ideal for hands-on practice and building a strong programming foundation for machine learning.

3. **Keras**: Particularly suited for quickly prototyping neural networks. It's easy to use and is a go-to library for modern deep learning projects.

4. **PyTorch and TensorFlow Courses**:
   - **PyTorch Full Course**: A comprehensive series on YouTube, designed for deep learning and machine learning.
   - **TensorFlow Tutorial Series**: Covers installation, environment setup, and advanced concepts like Neural Networks and the Tenso

# Optional (Multi-agent Deep Research)
Instead of a single multi-step agent, you can design multiple collaborating agents such as a Planner, Searcher, Summarizer, and Verifier that pass information and refine each other‚Äôs outputs. This setup improves robustness, diversity of reasoning, and division of labor.

Try building a simple setup with 2‚Äì3 agents that share goals and messages, for example Planner ‚Üí Researcher ‚Üí Writer.

In [None]:
def parallel_research(query, n=3):
    # Run n independent research runs in parallel and return their answers.
    # Steps: use ThreadPoolExecutor; submit n calls to your agent/search pipeline; gather results in order.
    """
    YOUR CODE HERE
    """

answers = parallel_research("What are the best resources to learn ML in 2025?")
for i,a in enumerate(answers,1):
    print(f"[Run {i}] {a[:200]}‚Ä¶")

## üéâ Congratulations!

* Practised various inference‚Äëtime reasoning methods
* Gained intuition about training reasoning models
* You have built a **deep-research agent**: reasoning model like deep-seek r1 + ReAct-style agent + tool use (web search)
* Try adding more tools, and extending the deep-research to a multi-agent system: many agents researching web in parallel.


üëè **Great job!** Take a moment to celebrate. The techniques you implemented here power many production agents and chatbots.