# Scaling Test-Time Compute for Longer Thinking in LLMs

_Authored by: [Sergio Paniego](https://github.com/sergiopaniego)_

🚨 **WARNING**: This notebook is **resource-intensive** and requires substantial computational power. If you’re running this in **Colab**, it will utilize an **A100 GPU**.

---

## 🧠 Extending Inference Time for Instruct LLM Systems

In this recipe, we'll guide you through extending the inference time for an **Instruct LLM system** using **test-time compute**, which enhances its ability to solve more challenging problems, such as **complex math problems**. This approach is inspired by the [**OpenAI o1-o3 models**](https://openai.com/index/learning-to-reason-with-llms/), which demonstrate how **longer reasoning time** during inference can improve model performance.

### 🔍 Concept Behind the Approach

This technique is based on insights from [this **blog post**](https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute), which presents experiments validating the strategy with open models. The results show that the smaller **1B** and **3B Llama Instruct models** outperform their much larger **8B** and **70B** counterparts on the **challenging MATH-500 benchmark**, provided they are given enough **"time to think"**.

![Instruct LLM Methodology](https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/methods-thumbnail.png)

### 🚀 New Repository and Experimentation

The blog introduces a [**new repository**](https://github.com/huggingface/search-and-learn) designed for running these experiments. In this recipe, however, we will focus on building a **small chatbot** that is capable of engaging in **longer reasoning** to tackle **harder problems** using small open models.

---

## 📚 Explore Further

As stated in the blog: `Although we don’t know how o1 was trained, recent research from DeepMind shows that test-time compute can be scaled optimally through strategies like iterative self-refinement or using a reward model to perform search over the space of solutions`. Learn more from the paper: [**Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters**](https://arxiv.org/abs/2408.03314).


# 1. Install Dependencies

Let’s start by installing the [search-and-learn](https://github.com/huggingface/search-and-learn) repository! 🚀  
This repo is designed to replicate the experimental results and is not a Python pip package. However, we can still use it to generate our system. To do so, we’ll need to install it from source with the following steps:

In [None]:
!git clone https://github.com/huggingface/search-and-learn

In [None]:
%cd search-and-learn
!pip install -e '.[dev]'

Log in to Hugging Face to access [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct), as it is a gated model! 🗝️  
If you haven't previously requested access, you'll need to submit a request before proceeding.


In [None]:
from huggingface_hub import notebook_login

notebook_login()

# 2. Setup the Large Language Model (LLM) and the Process Reward Model (PRM) 💬

As illustrated in the diagram, the system consists of an LLM that generates intermediate answers based on user input, a [PRM model](https://huggingface.co/papers/2211.14275) that evaluates and scores these answers, and a search strategy that uses the PRM feedback to guide the subsequent steps in the search process until reaching the final answer.

Let’s begin by initializing each model. For the LLM, we’ll use the [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) model, and for the PRM, we’ll use the [RLHFlow/Llama3.1-8B-PRM-Deepseek-Data](https://huggingface.co/RLHFlow/Llama3.1-8B-PRM-Deepseek-Data) model.




![system](https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/system.png)

In [None]:
import torch
from vllm import LLM
from sal.models.reward_models import RLHFFlow

model_path="meta-llama/Llama-3.2-1B-Instruct"
prm_path="RLHFlow/Llama3.1-8B-PRM-Deepseek-Data"

llm = LLM(
    model=model_path,
    gpu_memory_utilization=0.5,  # Utilize 50% of GPU memory
    enable_prefix_caching=True,  # Optimize repeated prefix computations
    seed=42,                     # Set seed for reproducibility
)

prm = RLHFFlow(prm_path)

## 2.1 Instantiate the Question, Search Strategy, and Call the Pipeline

Now that we've set up the LLM and PRM, let's proceed by defining the question, selecting a search strategy to retrieve relevant information, and calling the pipeline to process the question through the models.

1. **Instantiate the Question**: In this step, we define the input question that the system will answer, considering the given context.

2. **Search Strategy**: The system currently supports the following search strategies: `best_of_n`, `beam_search`, and `dvts` (see diagram). For this example, we'll use `best_of_n`, but you can easily switch to any of the other strategies based on your needs. We need to define some configuration parameters for the configuration of the search strategy. You can check the full list [here](https://github.com/huggingface/search-and-learn/blob/main/src/sal/config.py).

3. **Call the Pipeline**: With the question and search strategy in place, we’ll call the inference pipeline, processing the inputs through both the LLM and PRM to generate the final answer.

![](https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/search-strategies.png)

The first step is to clearly define the question that the system will answer. This ensures that we have a precise task for the model to tackle.

In [3]:
question_text = 'Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$'
input_batch = {"problem": [question_text]}

Next, we define the configuration, including parameters like the number of candidate answers `(N)`, and choose the search strategy that will be used. The search strategy dictates how we explore the potential answers. In this case, we'll use `best_of_n`.

With the question and configuration in place, we use the selected search strategy to generate multiple candidate answers. These candidates will be evaluated based on their relevance and quality.

In [4]:
from sal.config import Config
from sal.search import beam_search, best_of_n, dvts

config = Config()
config.n=32 # Number of answers to generate during the search

search_result = best_of_n(x=input_batch, config=config, llm=llm, prm=prm)

Finally, we evaluate the generated answers to select the best one. This scoring process helps us determine which answer best addresses the question.

In [6]:
from sal.utils.score import score
from datasets import Dataset


search_dataset = Dataset.from_dict(search_result)
final_result = score(search_dataset, config)

Map:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:   0%|          | 0/6 [00:00<?, ?it/s]

Subsample 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  17%|█▋        | 1/6 [00:00<00:00,  5.72it/s]

Subsample 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  33%|███▎      | 2/6 [00:00<00:00,  6.93it/s]

Subsample 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  50%|█████     | 3/6 [00:00<00:00,  7.21it/s]

Subsample 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  67%|██████▋   | 4/6 [00:00<00:00,  6.92it/s]

Subsample 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  83%|████████▎ | 5/6 [00:00<00:00,  6.24it/s]

Subsample 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions: 100%|██████████| 6/6 [00:01<00:00,  5.96it/s]


## 2.2 Display the Final Result

Once the pipeline has processed the question through the LLM and PRM, we can display the final result. This result will be the model's output after considering the intermediate answers and scoring them using the PRM.

Here's how to display the final answer:

In [11]:
final_result['pred'][0]

'<|start_header_id|>assistant<|end_header_id|>\n\n## Step 1: The relationship between rectangular and polar coordinates is given by:\n\\[ r = \\sqrt{x^2 + y^2} \\] and \\[ heta = \\arctan{\\left(\\frac{y}{x}\\right)}. \\]\n\n## Step 2: Given the point $(0, 3)$, we have $x = 0$ and $y = 3$. Plugging these into the formula for $r$, we get:\n\\[ r = \\sqrt{0^2 + 3^2} = \\sqrt{9} = 3. \\]\n\n## Step 3: Plugging the values of $x = 0$ and $y = 3$ into the formula for $heta$, we get:\n\\[ heta = \\arctan{\\left(\\frac{3}{0}\\right)} = \\arctan{\\left( \\infty \\right)}. \\]\n\n## Step 4: Since $\\arctan{\\left( \\infty \\right)}$ is equal to $\\frac{\\pi}{2}$, we have:\n\\[ heta = \\frac{\\pi}{2}. \\]\n\n## Step 5: Therefore, the polar coordinates of the point $(0, 3)$ are $\\left( 3, \\frac{\\pi}{2} \\right).$\n\nThe final answer is: $\\boxed{\\left( 3, \\frac{\\pi}{2} \\right)}$'

The model’s output might include special tokens, such as `<|start_header_id|>` or `<|end_header_id|>`. To make the answer more readable, we can safely remove them before displaying it to the end user.

In [8]:
formatted_output = final_result['pred'][0].replace("<|start_header_id|>assistant<|end_header_id|>\n\n", "").strip()
formatted_output

'## Step 1: The relationship between rectangular and polar coordinates is given by:\n\\[ r = \\sqrt{x^2 + y^2} \\] and \\[ heta = \\arctan{\\left(\\frac{y}{x}\\right)}. \\]\n\n## Step 2: Given the point $(0, 3)$, we have $x = 0$ and $y = 3$. Plugging these into the formula for $r$, we get:\n\\[ r = \\sqrt{0^2 + 3^2} = \\sqrt{9} = 3. \\]\n\n## Step 3: Plugging the values of $x = 0$ and $y = 3$ into the formula for $heta$, we get:\n\\[ heta = \\arctan{\\left(\\frac{3}{0}\\right)} = \\arctan{\\left( \\infty \\right)}. \\]\n\n## Step 4: Since $\\arctan{\\left( \\infty \\right)}$ is equal to $\\frac{\\pi}{2}$, we have:\n\\[ heta = \\frac{\\pi}{2}. \\]\n\n## Step 5: Therefore, the polar coordinates of the point $(0, 3)$ are $\\left( 3, \\frac{\\pi}{2} \\right).$\n\nThe final answer is: $\\boxed{\\left( 3, \\frac{\\pi}{2} \\right)}$'

After removing any special tokens, we can display the final answer to the user. Since the answer is based on markdown, it can be rendered properly by displaying it as markdown.

In [9]:
from IPython.display import display, Markdown

display(Markdown(formatted_output))

## Step 1: The relationship between rectangular and polar coordinates is given by:
\[ r = \sqrt{x^2 + y^2} \] and \[ heta = \arctan{\left(\frac{y}{x}\right)}. \]

## Step 2: Given the point $(0, 3)$, we have $x = 0$ and $y = 3$. Plugging these into the formula for $r$, we get:
\[ r = \sqrt{0^2 + 3^2} = \sqrt{9} = 3. \]

## Step 3: Plugging the values of $x = 0$ and $y = 3$ into the formula for $heta$, we get:
\[ heta = \arctan{\left(\frac{3}{0}\right)} = \arctan{\left( \infty \right)}. \]

## Step 4: Since $\arctan{\left( \infty \right)}$ is equal to $\frac{\pi}{2}$, we have:
\[ heta = \frac{\pi}{2}. \]

## Step 5: Therefore, the polar coordinates of the point $(0, 3)$ are $\left( 3, \frac{\pi}{2} \right).$

The final answer is: $\boxed{\left( 3, \frac{\pi}{2} \right)}$

# 3. Assembling It All! 🧑‍🏭️

Now, let's create a method that encapsulates the entire pipeline. This will allow us to easily reuse the process in future applications, making it efficient and modular.

By combining the LLM, PRM, search strategy, and result display, we can simplify the workflow and ensure that it’s reusable for other tasks or questions.

We simplify the workflow, ensuring that it’s reusable for different tasks or questions. Additionally, we’ll track the time spent on each method so that we can **understand the practical implications** of using each strategy and configuration.

Here’s how we can structure the method:

In [12]:
import time

def generate_with_search_and_learn(question, config, llm, prm, method='best_of_n'):
    """
    Generate an answer for a given question using the search-and-learn pipeline.

    Args:
    - question (str): The input question to generate an answer for.
    - config (Config): Configuration object containing parameters for search strategy.
    - llm (LLM): Pretrained large language model used for generating answers.
    - prm (RLHFFlow): Process reward model used for evaluating answers.
    - method (str): Search strategy to use. Options are 'best_of_n', 'beam_search', 'dvts'. Default is 'best_of_n'.

    Returns:
    - str: The formatted output after processing the question.
    """
    batch = {"problem": [question]}

    start_time = time.time()
    if method == 'best_of_n':
      result = best_of_n(x=batch, config=config, llm=llm, prm=prm)
    elif method == 'beam_search':
      result = beam_search(examples=batch, config=config, llm=llm, prm=prm)
    elif method == 'dvts':
      result = dvts(examples=batch, config=config, llm=llm, prm=prm)

    elapsed_time = time.time() - start_time
    print(f"\nFinished in {elapsed_time:.2f} seconds\n")

    result = Dataset.from_dict(result)
    result = score(result, config)
    formatted_output = result['pred'][0].replace("<|start_header_id|>assistant<|end_header_id|>\n\n", "").strip()
    return formatted_output

## ⏳  3.1 Comparing Thinking Time for Each Strategy

Let’s now run each method and track the **thinking time** for a given question.

### 1. **Best of n**

We’ll begin by using the `best_of_n` strategy. Here’s how to track the thinking time for this method:

In [23]:
question = 'Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$'

config.n=8

formatted_output = generate_with_search_and_learn(question=question, config=config, llm=llm, prm=prm, method='best_of_n')


Finished in 2.45 seconds



Map:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:   0%|          | 0/4 [00:00<?, ?it/s]

Subsample 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  25%|██▌       | 1/4 [00:00<00:00,  8.57it/s]

Subsample 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  50%|█████     | 2/4 [00:00<00:00,  8.36it/s]

Subsample 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  75%|███████▌  | 3/4 [00:00<00:00,  7.78it/s]

Subsample 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions: 100%|██████████| 4/4 [00:00<00:00,  7.52it/s]


In [24]:
display(Markdown(formatted_output))

## Step 1: Recall the conversion formulas between rectangular and polar coordinates.
The conversion formulas are given as $r = \sqrt{x^2 + y^2}$ for the radial coordinate and $\theta = \tan^{-1}\left(\frac{y}{x}\right)$ for the angular coordinate.

## Step 2: Apply the conversion formulas to the given point (0,3).
Given the point $(0,3)$, we can substitute its rectangular coordinates into the formulas:
- $r = \sqrt{0^2 + 3^2} = \sqrt{0 + 9} = \sqrt{9} = 3$
- $\theta = \tan^{-1}\left(\frac{3}{0}\right)$

## Step 3: Handle the division by zero explicitly.
Since $\tan^{-1}\left(\frac{3}{0}\right)$ is undefined, we must recognize that the point (0,3) lies on the y-axis. Thus, its polar coordinates should be in the form $(r, \frac{\pi}{2})$.

## Step 4: Conclude the polar coordinates of the point (0,3).
Therefore, the polar coordinates of the point (0,3) are $\left(3, \frac{\pi}{2}\right)$.

The final answer is: $\boxed{\left(3, \frac{\pi}{2}\right)}$

### 2. **Beam Search**

Now, let's try using the `beam_search` strategy.

In [25]:
config.n=8
# beam search specific
config.sort_completed=True
config.filter_duplicates=True

formatted_output = generate_with_search_and_learn(question=question, config=config, llm=llm, prm=prm, method='beam_search')

Beam search iterations:  22%|██▎       | 9/40 [00:10<00:35,  1.13s/it]


Finished in 10.19 seconds






Map:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:   0%|          | 0/4 [00:00<?, ?it/s]

Subsample 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  25%|██▌       | 1/4 [00:00<00:00,  8.88it/s]

Subsample 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  50%|█████     | 2/4 [00:00<00:00,  8.21it/s]

Subsample 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  75%|███████▌  | 3/4 [00:00<00:00,  7.82it/s]

Subsample 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions: 100%|██████████| 4/4 [00:00<00:00,  7.78it/s]


In [26]:
display(Markdown(formatted_output))

To convert the point $(0,3)$ from rectangular coordinates to polar coordinates, we need to find the radius $r$ and the angle $\theta$.

The formula to find $r$ is:

\[ r = \sqrt{x^2 + y^2} \]

In this case, $x = 0$ and $y = 3$.

\[ r = \sqrt{0^2 + 3^2} = \sqrt{9} = 3 \]

Now, we need to find the angle $\theta$.

Since the point $(0,3)$ lies on the positive $y$-axis, the angle $\theta$ is $\frac{\pi}{2}$ (or $90^\circ$).

So, the polar coordinates are:

\[(r, \theta) = \left(3, \frac{\pi}{2}\right)\]

Therefore, the polar coordinates of the point $(0,3)$ are $\left(3, \frac{\pi}{2}\right)$.

### 3. **Diverse Verifier Tree Search (DVTS)**

Finally, let's try the `dvts` strategy.

In [27]:
config.n=8
# dvts specific
config.n_beams = config.n // config.beam_width

formatted_output = generate_with_search_and_learn(question=question, config=config, llm=llm, prm=prm, method='dvts')

Beam search iterations:  25%|██▌       | 10/40 [00:11<00:34,  1.16s/it]


Finished in 11.56 seconds






Map:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:   0%|          | 0/4 [00:00<?, ?it/s]

Subsample 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  25%|██▌       | 1/4 [00:00<00:00,  8.78it/s]

Subsample 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  50%|█████     | 2/4 [00:00<00:00,  8.59it/s]

Subsample 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  75%|███████▌  | 3/4 [00:00<00:00,  8.20it/s]

Subsample 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions: 100%|██████████| 4/4 [00:00<00:00,  7.98it/s]


In [28]:
display(Markdown(formatted_output))

## Step 1: To convert the point $(0,3)$ from rectangular coordinates to polar coordinates, we need to find the radius $r$ and the angle $\theta$.

The formula to convert rectangular coordinates $(x, y)$ to polar coordinates $(r, \theta)$ is given by:

$r = \sqrt{x^2 + y^2}$

$\theta = \tan^{-1}\left(\frac{y}{x}\right)$

## Step 2: Substitute the values of $x = 0$ and $y = 3$ into the formulas to find the radius $r$ and the angle $\theta$.

$r = \sqrt{0^2 + 3^2} = \sqrt{9} = 3$

$\theta = \tan^{-1}\left(\frac{3}{0}\right) = \tan^{-1}(\infty)$

## Step 3: Since $\tan^{-1}(\infty)$ is equal to $\frac{\pi}{2}$, we can conclude that the polar coordinates are $(3, \frac{\pi}{2})$.

Therefore, the polar coordinates of the point $(0,3)$ are $\left(3, \frac{\pi}{2}\right)$.

The final answer is: $\boxed{\left(3, \frac{\pi}{2}\right)}$

In the table below, we compare the three methods (`best_of_n`, `beam_search`, and `dvts`) with the same number of answers during the search process, considering the time spent thinking (in seconds).

As seen from the results, the `best_of_n` method requires the least amount of thinking time, while the `dvts` method requires the most time.

| **Method**      | **Number of Answers During Search** | **Time Thinking (Seconds)** |
|-----------------|-------------------------------------|-----------------------------|
| **best_of_n**   | 8                                   | 2.45                        |
| **beam_search** | 8                                   | 10.19                       |
| **dvts**        | 8                                   | 11.56                       |

This comparison helps us understand the trade-offs between the different search strategies, especially when it comes to balancing time and solution quality.

## 🙋 3.2 Testing the System with a Simple Question

In this final example, we’ll test the system using a straightforward question to observe how it performs in simpler cases. This allows us to verify that the system works as expected even for basic queries.

Let's try the following question:

In [29]:
question = 'What\'s the capital of Spain?'

config.n=32

formatted_output = generate_with_search_and_learn(question=question, config=config, llm=llm, prm=prm, method='best_of_n')


Finished in 1.21 seconds



Map:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:   0%|          | 0/6 [00:00<?, ?it/s]

Subsample 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 1:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  17%|█▋        | 1/6 [00:00<00:00,  8.69it/s]

Subsample 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 2:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  33%|███▎      | 2/6 [00:00<00:00,  8.28it/s]

Subsample 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 4:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  50%|█████     | 3/6 [00:00<00:00,  7.91it/s]

Subsample 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 8:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  67%|██████▋   | 4/6 [00:00<00:00,  6.02it/s]

Subsample 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 16:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions:  83%|████████▎ | 5/6 [00:00<00:00,  6.07it/s]

Subsample 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Extract answers 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute weighted pred 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute majority pred 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Compute naive pred 32:   0%|          | 0/1 [00:00<?, ? examples/s]

Computing majority & weighted predictions: 100%|██████████| 6/6 [00:00<00:00,  6.34it/s]


In [30]:
display(Markdown(formatted_output))

The capital of Spain is Madrid.

Even though we set a larger number of candidate answers (`N`), the time spent thinking remains relatively small (1.21 seconds). This demonstrates the system’s ability to efficiently handle easier problems, spending less time on them, while leveraging its enhanced capabilities for more complex questions.

🏆 **We now have a fully operational pipeline** that leverages test-time compute, enabling the system to "think longer" for more complicated queries, while also maintaining fast response times for straightforward questions.

This approach ensures the system can scale its thinking time based on the task's complexity, offering an efficient and responsive solution for both simple and challenging problems.


# 4. Continuing the Journey 🧑‍🎓️

If you're eager to continue exploring, be sure to check out the original experimental [blog](https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute) and all the references mentioned within it. These resources will deepen your understanding of test-time compute, its benefits, and its applications in LLMs.

Happy learning and experimenting! 🚀