# Scaling Test-Time Compute for Longer Thinking in LLMs

This notebook is originally based on a huggingface [cookbook](https://github.com/huggingface/cookbook/blob/main/notebooks/en/search_and_learn.ipynb).

❗️This notebook can only perform inference but not benchmark a searching strategy.

This needs T4 GPU and is only tested with `config.n=4`. `config.n=8` causes OOM. \
Execution Time of each query with these settings:
- best_of_n: 1 mins
- beam_search: 5 mins
- dvts: 2 mins

❗️Huge amount of time is needed for `pip install` and downloading models in every new RUNTIME. If any error occurs, restart the SESSION.

## 1. Install Dependencies

Since Colab comes with many pre-installed packages, leading to difficult-to-resolve version conflicts, we install dependencies in a local virtual environment and freeze them here.

In [None]:
%%bash
echo "
accelerate==1.5.2
aiohappyeyeballs==2.6.1
aiohttp==3.11.14
aiosignal==1.3.2
annotated-types==0.7.0
antlr4-python3-runtime==4.7.2
anyio==4.9.0
attrs==25.3.0
certifi==2025.1.31
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
datasets==3.5.0
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
einops==0.8.1
fastapi==0.115.12
filelock==3.18.0
frozenlist==1.5.0
fsspec==2024.12.0
gguf==0.10.0
h11==0.14.0
hf_transfer==0.1.9
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.29.3
idna==3.10
importlib_metadata==8.6.1
iniconfig==2.1.0
interegular==0.3.3
isort==6.0.1
Jinja2==3.1.6
jiter==0.9.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
latex2sympy2==1.9.1
llvmlite==0.44.0
lm-format-enforcer==0.10.6
MarkupSafe==3.0.2
mistral_common==1.5.4
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.2.0
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.4.2
numba==0.61.0
numpy==1.26.4
nvidia-ml-py==12.570.86
openai==1.69.0
opencv-python-headless==4.11.0.86
outlines==0.0.46
packaging==24.2
pandas==2.2.3
partial-json-parser==0.2.1.1.post5
Pebble==5.1.1
pillow==11.1.0
pluggy==1.5.0
prometheus-fastapi-instrumentator==7.1.0
prometheus_client==0.21.1
propcache==0.3.1
protobuf==6.30.2
psutil==7.0.0
py-cpuinfo==9.0.0
pyairports==2.1.1
pyarrow==19.0.1
pycountry==24.6.1
pydantic==2.11.1
pydantic_core==2.33.0
pytest==8.3.5
python-dateutil==2.9.0.post0
python-dotenv==1.1.0
pytz==2025.2
PyYAML==6.0.2
pyzmq==26.3.0
ray==2.44.1
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
rpds-py==0.24.0
ruff==0.11.2
safetensors==0.5.3
sentencepiece==0.2.0
six==1.17.0
sniffio==1.3.1
starlette==0.46.1
sympy==1.13.3
tiktoken==0.9.0
tokenizers==0.21.1
torch==2.4.0
torchvision==0.19.0
tqdm==4.67.1
transformers==4.50.3
typing-inspection==0.4.0
typing_extensions==4.13.0
tzdata==2025.2
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
vllm==0.6.3
watchfiles==1.0.4
websockets==15.0.1
word2number==1.1
xxhash==3.5.0
yarl==1.18.3
zipp==3.21.0
" > requirements.txt

❗️This ends with multiple errors; just ignore them, as we are not using those packages.

In [None]:
!pip install -r requirements.txt

Collecting aiohttp==3.11.14 (from -r requirements.txt (line 4))
  Downloading aiohttp-3.11.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting antlr4-python3-runtime==4.7.2 (from -r requirements.txt (line 7))
  Downloading antlr4-python3-runtime-4.7.2.tar.gz (112 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m112.3/112.3 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting datasets==3.5.0 (from -r requirements.txt (line 14))
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting dill==0.3.8 (from -r requirements.txt (line 15))
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting diskcache==5.6.3 (from -r requirements.txt (line 16))
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting fastapi==0.115.12 (from -r requirements.txt (line 19))
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collect

In [None]:
!git clone https://github.com/choyerhuang/CSCI544-Project

fatal: destination path 'CSCI544-Project' already exists and is not an empty directory.


❗️If `ImportError: No module named sal`, restart session and start again from here.

In [None]:
%cd /content/CSCI544-Project
!pip install -e '.[dev]'

/content/CSCI544-Project
Obtaining file:///content/CSCI544-Project
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: search-and-learn
  Building editable for search-and-learn (pyproject.toml) ... [?25l[?25hdone
  Created wheel for search-and-learn: filename=search_and_learn-0.1.0-0.editable-py3-none-any.whl size=8678 sha256=18dc71826c382efbda2764dbf2f77893f40c62ac6d45d3f83e3306819297dbc0
  Stored in directory: /tmp/pip-ephem-wheel-cache-w9g2yd8r/wheels/5f/73/83/9b97bb726bbe8f29bc7dde852991239cf76c52b50fd1f32b62
Successfully built search-and-learn
Installing collected packages: search-and-learn
  Attempting uninstall: search-and-learn
    Found existing installation: search-and-learn 0.1.0
    Uninstalling search-and-learn-0.1.0:
      Success

Log in to Hugging Face to access [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct), as it is a gated model! 🗝️  
If you haven't previously requested access, you'll need to submit a request before proceeding.

⚠️ Use your USC email to register an account. When requesting access, enter "University of Southern California" as your affiliation and select "Research Graduate"; otherwise, your request will be rejected.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## 2. Setup the Large Language Model (LLM) and the Process Reward Model (PRM) 💬

As illustrated in the diagram, the system consists of an LLM that generates intermediate answers based on user input, a [PRM model](https://huggingface.co/papers/2211.14275) that evaluates and scores these answers, and a search strategy that uses the PRM feedback to guide the subsequent steps in the search process until reaching the final answer.

Let’s begin by initializing each model. For the LLM, we’ll use the [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) model, and for the PRM, we’ll use the [RLHFlow/Llama3.1-8B-PRM-Deepseek-Data](https://huggingface.co/RLHFlow/Llama3.1-8B-PRM-Deepseek-Data) model.




![system](https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/system.png)

⬇️ Start again from here after **Restart session**.

In [None]:
import torch
from vllm import LLM
from sal.models.reward_models import RLHFFlow

model_path="meta-llama/Llama-3.2-1B-Instruct"
prm_path="RLHFlow/Llama3.1-8B-PRM-Deepseek-Data"

llm = LLM(
    model=model_path,
    gpu_memory_utilization=0.5,  # Utilize 50% of GPU memory
    enable_prefix_caching=True,  # Optimize repeated prefix computations
    seed=42,                     # Set seed for reproducibility
    dtype='half',
    max_model_len=8192,
)

prm = RLHFFlow(prm_path)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


INFO 04-18 05:57:31 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='meta-llama/Llama-3.2-1B-Instruct', speculative_config=None, tokenizer='meta-llama/Llama-3.2-1B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=42, served_model_name=meta-llama/Llama-3.2-1B-Instruct, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=Fa

  @torch.library.impl_abstract("xformers_flash::flash_fwd")
  @torch.library.impl_abstract("xformers_flash::flash_bwd")


INFO 04-18 05:57:33 model_runner.py:1060] Starting to load model meta-llama/Llama-3.2-1B-Instruct...
INFO 04-18 05:57:33 selector.py:224] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 04-18 05:57:33 selector.py:115] Using XFormers backend.
INFO 04-18 05:57:36 weight_utils.py:243] Using model weights format ['*.safetensors']
INFO 04-18 05:57:36 weight_utils.py:288] No model.safetensors.index.json found in remote.


Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]


INFO 04-18 05:57:50 model_runner.py:1071] Loading model weights took 2.3185 GB
INFO 04-18 05:57:52 gpu_executor.py:122] # GPU blocks: 6055, # CPU blocks: 8192
INFO 04-18 05:57:52 gpu_executor.py:126] Maximum concurrency for 8192 tokens per request: 11.83x
INFO 04-18 05:57:55 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 04-18 05:57:55 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 04-18 05:58:22 model_runner.py:1530] Graph capturing finished in 28 secs.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]



### 2.1 Instantiate the Question, Search Strategy, and Call the Pipeline

Now that we've set up the LLM and PRM, let's proceed by defining the question, selecting a search strategy to retrieve relevant information, and calling the pipeline to process the question through the models.

1. **Instantiate the Question**: In this step, we define the input question that the system will answer, considering the given context.

2. **Search Strategy**: The system currently supports the following search strategies: `best_of_n`, `beam_search`, and `dvts` (see diagram). For this example, we'll use `best_of_n`, but you can easily switch to any of the other strategies based on your needs. We need to define some configuration parameters for the configuration of the search strategy. You can check the full list [here](https://github.com/huggingface/search-and-learn/blob/main/src/sal/config.py).

3. **Call the Pipeline**: With the question and search strategy in place, we’ll call the inference pipeline, processing the inputs through both the LLM and PRM to generate the final answer.

![](https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/search-strategies.png)

The first step is to clearly define the question that the system will answer. This ensures that we have a precise task for the model to tackle.

In [None]:
question_text = 'Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$'
input_batch = {"problem": [question_text]}

Next, we define the configuration, including parameters like the number of candidate answers `(N)`, and choose the search strategy that will be used. The search strategy dictates how we explore the potential answers. In this case, we'll use `best_of_n`.

With the question and configuration in place, we use the selected search strategy to generate multiple candidate answers. These candidates are evaluated based on their relevance and quality and the final answer is returned.


In [None]:
from sal.config import Config
import os
os.chdir('/content/CSCI544-Project')
from sal.search import beam_search, best_of_n, dvts, run_dynamic_beam_search, beam_search_ev, greedy_backtrack_search

config = Config()
config.n=4 # Number of answers to generate during the search
config.prm_batch_size=1
config.search_batch_size=1

search_result = best_of_n(x=input_batch, config=config, llm=llm, prm=prm)

### 2.2 Display the Final Result

Once the pipeline has processed the question through the LLM and PRM, we can display the final result. This result will be the model's output after considering the intermediate answers and scoring them using the PRM.

Here's how to display the final answer:

In [None]:
search_result['pred'][0]

'## Step 1: Recall the conversion formulas\nTo convert from rectangular coordinates $(x, y)$ to polar coordinates $(r, heta)$, we use the following formulas: $r = \\sqrt{x^2 + y^2}$ and $heta = \\tan^{-1}\\left(\\frac{y}{x}\\right)$.\n\n## Step 2: Plug in the values\nGiven the point $(0, 3)$, we can substitute $x = 0$ and $y = 3$ into the formulas. This gives us $r = \\sqrt{0^2 + 3^2} = \\sqrt{9} = 3$ and $heta = \\tan^{-1}\\left(\\frac{3}{0}\\right)$.\n\n## Step 3: Handle the division by zero\nSince $\\tan^{-1}\\left(\\frac{3}{0}\\right)$ is undefined, we must recognize that the point $(0, 3)$ is on the positive y-axis, which means it is at a distance of $3$ units from the origin but points in the positive y-direction, where $\\theta = \\frac{\\pi}{2}$.\n\n## Step 4: Conclude\nTherefore, the polar coordinates of the point $(0, 3)$ are $\\left(3, \\frac{\\pi}{2}\\right)$.\n\nThe final answer is: $\\boxed{\\left(3, \\frac{\\pi}{2}\\right)}$'

The model’s output might include special tokens, such as `<|start_header_id|>` or `<|end_header_id|>`. To make the answer more readable, we can safely remove them before displaying it to the end user.

In [None]:
formatted_output = search_result['pred'][0].replace("<|start_header_id|>assistant<|end_header_id|>\n\n", "").strip()
formatted_output

'## Step 1: Recall the conversion formulas\nTo convert from rectangular coordinates $(x, y)$ to polar coordinates $(r, heta)$, we use the following formulas: $r = \\sqrt{x^2 + y^2}$ and $heta = \\tan^{-1}\\left(\\frac{y}{x}\\right)$.\n\n## Step 2: Plug in the values\nGiven the point $(0, 3)$, we can substitute $x = 0$ and $y = 3$ into the formulas. This gives us $r = \\sqrt{0^2 + 3^2} = \\sqrt{9} = 3$ and $heta = \\tan^{-1}\\left(\\frac{3}{0}\\right)$.\n\n## Step 3: Handle the division by zero\nSince $\\tan^{-1}\\left(\\frac{3}{0}\\right)$ is undefined, we must recognize that the point $(0, 3)$ is on the positive y-axis, which means it is at a distance of $3$ units from the origin but points in the positive y-direction, where $\\theta = \\frac{\\pi}{2}$.\n\n## Step 4: Conclude\nTherefore, the polar coordinates of the point $(0, 3)$ are $\\left(3, \\frac{\\pi}{2}\\right)$.\n\nThe final answer is: $\\boxed{\\left(3, \\frac{\\pi}{2}\\right)}$'

After removing any special tokens, we can display the final answer to the user. Since the answer is based on markdown, it can be rendered properly by displaying it as markdown.

In [None]:
from IPython.display import display, Markdown

display(Markdown(formatted_output))

## Step 1: Recall the conversion formulas
To convert from rectangular coordinates $(x, y)$ to polar coordinates $(r, heta)$, we use the following formulas: $r = \sqrt{x^2 + y^2}$ and $heta = \tan^{-1}\left(\frac{y}{x}\right)$.

## Step 2: Plug in the values
Given the point $(0, 3)$, we can substitute $x = 0$ and $y = 3$ into the formulas. This gives us $r = \sqrt{0^2 + 3^2} = \sqrt{9} = 3$ and $heta = \tan^{-1}\left(\frac{3}{0}\right)$.

## Step 3: Handle the division by zero
Since $\tan^{-1}\left(\frac{3}{0}\right)$ is undefined, we must recognize that the point $(0, 3)$ is on the positive y-axis, which means it is at a distance of $3$ units from the origin but points in the positive y-direction, where $\theta = \frac{\pi}{2}$.

## Step 4: Conclude
Therefore, the polar coordinates of the point $(0, 3)$ are $\left(3, \frac{\pi}{2}\right)$.

The final answer is: $\boxed{\left(3, \frac{\pi}{2}\right)}$

## 3. Assembling It All! 🧑‍🏭️

Now, let's create a method that encapsulates the entire pipeline. This will allow us to easily reuse the process in future applications, making it efficient and modular.

By combining the LLM, PRM, search strategy, and result display, we can simplify the workflow and ensure that it’s reusable for other tasks or questions.

We simplify the workflow, ensuring that it’s reusable for different tasks or questions. Additionally, we’ll track the time spent on each method so that we can **understand the practical implications** of using each strategy and configuration.

Here’s how we can structure the method:

In [None]:
import time

def generate_with_search_and_learn(question, config, llm, prm, method='best_of_n'):
    """
    Generate an answer for a given question using the search-and-learn pipeline.

    Args:
    - question (str): The input question to generate an answer for.
    - config (Config): Configuration object containing parameters for search strategy.
    - llm (LLM): Pretrained large language model used for generating answers.
    - prm (RLHFFlow): Process reward model used for evaluating answers.
    - method (str): Search strategy to use. Options are 'best_of_n', 'beam_search', 'dvts'. Default is 'best_of_n'.

    Returns:
    - str: The formatted output after processing the question.
    """
    batch = {"problem": [question]}

    start_time = time.time()
    if method == 'best_of_n':
      result = best_of_n(x=batch, config=config, llm=llm, prm=prm)
    elif method == 'beam_search':
      result = beam_search(examples=batch, config=config, llm=llm, prm=prm)
    elif method == 'dvts':
      result = dvts(examples=batch, config=config, llm=llm, prm=prm)
    elif method == 'dynamic_beam':
      result = run_dynamic_beam_search(example_batch=batch, config=config, llm=llm, prm=prm)
    elif method == 'beam_search_ev':
      result = beam_search_ev(examples=batch, config=config, llm=llm, prm=prm)
    elif method == 'greedy_backtrack':
      result = greedy_backtrack_search(examples=batch, config=config, llm=llm, prm=prm)
      print("Result keys:", result.keys())

    elapsed_time = time.time() - start_time
    print(f"\nFinished in {elapsed_time:.2f} seconds\n")

    tokenizer = llm.get_tokenizer()
    total_tokens = 0
    for completion in result['completions']:
        for comp in  completion:
            output_tokens = tokenizer.encode(comp)
            total_tokens += len(output_tokens)

    print(f"Total tokens in all completions: {total_tokens}")

    formatted_output = result['pred'][0].replace("<|start_header_id|>assistant<|end_header_id|>\n\n", "").strip()
    return formatted_output

### ⏳  3.1 Comparing Thinking Time for Each Strategy

Let’s compare the **thinking time** of three methods: `best_of_n`, `beam_search`, and `dvts`. Each method is evaluated using the same number of answers during the search process, measuring the time spent thinking in seconds and the number of generated tokens.

In the results below, the `best_of_n` method shows the least thinking time, while the `dvts` method takes the most time. However, `best_of_n` generates more tokens due to its simpler search strategy.

| **Method**      | **Number of Answers During Search** | **Thinking Time (Seconds)** | **Generated Tokens** |
|------------------|-------------------------------------|-----------------------------|-----------------------|
| **best_of_n**    | 8                                   | 3.54                        | 3087                  |
| **beam_search**  | 8                                   | 10.06                       | 2049                  |
| **dvts**         | 8                                   | 8.46                        | 2544                  |

This comparison illustrates the trade-offs between the strategies, balancing time spent thinking and the complexity of the search process.


#### 1. **Best of n**

We’ll begin by using the `best_of_n` strategy. Here’s how to track the thinking time for this method:

In [None]:
question = 'Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$'

config.n=4

formatted_output = generate_with_search_and_learn(question=question, config=config, llm=llm, prm=prm, method='best_of_n')


Finished in 81.89 seconds

Total tokens in all completions: 1901


In [None]:
display(Markdown(formatted_output))

## Step 1: Recall the relationship between rectangular and polar coordinates.
The relationship between rectangular coordinates $(x, y)$ and polar coordinates $(r, \theta)$ is given by:
$x = r \cos \theta$
$y = r \sin \theta$
We need to find $r$ and $\theta$ for the point $(0, 3)$.

## Step 2: Calculate the radius $r$.
Using the first formula, we can find the value of $r$ by setting $x = 0$ and $y = 3$.
Since $x = r \cos \theta$ and $y = r \sin \theta$, we have:
$0 = r \cos \theta$
$3 = r \sin \theta$

## Step 3: Solve for $r$ using the equation $3 = r \sin \theta$.
We know that $\sin^2 \theta + \cos^2 \theta = 1$. Since $\sin \theta = 3/r$, we can substitute into this identity:
$(3/r)^2 + \cos^2 \theta = 1$

## Step 4: Expand the equation $(3/r)^2 + \cos^2 \theta = 1$.
$(3/r)^2 + \cos^2 \theta = 1 \Rightarrow 9/r^2 + \cos^2 \theta = 1$

## Step 5: Multiply both sides by $r^2$ to clear the fraction.
$(9/r^2) + r^2 \cos^2 \theta = r^2$

## Step 6: Since we want $r$ in terms of $x$ and $y$, we need to use the fact that $r^2 = x^2 + y^2$. We also know that $r = \sqrt{x^2 + y^2}$.
We can substitute $x = 0$ and $y = 3$ into $r^2 = x^2 + y^2$ to get:
$r^2 = 0^2 + 3^2 = 9$
Now we can substitute $r^2 = 9$ into the equation $(9/r^2) + r^2 \cos^2 \theta = r^2$:
$(9/r^2) + 9 \cos^2 \theta = 9$

## Step 7: Simplify the equation $(9/r^2) + 9 \cos^2 \theta = 9$.
First, multiply both sides by $r^2$ to clear the fraction:
$9 + 9r^2 \cos^2 \theta = 9r^2$
Now subtract $9$ from both sides:
$9r^2 \cos^2 \theta = 0$

## Step 8: Solve for $\cos^2 \theta$.
Divide both sides by $9r^2$:
$\cos^2 \theta = 0$

## Step 9: Find $\theta$.
Since $\cos^2 \theta = 0$, we know that $\cos \theta = 0$. Therefore, $\theta = \pi/2$.

## Step 10: Substitute $\theta = \pi/2$ into the equation $r = \sqrt{x^2 + y^2}$.
Since $r^2 = x^2 + y^2$, we have:
$r^2 = 0^2 + 3^2 = 9$
Now substitute $r^2 = 9$ into the equation $r = \sqrt{x^2 + y^2}$:
$r = \sqrt{9} = 3$

## Step 11: Substitute $r = 3$ and $\theta = \pi/2$ into the equation $(r, \theta)$.
The polar coordinates are $(r, \theta) = (3, \pi/2)$.

The final answer is: $\boxed{\left( 3, \  \frac{\pi}{2}\right)}$

#### 2. **Beam Search**

Now, let's try using the `beam_search` strategy.

In [None]:
config.n=4
# beam search specific
config.sort_completed=True
config.filter_duplicates=True

formatted_output = generate_with_search_and_learn(question=question, config=config, llm=llm, prm=prm, method='beam_search')

Beam search iterations:  10%|█         | 4/40 [03:52<34:55, 58.20s/it]


Finished in 232.83 seconds

Total tokens in all completions: 1480





In [None]:
display(Markdown(formatted_output))

## Step 1: Recall the conversion formulas between rectangular and polar coordinates
The conversion formulas between rectangular coordinates $(x, y)$ and polar coordinates $(r, \theta)$ are given by $r = \sqrt{x^2 + y^2}$ and $\theta = \arctan\left(\frac{y}{x}\right)$.

## Step 2: Apply the formula to convert the given point
Given the point $(0, 3)$, we can substitute $x = 0$ and $y = 3$ into the formula for $r$ to get $r = \sqrt{0^2 + 3^2} = \sqrt{9} = 3$.

## Step 3: Calculate the angle $\theta$ using the formula
We can substitute $x = 0$ and $y = 3$ into the formula for $\theta$ to get $\theta = \arctan\left(\frac{3}{0}\right)$. However, since $\arctan(0)$ is undefined, we need to handle this situation. In the context of polar coordinates, when $x = 0$, the point lies on the positive $y$-axis, and $\arctan(0)$ does not have a unique value; it's considered $\frac{\pi}{2}$ because it's the angle where the curve crosses the y-axis. Therefore, we have $\theta = \frac{\pi}{2}$.

## Step 4: Write the polar coordinates as an ordered pair
Using the calculated values of $r$ and $\theta$, we can write the polar coordinates as $(3, \frac{\pi}{2})$.

The final answer is: $\boxed{(3, \frac{\pi}{2})}$

#### 2. **Beam Search with ensemble voting**

I have additional implement beam search with ensemble voting stategy as independent function.

In [None]:
config.n=4
# beam search specific
config.approach = 'beam_search_ev'
config.sort_completed=True
config.filter_duplicates=True

formatted_output = generate_with_search_and_learn(question=question, config=config, llm=llm, prm=prm, method='beam_search_ev')

Beam search iterations:  10%|█         | 4/40 [03:47<34:04, 56.80s/it]


Finished in 227.19 seconds

Total tokens in all completions: 1180





In [None]:
display(Markdown(formatted_output))

## Step 1: Recall the formulas for converting rectangular coordinates to polar coordinates
The conversion from rectangular coordinates $(x, y)$ to polar coordinates $(r, \theta)$ can be done using the following formulas: $r = \sqrt{x^2 + y^2}$ for the radial coordinate and $\theta = \tan^{-1}\left(\frac{y}{x}\right)$ for the angular coordinate.

## Step 2: Calculate the radial coordinate $r$
Substitute $x = 0$ and $y = 3$ into the formula for $r$: $r = \sqrt{0^2 + 3^2} = \sqrt{9} = 3.$

## Step 3: Calculate the angular coordinate $\theta$
Substitute $x = 0$ and $y = 3$ into the formula for $\theta$: $\theta = \tan^{-1}\left(\frac{3}{0}\right)$. However, because the point $(0, 3)$ lies on the positive $y$-axis, the angle $\theta$ is $\frac{\pi}{2}$.

## Step 4: Write the polar coordinates
Therefore, the polar coordinates of the point $(0, 3)$ are $\left(3, \frac{\pi}{2}\right)$.

The final answer is: $\boxed{\left(3, \frac{\pi}{2}\right)}$

#### 3. **Diverse Verifier Tree Search (DVTS)**

Let's try the `dvts` strategy.

In [None]:
config.n=4
# dvts specific
config.n_beams = config.n // config.beam_width

formatted_output = generate_with_search_and_learn(question=question, config=config, llm=llm, prm=prm, method='dvts')

Beam search iterations:   5%|▌         | 2/40 [01:51<35:24, 55.91s/it]


Finished in 111.81 seconds

Total tokens in all completions: 988





In [None]:
display(Markdown(formatted_output))

## Step 1:  To convert the point $(0,3)$ from rectangular coordinates to polar coordinates, we need to find the radius $r$ and the angle $heta$.
## Step 2:  The radius $r$ can be found using the formula $r = \sqrt{x^2 + y^2}$. In this case, $x = 0$ and $y = 3$, so $r = \sqrt{0^2 + 3^2} = \sqrt{9} = 3$.
## Step 3:  Next, we need to find the angle $heta$. The angle $heta$ can be found using the formula $heta = \arctan\left(\frac{y}{x}\right)$. However, since $x = 0$, the value of $heta$ will be $\frac{\pi}{2}$.
## Step 4:  So, we have found the polar coordinates $(r, heta)$ to be $(3, \frac{\pi}{2})$.

The final answer is: $\boxed{\left(3, \frac{\pi}{2}\right)}$

#### 4. **Dynamic Beam Search Method**

Let's try our new `Dynamic Beam search` strategy.

In [None]:
# setting basic parameter
config.n = 4

# dynamic beam search parameters
config.approach = "dynamic_beam"
config.sort_completed = True
config.filter_duplicates = True
config.num_iterations = 10
config.dynamic_beam_delta = 0.3   # Beam score margin
config.min_beams = 2
config.max_beams = 4

formatted_output = generate_with_search_and_learn(
    question=question,
    config=config,
    llm=llm,
    prm=prm,
    method="dynamic_beam"
)

Dynamic Beam Search Steps:  70%|███████   | 7/10 [07:29<03:12, 64.28s/it]


Finished in 449.99 seconds

Total tokens in all completions: 1751





In [None]:
display(Markdown(formatted_output))

## Step 1:  To convert the point $(0,3)$ from rectangular coordinates to polar coordinates, we'll use the formulas $r = \sqrt{x^2 + y^2}$ for the radial coordinate and $\theta = \tan^{-1}\left(\frac{y}{x}\right)$ for the angular coordinate.
## Step 2:  Given the point $(0,3)$, we can calculate the radial coordinate $r$ by substituting $x = 0$ and $y = 3$ into the formula $r = \sqrt{x^2 + y^2}$. This gives us $r = \sqrt{0^2 + 3^2} = \sqrt{9} = 3$.
## Step 3:  Next, we'll find the angular coordinate $\theta$ using the formula $\theta = \tan^{-1}\left(\frac{y}{x}\right)$. However, we notice that the point $(0,3)$ is on the positive y-axis, which means $\theta = \frac{\pi}{2}$.
## Step 4:  Therefore, after applying the formulas, we can express the rectangular coordinates $(0,3)$ in polar coordinates as $\left(3, \frac{\pi}{2}\right)$.

The final answer is: $\boxed{\left(3, \frac{\pi}{2}\right)}$

#### 4. **Greedy Backtrack Search Method**

Let's try our new `Greedy Backtrack Search` strategy.

In [None]:
# setting basic parameter
config.n = 4 #  Error: n should be = 1 when greedy sample

# dynamic beam search parameters
config.approach = "greedy_backtrack"
config.sort_completed = True
config.filter_duplicates = True
config.num_iterations = 10
config.max_backtrack_depth = 2         # (NEW) Optional: maximum levels to look back
config.early_stop_when_x_finished = 1

formatted_output = generate_with_search_and_learn(
    question=question,
    config=config,
    llm=llm,
    prm=prm,
    method="greedy_backtrack"
)

New!!!!!!


Greedy Backtracking Search:   0%|          | 0/10 [00:30<?, ?it/s]


ValueError: n must be 1 when using greedy sampling, got 4.

In [None]:
display(Markdown(formatted_output))

### 🙋 3.2 Testing the System with a Simple Question

In this final example, we’ll test the system using a straightforward question to observe how it performs in simpler cases. This allows us to verify that the system works as expected even for basic queries.

Let's try the following question:

In [None]:
question = 'What\'s the capital of Spain?'

config.n=4

formatted_output = generate_with_search_and_learn(question=question, config=config, llm=llm, prm=prm, method='best_of_n')


Finished in 45.39 seconds

Total tokens in all completions: 32


In [None]:
display(Markdown(formatted_output))

The capital of Spain is Madrid.

Even though we set a larger number of candidate answers (`N`), the time spent thinking remains relatively small (1.03 seconds and 544 generated tokens). This demonstrates the system’s ability to efficiently handle easier problems, spending less time on them, while leveraging its enhanced capabilities for more complex questions.

🏆 **We now have a fully operational pipeline** that leverages test-time compute, enabling the system to "think longer" for more complicated queries, while also maintaining fast response times for straightforward questions.

This approach ensures the system can scale its thinking time based on the task's complexity, offering an efficient and responsive solution for both simple and challenging problems.


## 4. Continuing the Journey and Resources 🧑‍🎓️

If you're eager to continue exploring, be sure to check out the original experimental [blog](https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute) and all the references mentioned within it. These resources will deepen your understanding of test-time compute, its benefits, and its applications in LLMs.


Happy learning and experimenting! 🚀