### basics

- prompt vs. response
    - prompt: resp.prompt, resp.prompt_token_ids
    - response: `resp.outputs[0].text`, `resp.outputs[0].token_ids`
- `make_prefix` (TinyZero)
    - https://github.com/Jiayi-Pan/TinyZero/blob/main/examples/data_preprocess/countdown.py#L57-L66
    ```
    prompt = tokenizer.apply_chat_template(basic_messages, tokenize=False)
    # '<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.'
    
    prompt = tokenizer.apply_chat_template(basic_messages, tokenize=False, add_generation_prompt=True)
    # '<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.<｜Assistant｜><think>\n'
    
    # custom
    prompt = tokenizer.apply_chat_template(basic_messages, tokenize=False, add_generation_prompt=True)
    # '<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.<｜Assistant｜>'
    
    # custom no think
    prompt = tokenizer.apply_chat_template(basic_messages, tokenize=False, add_generation_prompt=True)
    # '<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.<｜Assistant｜><think>\n</think>'
    ```
- load the parquet dataset
    - https://github.com/Jiayi-Pan/TinyZero/blob/main/verl/utils/dataset/rl_dataset.py#L128
    - default
        - https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L169
        - prompt_with_chat_template = self.tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)

In [49]:
from transformers import AutoTokenizer
import re
import torch

In [2]:
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

In [3]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

In [4]:
basic_messages = [
    {"role": "user", "content": "3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}."}
]

In [5]:
tokenizer.apply_chat_template(basic_messages, tokenize=False)

'<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.'

In [6]:
tokenizer.apply_chat_template(basic_messages, tokenize=False, add_generation_prompt=True)

'<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.<｜Assistant｜><think>\n'

### vllm inference

In [7]:
from vllm import LLM, SamplingParams

2025-04-05 09:22:40,457	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
No module named 'vllm._version'
  from vllm.version import __version__ as VLLM_VERSION


In [8]:
sampling_params = SamplingParams(
    temperature=0.6, 
    max_tokens=32768
)

In [9]:
llm = LLM(model=model_id, max_model_len=32768)

INFO 04-05 09:22:48 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, use_v2_block_manager=True, num_scheduler_steps=1

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.56it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.55it/s]



INFO 04-05 09:22:52 model_runner.py:1071] Loading model weights took 3.3460 GB
INFO 04-05 09:22:54 gpu_executor.py:122] # GPU blocks: 36442, # CPU blocks: 9362
INFO 04-05 09:22:54 gpu_executor.py:126] Maximum concurrency for 32768 tokens per request: 17.79x
INFO 04-05 09:22:58 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 04-05 09:22:58 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 04-05 09:23:07 model_runner.py:1530] Graph capturing finished in 9 secs.


In [10]:
prompt = tokenizer.apply_chat_template(basic_messages, tokenize=False, add_generation_prompt=True)
prompt

'<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.<｜Assistant｜><think>\n'

In [11]:
resp = llm.generate(prompt, sampling_params=sampling_params)[0]

Processed prompts: 100%|██████████| 1/1 [00:05<00:00,  5.96s/it, est. speed input: 5.70 toks/s, output: 193.24 toks/s]


In [12]:
print(resp.prompt)
print(resp.prompt_token_ids)

<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \boxed{}.<｜Assistant｜><think>

[151646, 151646, 151644, 18, 13, 16, 16, 323, 220, 18, 13, 24, 892, 374, 11243, 30, 5209, 2874, 3019, 553, 3019, 11, 323, 2182, 697, 1590, 4226, 2878, 1124, 79075, 46391, 151645, 151648, 198]


In [13]:
assert tokenizer.encode(resp.prompt) == resp.prompt_token_ids

In [14]:
tokenizer.decode(151646), tokenizer.decode(7810)

('<｜begin▁of▁sentence｜>', '}.')

In [15]:
len(resp.outputs[0].token_ids), len(tokenizer.encode(resp.outputs[0].text))

(1152, 1152)

In [16]:
# resp.outputs[0].text

### custom chat template

In [33]:
prompt = tokenizer.apply_chat_template(basic_messages, tokenize=False)
prompt

'<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.'

In [34]:
resp = llm.generate(prompt, sampling_params=sampling_params)[0]

Processed prompts: 100%|██████████| 1/1 [00:01<00:00,  1.21s/it, est. speed input: 25.74 toks/s, output: 166.06 toks/s]


In [35]:
print(resp.outputs[0].text)

**
</think>

To determine which number is bigger between **3.11** and **3.9**, follow these steps:

1. **Compare the whole number part** of both numbers. Both have **3** as the whole number.
2. **Compare the decimal parts**:
   - **0.11** (from 3.11)
   - **0.9** (from 3.9, which can be written as 3.90)
3. **Convert 3.9 to two decimal places**: 3.90
4. **Compare 0.11 and 0.90**:
   - **0.11** is less than **0.90**
5. **Conclusion**: Since 0.11 is less than 0.90, **3.90** is larger than **3.11**.

**Final Answer**: \boxed{3.9}


In [41]:
prompt = tokenizer.apply_chat_template(basic_messages, tokenize=False, add_generation_prompt=True)
prompt

'<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.<｜Assistant｜><think>\n'

In [42]:
resp = llm.generate(prompt, sampling_params=sampling_params)[0]

Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.73s/it, est. speed input: 2.90 toks/s, output: 182.41 toks/s]


In [45]:
# print(resp.outputs[0].text)

In [46]:
prompt = '<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.<｜Assistant｜>'
resp = llm.generate(prompt, sampling_params=sampling_params)[0]

Processed prompts: 100%|██████████| 1/1 [00:09<00:00,  9.43s/it, est. speed input: 3.39 toks/s, output: 175.87 toks/s]


In [48]:
print(resp.outputs[0].text)

<think>
Alright, so I've got this problem here: 3.11 and 3.9, and I need to figure out which one is bigger. Hmm, okay. Let me think about how to approach this. I'm pretty sure that when comparing decimals, you start from the left and compare each digit one by one. So, first, I should look at the whole number part of both numbers. 

Both 3.11 and 3.9 have the same whole number part, which is 3. That means the whole numbers are equal, so I can't say one is bigger just yet. I need to look at the decimal parts. 

The first decimal place after the decimal point is the tenths place. In 3.11, the tenths place is 1, and in 3.9, the tenths place is 9. Since 9 is greater than 1, that means 3.9 is larger than 3.11. Wait, let me make sure I'm doing this right. 

So, if I write both numbers aligned by their decimal points:

3.11
3.9

I can think of 3.9 as 3.90 to make the comparison easier. Now, comparing 3.11 and 3.90. The first digit after the decimal is 1 vs. 9. Since 9 is bigger, 3.90 is bigger

#### no think

In [37]:
prompt = '<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.<｜Assistant｜><think>\n</think>'
prompt

'<｜begin▁of▁sentence｜><｜User｜>3.11 and 3.9 which is bigger? Please reason step by step, and put your final answer within \\boxed{}.<｜Assistant｜><think>\n</think>'

In [38]:
resp = llm.generate(prompt, sampling_params=sampling_params)[0]

Processed prompts: 100%|██████████| 1/1 [00:01<00:00,  1.30s/it, est. speed input: 26.98 toks/s, output: 166.53 toks/s]


In [40]:
print(resp.outputs[0].text)



To determine which number is larger between **3.11** and **3.9**, follow these steps:

1. **Compare the whole number parts**: Both numbers have the same whole number part, which is **3**.

2. **Compare the decimal parts**:
   - **0.11** (from 3.11)
   - **0.90** (from 3.9, which can be written as **0.90** to have the same number of decimal places)

3. **Compare the tenths place**:
   - **1** (from 3.11)
   - **9** (from 3.9)

Since **9** is greater than **1**, the tenths place of **3.9** is larger than that of **3.11**.

4. **Conclusion**: Because the tenths place of **3.9** is larger, **3.9** is the larger number.

**Final Answer**: \boxed{3.9}


### TinyZero