In [1]:
import os
from vllm.steer_vectors.request import SteerVectorRequest
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

os.environ["CUDA_VISIBLE_DEVICES"] = "1"
llm = LLM(
    model="/data/zju-46/shenyl/hf/model/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/",
    enable_steer_vector=True,
    enforce_eager=True,
    tensor_parallel_size=1,
    enable_chunked_prefill=False
)

# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "/data/zju-46/shenyl/hf/model/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/"
)

INFO 11-04 01:56:13 [utils.py:253] non-default args: {'disable_log_stats': True, 'enforce_eager': True, 'enable_steer_vector': True, 'enable_chunked_prefill': False, 'model': '/data/zju-46/shenyl/hf/model/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/'}
INFO 11-04 01:56:13 [model.py:657] Resolved architecture: Qwen2ForCausalLM
INFO 11-04 01:56:13 [model.py:1746] Using max model len 131072
INFO 11-04 01:56:15 [scheduler.py:211] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 11-04 01:56:15 [vllm.py:414] Cudagraph is disabled under eager mode
[1;36m(EngineCore_DP0 pid=2765190)[0;0m INFO 11-04 01:56:15 [core.py:94] Initializing a V1 LLM engine (v0.1.dev10888+g9d4fd0da4.d20251031) with config: model='/data/zju-46/shenyl/hf/model/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/', speculative_config=None, tokenizer='/data/zju-46/shenyl/hf/model/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]


[1;36m(EngineCore_DP0 pid=2765190)[0;0m INFO 11-04 01:56:19 [default_loader.py:314] Loading weights took 1.19 seconds
[1;36m(EngineCore_DP0 pid=2765190)[0;0m INFO 11-04 01:56:19 [steer_vector_model_runner_mixin.py:36] Initialized SteerVector worker manager
[1;36m(EngineCore_DP0 pid=2765190)[0;0m INFO 11-04 01:56:19 [steer_vector_model_runner_mixin.py:50] Wrapping model with steer vector support
[1;36m(EngineCore_DP0 pid=2765190)[0;0m INFO 11-04 01:56:19 [hidden_states_model_runner_mixin.py:90] Wrapped 28 decoder layers for hidden states capture
[1;36m(EngineCore_DP0 pid=2765190)[0;0m INFO 11-04 01:56:20 [gpu_model_runner.py:2971] Model loading took 3.3466 GiB and 1.342046 seconds
[1;36m(EngineCore_DP0 pid=2765190)[0;0m INFO 11-04 01:56:21 [gpu_worker.py:343] Available KV cache memory: 37.90 GiB
[1;36m(EngineCore_DP0 pid=2765190)[0;0m INFO 11-04 01:56:22 [kv_cache_utils.py:1247] GPU KV cache size: 1,419,360 tokens
[1;36m(EngineCore_DP0 pid=2765190)[0;0m INFO 11-04 01:56:

In [2]:
prompt_template = "<|User|>Return your final response within \\boxed{}.\n%s<|Assistant|><think>\n"
prompt = "Find the constant term in the expansion of $$\\left(10x^3-\\frac{1}{2x^2}\\right)^{5}$$"
text = prompt_template % prompt

output = llm.generate(
    text,
    SamplingParams(
        temperature=0,
        max_tokens=4096,
        skip_special_tokens=False,
    ),
)
print(output[0].outputs[0].text)
print("Baseline tokens: ", len(tokenizer.tokenize(output[0].outputs[0].text, add_special_tokens=True)))

Adding requests:   0%|          | 0/1 [00:00<?, ?it/s]

Processed prompts:   0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, 

Okay, so I have this problem here: I need to find the constant term in the expansion of \(\left(10x^3 - \frac{1}{2x^2}\right)^5\). Hmm, let me think about how to approach this.

First, I remember that when you have an expression raised to a power, like \((a + b)^n\), you can use the binomial theorem to expand it. The binomial theorem says that each term in the expansion is of the form \(\binom{n}{k} a^{n - k} b^k\), where \(k\) ranges from 0 to \(n\). So, in this case, \(a\) is \(10x^3\), \(b\) is \(-\frac{1}{2x^2}\), and \(n\) is 5.

Alright, so the general term in the expansion will be \(\binom{5}{k} (10x^3)^{5 - k} \left(-\frac{1}{2x^2}\right)^k\). I need to find the term where the exponent of \(x\) is zero because that's the constant term. So, let me write that down.

First, let's compute the exponent of \(x\) in each term. The exponent from \((10x^3)^{5 - k}\) is \(3(5 - k)\), and the exponent from \(\left(-\frac{1}{2x^2}\right)^k\) is \(-2k\). So, the total exponent of \(x\) in e

In [3]:
steer_vector_request_pos = SteerVectorRequest(
    steer_vector_name="fast",
    steer_vector_int_id=1,
    steer_vector_local_path="MATH500.gguf",
    scale=8.0,
    target_layers=list(range(19,28)),
    prefill_trigger_tokens=[-1],
    generate_trigger_tokens=[-1],
    debug=False,
    algorithm='direct'
)
output = llm.generate(
    text,
    SamplingParams(
        temperature=0,
        max_tokens=4096,
        skip_special_tokens=False,
    ),
    steer_vector_request=steer_vector_request_pos
)
print(output[0].outputs[0].text)
print("Fast tokens: ", len(tokenizer.tokenize(output[0].outputs[0].text, add_special_tokens=True)))

Adding requests:   0%|          | 0/1 [00:00<?, ?it/s]

Processed prompts:   0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, 

Okay, so I have this problem here: Find the constant term in the expansion of \(\left(10x^3 - \frac{1}{2x^2}\right)^5\). Hmm, alright, let me try to figure this out step by step.

First, I remember that when you have an expression raised to a power, like \((a + b)^n\), you can expand it using the binomial theorem. The binomial theorem says that each term in the expansion is of the form \(\binom{n}{k} a^{n - k} b^{k}\), where \(k\) goes from 0 to \(n\). So, in this case, \(a\) is \(10x^3\), \(b\) is \(-\frac{1}{2x^2}\), and \(n\) is 5.

So, the general term in the expansion would be \(\binom{5}{k} (10x^3)^{5 - k} \left(-\frac{1}{2x^2}\right)^k\). I need to find the term where the exponent of \(x\) is zero because that's the constant term. So, my goal is to find the value of \(k\) such that the exponent of \(x\) in that term is zero.

Let me write that down. The exponent of \(x\) in each term is given by the exponent in \(a^{5 - k}\) plus the exponent in \(b^k\). So, for \(a^{5 - k}\), t