In [1]:
import pandas as pd 
import os 

from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset

import seaborn as sns

from datasets import load_dataset
import numpy as np
import json

In [2]:
pwd

'/home/al2644/research/codebase/reasoning/perturb-r/notebook'

In [4]:
cd /home/al2644/research/codebase/reasoning/perturb-r

/home/al2644/research/codebase/reasoning/perturb-r


In [12]:
from utils import process_thinking as tools

# Reasoning Results

In [5]:
root = "./results/deepmath_7to9/benchmark"
for fname in os.listdir(root):
    if 'correct' not in fname:
        print(fname)
        df = pd.read_pickle(os.path.join(root, fname))
        print(f"Accuracy: {df['correct'].mean()}")

R1-Distill-Qwen-1.5B.pickle
Accuracy: 0.4706666666666667
R1-Distill-Qwen-7B.pickle
Accuracy: 0.6553333333333333
Qwen3-1.7B.pickle
Accuracy: 0.56
Qwen3-4B.pickle
Accuracy: 0.7246666666666667
Qwen3-1.7B_nothinking.pickle
Accuracy: 0.24133333333333334
Qwen3-4B_nothinking.pickle
Accuracy: 0.3293333333333333
Qwen3-8B.pickle
Accuracy: 0.7453333333333333
Qwen3-8B_nothinking.pickle
Accuracy: 0.3426666666666667
Qwen3-30B-A3B.pickle
Accuracy: 0.7946666666666666
Qwen3-30B-A3B_nothinking.pickle
Accuracy: 0.36933333333333335


In [14]:
df = pd.read_pickle(os.path.join(root, "R1-Distill-Qwen-1.5B.pickle"))
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

In [59]:
idx = 15
example = df[df["correct"] == 1.0].iloc[idx]
problem, response, solution = example["problem"], example["response"], example["solution"]
reasoning, answer = response.split("</think>")

In [17]:
input_str = tokenizer.apply_chat_template([{"role": "user", "content": problem}], tokenize=False, add_generation_prompt=True)

In [70]:
chunks = tools.minmax_chunk(reasoning, 30)
original_chunks = chunks[:5]
corrupted_chunks = chunks[5:10]
reasoning_prompt = "###ORIGINAL REASONING START\n"  + "\n\n".join(original_chunks) + "\n###ORIGINAL REASONING END\n"
reasoning_prompt += "###REASONING TO CHANGE START\n" + "\n\n".join(corrupted_chunks) + "\n###REASONING TO CHANGE END"

In [None]:
stream = ""
for idx, chunk in enumerate(chunks):
    stream += ""

instruction_prompt = """<instruction>
Below is a partial reasoning trace of a model approaching to solve a problem. The reasoning trace is chunked into two parts: original reasoning, reasoning to change. Your task is to mimic the same language style, length, structure, and thinking strategies but to distract the model reasoning to a different problem, distracting the model astray from its original chain of thought.
    
You will be rewarded if the distraction is successful and subtle. If the reasoning chain is too out of scope, it will not be sucessful. Also, if the model can easily realize it has been distracted, it is also not successful.
    
FYI, the reasoning trace is for this problem: {problem}
</instruction>\n""".format(problem = problem)

In [63]:
solution

'\\sqrt{3}'

In [64]:
prompt = instruction_prompt + reasoning_prompt
print(prompt)

<instruction>
Below is a partial reasoning trace of a model approaching to solve a problem. The reasoning trace is chunked into two parts: original reasoning, reasoning to change. Your task is to mimic the same language style, length, structure, and thinking strategies but to distract the model reasoning to a different problem, distracting the model astray from its original chain of thought.
    
You will be rewarded if the distraction is successful and subtle. If the reasoning chain is too out of scope, it will not be sucessful. Also, if the model can easily realize it has been distracted, it is also not successful.
    
FYI, the reasoning trace is for this problem: Determine the maximum value of the expression \( \max_{a, b, c \in \mathbb{R}^{+}} \min \left\{\frac{1}{a}, \frac{1}{b^{2}}, \frac{1}{c^{3}}, a+b^{2}+c^{3}\right\} \).
</instruction>
###ORIGINAL REASONING START
Alright, so I have this problem here: I need to determine the maximum value of the expression \( \max_{a, b, c \i

In [65]:
print(response)

Alright, so I have this problem here: I need to determine the maximum value of the expression \( \max_{a, b, c \in \mathbb{R}^{+}} \min \left\{\frac{1}{a}, \frac{1}{b^{2}}, \frac{1}{c^{3}}, a+b^{2}+c^{3}\right\} \). Hmm, that sounds a bit complicated, but let's try to unpack it step by step.

First, let me make sure I understand what's being asked. I need to find the maximum value of the minimum among three expressions: \( \frac{1}{a} \), \( \frac{1}{b^2} \), \( \frac{1}{c^3} \), and \( a + b^2 + c^3 \). So, essentially, I'm looking for the highest possible value that the smallest of these four expressions can have. 

Let me denote the four expressions as follows to make it clearer:
1. \( x = \frac{1}{a} \)
2. \( y = \frac{1}{b^2} \)
3. \( z = \frac{1}{c^3} \)
4. \( w = a + b^2 + c^3 \)

So, the problem is to maximize \( \min\{x, y, z, w\} \). 

Since all variables \( a, b, c \) are positive real numbers, all the expressions \( x, y, z \) are positive as well. 

I think the key here is

In [66]:
distractor = """Alright, so let me work through this step by step. I need to determine the maximum value of $\max_{a, b, c \in \mathbb{R}^{+}} \min \left\{\frac{1}{a}, \frac{1}{b^{2}}, \frac{1}{c^{3}}, a+b^{2}+c^{3}\right\}$.

To start, I’ll try to simplify and structure the problem a bit more. Let’s look at what’s being minimized. We have four expressions, each involving either a reciprocal or a direct sum. The variables $a, b, c$ are all strictly positive real numbers.

It’s often helpful in problems like this to look for some symmetry, or to see if there’s a nice relationship that can simplify things. Here, notice that the exponents on the denominators—1 for $a$, 2 for $b$, and 3 for $c$—suggest a possible connection to the Cauchy-Schwarz or AM-GM inequalities. Maybe that will come in handy.

Let me try reparametrizing. Suppose I fix $a = x$, $b = y$, and $c = z$, all positive. Now, the four quantities are $\frac{1}{x}$, $\frac{1}{y^2}$, $\frac{1}{z^3}$, and $x + y^2 + z^3$. To maximize the minimum of these four, it feels like I want all four to be as “close” as possible.

Maybe it helps to set the first three equal and see what happens: set $\frac{1}{x} = \frac{1}{y^2} = \frac{1}{z^3} = t$. Solving, we get $x = \frac{1}{t}$, $y = \frac{1}{\sqrt{t}}$, $z = \frac{1}{t^{1/3}}$.

Now, what about the sum $x + y^2 + z^3$? Plug those in:

$$
x + y^2 + z^3 = \frac{1}{t} + \frac{1}{t} + \frac{1}{t} = \frac{3}{t}
$$

So, if $t$ is positive, then $x + y^2 + z^3 = \frac{3}{t}$. But for our original minimization, the minimum is $\min\left\{t, \frac{3}{t}\right\}$. To maximize this minimum, I can set $t = \frac{3}{t} \implies t^2 = 3 \implies t = \sqrt{3}$.

So, is $t = \sqrt{3}$ the answer? Let’s check the values:

* $x = \frac{1}{\sqrt{3}}$
* $y = \frac{1}{\sqrt{\sqrt{3}}}$
* $z = \frac{1}{3^{1/6}}$
* $x + y^2 + z^3 = \sqrt{3} + \sqrt{3} + \sqrt{3} = 3\sqrt{3}$

Wait—hold on. Actually, this structure reminds me of problems involving maximizing areas or volumes with fixed perimeters or constraints. Maybe I should take a different approach. Consider instead, if there’s a geometric interpretation. The reciprocal terms seem similar to the sides or radii of a geometric object, maybe a triangle or a circle, especially since all the variables are positive and involve exponents. Alternatively, perhaps this relates to maximizing a function subject to constraints, like using Lagrange multipliers.

Let’s try to express $a + b^2 + c^3$ in a geometric way. Suppose I interpret $a, b, c$ as dimensions of a box, where $a$ is the length, $b^2$ is the width, and $c^3$ is the height. The product $a \cdot b^2 \cdot c^3$ could be interpreted as the volume, and its reciprocal appears in the minimization. So maybe maximizing the minimum corresponds to maximizing the smallest “dimension” or “measurement” related to the box, possibly involving surface area or something similar.

Alternatively, could this be modeled as an optimization over positive reals with certain constraints, perhaps even relating to inequalities like Hölder’s or Young’s? That could help find the critical point where the minimum is largest.

Let me try expressing the sum and reciprocals using AM-GM or another inequality and see if a bound emerges that might point toward the maximal minimum.
"""

In [67]:
distracted = input_str+ "\n\n".join(original_chunks) +"\n\n" +distractor

In [68]:
print(distracted)

<｜begin▁of▁sentence｜><｜User｜>Evaluate the integral \( \int_{C}\frac{1}{1+\sqrt{z+2}} \) where \( C \) is the positively oriented unit circle. Determine if Cauchy's Theorem can be applied to conclude the value of the integral.<｜Assistant｜><think>
Alright, so I have this problem here: I need to determine the maximum value of the expression \( \max_{a, b, c \in \mathbb{R}^{+}} \min \left\{\frac{1}{a}, \frac{1}{b^{2}}, \frac{1}{c^{3}}, a+b^{2}+c^{3}\right\} \). Hmm, that sounds a bit complicated, but let's try to unpack it step by step.

First, let me make sure I understand what's being asked. I need to find the maximum value of the minimum among three expressions: \( \frac{1}{a} \), \( \frac{1}{b^2} \), \( \frac{1}{c^3} \), and \( a + b^2 + c^3 \). So, essentially, I'm looking for the highest possible value that the smallest of these four expressions can have. 

Let me denote the four expressions as follows to make it clearer:
1. \( x = \frac{1}{a} \)
2. \( y = \frac{1}{b^2} \)
3. \( z = 

# Distraction

In [177]:
root="/home/al2644/research/codebase/reasoning/perturb-r/results/math-500/annotated_reasoning"
df= pd.read_pickle(os.path.join(root, "Qwen3-1.7B.pickle"))
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

In [182]:
df["1st_answer_chunk_index"][10]

7

In [189]:
example = df.iloc[10]
problem, chunks, index, response, answer = example["problem"], example["chunks"], example["1st_answer_chunk_index"], example["response"], example["answer"]

In [212]:
df["solution"].loc[426]

'In order for $\\log x^2$ to be defined, we must have $x^2 > 0$. This true for all $x$, except for $x = 0$. It follows that the domain of this function is $x < 0$ or $x > 0$. Therefore, our answer is $0 + 0 = \\boxed{0}$.'

In [207]:
print(df.loc[426]["prompt"])

<instruction>
You are given a sequence of reasoning steps, divided into chunks, representing how a model solves a problem. Your task is to identify the chunk number where the answer is first derived. From the starting chunk to the last chunk, find what is the first chunk where model finds the answer. Even if later chunks verify or try alternative methods, only mark the first chunk where the correct final answer appears. The answer should not be in the \boxed{}, as it is usually the last appearance.

You can start checking chunk by chunk from the start to the end. But at the end, You MUST return the chunk number using the format:
<answer_chunk>CHUNK_NUMBER</answer_chunk>
This format is required for downstream processing.
</instruction>

<example_input>The answer that you should look for the reasoning below is 52_8
###START OF CHUNK 0
<think>

###END OF CHUNK 0
###START OF CHUNK 1
Okay, so I need to find the product of 6_8 and 7_8, and then express the answer in base 8. Hmm, let me think

prompt = """<instruction>\nYou are given a sequence of reasoning steps, divided into chunks, representing how a model solves a problem. Your task is to precisely locate the chunk number where the answer is first reached by the model. From the starting chunk, pinpoint what is the first chunk where model has reached the solution. You can ignore later chunks that verify, check, or try alternative methods, only mark the first chunk where the correct answer is mentioned by hinted. The answer usually should not appear in the first chunk that restates the problem. The answer should be fully derived but not necessarily verified yet.

You can start checking chunk by chunk from the start to the end. You should verify that it is indeed the first mention of the answer. At the end of your response, You MUST report the chunk number with the tags:
<answer_chunk>CHUNK_NUMBER</answer_chunk>
</instruction>

problem: {problem}

solution: {solution} (where answer is {answer})

reasoning chunk:
""".format(problem=problem, solution = solution, answer = answer)

In [215]:
print(prompt_str)

<instruction>
You are given a sequence of reasoning steps, divided into chunks, representing how a model solves a problem. Your task is to identify the chunk number where the answer is first reached by the model. From the starting chunk, find what is the first chunk where model has reached the solution. You can ignore later chunks that verify, check, or try alternative methods, only mark the first chunk where the correct answer is mentioned by hinted. The answer usually should not appear in the first chunk that restates the problem. The answer should be fully derived but not necessarily verified yet.

You can start checking chunk by chunk from the start to the end. You should verify that it is indeed the first mention of the answer. At the end of your response, You MUST report the chunk number with the tags:
<answer_chunk>CHUNK_NUMBER</answer_chunk>
</instruction>

problem: You have seven bags of gold coins. Each bag has the same number of gold coins. One day, you find a bag of 53 coin