# MATH-500 Dataset Utilities Testing

This notebook tests the MATH-500 dataset utilities from `pipeline/utils/data_utils.py`.

Based on: https://github.com/rasbt/reasoning-from-scratch/blob/main/ch03/01_main-chapter-code/ch03_main.ipynb

In [4]:
import sys
sys.path.append('..')

from pipeline.utils.data_utilsMATH500 import (
    load_math500_data,
    render_prompt,
    get_last_boxed,
    extract_final_candidate,
    split_into_parts,
    normalize_answer,
    format_problem,
    batch_problems,
    prepare_prompts,
    parse_answer
)

## 1. Load MATH-500 Dataset

In [5]:
# URL for MATH-500 dataset
MATH500_URL = "https://raw.githubusercontent.com/rasbt/reasoning-from-scratch/main/ch03/01_main-chapter-code/math500_test.json"

# Load the dataset
math_data = load_math500_data(
    local_path='../data/math500_test.json',
    url=MATH500_URL
)

print(f"Loaded {len(math_data)} problems")
print(f"\nFirst problem keys: {list(math_data[0].keys())}")

Loaded 500 problems

First problem keys: ['problem', 'solution', 'answer', 'subject', 'level', 'unique_id']


## 2. Inspect Sample Problems

In [6]:
# Display first problem
sample_problem = math_data[0]
print("Problem:")
print(sample_problem.get('problem', ''))
print("\nAnswer:")
print(sample_problem.get('answer', ''))
print("\nSolution (first 200 chars):")
print(sample_problem.get('solution', '')[:200] + "...")

Problem:
Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$

Answer:
\left( 3, \frac{\pi}{2} \right)

Solution (first 200 chars):
We have that $r = \sqrt{0^2 + 3^2} = 3.$  Also, if we draw the line connecting the origin and $(0,3),$ this line makes an angle of $\frac{\pi}{2}$ with the positive $x$-axis.

[asy]
unitsize(0.8 cm);
...


## 3. Test Prompt Rendering

In [7]:
# Render a prompt for the first problem
problem_text = math_data[0]['problem']
prompt = render_prompt(problem_text)

print("Rendered Prompt:")
print("=" * 60)
print(prompt)
print("=" * 60)

Rendered Prompt:
You are a helpful math assistant.
Answer the question and write the final result on a new line as:
\boxed{ANSWER}

Question:
Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$

Answer:


## 4. Test Answer Extraction

In [8]:
# Test get_last_boxed function
test_texts = [
    "The answer is \\boxed{42}",
    "First we get \\boxed{10}, then \\boxed{20}, and finally \\boxed{30}",
    "No boxed answer here",
    "The final result is \\boxed{x^2 + 2x + 1}"
]

print("Testing get_last_boxed:")
for text in test_texts:
    result = get_last_boxed(text)
    print(f"Text: {text[:50]}...")
    print(f"Extracted: {result}")
    print()

Testing get_last_boxed:
Text: The answer is \boxed{42}...
Extracted: 42

Text: First we get \boxed{10}, then \boxed{20}, and fina...
Extracted: 30

Text: No boxed answer here...
Extracted: None

Text: The final result is \boxed{x^2 + 2x + 1}...
Extracted: x^2 + 2x + 1



In [9]:
# Test extract_final_candidate function
test_cases = [
    "The answer is \\boxed{42}",
    "After calculation, we get 3.14159",
    "The result is -7.5",
    "No numbers here"
]

print("Testing extract_final_candidate:")
for text in test_cases:
    result = extract_final_candidate(text, fallback='number_then_full')
    print(f"Text: {text}")
    print(f"Extracted: '{result}'")
    print()

Testing extract_final_candidate:
Text: The answer is \boxed{42}
Extracted: '42'

Text: After calculation, we get 3.14159
Extracted: '3.14159'

Text: The result is -7.5
Extracted: '-7.5'

Text: No numbers here
Extracted: 'No numbers here'



## 5. Test Answer Normalization

In [10]:
# Test normalize_answer function
test_answers = [
    "$42$",
    "  \\text{hello}  ",
    "1\\,000\\,000",
    "$ x^2 + 2x + 1 $"
]

print("Testing normalize_answer:")
for answer in test_answers:
    normalized = normalize_answer(answer)
    print(f"Original: '{answer}'")
    print(f"Normalized: '{normalized}'")
    print()

Testing normalize_answer:
Original: '$42$'
Normalized: '42'

Original: '  \text{hello}  '
Normalized: 'hello'

Original: '1\,000\,000'
Normalized: '1000000'

Original: '$ x^2 + 2x + 1 $'
Normalized: 'x^2 + 2x + 1'



## 6. Test Split Into Parts

In [11]:
# Test split_into_parts function
test_texts = [
    "(1, 2, 3)",
    "[a, b, c]",
    "single value",
    "(x, y)"
]

print("Testing split_into_parts:")
for text in test_texts:
    parts = split_into_parts(text)
    print(f"Text: {text}")
    print(f"Parts: {parts}")
    print()

Testing split_into_parts:
Text: (1, 2, 3)
Parts: ['1', '2', '3']

Text: [a, b, c]
Parts: ['a', 'b', 'c']

Text: single value
Parts: ['single value']

Text: (x, y)
Parts: ['x', 'y']



## 7. Format Problems for Model Input

In [12]:
# Format a problem
formatted = format_problem(math_data[0])

print("Formatted Problem:")
print(f"Keys: {list(formatted.keys())}")
print(f"\nPrompt (first 200 chars):\n{formatted['prompt'][:200]}...")
print(f"\nOriginal problem:\n{formatted['problem'][:100]}...")

Formatted Problem:
Keys: ['prompt', 'problem', 'solution', 'answer']

Prompt (first 200 chars):
You are a helpful math assistant.
Answer the question and write the final result on a new line as:
\boxed{ANSWER}

Question:
Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. ...

Original problem:
Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the...


## 8. Test Batch Processing

In [13]:
# Create batches
batch_size = 10
batches = batch_problems(math_data, batch_size)

print(f"Total problems: {len(math_data)}")
print(f"Batch size: {batch_size}")
print(f"Number of batches: {len(batches)}")
print(f"\nBatch sizes: {[len(b) for b in batches]}")

Total problems: 500
Batch size: 10
Number of batches: 50

Batch sizes: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]


## 9. Test prepare_prompts with MATH-500

In [14]:
# Prepare prompts for first 5 problems
sample_problems = math_data[:5]
prompts = prepare_prompts(sample_problems, dataset_type='math500', include_cot_prompt=False)

print(f"Generated {len(prompts)} prompts")
print(f"\nFirst prompt:\n{prompts[0]}")

Generated 5 prompts

First prompt:
You are a helpful math assistant.
Answer the question and write the final result on a new line as:
\boxed{ANSWER}

Question:
Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$

Answer:


## 10. Test parse_answer with MATH-500 Format

In [15]:
# Test parsing answers from model outputs
test_outputs = [
    "Let's solve this step by step.\n\nStep 1: ...\n\nTherefore, the answer is \\boxed{42}",
    "After simplification, we get \\boxed{x^2 + 1}",
    "The final answer is \\boxed{-3.5}"
]

print("Testing parse_answer with MATH-500 format:")
for output in test_outputs:
    answer = parse_answer(output, dataset_type='math500')
    print(f"Output: {output[:60]}...")
    print(f"Parsed answer: {answer}")
    print()

Testing parse_answer with MATH-500 format:
Output: Let's solve this step by step.

Step 1: ...

Therefore, the ...
Parsed answer: 42

Output: After simplification, we get \boxed{x^2 + 1}...
Parsed answer: x^2 + 1

Output: The final answer is \boxed{-3.5}...
Parsed answer: -3.5



## 11. End-to-End Workflow Example

In [16]:
# Simulate a complete workflow
print("=" * 70)
print("COMPLETE WORKFLOW EXAMPLE")
print("=" * 70)

# 1. Select a problem
problem = math_data[0]
print(f"\n1. Original Problem:\n{problem['problem'][:150]}...")

# 2. Format it
formatted = format_problem(problem)
print(f"\n2. Formatted Prompt (first 200 chars):\n{formatted['prompt'][:200]}...")

# 3. Simulate model output (using the actual solution)
simulated_output = f"Let's solve this problem.\n\nSolution:\n{problem.get('solution', '')[:200]}...\n\nThe answer is \\boxed{{{problem.get('answer', '')}}}"
print(f"\n3. Simulated Model Output (first 200 chars):\n{simulated_output[:200]}...")

# 4. Extract answer
extracted_answer = extract_final_candidate(simulated_output)
print(f"\n4. Extracted Answer: {extracted_answer}")

# 5. Normalize
normalized = normalize_answer(extracted_answer)
print(f"\n5. Normalized Answer: {normalized}")

# 6. Compare with ground truth
ground_truth = normalize_answer(problem.get('answer', ''))
print(f"\n6. Ground Truth: {ground_truth}")
print(f"\n7. Match: {normalized == ground_truth}")

COMPLETE WORKFLOW EXAMPLE

1. Original Problem:
Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \thet...

2. Formatted Prompt (first 200 chars):
You are a helpful math assistant.
Answer the question and write the final result on a new line as:
\boxed{ANSWER}

Question:
Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. ...

3. Simulated Model Output (first 200 chars):
Let's solve this problem.

Solution:
We have that $r = \sqrt{0^2 + 3^2} = 3.$  Also, if we draw the line connecting the origin and $(0,3),$ this line makes an angle of $\frac{\pi}{2}$ with the positiv...

4. Extracted Answer: \left( 3, \frac{\pi

5. Normalized Answer: \left( 3, \frac{\pi

6. Ground Truth: \left( 3, \frac{\pi{2 \right)

7. Match: False


## 12. Statistics on Dataset

In [17]:
# Compute some statistics
import numpy as np

problem_lengths = [len(p['problem']) for p in math_data]
solution_lengths = [len(p.get('solution', '')) for p in math_data]

print("Dataset Statistics:")
print("=" * 50)
print(f"Total problems: {len(math_data)}")
print(f"\nProblem text length:")
print(f"  Mean: {np.mean(problem_lengths):.1f} chars")
print(f"  Median: {np.median(problem_lengths):.1f} chars")
print(f"  Min: {np.min(problem_lengths)} chars")
print(f"  Max: {np.max(problem_lengths)} chars")
print(f"\nSolution text length:")
print(f"  Mean: {np.mean(solution_lengths):.1f} chars")
print(f"  Median: {np.median(solution_lengths):.1f} chars")
print(f"  Min: {np.min(solution_lengths)} chars")
print(f"  Max: {np.max(solution_lengths)} chars")

Dataset Statistics:
Total problems: 500

Problem text length:
  Mean: 195.9 chars
  Median: 148.0 chars
  Min: 20 chars
  Max: 1733 chars

Solution text length:
  Mean: 531.3 chars
  Median: 421.5 chars
  Min: 45 chars
  Max: 3356 chars


## 13. Test Multiple Answer Formats

In [18]:
# Sample some answers to see variety
print("Sample Answers from Dataset:")
print("=" * 50)
for i in range(min(10, len(math_data))):
    answer = math_data[i].get('answer', 'N/A')
    print(f"Problem {i}: {answer}")

Sample Answers from Dataset:
Problem 0: \left( 3, \frac{\pi}{2} \right)
Problem 1: p - q
Problem 2: \frac{14}{3}
Problem 3: 9
Problem 4: \text{Evelyn}
Problem 5: 42
Problem 6: 27
Problem 7: 90^\circ
Problem 8: 3\sqrt{13}
Problem 9: 4
