In [1]:
import datasets

ds_easy = datasets.load_dataset('Asap7772/d1shs0ap-easy-hintgen-qwen3-4b-lr1e6_respgen', split='train')
ds_medium = datasets.load_dataset('Asap7772/d1shs0ap-medium_2500-hintgen-qwen3-4b-lr1e6_respgen', split='train')

ds_cat = datasets.concatenate_datasets([ds_easy, ds_medium])
ds_cat

Dataset({
    features: ['problem', 'answer', 'solution', 'reward', 'length', 'correct_length', 'incorrect_length', 'all_hints', 'no_hint_completions', 'hint_completions'],
    num_rows: 14213
})

In [None]:
print(ds_cat['problem'][0])

Given a rectangular pan of brownies measuring 15 inches by 24 inches, cut into triangular pieces with a base of 3 inches and a height of 4 inches, determine the number of triangular pieces that can be cut from the pan.


In [3]:
print(ds_cat['all_hints'][0][0])

<notes>
  <note>
    <description>To count how many congruent shapes of area A fit into a larger region of area B, first compute A by appropriate formulas (e.g. for triangles, A = ½·base·height). Then use the integer part of B/A to get the maximum count. This leverages area as a measure of space occupancy.</description>
    <example>Suppose you have a rectangle of area 120 and small triangles each of area 5. The largest integer n satisfying 5·n ≤ 120 is n = ⌊120/5⌋ = 24. Hence you can fit at most 24 triangles.</example>
  </note>
  <note>
    <description>When a shape’s dimensions must align with the container’s dimensions, partition the container into the same unit measures to avoid wasted space. This ensures every piece is placed along grid lines (in the case of integer partitioning) or aligned with the shape’s bases and heights.</description>
    <example>For a rectangle of width 10 and height 6, you can partition it into 2×3 unit squares. If you need to place small right triangles 

In [4]:
combined_prompt = """
You are an expert problem-solving assistant tasked with analyzing and solving a given math problem. Your response must include two main components:

---

# 1. HINT GENERATION (REASONING SUPPORT)

Given the following math problem, generate a cheatsheet of insightful hints that help guide a student toward solving the problem. Each hint should be wrapped in a <note> block with the following structure:

<note>
<description>[Brief explanation of a key idea or technique relevant to the problem]</description>
<example>[Concrete illustrative example that demonstrates the idea in action]</example>
</note>
Combine all hint blocks inside a <notes> element. Your goal is to help the student reason through the problem step-by-step by surfacing useful strategies, intermediate goals, or simplifications.

---

# 2. GENERATOR (PROBLEM SOLVER)

Instruction: You are an expert problem-solving assistant tasked with analyzing and solving various questions using a combination of your expertise and provided reference materials. Each task will include:
1. A specific question or problem to solve
2. A cheatsheet containing relevant strategies, patterns, and examples from similar problems

---

## a. ANALYSIS & STRATEGY

- Carefully analyze both the question and cheatsheet before starting
- Search for and identify any applicable patterns, strategies, or examples within the cheatsheet
- Create a structured approach to solving the problem at hand
- Review and document any limitations in the provided reference materials

## b. SOLUTION DEVELOPMENT

- Present your solution using clear, logical steps that others can follow and review
- Explain your reasoning and methodology before presenting final conclusions
- Provide detailed explanations for each step of the process
- Check and verify all assumptions and intermediate calculations

## c. FINAL ANSWER FORMAT

ALWAYS present your final answer in the following format:

\\boxed{<answer>}

Example:
Q: What is the meaning of life?
A: [...] My final answer is \\boxed{42}.

Question:
""".strip()

print(combined_prompt + "Find the sum of all integer bases $b>9$ for which $17_{b}$ is a divisor of $97_{b}$.")

You are an expert problem-solving assistant tasked with analyzing and solving a given math problem. Your response must include two main components:

---

# 1. HINT GENERATION (REASONING SUPPORT)

Given the following math problem, generate a cheatsheet of insightful hints that help guide a student toward solving the problem. Each hint should be wrapped in a <note> block with the following structure:

<note>
<description>[Brief explanation of a key idea or technique relevant to the problem]</description>
<example>[Concrete illustrative example that demonstrates the idea in action]</example>
</note>
Combine all hint blocks inside a <notes> element. Your goal is to help the student reason through the problem step-by-step by surfacing useful strategies, intermediate goals, or simplifications.

---

# 2. GENERATOR (PROBLEM SOLVER)

Instruction: You are an expert problem-solving assistant tasked with analyzing and solving various questions using a combination of your expertise and provided re

In [5]:
sample_output_format = """
# HINT GENERATION (REASONING SUPPORT)

{hints}

---

# GENERATOR (PROBLEM SOLVER)

{solution}
"""

print(sample_output_format.format(hints=ds_cat['all_hints'][0][0], solution=ds_cat['hint_completions'][0][0][0]))


# HINT GENERATION (REASONING SUPPORT)

<notes>
  <note>
    <description>To count how many congruent shapes of area A fit into a larger region of area B, first compute A by appropriate formulas (e.g. for triangles, A = ½·base·height). Then use the integer part of B/A to get the maximum count. This leverages area as a measure of space occupancy.</description>
    <example>Suppose you have a rectangle of area 120 and small triangles each of area 5. The largest integer n satisfying 5·n ≤ 120 is n = ⌊120/5⌋ = 24. Hence you can fit at most 24 triangles.</example>
  </note>
  <note>
    <description>When a shape’s dimensions must align with the container’s dimensions, partition the container into the same unit measures to avoid wasted space. This ensures every piece is placed along grid lines (in the case of integer partitioning) or aligned with the shape’s bases and heights.</description>
    <example>For a rectangle of width 10 and height 6, you can partition it into 2×3 unit squares. If 

In [6]:
import os
def map_fn(example):
    return_dict = {
        'query': [],
        'completion': [],
        'problem': []
    }
    for i, question in enumerate(example['problem']):
        for hint, completion in zip(example['all_hints'][i], example['hint_completions'][i][0]):
            query = combined_prompt + question
            output = sample_output_format.format(hints=hint, solution=completion)
            return_dict['query'].append(query)
            return_dict['completion'].append(output)
            return_dict['problem'].append(question)
    return return_dict


ds_cat = ds_cat.map(map_fn, batched=True, num_proc=os.cpu_count(), remove_columns=ds_cat.column_names)
ds_cat


Map (num_proc=24):   0%|          | 0/14213 [00:00<?, ? examples/s]

Dataset({
    features: ['problem', 'query', 'completion'],
    num_rows: 113704
})

In [7]:
ds_cat[0]

{'problem': 'Given a rectangular pan of brownies measuring 15 inches by 24 inches, cut into triangular pieces with a base of 3 inches and a height of 4 inches, determine the number of triangular pieces that can be cut from the pan.',
 'query': 'You are an expert problem-solving assistant tasked with analyzing and solving a given math problem. Your response must include two main components:\n\n---\n\n# 1. HINT GENERATION (REASONING SUPPORT)\n\nGiven the following math problem, generate a cheatsheet of insightful hints that help guide a student toward solving the problem. Each hint should be wrapped in a <note> block with the following structure:\n\n<note>\n<description>[Brief explanation of a key idea or technique relevant to the problem]</description>\n<example>[Concrete illustrative example that demonstrates the idea in action]</example>\n</note>\nCombine all hint blocks inside a <notes> element. Your goal is to help the student reason through the problem step-by-step by surfacing use

In [8]:
print(ds_cat[0]['query'])
print(ds_cat[0]['completion'])

You are an expert problem-solving assistant tasked with analyzing and solving a given math problem. Your response must include two main components:

---

# 1. HINT GENERATION (REASONING SUPPORT)

Given the following math problem, generate a cheatsheet of insightful hints that help guide a student toward solving the problem. Each hint should be wrapped in a <note> block with the following structure:

<note>
<description>[Brief explanation of a key idea or technique relevant to the problem]</description>
<example>[Concrete illustrative example that demonstrates the idea in action]</example>
</note>
Combine all hint blocks inside a <notes> element. Your goal is to help the student reason through the problem step-by-step by surfacing useful strategies, intermediate goals, or simplifications.

---

# 2. GENERATOR (PROBLEM SOLVER)

Instruction: You are an expert problem-solving assistant tasked with analyzing and solving various questions using a combination of your expertise and provided re

In [9]:
unique_problems = list(set(ds_cat['problem']))
test_size = int(len(unique_problems) * 0.1)
test_problems = unique_problems[:test_size]
train_problems = unique_problems[test_size:]
train_ds = ds_cat.filter(lambda x: x['problem'] in train_problems, num_proc=os.cpu_count())
test_ds = ds_cat.filter(lambda x: x['problem'] in test_problems, num_proc=os.cpu_count())

ds_cat = datasets.DatasetDict({
    'train': train_ds,
    'test': test_ds
})
ds_cat

Filter (num_proc=24):   0%|          | 0/113704 [00:00<?, ? examples/s]

Filter (num_proc=24):   0%|          | 0/113704 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['problem', 'query', 'completion'],
        num_rows: 102312
    })
    test: Dataset({
        features: ['problem', 'query', 'completion'],
        num_rows: 11392
    })
})

In [10]:
ds_cat.push_to_hub('Asap7772/d1shs0ap-twostagejoint-sft')

Uploading the dataset shards:   0%|          | 0/6 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/18 [00:00<?, ?ba/s]

Creating parquet from Arrow format:   0%|          | 0/18 [00:00<?, ?ba/s]

Creating parquet from Arrow format:   0%|          | 0/18 [00:00<?, ?ba/s]

Creating parquet from Arrow format:   0%|          | 0/18 [00:00<?, ?ba/s]

Creating parquet from Arrow format:   0%|          | 0/18 [00:00<?, ?ba/s]

Creating parquet from Arrow format:   0%|          | 0/18 [00:00<?, ?ba/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/12 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/324 [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/datasets/Asap7772/d1shs0ap-twostagejoint-sft/commit/ef61898605df4ecbcbf3650153a9eaa88790280d', commit_message='Upload dataset', commit_description='', oid='ef61898605df4ecbcbf3650153a9eaa88790280d', pr_url=None, repo_url=RepoUrl('https://huggingface.co/datasets/Asap7772/d1shs0ap-twostagejoint-sft', endpoint='https://huggingface.co', repo_type='dataset', repo_id='Asap7772/d1shs0ap-twostagejoint-sft'), pr_revision=None, pr_num=None)

In [11]:
output_dir = '/home/anikait.singh/rl_behaviors_verl_stable/d1shs0ap-twostagejoint-sft'
os.makedirs(output_dir, exist_ok=True)
ds_cat['train'].to_parquet(os.path.join(output_dir, 'train.parquet'))
ds_cat['test'].to_parquet(os.path.join(output_dir, 'test.parquet'))

Creating parquet from Arrow format:   0%|          | 0/103 [00:00<?, ?ba/s]

Creating parquet from Arrow format:   0%|          | 0/12 [00:00<?, ?ba/s]

298060250