## Cell 1: Install/Import

In [6]:
from datasets import load_dataset
import pprint
import re
from typing import List

## Cell 2: Load the Dataset

In [7]:
# Load the prm800k dataset (aka Intel/orca_dpo_pairs)
dataset = load_dataset("Intel/orca_dpo_pairs", split="train")

## Cell 3: Preview Few Samples

In [8]:
# Show 2 examples to see the structure
for i in range(2):
    sample = dataset[i]
    print(f"\n--- Example {i+1} ---")
    print(f"Prompt: {sample['question']}")
    print("\nChosen Response (first 300 chars):")
    print(sample['chosen'][:300])
    print("\nRejected Response (first 300 chars):")
    print(sample['rejected'][:300])


--- Example 1 ---
Prompt: You will be given a definition of a task first, then some input of the task.
This task is about using the specified sentence and converting the sentence to Resource Description Framework (RDF) triplets of the form (subject, predicate object). The RDF triplets generated must be such that the triplets accurately capture the structure and semantics of the input sentence. The input is a sentence and the output is a list of triplets of the form [subject, predicate, object] that capture the relationships present in the sentence. When a sentence has more than 1 RDF triplet possible, the output must contain all of them.

AFC Ajax (amateurs)'s ground is Sportpark De Toekomst where Ajax Youth Academy also play.
Output:

Chosen Response (first 300 chars):
[
  ["AFC Ajax (amateurs)", "has ground", "Sportpark De Toekomst"],
  ["Ajax Youth Academy", "plays at", "Sportpark De Toekomst"]
]

Rejected Response (first 300 chars):
 Sure, I'd be happy to help! Here are the RDF tr

## Cell 4: Split Answer into Steps

In [9]:
def split_into_steps(text: str) -> List[str]:
    """
    Naive step splitting: split by newlines or sentences.
    You can refine this later using LLMs or logic.
    """
    steps = re.split(r'(?<=\.)\s+(?=[A-Z])', text.strip())
    return [s.strip() for s in steps if s.strip()]

# Try it on one chosen response
chosen_steps = split_into_steps(dataset[0]['chosen'])
rejected_steps = split_into_steps(dataset[0]['rejected'])

print("Chosen Reasoning Steps:")
pprint.pprint(chosen_steps[:5])

print("\nRejected Reasoning Steps:")
pprint.pprint(rejected_steps[:5])


Chosen Reasoning Steps:
['[\n'
 '  ["AFC Ajax (amateurs)", "has ground", "Sportpark De Toekomst"],\n'
 '  ["Ajax Youth Academy", "plays at", "Sportpark De Toekomst"]\n'
 ']']

Rejected Reasoning Steps:
["Sure, I'd be happy to help! Here are the RDF triplets for the input "
 'sentence:\n'
 '\n'
 '[AFC Ajax (amateurs), hasGround, Sportpark De Toekomst]\n'
 '[Ajax Youth Academy, playsAt, Sportpark De Toekomst]\n'
 '\n'
 'Explanation:\n'
 '\n'
 '* AFC Ajax (amateurs) is the subject of the first triplet, and hasGround is '
 'the predicate that describes the relationship between AFC Ajax (amateurs) '
 'and Sportpark De Toekomst.\n'
 '* Ajax Youth Academy is the subject of the second triplet, and playsAt is '
 'the predicate that describes the relationship between Ajax Youth Academy and '
 'Sportpark De Toekomst.',
 'Note that there may be other possible RDF triplets that could be derived '
 'from the input sentence, but the above triplets capture the main '
 'relationships present in the sen