Task 1: Build pos-neg pairs.

In this task, we will have to create a dataset that will train our model to answer math problems correctly. The NanoGPT only knows QA type responses. The pos-neg pairs will later be used for DPO fine-tuning.

In [1]:
# import libraries needed for task 1
import json
import random
import math
import os
import pandas as pd
from pathlib import Path

In [2]:
random.seed(42)

In [3]:
# we will save the json file here
saveloc = "C:/Users/syedr/OneDrive/Documents/NTU Semester Resources/NTU Y3S1/SC3000 - Artificial Intelligence/Lab 1/NanoGPT-Math-main/dpo/pos_neg_pairs.json"

In [4]:
# Load existing data if the file is there
existingdoc = []
if os.path.exists(saveloc):
    with open(saveloc, "r", encoding="utf-8") as file:
        existingdoc = json.load(file)
    print(f"Loaded {len(existingdoc)} items from '{saveloc}'")
else:
    print("No existing data found, starting fresh.")

No existing data found, starting fresh.


In [5]:
# we need to create some helper functions for the math problems
# addition
def create_addition_problem(num1, num2):
    question = f"{num1}+{num2}=?"
    answer = num1 + num2
    correct_response = f"{question} The answer is {answer} because {num1}+{num2} equals {answer}."
    wrong_response = f"{question} Hmm, I'm not sure!"
    return {"negative": wrong_response, "positive": correct_response}

# subtraction
def create_subtraction_problem(num1, num2):
    question = f"{num1}-{num2}=?"
    result = num1 - num2
    correct_response = f"{question} The answer is {result} because {num1}-{num2} equals {result}."
    wrong_response = f"{question} Sorry, I don't know the answer!"
    return {"negative": wrong_response, "positive": correct_response}

# multiplication
def create_multiplication_problem(a, b):
    question = f"{a}*{b}=?"
    answer = a * b
    correct_response = f"{question} The answer is {answer} because {a}×{b} equals {answer}."
    wrong_response = f"{question} Sorry, I don't know!"
    return {"negative": wrong_response, "positive": correct_response}

# division
def create_division_problem(dividend, divisor):
    if divisor == 0:
        divisor = 1  # just make it 1 to keep things simple
    
    dividend = dividend - (dividend % divisor)
    question = f"{dividend}/{divisor}=?"
    answer = dividend // divisor
    
    correct_response = f"{question} The answer is {answer} because {dividend}/{divisor} equals {answer}."
    # for wrong answers, let's throw in a random guess
    random_guess = random.randint(0, 9)
    wrong_response = f"{question} I think it's {random_guess}, but not sure."
    
    return {"negative": wrong_response, "positive": correct_response}

# linear eqns
def create_linear_equation(coefficient, constant):
    # solving ax + b = c for x
    # ax + b = c  -> x = (c - b)/a
    if coefficient == 0:
        coefficient = 1
    
    x_value = random.randint(-20, 50)
    c_value = coefficient * x_value + constant
    
    question = f"{coefficient}*x+{constant}={c_value}, x=?"
    solution = x_value
    
    correct_response = f"{question} The answer is {solution} because ({c_value}-{constant})/{coefficient} = {solution}."
    wrong_response = f"{question} Sorry, I'm not sure how to solve that."
    
    return {"negative": wrong_response, "positive": correct_response}

In [6]:
# generate the dataset for task 1
target_total = 50000
need = max(0, target_total - len(existingdoc))

pairs = []
ops = ["add", "sub", "mul", "div", "linear"]
weights = [0.2, 0.2, 0.2, 0.2, 0.2]

for _ in range(need):
    op = random.choices(ops, weights=weights)[0]
    if op == "add":
        pairs.append(create_addition_problem(random.randint(0, 999), random.randint(0, 999)))
    elif op == "sub":
        pairs.append(create_subtraction_problem(random.randint(0, 999), random.randint(0, 999)))
    elif op == "mul":
        pairs.append(create_multiplication_problem(random.randint(0, 99), random.randint(0, 20)))
    elif op == "div":
        pairs.append(create_division_problem(random.randint(1, 999), random.randint(1, 20)))
    elif op == "linear":
        pairs.append(create_linear_equation(random.randint(-9, 9) or 1, random.randint(-50, 100)))

In [7]:
# combine with existing and shuffle
combined = existingdoc + pairs
random.shuffle(combined)

In [8]:
# save the combined data to a file
with open(saveloc, "w", encoding="utf-8") as file:
    json.dump(combined, file, indent=2, ensure_ascii=False)

# let the user know the save was successful
num_examples = len(combined)
print(f"Successfully saved {num_examples} examples to '{saveloc}'.")

Successfully saved 50000 examples to 'C:/Users/syedr/OneDrive/Documents/NTU Semester Resources/NTU Y3S1/SC3000 - Artificial Intelligence/Lab 1/NanoGPT-Math-main/dpo/pos_neg_pairs.json'.


In [10]:
# preview a few examples
df_preview = pd.DataFrame(combined[:10])
display(df_preview)

Unnamed: 0,negative,positive
0,"120/8=? I think it's 6, but not sure.",120/8=? The answer is 15 because 120/8 equals 15.
1,"603/9=? I think it's 3, but not sure.",603/9=? The answer is 67 because 603/9 equals 67.
2,"359-781=? Sorry, I don't know the answer!",359-781=? The answer is -422 because 359-781 e...
3,"7*x+77=259, x=? Sorry, I'm not sure how to sol...","7*x+77=259, x=? The answer is 26 because (259-..."
4,"290-341=? Sorry, I don't know the answer!",290-341=? The answer is -51 because 290-341 eq...
5,"81+276=? Hmm, I'm not sure!",81+276=? The answer is 357 because 81+276 equa...
6,"357+717=? Hmm, I'm not sure!",357+717=? The answer is 1074 because 357+717 e...
7,"596-197=? Sorry, I don't know the answer!",596-197=? The answer is 399 because 596-197 eq...
8,"969/19=? I think it's 7, but not sure.",969/19=? The answer is 51 because 969/19 equal...
9,"416+250=? Hmm, I'm not sure!",416+250=? The answer is 666 because 416+250 eq...


By completing task 1, we have generated a total of 50,000 pos-neg pairs for DPO training in the later tasks.