Decided to improve the prompt before fine-tuning, by providing more few-shot examples and definitions. As you can see, it improved stability much.

In [1]:
import json
import ollama

In [2]:
SYSTEM_PROMPT = ""
with open("../../global_system_prompt.txt", "r", encoding="utf-8") as f:
    SYSTEM_PROMPT = f.read()

In [3]:
from collections import Counter

def get_structure(json_data):
    try:
        data = json.loads(json_data)
        return {
            "atoms_count": len(data.get("atoms", [])),
            "rules_count": len(data.get("rules", [])),
            "strict_rules": len([r for r in data.get("rules", []) if r.get("type") == "strict"]),
            "defeasible_rules": len([r for r in data.get("rules", []) if r.get("type") == "defeasible"]),
            "attacks_count": len(data.get("attacks", [])),
            "attack_types": Counter([a.get("type") for a in data.get("attacks", [])])
        }
    except:
        return "Invalid JSON"

In [4]:
prompts=[]
with open("data/variance/prompts.txt", "r") as f:
    prompts = f.read()
for i in prompts.split("@"):
    outputs=[]
    print("Prompt:", i)
    for j in range(5):
        response = ollama.chat(
            model="qwen2.5:14b",
            format="json",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": i}
            ],
            options={"temperature": 0}
        )
        outputs.append(response["message"]["content"].strip())
        print("Output", j+1, "processed.")
    for k in outputs:
        print(get_structure(k))
    

Prompt: John is guilty of trespassing because he entered the private property without permission. However, John claims he was invited by the owner, which means he is not guilty of trespassing.

Output 1 processed.
Output 2 processed.
Output 3 processed.
Output 4 processed.
Output 5 processed.
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
Prompt: 
The sec

In [5]:
prompts=[]
with open("data/stability/prompts.txt", "r") as f:
    prompts = f.read()
ct=1
for i in prompts.split(";"):
    outputs=[]
    print("Prompt set ", ct, "\n")
    ct2=1
    for j in i.split("@"):
        response = ollama.chat(
            model="qwen2.5:14b",
            format="json",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": j}
            ],
            options={"temperature": 0}
        )
        outputs.append(response["message"]["content"].strip())
        print("Output", (ct-1)*5+ct2, "processed.")
        ct2+=1
    ct+=1
    for k in outputs:
        print(get_structure(k))
    

Prompt set  1 

Output 1 processed.
Output 2 processed.
Output 3 processed.
Output 4 processed.
Output 5 processed.
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
Prompt set  2 

Output 6 processed.
Output 7 processed.
Output 8 processed.
Output 9 processed.
Output 10 processed.
{'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1

In [None]:
prompts=[]
valid_outputs=[]
with open("data/correctness/prompts.txt", "r") as f:
    prompts = f.read()
with open("data/correctness/outputs.txt", "r") as f:
    valid_outputs = f.read()
ct=0
for i in prompts.split("@"):
    print("Prompt:", i)
    response = ollama.chat(
        model="qwen2.5:14b",
        format="json",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": i}
        ],
        options={"temperature": 0}
    )
    print("Computed output: ",get_structure(response["message"]["content"].strip()))
    print("Valid output: ", get_structure(valid_outputs.split("@")[ct]),"\n")

Prompt: John is guilty of trespassing because he entered the private property without permission. However, John claims he was invited by the owner, which means he is not guilty of trespassing.

Computed output:  {'atoms_count': 4, 'rules_count': 2, 'strict_rules': 1, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})}
Valid output:  {'atoms_count': 4, 'rules_count': 2, 'strict_rules': 0, 'defeasible_rules': 2, 'attacks_count': 1, 'attack_types': Counter({'rebut': 1})} 

Prompt: 
The security footage shows a person in a red hat stealing the bike, so the thief wore a red hat. But the camera was low-resolution and the colors are distorted, so the footage cannot reliably identify the color of the hat.

Computed output:  {'atoms_count': 3, 'rules_count': 1, 'strict_rules': 0, 'defeasible_rules': 1, 'attacks_count': 1, 'attack_types': Counter({'undercut': 1})}
Valid output:  {'atoms_count': 4, 'rules_count': 2, 'strict_rules': 0, 'defeasible_rules': 2, 'attacks_

: 

All of the data and results can be found in the corresponding directory in the data folder.