# 5. Demo

**Purpose:**  
Interactively demonstrate our Question Parsing & CoT Parsing pipeline on a single logical puzzle.  
You’ll pick an example (by index or your own custom input), then see:
- The original question and CoT  
- The extracted `question_parsing` constraints  
- The structured `cot_parsing` steps  

## Imports and File Paths

In [2]:
# Install Unsloth for efficient LLM fine-tuning
%%capture
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

In [1]:
import unsloth
import torch, json, re, ast, html
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# JSON fallback
try:
    import json5
    USE_JSON5 = True
except ImportError:
    USE_JSON5 = False

Unsloth: Patching Xformers to fix some performance issues.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


    PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.6.0+cu124)
    Python  3.11.9 (you have 3.11.12)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


🦥 Unsloth Zoo will now patch everything to make training faster!


In [2]:
input_path  = "/content/drive/MyDrive/llm-sr-project/testingData-blank.json"
output_path = "/content/drive/MyDrive/llm-sr-project/testingDataresultsfor700-2.json"
log_path    = "/content/drive/MyDrive/llm-sr-project/raw_outputs_log.jsonl"

## Prompt Templates

In [3]:
# ICL (In-Context Learning) Prompt Templates and Demonstrations (same as in training)

QP_DEMON = '''The question is:

There are 6 volunteers: A, B, C, D, E and F. They will be assigned to either Project Alpha or Project Beta. Each person works on exactly one project. This assignment must satisfy:
(1) If A works on Alpha, then B works on Beta.
(2) If C works on Alpha, then D and E work on Beta.
(3) F works on a different project than E.
(4) D must work on a different project than A.
(5) If F works on Alpha, then B works on Alpha.

If A works on Beta, which of the following must be true?
A. B works on Alpha
B. C works on Beta
C. D works on Alpha
D. F works on Beta

The parsing result is:

[
  "There are 6 volunteers: A, B, C, D, E and F. They will be assigned to either Project Alpha or Project Beta. Each person works on exactly one project.",
  "If A works on Alpha, then B works on Beta",
  "If C works on Alpha, then D and E work on Beta",
  "F works on a different project than E",
  "D must work on a different project than A",
  "If F works on Alpha, then B works on Alpha",
  "A works on Beta"
]
'''


QP_TEMPLATE = '''Given a question, extract all relevant information from the question that would help to solve it.

This includes:
- General setup information (e.g., number of people, projects involved)
- Explicit facts given in the question
- All logical constraints or conditions

Output only a JSON list and nothing else. Follow the format shown in the example.

Example:

{demon}

Now, the question is:

{question}

Your output:
'''


CP_DEMON = '''The question is:

There are 6 volunteers: A, B, C, D, E and F. They will be assigned to either Project Alpha or Project Beta. Each person works on exactly one project.

Conditions:
(1) If A works on Alpha, then B works on Beta.
(2) If C works on Alpha, then D and E work on Beta.
(3) F works on a different project than E.
(4) D must work on a different project than A.
(5) If F works on Alpha, then B works on Alpha.

Question:
If A works on Beta, which of the following must be true?

CoT:
Since A works on Beta, Condition (1) is not triggered. Condition (2) is not triggered since C’s assignment is unknown. Condition (3) doesn’t give anything because E’s assignment is unspecified. Condition (4) says D must work on a different project than A, so D must work on Alpha. Condition (5) depends on F, which is unknown.

Parsing result:

[
  {
    "statement": "Condition (1) is not applicable",
    "evidence": "Condition (1): If A works on Alpha, then B works on Beta. | A is working on Beta",
    "Verification": "false"
  },
  {
    "statement": "Condition (2) is not applicable",
    "evidence": "Condition (2): If C works on Alpha, then D and E work on Beta. | C’s assignment is unknown",
    "Verification": "false"
  },
  {
    "statement": "Condition (3) does not provide any info",
    "evidence": "Condition (3): F works on a different project than E. | E’s assignment is unknown",
    "Verification": "false"
  },
  {
    "statement": "D must work on Alpha",
    "evidence": "Condition (4): D must work on a different project than A, and A is working on Beta",
    "Verification": "true"
  },
  {
    "statement": "Condition (5) is not applicable",
    "evidence": "Condition (5): If F works on Alpha, then B works on Alpha. | F’s assignment is unknown",
    "Verification": "false"
  }
]
'''

CP_TEMPLATE = '''You are a reasoning assistant. Based on the question, conditions, and chain-of-thought (CoT), extract every inference or non-inference step as a JSON object.

For each CoT sentence that either:
  1. Refers to a condition (e.g. “Condition (2) …”)
  2. Starts with an inference cue (“Since”, “Therefore”, “This means”, “We can deduce”, etc.)

Produce one object with:
  • "statement": the new claim you read in that CoT sentence (don’t quote the entire sentence—just the core inference).
  • "evidence":
      – if the claim restates a constraint, use the exact line from the **Conditions** block,
      – otherwise, use the CoT fragment that you extracted it from.
  • "Verification":
      – `"false"` if the sentence rejects or blocks a condition (contains “not applicable”, “does not provide”, etc.),
      – otherwise `"true"`.

Keep the objects in the same order as they appear in the CoT.

Example:

{demon}

Now, given:

Question:
{question}

Conditions:
{conditions}

Chain-of-Thought:
{cot}

Your output:
'''

## Utility Functions

In [4]:
def extract_first_json_array(raw: str):
    raw = raw.strip()
    start = raw.find('[')
    if start == -1: return None
    depth = 0
    for i, ch in enumerate(raw[start:], start):
        if ch=='[': depth+=1
        elif ch==']': depth-=1
        if depth==0:
            block = raw[start:i+1]
            for parser in (json.loads, ast.literal_eval, (json5.loads if USE_JSON5 else None)):
                if parser:
                    try: return parser(block)
                    except: pass
            return None
    return None

def clean_quotes(text):
    return text.replace('“','"').replace('”','"').replace("‘","'").replace("’","'")

def normalize_question_text(text):
    text = clean_quotes(text)
    text = re.sub(r'\?\s(?=[A-Z])', ', ', text)
    text = re.sub(r'(?<=[a-zA-Z])\.(?=[A-Z])', '. ', text)
    text = re.sub(r'(?<![A-Da-d])\\n(?!\s?[A-Da-d]\\.)', ' ', text)
    return html.unescape(text).strip()

## Load Models

In [5]:
# Question Parser
question_model_path = "/content/drive/MyDrive/llm-sr-project/finetuned_llama3_question_parsing"
question_tokenizer  = AutoTokenizer.from_pretrained(question_model_path)
question_tokenizer.model_max_length = 1024
question_model      = AutoModelForCausalLM.from_pretrained(question_model_path, load_in_4bit=True)
question_pipe       = pipeline("text-generation", model=question_model, tokenizer=question_tokenizer,
                               return_full_text=False, num_beams=5, early_stopping=True, do_sample=False)

# CoT Parser
cot_model_path = "/content/drive/MyDrive/llm-sr-project/finetuned_llama3_cot_parsing"
cot_tokenizer  = AutoTokenizer.from_pretrained(cot_model_path)
cot_tokenizer.model_max_length = 2048
cot_model      = AutoModelForCausalLM.from_pretrained(cot_model_path, load_in_4bit=True)
cot_pipe       = pipeline("text-generation", model=cot_model, tokenizer=cot_tokenizer,
                          return_full_text=False, num_beams=5, early_stopping=True, do_sample=False)

print("✅ Models loaded.")

config.json:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/220 [00:00<?, ?B/s]

Device set to use cuda:0
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Device set to use cuda:0
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


✅ Models loaded.


## Inference Functions

In [6]:
def generate_question_parsing(question: str):
    q = normalize_question_text(question)
    prompt = QP_TEMPLATE.format(demon=QP_DEMON, question=q)
    resp = question_pipe(prompt, max_new_tokens=512)[0]["generated_text"]
    with open(log_path, "a") as f:
        f.write(json.dumps({"type":"QP","question":question,"raw":resp})+"\n")
    return extract_first_json_array(resp) or []

def generate_cot_parsing(question: str, cot: str, constraints):
    q = normalize_question_text(question)
    c = normalize_question_text(cot)
    prompt = CP_TEMPLATE.format(demon=CP_DEMON, question=q,
                                conditions=json.dumps(constraints, ensure_ascii=False),
                                cot=c)
    resp_list = cot_pipe(prompt, max_new_tokens=1024)
    if not resp_list or "generated_text" not in resp_list[0]:
        print("Malformed response")
        return []
    resp = resp_list[0]["generated_text"]
    with open(log_path, "a") as f:
        f.write(json.dumps({"type":"CP","question":question,"cot":cot,"raw":resp})+"\n")
    steps = extract_first_json_array(resp)
    if not steps: return []
    clean, seen = [], set()
    for st in steps:
        s = st.get("statement","").strip()
        e = st.get("evidence","").strip() or "logical deduction"
        v = str(st.get("Verification","true")).lower()
        if len(s)<5 or (s,e) in seen: continue
        seen.add((s,e))
        clean.append({"statement":s,"evidence":e,"Verification":v})
    return clean

## Demo

In [7]:
# Load test set
with open("/content/drive/MyDrive/llm-sr-project/testingDataresultsfor700-2.json") as f:
    examples = json.load(f)

idx = 0
ex = examples[idx]

print(" Question:\n", ex["question"])
print("\n Chain of Thought:\n", ex["cot"])

# Generate
qp = generate_question_parsing(ex["question"])
cp = generate_cot_parsing(ex["question"], ex["cot"], qp)

print(f"\n Extracted {len(qp)} constraints:")
for c in qp: print(" •", c)

print(f"\n Parsed {len(cp)} steps:")
for step in cp:
    print(" •", step)


The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


 Question:
 There are 7 outstanding students G, H, L, M, U, W and Z in a school.During the summer vacation, the school will send them to the United Kingdom and the United States for inspection.The school has only 7 students participating in this activity, and each person happens to go to one of these two countries.Considering the specialty of each student, this activity must meet the following conditions? (1) If G goes to the UK, then H To the United States.(2) If L goes to the UK, both M and U go to the US.(3) The country W went to was different from the country Z went to.(4) The country where U goes is different from the country where G goes.(5) If Z goes to the UK, then H also goes to the UK.
If G goes to the United States, which of the following must be true?
A.H go to the UK
B.L go to America
C.M go to the UK
D.W go to America

 Chain of Thought:
 Since G goes to the United States, we need to analyze the conditions that follow. Condition (1) is not applicable since G is going to t

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.



 Extracted 7 constraints:
 • There are 7 outstanding students G, H, L, M, U, W and Z in a school. During the summer vacation, the school will send them to the United Kingdom and the United States for inspection. The school has only 7 students participating in this activity, and each person happens to go to one of these two countries.
 • If G goes to the UK, then H To the United States.
 • If L goes to the UK, both M and U go to the US.
 • The country W went to was different from the country Z went to.
 • The country where U goes is different from the country where G goes.
 • If Z goes to the UK, then H also goes to the UK.
 • G goes to the United States

 Parsed 5 steps:
 • {'statement': 'Condition (1) is not applicable', 'evidence': 'Condition (1): If G goes to the UK, then H To the United States. | G goes to the United States', 'Verification': 'false'}
 • {'statement': 'Condition (2) is not applicable', 'evidence': "Condition (2): If L goes to the UK, both M and U go to the US. | L'

In [8]:
with open("/content/drive/MyDrive/llm-sr-project/testingDataresultsfor700-2.json") as f:
    examples = json.load(f)

ex = {
    "question": "Imagine you come across an ancient door with a numerical lock that consists of a sequence of numbers followed by a question mark. The sequence is as follows: 3, 6, 9, 12, ?, what number should you enter to unlock the door?\nA. 14\nB. 13\nC. 18\nD. 15",
    "cot": "The sequence follows a simple arithmetic pattern where each number is derived by adding a constant increment to the previous one. Starting with 3, each subsequent number increases by 3: 3 becomes 6, 6 becomes 9, and 9 becomes 12. This indicates a consistent increment of 3. Applying the same rule, the next number after 12 is 12 + 3, which results in 15. Hence, 15 is the number that completes the sequence."
}

print(" Question:\n", ex["question"])
print("\n Chain of Thought:\n", ex["cot"])

qp = generate_question_parsing(ex["question"])
cp = generate_cot_parsing(ex["question"], ex["cot"], qp)

print(f"\n Extracted {len(qp)} constraints:")
for c in qp: print(" •", c)

print(f"\n Parsed {len(cp)} steps:")
for step in cp:
    print(" •", step)


The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


 Question:
 Imagine you come across an ancient door with a numerical lock that consists of a sequence of numbers followed by a question mark. The sequence is as follows: 3, 6, 9, 12, ?, what number should you enter to unlock the door?
A. 14
B. 13
C. 18
D. 15

 Chain of Thought:
 The sequence follows a simple arithmetic pattern where each number is derived by adding a constant increment to the previous one. Starting with 3, each subsequent number increases by 3: 3 becomes 6, 6 becomes 9, and 9 becomes 12. This indicates a consistent increment of 3. Applying the same rule, the next number after 12 is 12 + 3, which results in 15. Hence, 15 is the number that completes the sequence.


The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.



 Extracted 3 constraints:
 • Imagine you come across an ancient door with a numerical lock that consists of a sequence of numbers followed by a question mark.
 • The sequence is as follows: 3, 6, 9, 12,?
 • what number should you enter to unlock the door

 Parsed 5 steps:
 • {'statement': 'The sequence follows a simple arithmetic pattern', 'evidence': 'The sequence is as follows: 3, 6, 9, 12,?', 'Verification': 'true'}
 • {'statement': 'Each number is derived by adding a constant increment to the previous one', 'evidence': 'The sequence follows a simple arithmetic pattern where each number is derived by adding a constant increment to the previous one.', 'Verification': 'true'}
 • {'statement': 'Starting with 3, each subsequent number increases by 3', 'evidence': 'Starting with 3, each subsequent number increases by 3: 3 becomes 6, 6 becomes 9, and 9 becomes 12.', 'Verification': 'true'}
 • {'statement': 'The next number after 12 is 12 + 3', 'evidence': 'This indicates a consistent inc