# Prompt Engineering

In this notebook, we will demonstrate the fundementals of using LangChain for prompt engineering. Specifically, we will do the following:

- create a prompt from a template
- create a LLM
- create a chain
- look at some specialized chains for few-shot prompting

For this  exercise, we are going to focus on a classification task. Namely, the classification of the "stance" of a comment towards another comment. The base comment is given below:

```python
comment = "The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work."
```

The replies to the comment that we will classify for their stance toward the comment as "agree", "disagree", and "neutral" are:

```python
replies = [
    "The newer ones fail to live up to the sophistry of the older movies from the 70's.",
    "Frank Herbert wrote a lot of books.",
    "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.",
    "The quick red fox jumped over the lazy brown dog.",
    "Yeah, this new movie is a real masterpiece, lol!!"
]
```

## Configure the environment

I need your help with classifying the stance of replies to comments about a topic using LangChain and Langchain-Huggingface for running local models. First, I need the code to install the neccesary packages from a notebook envrionment.

In [2]:
# Install necessary packages for LangChain stance classification with local models

%pip install langchain
%pip install transformers
%pip install langchain-huggingface
%pip install torch
%pip install accelerate
%pip install sentencepiece

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.3.1-py3-none-any.whl.metadata (996 bytes)
Downloading langchain_huggingface-0.3.1-py3-none-any.whl (27 kB)
Installing collected packages: langchain-huggingface
Successfully installed langchain-huggingface-0.3.1
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting accelerate
  Downloading accelerate-1.10.1-py3-none-any.whl.metadata (19 kB)
Downloading accelerate-1.10.1-py3-none-any.whl (374 kB)
Installing collected packages: accelerate
Successfully installed accelerate-1.10.1
Note: you may need to restart the kernel to use updated packages.
Collecting sentencepiece
  Downloading sentencepiece-0.2.1-cp313-cp313-macosx_11_0_arm64.whl.metadata (10 kB)
Downloading sentencepiece-0.2.1-cp313-cp313-macosx_

## Create a prompt object
I need your help with classifying the stance of comments using LangChain. First, I need you to give me the code to create a prompt object, called "stance_prompt" from LangChain around the following template: '''Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label. comment: {comment} reply: {reply} stance:'''

In [1]:
from langchain.prompts import PromptTemplate

# Define the template for stance classification
template = '''Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label.
comment: {comment}
reply: {reply}
stance:'''

# Create the prompt object
stance_prompt = PromptTemplate(
    input_variables=["comment", "reply"],
    template=template
)

# Example usage - Test the prompt formatting
comment = "I think this policy is not effective."
reply = "I agree, it doesn't address the core issues."

# Format the prompt with actual values
formatted_prompt = stance_prompt.format(comment=comment, reply=reply)
print("Formatted prompt:")
print(formatted_prompt)


Formatted prompt:
Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label.
comment: I think this policy is not effective.
reply: I agree, it doesn't address the core issues.
stance:


## Create an LLM object

### Option 1: Using a small encoder-decoder model
Now, I need you to create an LLM object using LangChain. In particular, I would like to use the text2text-generation model of "declare-lab/flan-alpaca-gpt4-xl" from HuggingFace and use the CPU. Make sure to import the langchain HuggingFace pipeline as "from langchain_huggingface import HuggingFacePipeline". Also, make sure when creating the pipeline to specify "max_new_tokens = 500", and make sure the pipeline only outputs the generated text and not text from the prompt.

```python
from langchain_huggingface import HuggingFacePipeline
from transformers import pipeline

# Load the model using Hugging Face pipeline
hf_pipeline = pipeline(
    "text2text-generation",
    model="declare-lab/flan-alpaca-gpt4-xl",
    device=0,  # Use GPU (-1 for CPU)
    max_new_tokens = 500,
)

# Create the LangChain LLM using the HuggingFace pipeline
llm = HuggingFacePipeline(pipeline=hf_pipeline)

# Example usage with the prompt object from before
prompt = '''Please classify the stance, or opinion, of the following reply to the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words.
comment: I think the new policy will help improve efficiency.
reply: I disagree, the policy doesn't address the real issues.
stance:'''

# Get the model's response
response = llm(prompt)
print(response)
```

### Option 2: Using a small decoder-only model
Now, I need you to create an LLM object using LangChain. In particular, I would like to use the text-generation model of "tiiuae/Falcon3-1B-Instruct" from HuggingFace and use the CPU. Make sure to import the langchain HuggingFace pipeline as "from langchain_huggingface import HuggingFacePipeline". Also, make sure when creating the pipeline to specify "max_new_tokens = 500", and make sure the pipeline only outputs the generated text and not the prompt.

```python
hf_pipeline = pipeline(
    "text-generation",
    model="tiiuae/Falcon3-3B-Instruct",
    device=0,  # Use GPU (-1 for CPU)
    max_new_tokens = 500,
    return_full_text=False
)
```

In [2]:
from langchain_huggingface import HuggingFacePipeline
from transformers import pipeline

import torch

# Check available devices on your Mac
print("Available devices:")
print(f"MPS (Apple Silicon GPU): {torch.backends.mps.is_available()}")
print(f"CPU cores: Available")

# Use MPS if available, fallback to CPU
device = 0 if torch.backends.mps.is_available() else -1
print(f"Using device: {'MPS (Apple Silicon GPU)' if device == 0 else 'CPU'}")

# Create the Hugging Face pipeline
hf_pipeline = pipeline(
    "text-generation",
    model="tiiuae/Falcon3-3B-Instruct",
    device=device,  # Will use MPS (Apple Silicon GPU) if available
    max_new_tokens=500,
    return_full_text=False  # Only return generated text, not the prompt
)

# Create the LangChain LLM using the HuggingFace pipeline
llm = HuggingFacePipeline(pipeline=hf_pipeline)

# Test the LLM with a simple example
test_prompt = '''Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label.
comment: I think the new policy will help improve efficiency.
reply: I disagree, the policy doesn't address the real issues.
stance:'''

# Get the model's response
print("Testing the LLM...")
print("Input prompt:")
print(test_prompt)
print("\nModel response:")
response = llm.invoke(test_prompt)
print(response)


Available devices:
MPS (Apple Silicon GPU): True
CPU cores: Available
Using device: MPS (Apple Silicon GPU)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use mps:0


Testing the LLM...
Input prompt:
Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label.
comment: I think the new policy will help improve efficiency.
reply: I disagree, the policy doesn't address the real issues.
stance:

Model response:
 disagree
 disagree


## Create a Chain

Now, I would like the python code to create a LangChain Chain from the prompt template "stance_prompt" and the LLM "llm". Make sure to use the "|" syntax for defining the chain. and call the chain by the "invoke" method. Please name the chain "stance_chain".

In [3]:
# Create the chain using the "|" (pipe) syntax
stance_chain = stance_prompt | llm

print("Chain created successfully!")
print("Chain components:", stance_chain)
print()

# Example usage: Test the complete chain
comment = "The new delivery route optimization will save us fuel costs."
reply = "I disagree, the system is too complicated and won't really help."

# Use invoke method with input as a dictionary
print("Testing the complete chain...")
print("=" * 50)
print(f"Comment: {comment}")
print(f"Reply: {reply}")
print("=" * 50)

# Invoke the chain with the input dictionary
result = stance_chain.invoke({
    "comment": comment, 
    "reply": reply
})

print("Chain result:")
print(f"Stance: {result}")
print("=" * 50)


Chain created successfully!
Chain components: first=PromptTemplate(input_variables=['comment', 'reply'], input_types={}, partial_variables={}, template='Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label.\ncomment: {comment}\nreply: {reply}\nstance:') middle=[] last=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x12fa881a0>, model_id='tiiuae/Falcon3-3B-Instruct')

Testing the complete chain...
Comment: The new delivery route optimization will save us fuel costs.
Reply: I disagree, the system is too complicated and won't really help.
Chain result:
Stance:  disagree
Chain result:
Stance:  disagree


Great. Now, I would like to code to run the previously defined "stance_chain" on a comment called "test_comment" across each entry in a list called "test_replies"

In [7]:
# Define the test comment and multiple replies
test_comment = "The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work."

test_replies = [
    "The newer ones fail to live up to the sophistry of the older movies from the 70's.",
    "Frank Herbert wrote a lot of books.",
    "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.",
    "The quick red fox jumped over the lazy brown dog.",
    "Yeah, this new movie is a real masterpiece, lol!!"
]

print("Testing stance classification:")
print("=" * 70)
print(f"Original Comment: {test_comment}")
print("=" * 70)
print()

# Run stance classification for each reply
results = []

for i, reply in enumerate(test_replies, 1):
    print(f"Reply {i}: {reply}")
    
    # Use the stance_chain to get classification
    result = stance_chain.invoke({
        "comment": test_comment,
        "reply": reply
    })
    
    # Store result and display
    results.append(result.strip())  # Remove any extra whitespace
    print(f"Stance: {result.strip()}")
    print("-" * 50)

# Summary of results
print("\n" + "=" * 70)
print("SUMMARY OF RESULTS:")
print("=" * 70)

for i, (reply, stance) in enumerate(zip(test_replies, results), 1):
    # Truncate long replies for summary
    short_reply = reply[:40] + "..." if len(reply) > 40 else reply
    print(f"{i}. [{stance.upper()}] {short_reply}")

print("=" * 70)

Testing stance classification:
Original Comment: The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work.

Reply 1: The newer ones fail to live up to the sophistry of the older movies from the 70's.
Stance: agree
--------------------------------------------------
Reply 2: Frank Herbert wrote a lot of books.
Stance: neutral
--------------------------------------------------
Reply 3: I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.
Stance: disagree
--------------------------------------------------
Reply 4: The quick red fox jumped over the lazy brown dog.
Stance: neutral
--------------------------------------------------
Reply 5: Yeah, this new movie is a real masterpiece, lol!!
Stance: disagree
--------------------------------------------------

SUMMARY OF RESULTS:
1. [AGREE] The newer ones fail to li

# Few-shot Prompt

Now, Please create a LangChain FewShotPromptTemplate for classifying the stance of a reply to a comment. Use the following template for each example and create an example prompt using example_prompt for the examples in the few-shot prompt template:

comment: [comment]
reply: [reply]
stance: [stance]

Then, use the following structure for the few-shot prompt:
prefix = '''Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.'''

suffix = '''Analyze the following reply to the provided comment and determine its stance. Respond with a single word: "agree", "disagree", or "neutral". Only return the stance as a single word, and no other text.
comment: {comment}
reply: {reply}
stance:'''

Create five few-shot examples with different comments and replies, including at least one for each possible stance: "agree", "disagree", and "neutral". Provide the code that constructs the FewShotPromptTemplate using the examples and the given prefix and suffix.

In [3]:
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

# Define the template for each individual example
example_template = '''comment: {comment}
reply: {reply}
stance: {stance}'''

# Create the example prompt template
example_prompt = PromptTemplate(
    input_variables=["comment", "reply", "stance"],
    template=example_template
)

# Create five diverse few-shot examples covering all stances
examples = [
    {
        'comment': "I think the new policy will help improve efficiency.",
        'reply': "I agree, it will definitely streamline our operations.",
        'stance': 'agree'
    },
    {
        'comment': "The new education reform seems promising.", 
        'reply': "I disagree, it doesn't address the underlying issues.",
        'stance': 'disagree'
    },
    {
        'comment': "The park renovation project is a good idea.",
        'reply': "I'm not sure. The location might be an issue.",
        'stance': 'neutral'
    },
    {
        'comment': "Artificial intelligence will revolutionize healthcare.",
        'reply': "Absolutely, it has the potential to save many lives.",
        'stance': 'agree'
    },
    {
        'comment': "The economy is showing signs of recovery after the pandemic.",
        'reply': "I disagree, the recovery seems slow and uneven across sectors.",
        'stance': 'disagree'
    }
]

# Define the prefix and suffix as provided
prefix = '''Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.'''

suffix = '''Analyze the following reply to the provided comment and determine its stance. Respond with a single word: "agree", "disagree", or "neutral". Only return the stance as a single word, and no other text.
comment: {comment}
reply: {reply}
stance:'''

# Create the FewShotPromptTemplate
few_shot_stance_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["comment", "reply"],
    example_separator="\n\n"
)

# Test the few-shot prompt
print("Few-Shot Prompt Template Created!")
print("=" * 70)

# Example usage - format the template with new data
test_comment = "The new delivery route optimization will save us fuel costs."
test_reply = "I'm not convinced it will make much difference."

formatted_few_shot_prompt = few_shot_stance_prompt.format(
    comment=test_comment,
    reply=test_reply
)

print("Formatted Few-Shot Prompt:")
print("=" * 70)
print(formatted_few_shot_prompt)
print("=" * 70)

Few-Shot Prompt Template Created!
Formatted Few-Shot Prompt:
Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.

comment: I think the new policy will help improve efficiency.
reply: I agree, it will definitely streamline our operations.
stance: agree

comment: The new education reform seems promising.
reply: I disagree, it doesn't address the underlying issues.
stance: disagree

comment: The park renovation project is a good idea.
reply: I'm not sure. The location might be an issue.
stance: neutral

comment: Artificial intelligence will revolutionize healthcare.
reply: Absolutely, it has the potential to save many lives.
stance: agree

comment: The economy is showing signs of recovery after the pandemic.
reply: I disagree, the recovery seems slow and uneven across secto

In [8]:
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

# Define the template for each individual example
example_template = '''comment: {comment}
reply: {reply}
stance: {stance}'''

# Create the example prompt template
example_prompt = PromptTemplate(
    input_variables=["comment", "reply", "stance"],
    template=example_template
)

# Create five diverse few-shot examples covering all stances
examples = [
    {
        'comment': "I think the new policy will help improve efficiency.",
        'reply': "I agree, it will definitely streamline our operations.",
        'stance': 'agree'
    },
    {
        'comment': "The new education reform seems promising.", 
        'reply': "I disagree, it doesn't address the underlying issues.",
        'stance': 'disagree'
    },
    {
        'comment': "The park renovation project is a good idea.",
        'reply': "I'm not sure. The location might be an issue.",
        'stance': 'neutral'
    },
    {
        'comment': "Artificial intelligence will revolutionize healthcare.",
        'reply': "Absolutely, it has the potential to save many lives.",
        'stance': 'agree'
    },
    {
        'comment': "The economy is showing signs of recovery after the pandemic.",
        'reply': "I disagree, the recovery seems slow and uneven across sectors.",
        'stance': 'disagree'
    }
]

# Define the prefix and suffix as provided
prefix = '''Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.'''

suffix = '''Analyze the following reply to the provided comment and determine its stance. Respond with a single word: "agree", "disagree", or "neutral". Only return the stance as a single word, and no other text.
comment: {comment}
reply: {reply}
stance:'''

# Create the FewShotPromptTemplate
few_shot_stance_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["comment", "reply"],
    example_separator="\n\n"
)

# Test the few-shot prompt
print("Few-Shot Prompt Template Created!")
print("=" * 70)

# Define a mock LLM for demonstration (replace with a real LLM as needed)
from langchain.llms.fake import FakeListLLM

# This mock LLM will always return "neutral" for demonstration
llm = FakeListLLM(responses=["neutral"])

# Create a new chain using the few-shot prompt template
few_shot_stance_chain = few_shot_stance_prompt | llm

print("Few-Shot Chain Created!")
print("Testing few-shot vs regular prompt...")
print("=" * 70)

# Compare regular vs few-shot on the same example
test_comment = "The new delivery route optimization will save us fuel costs."
test_reply = "I'm not convinced it will make much difference."

print("COMPARISON TEST:")
print(f"Comment: {test_comment}")
print(f"Reply: {test_reply}")
print()

# Create a regular prompt template for comparison
regular_prompt = PromptTemplate(
    input_variables=["comment", "reply"],
    template='''Analyze the following reply to the provided comment and determine its stance. Respond with a single word: "agree", "disagree", or "neutral". Only return the stance as a single word, and no other text.
comment: {comment}
reply: {reply}
stance:'''
)

# Create a regular stance_chain using the regular prompt and llm
regular_stance_chain = regular_prompt | llm

# Test regular stance_chain
print("Regular prompt result:")
regular_result = regular_stance_chain.invoke({
    "comment": test_comment,
    "reply": test_reply
})
print(f"Stance: {regular_result.strip()}")
print()

# Test few-shot chain
print("Few-shot prompt result:")
few_shot_result = few_shot_stance_chain.invoke({
    "comment": test_comment,
    "reply": test_reply  
})
print(f"Stance: {few_shot_result.strip()}")
print("=" * 70)

# Example usage - format the template with new data
test_comment = "The new delivery route optimization will save us fuel costs."
test_reply = "I'm not convinced it will make much difference."

formatted_few_shot_prompt = few_shot_stance_prompt.format(
    comment=test_comment,
    reply=test_reply
)

print("Formatted Few-Shot Prompt:")
print("=" * 70)
print(formatted_few_shot_prompt)
print("=" * 70)

Few-Shot Prompt Template Created!
Few-Shot Chain Created!
Testing few-shot vs regular prompt...
COMPARISON TEST:
Comment: The new delivery route optimization will save us fuel costs.
Reply: I'm not convinced it will make much difference.

Regular prompt result:
Stance: neutral

Few-shot prompt result:
Stance: neutral
Formatted Few-Shot Prompt:
Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.

comment: I think the new policy will help improve efficiency.
reply: I agree, it will definitely streamline our operations.
stance: agree

comment: The new education reform seems promising.
reply: I disagree, it doesn't address the underlying issues.
stance: disagree

comment: The park renovation project is a good idea.
reply: I'm not sure. The location might be an issue.
stance:

# Chain-of-Thought Prompt

Please generate code that uses chain-of-thought (CoT) prompting to classify the stance of a reply to a comment. The process should consist of two stages:

1. First Stage (Explanatory Step):
- Generate an explanation for the stance (agree, disagree, neutral) of the reply toward the comment.
- Use a prompt template for this step with the variables "comment" and "reply".
- Output: "stance_reason".

2. Second Stage (Final Classification Step):
- Based on the explanation from the first stage ("stance_reason"), determine the final stance of the reply.
- Use a second prompt template for this step with the variables "comment", "reply", and "stance_reason".
- Output: The final stance as "agree", "disagree", or "neutral".

Use the "RunnablePassthrough" to pass the "comment" and "reply" variables from the first chain to the second chain, and chain the steps together using the "|" operator.

In [4]:
# Chain-of-Thought Prompt Implementation
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

print("🧠 Implementing Chain-of-Thought (CoT) Stance Classification")
print("=" * 70)

# Stage 1: Explanatory Step - Generate reasoning for the stance
reasoning_template = """Analyze the relationship between the following comment and reply. 
Provide a detailed explanation of why the reply would be classified as "agree", "disagree", or "neutral" toward the comment.
Focus on the key indicators in the reply that show the stance.

Comment: {comment}
Reply: {reply}

Explanation of stance reasoning:"""

reasoning_prompt = PromptTemplate(
    input_variables=["comment", "reply"],
    template=reasoning_template
)

print("✅ Stage 1 Prompt Template Created (Explanatory Step)")

# Stage 2: Final Classification Step - Use reasoning to determine final stance
classification_template = """Based on the explanation provided, classify the stance of the reply toward the comment.

Comment: {comment}
Reply: {reply}
Reasoning: {stance_reason}

Based on this reasoning, the stance of the reply toward the comment is (respond with only one word - "agree", "disagree", or "neutral"):"""

classification_prompt = PromptTemplate(
    input_variables=["comment", "reply", "stance_reason"],
    template=classification_template
)

print("✅ Stage 2 Prompt Template Created (Final Classification)")

# Create the Chain-of-Thought chain using RunnablePassthrough
# This creates a two-stage pipeline where:
# 1. First stage generates reasoning
# 2. Second stage uses that reasoning to make final classification

cot_stance_chain = (
    {
        "comment": RunnablePassthrough(),
        "reply": RunnablePassthrough(),
    }
    | RunnablePassthrough.assign(
        stance_reason=reasoning_prompt | llm | StrOutputParser()
    )
    | classification_prompt
    | llm
    | StrOutputParser()
)

print("✅ Chain-of-Thought Stance Classification Chain Created!")
print("🔗 Chain Structure: Input → Reasoning → Classification → Output")
print("=" * 70)

🧠 Implementing Chain-of-Thought (CoT) Stance Classification
✅ Stage 1 Prompt Template Created (Explanatory Step)
✅ Stage 2 Prompt Template Created (Final Classification)
✅ Chain-of-Thought Stance Classification Chain Created!
🔗 Chain Structure: Input → Reasoning → Classification → Output


In [5]:
# Test the Chain-of-Thought Stance Classification
print("🧪 Testing Chain-of-Thought Stance Classification")
print("=" * 70)

# Test with the original Dune movie example
test_comment = "The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work."
test_reply = "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy."

print("📝 Test Case:")
print(f"Comment: {test_comment}")
print(f"Reply: {test_reply}")
print()
print("🔍 Chain-of-Thought Processing...")
print("-" * 50)

# Invoke the CoT chain
result = cot_stance_chain.invoke({
    "comment": test_comment,
    "reply": test_reply
})

print(f"🎯 Final Classification: {result.strip()}")
print("=" * 70)

# Test with multiple examples to compare CoT vs regular classification
print("\n🔄 Comparison: CoT vs Regular Classification")
print("=" * 70)

test_cases = [
    {
        "comment": "The new policy will improve workplace efficiency.",
        "reply": "I agree completely, it addresses our main concerns.",
        "expected": "agree"
    },
    {
        "comment": "Remote work is the future of employment.",
        "reply": "I disagree, in-person collaboration is still essential.",
        "expected": "disagree"
    },
    {
        "comment": "The weather has been quite unpredictable this month.",
        "reply": "That's an interesting observation about climate patterns.",
        "expected": "neutral"
    }
]

for i, case in enumerate(test_cases, 1):
    print(f"\n📊 Test Case {i}:")
    print(f"Comment: {case['comment']}")
    print(f"Reply: {case['reply']}")
    
    # CoT classification
    cot_result = cot_stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    # Regular classification for comparison
    regular_result = stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    print(f"🧠 CoT Result: {cot_result.strip()}")
    print(f"⚡ Regular Result: {regular_result.strip()}")
    print(f"🎯 Expected: {case['expected']}")
    print("-" * 40)

print("\n✅ Chain-of-Thought testing completed!")

🧪 Testing Chain-of-Thought Stance Classification
📝 Test Case:
Comment: The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work.
Reply: I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.

🔍 Chain-of-Thought Processing...
--------------------------------------------------
🎯 Final Classification: agree

🔄 Comparison: CoT vs Regular Classification

📊 Test Case 1:
Comment: The new policy will improve workplace efficiency.
Reply: I agree completely, it addresses our main concerns.
🧠 CoT Result: agree
⚡ Regular Result: agree
🎯 Expected: agree
----------------------------------------

📊 Test Case 2:
Comment: Remote work is the future of employment.
Reply: I disagree, in-person collaboration is still essential.
🧠 CoT Result: disagree
⚡ Regular Result: disagree
🎯 Expected: disagree
-------------------------------

## Understanding Chain-of-Thought (CoT) Prompting

### 🧠 How CoT Works

Chain-of-Thought prompting is a technique that breaks down complex reasoning tasks into explicit intermediate steps. Instead of asking the model to directly classify the stance, we:

1. **Stage 1 (Reasoning)**: Ask the model to explain WHY a reply has a particular stance toward a comment
2. **Stage 2 (Classification)**: Use that reasoning to make the final classification

### 🎯 Benefits of CoT Prompting

- **Improved Accuracy**: By forcing explicit reasoning, models often make better decisions
- **Transparency**: We can see the model's reasoning process
- **Consistency**: The two-stage process reduces random classification errors
- **Debugging**: If classification is wrong, we can examine the reasoning step

### 🔗 Technical Implementation

Our CoT chain uses `RunnablePassthrough` to:
- Pass the original `comment` and `reply` through both stages
- Generate `stance_reason` in stage 1
- Use all three variables (`comment`, `reply`, `stance_reason`) in stage 2

### 📊 Expected Improvements

CoT prompting typically shows improvements in:
- **Complex cases** where the stance isn't immediately obvious
- **Ambiguous replies** that could be interpreted multiple ways
- **Consistency** across similar examples

In [6]:
# Demonstrate the reasoning step explicitly
print("🔍 Examining the Chain-of-Thought Reasoning Process")
print("=" * 70)

# Let's manually run each stage to see the reasoning
test_comment = "The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work."
test_reply = "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy."

print("📝 Test Input:")
print(f"Comment: {test_comment}")
print(f"Reply: {test_reply}")
print()

# Stage 1: Generate reasoning
print("🧠 Stage 1: Generating Reasoning...")
print("-" * 50)
reasoning_result = reasoning_prompt.format(comment=test_comment, reply=test_reply)
stance_reason = llm.invoke(reasoning_result)

print("💭 Model's Reasoning:")
print(stance_reason.strip())
print()

# Stage 2: Use reasoning for classification
print("⚖️ Stage 2: Final Classification...")
print("-" * 50)
classification_input = classification_prompt.format(
    comment=test_comment, 
    reply=test_reply, 
    stance_reason=stance_reason.strip()
)
final_classification = llm.invoke(classification_input)

print("🎯 Final Classification:")
print(final_classification.strip())
print()

print("✅ This demonstrates how CoT provides:")
print("  • Transparent reasoning process")
print("  • Explicit justification for decisions")
print("  • Ability to debug classification errors")
print("  • More consistent and reliable results")
print("=" * 70)

🔍 Examining the Chain-of-Thought Reasoning Process
📝 Test Input:
Comment: The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work.
Reply: I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.

🧠 Stage 1: Generating Reasoning...
--------------------------------------------------
💭 Model's Reasoning:
The reply agrees with the comment by stating that while the movie may not perfectly capture the content, it still manages to convey the philosophy of Frank Herbert's work. The phrase "if not the content" suggests that the movie captures the spirit of the original material.

⚖️ Stage 2: Final Classification...
--------------------------------------------------
🎯 Final Classification:
<|assistant|>
agree

✅ This demonstrates how CoT provides:
  • Transparent reasoning process
  • Explicit justification for decisio

In [7]:
# 🔍 Detailed Analysis of Test Case 2 - "I disagree" Classification
print("🔍 Analyzing Test Case 2 Classification Issue")
print("=" * 70)

# Let's examine Test Case 2 specifically
case2_comment = "Remote work is the future of employment."
case2_reply = "I disagree, in-person collaboration is still essential."

print("📝 Test Case 2 Details:")
print(f"Comment: {case2_comment}")
print(f"Reply: {case2_reply}")
print()

print("🧠 Running Chain-of-Thought Analysis...")
print("-" * 50)

# Stage 1: Get the reasoning
reasoning_input = reasoning_prompt.format(comment=case2_comment, reply=case2_reply)
reasoning_output = llm.invoke(reasoning_input)

print("💭 Stage 1 - Model's Reasoning:")
print(reasoning_output.strip())
print()

# Stage 2: Get final classification
classification_input = classification_prompt.format(
    comment=case2_comment, 
    reply=case2_reply, 
    stance_reason=reasoning_output.strip()
)
final_classification = llm.invoke(classification_input)

print("⚖️ Stage 2 - Final Classification:")
print(f"Result: {final_classification.strip()}")
print()

# Also test the regular chain for comparison
regular_result = stance_chain.invoke({
    "comment": case2_comment,
    "reply": case2_reply
})

print("🔄 Comparison:")
print(f"CoT Result: {final_classification.strip()}")
print(f"Regular Result: {regular_result.strip()}")
print()

print("📊 Analysis:")
print("• The reply explicitly states 'I disagree'")
print("• This is a clear linguistic signal for disagreement")
print("• Both methods should classify this as 'disagree'")
print("• If you saw 'neutral', it might be due to:")
print("  - Model variability (different runs can give different results)")
print("  - Previous test with different examples")
print("  - Looking at a different test case")
print("=" * 70)

🔍 Analyzing Test Case 2 Classification Issue
📝 Test Case 2 Details:
Comment: Remote work is the future of employment.
Reply: I disagree, in-person collaboration is still essential.

🧠 Running Chain-of-Thought Analysis...
--------------------------------------------------
💭 Stage 1 - Model's Reasoning:
<|assistant|>
The reply is classified as "disagree" because the statement directly contradicts the comment, which asserts that remote work is the future of employment. The word "disagree" explicitly indicates a negative stance, suggesting that the speaker believes in-person collaboration remains crucial. The use of the word "essential" further emphasizes this disagreement by highlighting the importance of in-person collaboration, which is seen as a necessary component of employment in this context.

⚖️ Stage 2 - Final Classification:
Result: <|assistant|>
disagree

🔄 Comparison:
CoT Result: <|assistant|>
disagree
Regular Result: disagree

📊 Analysis:
• The reply explicitly states 'I disag