# Prompt Engineering

In this notebook, we will demonstrate the fundementals of using LangChain for prompt engineering. Specifically, we will do the following:

- create a prompt from a template
- create a LLM
- create a chain
- look at some specialized chains for few-shot prompting

For this  exercise, we are going to focus on a classification task. Namely, the classification of the "stance" of a comment towards another comment. The base comment is given below:

```python
comment = "The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work."
```

The replies to the comment that we will classify for their stance toward the comment as "agree", "disagree", and "neutral" are:

```python
replies = [
    "The newer ones fail to live up to the sophistry of the older movies from the 70's.",
    "Frank Herbert wrote a lot of books.",
    "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.",
    "The quick red fox jumped over the lazy brown dog.",
    "Yeah, this new movie is a real masterpiece, lol!!"
]
```

## Configure the environment

I need your help with classifying the stance of replies to comments about a topic using LangChain and Langchain-Huggingface for running local models. First, I need the code to install the neccesary packages from a notebook envrionment.

In [9]:
# Install necessary packages for LangChain stance classification with local models

%pip install langchain
%pip install transformers
%pip install langchain-huggingface
%pip install torch
%pip install accelerate
%pip install sentencepiece

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Create a prompt object
I need your help with classifying the stance of comments using LangChain. First, I need you to give me the code to create a prompt object, called "stance_prompt" from LangChain around the following template: '''Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label. comment: {comment} reply: {reply} stance:'''

In [1]:
from langchain.prompts import PromptTemplate

# Define the template for stance classification
template = '''Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label.
comment: {comment}
reply: {reply}
stance:'''

# Create the prompt object
stance_prompt = PromptTemplate(
    input_variables=["comment", "reply"],
    template=template
)

# Example usage - Test the prompt formatting
comment = "I think this policy is not effective."
reply = "I agree, it doesn't address the core issues."

# Format the prompt with actual values
formatted_prompt = stance_prompt.format(comment=comment, reply=reply)
print("Formatted prompt:")
print(formatted_prompt)


Formatted prompt:
Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label.
comment: I think this policy is not effective.
reply: I agree, it doesn't address the core issues.
stance:


## Create an LLM object

### Option 1: Using a small encoder-decoder model
Now, I need you to create an LLM object using LangChain. In particular, I would like to use the text2text-generation model of "declare-lab/flan-alpaca-gpt4-xl" from HuggingFace and use the CPU. Make sure to import the langchain HuggingFace pipeline as "from langchain_huggingface import HuggingFacePipeline". Also, make sure when creating the pipeline to specify "max_new_tokens = 500", and make sure the pipeline only outputs the generated text and not text from the prompt.

```python
from langchain_huggingface import HuggingFacePipeline
from transformers import pipeline

# Load the model using Hugging Face pipeline
hf_pipeline = pipeline(
    "text2text-generation",
    model="declare-lab/flan-alpaca-gpt4-xl",
    device=0,  # Use GPU (-1 for CPU)
    max_new_tokens = 500,
)

# Create the LangChain LLM using the HuggingFace pipeline
llm = HuggingFacePipeline(pipeline=hf_pipeline)

# Example usage with the prompt object from before
prompt = '''Please classify the stance, or opinion, of the following reply to the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words.
comment: I think the new policy will help improve efficiency.
reply: I disagree, the policy doesn't address the real issues.
stance:'''

# Get the model's response
response = llm(prompt)
print(response)
```

### Option 2: Using a small decoder-only model
Now, I need you to create an LLM object using LangChain. In particular, I would like to use the text-generation model of "tiiuae/Falcon3-1B-Instruct" from HuggingFace and use the CPU. Make sure to import the langchain HuggingFace pipeline as "from langchain_huggingface import HuggingFacePipeline". Also, make sure when creating the pipeline to specify "max_new_tokens = 500", and make sure the pipeline only outputs the generated text and not the prompt.

```python
hf_pipeline = pipeline(
    "text-generation",
    model="tiiuae/Falcon3-3B-Instruct",
    device=0,  # Use GPU (-1 for CPU)
    max_new_tokens = 500,
    return_full_text=False
)
```

In [3]:
from langchain_huggingface import HuggingFacePipeline
from transformers import pipeline

import torch

# Check available devices on your Mac
print("Available devices:")
print(f"MPS (Apple Silicon GPU): {torch.backends.mps.is_available()}")
print(f"CPU cores: Available")

# Use MPS if available, fallback to CPU
device = 0 if torch.backends.mps.is_available() else -1
print(f"Using device: {'MPS (Apple Silicon GPU)' if device == 0 else 'CPU'}")

# Create the Hugging Face pipeline
hf_pipeline = pipeline(
    "text-generation",
    model="tiiuae/Falcon3-3B-Instruct",
    device=device,  # Will use MPS (Apple Silicon GPU) if available
    max_new_tokens=500,
    return_full_text=False  # Only return generated text, not the prompt
)

# Create the LangChain LLM using the HuggingFace pipeline
llm = HuggingFacePipeline(pipeline=hf_pipeline)

# Test the LLM with a simple example
test_prompt = '''Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label.
comment: I think the new policy will help improve efficiency.
reply: I disagree, the policy doesn't address the real issues.
stance:'''

# Get the model's response
print("Testing the LLM...")
print("Input prompt:")
print(test_prompt)
print("\nModel response:")
response = llm.invoke(test_prompt)
print(response)


Available devices:
MPS (Apple Silicon GPU): True
CPU cores: Available
Using device: MPS (Apple Silicon GPU)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use mps:0


Testing the LLM...
Input prompt:
Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputing the label.
comment: I think the new policy will help improve efficiency.
reply: I disagree, the policy doesn't address the real issues.
stance:

Model response:
 disagree
 disagree


## Create a Chain

Now, I would like the python code to create a LangChain Chain from the prompt template "stance_prompt" and the LLM "llm". Make sure to use the "|" syntax for defining the chain. and call the chain by the "invoke" method. Please name the chain "stance_chain".

In [10]:
# Create the basic stance prompt template
stance_prompt = PromptTemplate(
    input_variables=["comment", "reply"],
    template='''Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputting the label.

comment: {comment}
reply: {reply}
stance:'''
)

print("✅ Basic stance prompt created successfully!")
print("Template:", stance_prompt.template)

✅ Basic stance prompt created successfully!
Template: Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputting the label.

comment: {comment}
reply: {reply}
stance:


In [11]:
# Create the chain using the "|" (pipe) syntax
stance_chain = stance_prompt | llm

print("Chain created successfully!")
print("Chain components:", stance_chain)
print()

# Example usage: Test the complete chain
comment = "The new delivery route optimization will save us fuel costs."
reply = "I disagree, the system is too complicated and won't really help."

# Use invoke method with input as a dictionary
print("Testing the complete chain...")
print("=" * 50)
print(f"Comment: {comment}")
print(f"Reply: {reply}")
print("=" * 50)

# Invoke the chain with the input dictionary
result = stance_chain.invoke({
    "comment": comment, 
    "reply": reply
})

print("Chain result:")
print(f"Stance: {result}")
print("=" * 50)


Chain created successfully!
Chain components: first=PromptTemplate(input_variables=['comment', 'reply'], input_types={}, partial_variables={}, template='Please classify the stance, or opinion, of the following reply to the comment. Note that we want the stance of the reply to the comment, and not the stance of the reply to topic of the comment. Only give the stance as "agree", "disagree", or "neutral" and output no other words after outputting the label.\n\ncomment: {comment}\nreply: {reply}\nstance:') middle=[] last=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x12c148ec0>, model_id='tiiuae/Falcon3-3B-Instruct')

Testing the complete chain...
Comment: The new delivery route optimization will save us fuel costs.
Reply: I disagree, the system is too complicated and won't really help.
Chain result:
Stance:  disagree


Great. Now, I would like to code to run the previously defined "stance_chain" on a comment called "test_comment" across each entry in a list called "test_replies"

In [8]:
# Define the test comment and multiple replies
test_comment = "The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work."

test_replies = [
    "The newer ones fail to live up to the sophistry of the older movies from the 70's.",
    "Frank Herbert wrote a lot of books.",
    "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.",
    "The quick red fox jumped over the lazy brown dog.",
    "Yeah, this new movie is a real masterpiece, lol!!"
]

print("Testing stance classification:")
print("=" * 70)
print(f"Original Comment: {test_comment}")
print("=" * 70)
print()

# Run stance classification for each reply
results = []

for i, reply in enumerate(test_replies, 1):
    print(f"Reply {i}: {reply}")
    
    # Use the stance_chain to get classification
    result = stance_chain.invoke({
        "comment": test_comment,
        "reply": reply
    })
    
    # Store result and display
    results.append(result.strip())  # Remove any extra whitespace
    print(f"Stance: {result.strip()}")
    print("-" * 50)

# Summary of results
print("\n" + "=" * 70)
print("SUMMARY OF RESULTS:")
print("=" * 70)

for i, (reply, stance) in enumerate(zip(test_replies, results), 1):
    # Truncate long replies for summary
    short_reply = reply[:40] + "..." if len(reply) > 40 else reply
    print(f"{i}. [{stance.upper()}] {short_reply}")

print("=" * 70)

Testing stance classification:
Original Comment: The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work.

Reply 1: The newer ones fail to live up to the sophistry of the older movies from the 70's.


NameError: name 'stance_chain' is not defined

# Few-shot Prompt

Now, Please create a LangChain FewShotPromptTemplate for classifying the stance of a reply to a comment. Use the following template for each example and create an example prompt using example_prompt for the examples in the few-shot prompt template:

comment: [comment]
reply: [reply]
stance: [stance]

Then, use the following structure for the few-shot prompt:
prefix = '''Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.'''

suffix = '''Analyze the following reply to the provided comment and determine its stance. Respond with a single word: "agree", "disagree", or "neutral". Only return the stance as a single word, and no other text.
comment: {comment}
reply: {reply}
stance:'''

Create five few-shot examples with different comments and replies, including at least one for each possible stance: "agree", "disagree", and "neutral". Provide the code that constructs the FewShotPromptTemplate using the examples and the given prefix and suffix.

In [3]:
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

# Define the template for each individual example
example_template = '''comment: {comment}
reply: {reply}
stance: {stance}'''

# Create the example prompt template
example_prompt = PromptTemplate(
    input_variables=["comment", "reply", "stance"],
    template=example_template
)

# Create five diverse few-shot examples covering all stances
examples = [
    {
        'comment': "I think the new policy will help improve efficiency.",
        'reply': "I agree, it will definitely streamline our operations.",
        'stance': 'agree'
    },
    {
        'comment': "The new education reform seems promising.", 
        'reply': "I disagree, it doesn't address the underlying issues.",
        'stance': 'disagree'
    },
    {
        'comment': "The park renovation project is a good idea.",
        'reply': "I'm not sure. The location might be an issue.",
        'stance': 'neutral'
    },
    {
        'comment': "Artificial intelligence will revolutionize healthcare.",
        'reply': "Absolutely, it has the potential to save many lives.",
        'stance': 'agree'
    },
    {
        'comment': "The economy is showing signs of recovery after the pandemic.",
        'reply': "I disagree, the recovery seems slow and uneven across sectors.",
        'stance': 'disagree'
    }
]

# Define the prefix and suffix as provided
prefix = '''Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.'''

suffix = '''Analyze the following reply to the provided comment and determine its stance. Respond with a single word: "agree", "disagree", or "neutral". Only return the stance as a single word, and no other text.
comment: {comment}
reply: {reply}
stance:'''

# Create the FewShotPromptTemplate
few_shot_stance_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["comment", "reply"],
    example_separator="\n\n"
)

# Test the few-shot prompt
print("Few-Shot Prompt Template Created!")
print("=" * 70)

# Example usage - format the template with new data
test_comment = "The new delivery route optimization will save us fuel costs."
test_reply = "I'm not convinced it will make much difference."

formatted_few_shot_prompt = few_shot_stance_prompt.format(
    comment=test_comment,
    reply=test_reply
)

print("Formatted Few-Shot Prompt:")
print("=" * 70)
print(formatted_few_shot_prompt)
print("=" * 70)

Few-Shot Prompt Template Created!
Formatted Few-Shot Prompt:
Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.

comment: I think the new policy will help improve efficiency.
reply: I agree, it will definitely streamline our operations.
stance: agree

comment: The new education reform seems promising.
reply: I disagree, it doesn't address the underlying issues.
stance: disagree

comment: The park renovation project is a good idea.
reply: I'm not sure. The location might be an issue.
stance: neutral

comment: Artificial intelligence will revolutionize healthcare.
reply: Absolutely, it has the potential to save many lives.
stance: agree

comment: The economy is showing signs of recovery after the pandemic.
reply: I disagree, the recovery seems slow and uneven across secto

In [8]:
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

# Define the template for each individual example
example_template = '''comment: {comment}
reply: {reply}
stance: {stance}'''

# Create the example prompt template
example_prompt = PromptTemplate(
    input_variables=["comment", "reply", "stance"],
    template=example_template
)

# Create five diverse few-shot examples covering all stances
examples = [
    {
        'comment': "I think the new policy will help improve efficiency.",
        'reply': "I agree, it will definitely streamline our operations.",
        'stance': 'agree'
    },
    {
        'comment': "The new education reform seems promising.", 
        'reply': "I disagree, it doesn't address the underlying issues.",
        'stance': 'disagree'
    },
    {
        'comment': "The park renovation project is a good idea.",
        'reply': "I'm not sure. The location might be an issue.",
        'stance': 'neutral'
    },
    {
        'comment': "Artificial intelligence will revolutionize healthcare.",
        'reply': "Absolutely, it has the potential to save many lives.",
        'stance': 'agree'
    },
    {
        'comment': "The economy is showing signs of recovery after the pandemic.",
        'reply': "I disagree, the recovery seems slow and uneven across sectors.",
        'stance': 'disagree'
    }
]

# Define the prefix and suffix as provided
prefix = '''Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.'''

suffix = '''Analyze the following reply to the provided comment and determine its stance. Respond with a single word: "agree", "disagree", or "neutral". Only return the stance as a single word, and no other text.
comment: {comment}
reply: {reply}
stance:'''

# Create the FewShotPromptTemplate
few_shot_stance_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["comment", "reply"],
    example_separator="\n\n"
)

# Test the few-shot prompt
print("Few-Shot Prompt Template Created!")
print("=" * 70)

# Define a mock LLM for demonstration (replace with a real LLM as needed)
from langchain.llms.fake import FakeListLLM

# This mock LLM will always return "neutral" for demonstration
llm = FakeListLLM(responses=["neutral"])

# Create a new chain using the few-shot prompt template
few_shot_stance_chain = few_shot_stance_prompt | llm

print("Few-Shot Chain Created!")
print("Testing few-shot vs regular prompt...")
print("=" * 70)

# Compare regular vs few-shot on the same example
test_comment = "The new delivery route optimization will save us fuel costs."
test_reply = "I'm not convinced it will make much difference."

print("COMPARISON TEST:")
print(f"Comment: {test_comment}")
print(f"Reply: {test_reply}")
print()

# Create a regular prompt template for comparison
regular_prompt = PromptTemplate(
    input_variables=["comment", "reply"],
    template='''Analyze the following reply to the provided comment and determine its stance. Respond with a single word: "agree", "disagree", or "neutral". Only return the stance as a single word, and no other text.
comment: {comment}
reply: {reply}
stance:'''
)

# Create a regular stance_chain using the regular prompt and llm
regular_stance_chain = regular_prompt | llm

# Test regular stance_chain
print("Regular prompt result:")
regular_result = regular_stance_chain.invoke({
    "comment": test_comment,
    "reply": test_reply
})
print(f"Stance: {regular_result.strip()}")
print()

# Test few-shot chain
print("Few-shot prompt result:")
few_shot_result = few_shot_stance_chain.invoke({
    "comment": test_comment,
    "reply": test_reply  
})
print(f"Stance: {few_shot_result.strip()}")
print("=" * 70)

# Example usage - format the template with new data
test_comment = "The new delivery route optimization will save us fuel costs."
test_reply = "I'm not convinced it will make much difference."

formatted_few_shot_prompt = few_shot_stance_prompt.format(
    comment=test_comment,
    reply=test_reply
)

print("Formatted Few-Shot Prompt:")
print("=" * 70)
print(formatted_few_shot_prompt)
print("=" * 70)

Few-Shot Prompt Template Created!
Few-Shot Chain Created!
Testing few-shot vs regular prompt...
COMPARISON TEST:
Comment: The new delivery route optimization will save us fuel costs.
Reply: I'm not convinced it will make much difference.

Regular prompt result:
Stance: neutral

Few-shot prompt result:
Stance: neutral
Formatted Few-Shot Prompt:
Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.

comment: I think the new policy will help improve efficiency.
reply: I agree, it will definitely streamline our operations.
stance: agree

comment: The new education reform seems promising.
reply: I disagree, it doesn't address the underlying issues.
stance: disagree

comment: The park renovation project is a good idea.
reply: I'm not sure. The location might be an issue.
stance:

# Chain-of-Thought Prompt

Please generate code that uses chain-of-thought (CoT) prompting to classify the stance of a reply to a comment. The process should consist of two stages:

1. First Stage (Explanatory Step):
- Generate an explanation for the stance (agree, disagree, neutral) of the reply toward the comment.
- Use a prompt template for this step with the variables "comment" and "reply".
- Output: "stance_reason".

2. Second Stage (Final Classification Step):
- Based on the explanation from the first stage ("stance_reason"), determine the final stance of the reply.
- Use a second prompt template for this step with the variables "comment", "reply", and "stance_reason".
- Output: The final stance as "agree", "disagree", or "neutral".

Use the "RunnablePassthrough" to pass the "comment" and "reply" variables from the first chain to the second chain, and chain the steps together using the "|" operator.

In [12]:
# Chain-of-Thought Prompt Implementation
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

print("🧠 Implementing Chain-of-Thought (CoT) Stance Classification")
print("=" * 70)

# Stage 1: Explanatory Step - Generate reasoning for the stance
reasoning_template = """Analyze the relationship between the following comment and reply. 
Provide a detailed explanation of why the reply would be classified as "agree", "disagree", or "neutral" toward the comment.
Focus on the key indicators in the reply that show the stance.

Comment: {comment}
Reply: {reply}

Explanation of stance reasoning:"""

reasoning_prompt = PromptTemplate(
    input_variables=["comment", "reply"],
    template=reasoning_template
)

print("✅ Stage 1 Prompt Template Created (Explanatory Step)")

# Stage 2: Final Classification Step - Use reasoning to determine final stance
classification_template = """Based on the explanation provided, classify the stance of the reply toward the comment.

Comment: {comment}
Reply: {reply}
Reasoning: {stance_reason}

Based on this reasoning, the stance of the reply toward the comment is (respond with only one word - "agree", "disagree", or "neutral"):"""

classification_prompt = PromptTemplate(
    input_variables=["comment", "reply", "stance_reason"],
    template=classification_template
)

print("✅ Stage 2 Prompt Template Created (Final Classification)")

# Create the Chain-of-Thought chain using RunnablePassthrough
# This creates a two-stage pipeline where:
# 1. First stage generates reasoning
# 2. Second stage uses that reasoning to make final classification

cot_stance_chain = (
    {
        "comment": RunnablePassthrough(),
        "reply": RunnablePassthrough(),
    }
    | RunnablePassthrough.assign(
        stance_reason=reasoning_prompt | llm | StrOutputParser()
    )
    | classification_prompt
    | llm
    | StrOutputParser()
)

print("✅ Chain-of-Thought Stance Classification Chain Created!")
print("🔗 Chain Structure: Input → Reasoning → Classification → Output")
print("=" * 70)

🧠 Implementing Chain-of-Thought (CoT) Stance Classification
✅ Stage 1 Prompt Template Created (Explanatory Step)
✅ Stage 2 Prompt Template Created (Final Classification)
✅ Chain-of-Thought Stance Classification Chain Created!
🔗 Chain Structure: Input → Reasoning → Classification → Output


In [8]:
# Test the Chain-of-Thought Stance Classification
print("🧪 Testing Chain-of-Thought Stance Classification")
print("=" * 70)

# Test with the original Dune movie example
test_comment = "The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work."
test_reply = "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy."

print("📝 Test Case:")
print(f"Comment: {test_comment}")
print(f"Reply: {test_reply}")
print()
print("🔍 Chain-of-Thought Processing...")
print("-" * 50)

# Invoke the CoT chain
result = cot_stance_chain.invoke({
    "comment": test_comment,
    "reply": test_reply
})

print(f"🎯 Final Classification: {result.strip()}")
print("=" * 70)

# Test with multiple examples to compare CoT vs regular classification
print("\n🔄 Comparison: CoT vs Regular Classification")
print("=" * 70)

test_cases = [
    {
        "comment": "The new policy will improve workplace efficiency.",
        "reply": "I agree completely, it addresses our main concerns.",
        "expected": "agree"
    },
    {
        "comment": "Remote work is the future of employment.",
        "reply": "I disagree, in-person collaboration is still essential.",
        "expected": "disagree"
    },
    {
        "comment": "The weather has been quite unpredictable this month.",
        "reply": "That's an interesting observation about climate patterns.",
        "expected": "neutral"
    }
]

for i, case in enumerate(test_cases, 1):
    print(f"\n📊 Test Case {i}:")
    print(f"Comment: {case['comment']}")
    print(f"Reply: {case['reply']}")
    
    # CoT classification
    cot_result = cot_stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    # Regular classification for comparison
    regular_result = stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    print(f"🧠 CoT Result: {cot_result.strip()}")
    print(f"⚡ Regular Result: {regular_result.strip()}")
    print(f"🎯 Expected: {case['expected']}")
    print("-" * 40)

print("\n✅ Chain-of-Thought testing completed!")

🧪 Testing Chain-of-Thought Stance Classification
📝 Test Case:
Comment: The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work.
Reply: I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.

🔍 Chain-of-Thought Processing...
--------------------------------------------------
🎯 Final Classification: agree

🔄 Comparison: CoT vs Regular Classification

📊 Test Case 1:
Comment: The new policy will improve workplace efficiency.
Reply: I agree completely, it addresses our main concerns.
🎯 Final Classification: agree

🔄 Comparison: CoT vs Regular Classification

📊 Test Case 1:
Comment: The new policy will improve workplace efficiency.
Reply: I agree completely, it addresses our main concerns.
🧠 CoT Result: agree
⚡ Regular Result: agree
<|assistant|>
agree
🎯 Expected: agree
----------------------------------------

📊

## Understanding Chain-of-Thought (CoT) Prompting

### 🧠 How CoT Works

Chain-of-Thought prompting is a technique that breaks down complex reasoning tasks into explicit intermediate steps. Instead of asking the model to directly classify the stance, we:

1. **Stage 1 (Reasoning)**: Ask the model to explain WHY a reply has a particular stance toward a comment
2. **Stage 2 (Classification)**: Use that reasoning to make the final classification

### 🎯 Benefits of CoT Prompting

- **Improved Accuracy**: By forcing explicit reasoning, models often make better decisions
- **Transparency**: We can see the model's reasoning process
- **Consistency**: The two-stage process reduces random classification errors
- **Debugging**: If classification is wrong, we can examine the reasoning step

### 🔗 Technical Implementation

Our CoT chain uses `RunnablePassthrough` to:
- Pass the original `comment` and `reply` through both stages
- Generate `stance_reason` in stage 1
- Use all three variables (`comment`, `reply`, `stance_reason`) in stage 2

### 📊 Expected Improvements

CoT prompting typically shows improvements in:
- **Complex cases** where the stance isn't immediately obvious
- **Ambiguous replies** that could be interpreted multiple ways
- **Consistency** across similar examples

In [6]:
# Demonstrate the reasoning step explicitly
print("🔍 Examining the Chain-of-Thought Reasoning Process")
print("=" * 70)

# Let's manually run each stage to see the reasoning
test_comment = "The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work."
test_reply = "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy."

print("📝 Test Input:")
print(f"Comment: {test_comment}")
print(f"Reply: {test_reply}")
print()

# Stage 1: Generate reasoning
print("🧠 Stage 1: Generating Reasoning...")
print("-" * 50)
reasoning_result = reasoning_prompt.format(comment=test_comment, reply=test_reply)
stance_reason = llm.invoke(reasoning_result)

print("💭 Model's Reasoning:")
print(stance_reason.strip())
print()

# Stage 2: Use reasoning for classification
print("⚖️ Stage 2: Final Classification...")
print("-" * 50)
classification_input = classification_prompt.format(
    comment=test_comment, 
    reply=test_reply, 
    stance_reason=stance_reason.strip()
)
final_classification = llm.invoke(classification_input)

print("🎯 Final Classification:")
print(final_classification.strip())
print()

print("✅ This demonstrates how CoT provides:")
print("  • Transparent reasoning process")
print("  • Explicit justification for decisions")
print("  • Ability to debug classification errors")
print("  • More consistent and reliable results")
print("=" * 70)

🔍 Examining the Chain-of-Thought Reasoning Process
📝 Test Input:
Comment: The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work.
Reply: I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.

🧠 Stage 1: Generating Reasoning...
--------------------------------------------------
💭 Model's Reasoning:
The reply agrees with the comment by stating that while the movie may not perfectly capture the content, it still manages to convey the philosophy of Frank Herbert's work. The phrase "if not the content" suggests that the movie captures the spirit of the original material.

⚖️ Stage 2: Final Classification...
--------------------------------------------------
🎯 Final Classification:
<|assistant|>
agree

✅ This demonstrates how CoT provides:
  • Transparent reasoning process
  • Explicit justification for decisio

In [7]:
# 🔍 Detailed Analysis of Test Case 2 - "I disagree" Classification
print("🔍 Analyzing Test Case 2 Classification Issue")
print("=" * 70)

# Let's examine Test Case 2 specifically
case2_comment = "Remote work is the future of employment."
case2_reply = "I disagree, in-person collaboration is still essential."

print("📝 Test Case 2 Details:")
print(f"Comment: {case2_comment}")
print(f"Reply: {case2_reply}")
print()

print("🧠 Running Chain-of-Thought Analysis...")
print("-" * 50)

# Stage 1: Get the reasoning
reasoning_input = reasoning_prompt.format(comment=case2_comment, reply=case2_reply)
reasoning_output = llm.invoke(reasoning_input)

print("💭 Stage 1 - Model's Reasoning:")
print(reasoning_output.strip())
print()

# Stage 2: Get final classification
classification_input = classification_prompt.format(
    comment=case2_comment, 
    reply=case2_reply, 
    stance_reason=reasoning_output.strip()
)
final_classification = llm.invoke(classification_input)

print("⚖️ Stage 2 - Final Classification:")
print(f"Result: {final_classification.strip()}")
print()

# Also test the regular chain for comparison
regular_result = stance_chain.invoke({
    "comment": case2_comment,
    "reply": case2_reply
})

print("🔄 Comparison:")
print(f"CoT Result: {final_classification.strip()}")
print(f"Regular Result: {regular_result.strip()}")
print()

print("📊 Analysis:")
print("• The reply explicitly states 'I disagree'")
print("• This is a clear linguistic signal for disagreement")
print("• Both methods should classify this as 'disagree'")
print("• If you saw 'neutral', it might be due to:")
print("  - Model variability (different runs can give different results)")
print("  - Previous test with different examples")
print("  - Looking at a different test case")
print("=" * 70)

🔍 Analyzing Test Case 2 Classification Issue
📝 Test Case 2 Details:
Comment: Remote work is the future of employment.
Reply: I disagree, in-person collaboration is still essential.

🧠 Running Chain-of-Thought Analysis...
--------------------------------------------------
💭 Stage 1 - Model's Reasoning:
<|assistant|>
The reply is classified as "disagree" because the statement directly contradicts the comment, which asserts that remote work is the future of employment. The word "disagree" explicitly indicates a negative stance, suggesting that the speaker believes in-person collaboration remains crucial. The use of the word "essential" further emphasizes this disagreement by highlighting the importance of in-person collaboration, which is seen as a necessary component of employment in this context.

⚖️ Stage 2 - Final Classification:
Result: <|assistant|>
disagree

🔄 Comparison:
CoT Result: <|assistant|>
disagree
Regular Result: disagree

📊 Analysis:
• The reply explicitly states 'I disag

# Tree-of-Thought Prompt

Please generate code that uses Tree-of-Thought (ToT) prompting to explore multiple reasoning paths and iteratively evaluate potential stances (agree, disagree, neutral) before making a final classification. Maintain the context of the comment and reply through all stages of reasoning and evaluation.

## Steps:

**Step 1 - Generate Hypotheses:**
- Propose multiple possible stances (agree, disagree, neutral) based on different interpretations of the reply.
- Use a prompt template to generate each hypothesis with explanations.
- Output: hypotheses as a list.

**Step 2 - Evaluate Hypotheses:**
- Assess the validity of each hypothesis by critically analyzing its reasoning.
- Use an evaluation prompt template to rank or score each hypothesis based on coherence and relevance. Assign a score (1–5) for logical consistency and coherence
- Output: evaluations as scores or rankings for each hypothesis.

**Step 3 - Final Decision:**
- Select the hypothesis with the highest score or best reasoning and output the final stance as "agree", "disagree", or "neutral" based on the reasoning.

In [1]:
# Tree-of-Thought Prompt Implementation
import json
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

print("🌳 Implementing Tree-of-Thought (ToT) Stance Classification")
print("=" * 70)

# Step 1: Generate Hypotheses - Explore multiple reasoning paths
hypothesis_template = """Given the following comment and reply, generate three different hypotheses about the stance of the reply toward the comment. For each hypothesis, provide the stance (agree, disagree, or neutral) and a detailed explanation of the reasoning.

Comment: {comment}
Reply: {reply}

Generate exactly 3 hypotheses in the following format:

Hypothesis 1: [stance]
Reasoning: [detailed explanation]

Hypothesis 2: [stance] 
Reasoning: [detailed explanation]

Hypothesis 3: [stance]
Reasoning: [detailed explanation]

Focus on different possible interpretations of the reply's meaning and intent."""

hypothesis_prompt = PromptTemplate(
    input_variables=["comment", "reply"],
    template=hypothesis_template
)

print("✅ Step 1 - Hypothesis Generation Template Created")

# Step 2: Evaluate Hypotheses - Score each hypothesis for logical consistency
evaluation_template = """Evaluate the following hypotheses about the stance classification. For each hypothesis, assign a score from 1-5 based on logical consistency and coherence with the given comment and reply.

Comment: {comment}
Reply: {reply}

Hypotheses to evaluate:
{hypotheses}

For each hypothesis, provide:
- Score (1-5, where 5 is most logical and coherent)
- Brief justification

Format your response as:
Hypothesis 1 Score: [1-5]
Justification: [brief explanation]

Hypothesis 2 Score: [1-5] 
Justification: [brief explanation]

Hypothesis 3 Score: [1-5]
Justification: [brief explanation]"""

evaluation_prompt = PromptTemplate(
    input_variables=["comment", "reply", "hypotheses"],
    template=evaluation_template
)

print("✅ Step 2 - Hypothesis Evaluation Template Created")

# Step 3: Final Decision - Select best hypothesis and output final stance
decision_template = """Based on the evaluation scores and justifications, select the best hypothesis and provide the final stance classification.

Comment: {comment}
Reply: {reply}

Hypotheses and Evaluations:
{evaluations}

Select the hypothesis with the highest score and provide the final stance as a single word: "agree", "disagree", or "neutral".

Final Decision: """

decision_prompt = PromptTemplate(
    input_variables=["comment", "reply", "evaluations"],
    template=decision_template
)

print("✅ Step 3 - Final Decision Template Created")
print("🔗 Tree-of-Thought Structure: Input → Hypotheses → Evaluation → Decision")
print("=" * 70)

🌳 Implementing Tree-of-Thought (ToT) Stance Classification
✅ Step 1 - Hypothesis Generation Template Created
✅ Step 2 - Hypothesis Evaluation Template Created
✅ Step 3 - Final Decision Template Created
🔗 Tree-of-Thought Structure: Input → Hypotheses → Evaluation → Decision


In [4]:
# Create the Tree-of-Thought Chain
print("🔧 Building Tree-of-Thought Processing Chain...")
print("-" * 50)

# Helper function for evaluation step
def create_evaluation_input(x):
    return evaluation_prompt.format(
        comment=x["comment"],
        reply=x["reply"], 
        hypotheses=x["hypotheses"]
    )

# Helper function for decision step  
def create_decision_input(x):
    return decision_prompt.format(
        comment=x["comment"],
        reply=x["reply"],
        evaluations=x["evaluations"]
    )

# Tree-of-Thought chain implementation using RunnablePassthrough
tot_stance_chain = (
    {
        "comment": RunnablePassthrough(),
        "reply": RunnablePassthrough(),
    }
    | RunnablePassthrough.assign(
        # Step 1: Generate hypotheses
        hypotheses=hypothesis_prompt | llm | StrOutputParser()
    )
    | RunnablePassthrough.assign(
        # Step 2: Evaluate hypotheses  
        evaluations=create_evaluation_input | llm | StrOutputParser()
    )
    | create_decision_input
    | llm
    | StrOutputParser()
)

print("✅ Tree-of-Thought Chain Created Successfully!")
print("🎯 Processing Flow:")
print("   1. Generate 3 different hypotheses with reasoning")
print("   2. Evaluate each hypothesis (score 1-5)")
print("   3. Select best hypothesis for final classification")
print("=" * 70)

🔧 Building Tree-of-Thought Processing Chain...
--------------------------------------------------
✅ Tree-of-Thought Chain Created Successfully!
🎯 Processing Flow:
   1. Generate 3 different hypotheses with reasoning
   2. Evaluate each hypothesis (score 1-5)
   3. Select best hypothesis for final classification


In [None]:
# Test Tree-of-Thought Stance Classification
print("🧪 Testing Tree-of-Thought Stance Classification")
print("=" * 70)

# Test with a complex, ambiguous case
test_comment = "The new Dune movie does not really capture the vision laid out by Frank Herbert. It feels like they tried to import too many visual effects that take away from the philosophy of the work."
test_reply = "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy."

print("📝 Test Case (Ambiguous - good for ToT analysis):")
print(f"Comment: {test_comment}")
print(f"Reply: {test_reply}")
print()

print("🌳 Tree-of-Thought Processing...")
print("This will generate multiple hypotheses, evaluate them, and select the best one.")
print("-" * 50)

# Run the ToT chain
tot_result = tot_stance_chain.invoke({
    "comment": test_comment,
    "reply": test_reply
})

print(f"🎯 Final ToT Classification: {tot_result.strip()}")
print("=" * 70)

# Compare all three methods: Regular, CoT, and ToT
print("\n🔄 Method Comparison: Regular vs CoT vs ToT")
print("=" * 70)

comparison_cases = [
    {
        "comment": "Remote work is destroying team collaboration and productivity.",
        "reply": "I partially agree - while productivity can suffer, remote work also offers flexibility benefits.",
        "description": "Complex case with partial agreement"
    },
    {
        "comment": "The new AI regulation will stifle innovation in the tech industry.",
        "reply": "That's a valid concern, though some oversight might be necessary for safety.",
        "description": "Nuanced response with acknowledgment but neutral stance"
    },
    {
        "comment": "Climate change is the most urgent global challenge we face today.",
        "reply": "Absolutely, we need immediate action on carbon emissions and renewable energy.",
        "description": "Clear agreement case"
    }
]

for i, case in enumerate(comparison_cases, 1):
    print(f"\n📊 Comparison Case {i}: {case['description']}")
    print(f"Comment: {case['comment']}")
    print(f"Reply: {case['reply']}")
    
    # Regular classification
    regular_result = stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    # CoT classification
    cot_result = cot_stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    # ToT classification
    tot_result = tot_stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    print(f"⚡ Regular: {regular_result.strip()}")
    print(f"🧠 CoT: {cot_result.strip()}")
    print(f"🌳 ToT: {tot_result.strip()}")
    print("-" * 40)

print("\n✅ Tree-of-Thought testing completed!")
print("🎯 ToT provides the most thorough analysis by:")
print("   • Exploring multiple interpretation paths")
print("   • Scoring hypotheses for logical consistency") 
print("   • Selecting the most coherent reasoning")
print("=" * 70)

In [5]:
# Detailed Tree-of-Thought Process Demonstration
print("🔍 Step-by-Step Tree-of-Thought Process Demonstration")
print("=" * 70)

# Use a challenging example for demonstration
demo_comment = "The new government policy on renewable energy is too aggressive and will hurt the economy."
demo_reply = "While the timeline is ambitious, investing in clean energy could create new economic opportunities."

print("📝 Demonstration Case:")
print(f"Comment: {demo_comment}")
print(f"Reply: {demo_reply}")
print()

# Step 1: Generate Hypotheses
print("🌱 STEP 1: Generate Multiple Hypotheses")
print("-" * 50)
hypotheses_input = hypothesis_prompt.format(comment=demo_comment, reply=demo_reply)
hypotheses_output = llm.invoke(hypotheses_input)

print("💭 Generated Hypotheses:")
print(hypotheses_output.strip())
print()

# Step 2: Evaluate Hypotheses
print("⚖️ STEP 2: Evaluate Each Hypothesis")
print("-" * 50)
evaluation_input = evaluation_prompt.format(
    comment=demo_comment,
    reply=demo_reply,
    hypotheses=hypotheses_output.strip()
)
evaluation_output = llm.invoke(evaluation_input)

print("📊 Hypothesis Evaluations:")
print(evaluation_output.strip())
print()

# Step 3: Final Decision
print("🎯 STEP 3: Final Decision Based on Evaluations")
print("-" * 50)
decision_input = decision_prompt.format(
    comment=demo_comment,
    reply=demo_reply,
    evaluations=evaluation_output.strip()
)
final_decision = llm.invoke(decision_input)

print("🏆 Final Classification:")
print(final_decision.strip())
print()

print("✅ Tree-of-Thought Process Benefits:")
print("  • Multiple perspectives considered simultaneously")
print("  • Explicit evaluation of reasoning quality")
print("  • Selection based on logical consistency")
print("  • Reduced bias from single interpretation path")
print("  • More robust classification for ambiguous cases")
print("=" * 70)

🔍 Step-by-Step Tree-of-Thought Process Demonstration
📝 Demonstration Case:
Comment: The new government policy on renewable energy is too aggressive and will hurt the economy.
Reply: While the timeline is ambitious, investing in clean energy could create new economic opportunities.

🌱 STEP 1: Generate Multiple Hypotheses
--------------------------------------------------
💭 Generated Hypotheses:
<|assistant|>
Hypothesis 1: Neutral
Reasoning: The reply acknowledges the criticism about the aggressive timeline but shifts the focus to potential economic benefits of the policy. This suggests a balanced view rather than a clear stance.

Hypothesis 2: Agree
Reasoning: The reply supports the idea that investing in renewable energy could create new economic opportunities, even if it challenges the timeline. This indicates a positive outlook on the economic potential of the policy.

Hypothesis 3: Disagree 
Reasoning: The reply highlights the negative impact of the policy on the economy, contradict

## Understanding Tree-of-Thought (ToT) Prompting

### 🌳 Tree-of-Thought Methodology

Tree-of-Thought prompting represents an advanced reasoning technique that explores multiple solution paths simultaneously, evaluates them systematically, and selects the most coherent approach. Unlike linear reasoning methods, ToT creates a "tree" of possible interpretations.

### 🔄 Three-Stage Process

1. **Hypothesis Generation**: Create multiple possible interpretations of the same input
   - Generate 3 different stance hypotheses (agree, disagree, neutral)
   - Each hypothesis includes detailed reasoning
   - Explores different aspects and interpretations

2. **Systematic Evaluation**: Score each hypothesis objectively
   - Assign numerical scores (1-5) for logical consistency
   - Provide justifications for each score
   - Compare reasoning quality across hypotheses

3. **Best Path Selection**: Choose the highest-scoring interpretation
   - Select hypothesis with strongest logical foundation
   - Output final classification based on best reasoning
   - Reduces single-path bias and errors

### 🎯 Advantages Over Other Methods

| Method | Reasoning Paths | Evaluation | Best For |
|--------|----------------|------------|----------|
| **Regular** | Single, direct | None | Clear, obvious cases |
| **Chain-of-Thought** | Single, detailed | Implicit | Transparent reasoning |
| **Tree-of-Thought** | Multiple, evaluated | Explicit scoring | Complex, ambiguous cases |

### 🔬 When to Use ToT

- **Ambiguous inputs** where multiple interpretations are valid
- **High-stakes decisions** requiring thorough analysis
- **Complex reasoning** tasks with subtle nuances
- **Quality assurance** for critical classifications

### 📊 Expected Outcomes

ToT typically provides:
- **Higher accuracy** on difficult cases
- **Better handling** of ambiguous language
- **More consistent** results across similar examples
- **Traceable reasoning** with explicit evaluation criteria

# Self-Consistency Prompt

Given the final stance labels from multiple reasoning approaches—Tree-of-Thought (ToT), Chain-of-Thought (CoT), Few-Shot prompting, and Task-only approach—determine the final stance label ("agree", "disagree", or "neutral") by synthesizing the outputs.

## Inputs:
- **Comment and Reply**: The context for determining the stance.
- **Stance Labels from the four approaches**:
    - `tot_output`: Stance label from the Tree-of-Thought (ToT) approach.
    - `cot_output`: Stance label from the Chain-of-Thought (CoT) approach.
    - `few_shot_output`: Stance label from the Few-Shot prompting approach.
    - `task_output`: Stance label from the Task-only approach.

## Steps:
1. Compare the stance labels from the four approaches.
2. Identify patterns of agreement or disagreement:
   - If there is a majority consensus, select that stance label.
   - If there is no clear majority, resolve the inconsistencies by choosing the most consistent or compelling label based on the distribution of outputs.
3. **Output**: A final stance label of "agree", "disagree", or "neutral", based on the most consistent or majority label. Only output the label, with no additional explanation.

In [13]:
# Self-Consistency Implementation
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

print("🎯 Implementing Self-Consistency Stance Classification")
print("=" * 70)

# First, let's set up the Few-Shot chain that we need for self-consistency
print("📚 Setting up Few-Shot Prompting Chain...")

# Define the template for each individual example
example_template = '''comment: {comment}
reply: {reply}
stance: {stance}'''

# Create the example prompt template
example_prompt = PromptTemplate(
    input_variables=["comment", "reply", "stance"],
    template=example_template
)

# Create diverse few-shot examples
examples = [
    {
        'comment': "I think the new policy will help improve efficiency.",
        'reply': "I agree, it will definitely streamline our operations.",
        'stance': 'agree'
    },
    {
        'comment': "The new education reform seems promising.", 
        'reply': "I disagree, it doesn't address the underlying issues.",
        'stance': 'disagree'
    },
    {
        'comment': "The park renovation project is a good idea.",
        'reply': "I'm not sure. The location might be an issue.",
        'stance': 'neutral'
    },
    {
        'comment': "Artificial intelligence will revolutionize healthcare.",
        'reply': "Absolutely, it has the potential to save many lives.",
        'stance': 'agree'
    },
    {
        'comment': "The economy is showing signs of recovery after the pandemic.",
        'reply': "I disagree, the recovery seems slow and uneven across sectors.",
        'stance': 'disagree'
    }
]

# Define the prefix and suffix for few-shot prompting
prefix = '''Stance classification is the task of determining the expressed or implied opinion, or stance, of a reply toward a comment. The following replies express opinions about the associated comment. Each reply can either be "agree", "disagree", or "neutral" toward the comment.'''

suffix = '''Analyze the following reply to the provided comment and determine its stance. Respond with a single word: "agree", "disagree", or "neutral". Only return the stance as a single word, and no other text.
comment: {comment}
reply: {reply}
stance:'''

# Create the FewShotPromptTemplate
few_shot_stance_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["comment", "reply"],
    example_separator="\n\n"
)

# Create few-shot chain
few_shot_stance_chain = few_shot_stance_prompt | llm | StrOutputParser()

print("✅ Few-Shot Chain Created")

# Now implement the Self-Consistency template
consistency_template = '''Consider the following comment and reply:
Comment: {comment}
Reply: {reply}

You have been provided with stance outputs generated by four different approaches:
1. Tree-of-Thought (ToT) approach: {tot_output}
2. Chain-of-Thought (CoT) approach: {cot_output}
3. Few-Shot approach: {few_shot_output}
4. Task-only approach: {task_output}

Compare these outputs and determine the most likely stance label. If there is a majority consensus among the approaches, select that stance. If there is no clear majority, choose the most consistent or compelling label based on the distribution of outputs.

Output the final stance label as "agree", "disagree", or "neutral" based on the most consistency across the responses. Only output the stance label as a single word and do not generate any other text after the label.

Final stance:'''

# Create the Self-Consistency prompt template
consistency_prompt = PromptTemplate(
    input_variables=["comment", "reply", "tot_output", "cot_output", "few_shot_output", "task_output"],
    template=consistency_template
)

print("✅ Self-Consistency Template Created")
print("🔗 Integration: Task-only → CoT → Few-Shot → ToT → Self-Consistency")
print("=" * 70)

🎯 Implementing Self-Consistency Stance Classification
📚 Setting up Few-Shot Prompting Chain...
✅ Few-Shot Chain Created
✅ Self-Consistency Template Created
🔗 Integration: Task-only → CoT → Few-Shot → ToT → Self-Consistency


In [14]:
# Create the Self-Consistency Chain
print("🔧 Building Self-Consistency Processing Chain...")
print("-" * 50)

# Self-Consistency chain that runs all four approaches and synthesizes results
self_consistency_chain = (
    {
        "comment": RunnablePassthrough(),
        "reply": RunnablePassthrough(),
    }
    | RunnablePassthrough.assign(
        # Run all four approaches in parallel
        task_output=stance_chain,
        cot_output=cot_stance_chain,
        few_shot_output=few_shot_stance_chain,
        tot_output=tot_stance_chain
    )
    | consistency_prompt
    | llm
    | StrOutputParser()
)

print("✅ Self-Consistency Chain Created Successfully!")
print("🎯 Processing Flow:")
print("   1. Run Task-only approach (Regular prompt)")
print("   2. Run Chain-of-Thought approach")  
print("   3. Run Few-Shot approach")
print("   4. Run Tree-of-Thought approach")
print("   5. Synthesize all outputs for final decision")
print("=" * 70)

🔧 Building Self-Consistency Processing Chain...
--------------------------------------------------
✅ Self-Consistency Chain Created Successfully!
🎯 Processing Flow:
   1. Run Task-only approach (Regular prompt)
   2. Run Chain-of-Thought approach
   3. Run Few-Shot approach
   4. Run Tree-of-Thought approach
   5. Synthesize all outputs for final decision


In [15]:
# Test Self-Consistency Stance Classification
print("🧪 Testing Self-Consistency Stance Classification")
print("=" * 70)

# Test with challenging cases that might produce different results across methods
test_cases = [
    {
        "comment": "The new AI regulation will stifle innovation in the tech industry.",
        "reply": "That's a valid concern, though some oversight might be necessary for safety.",
        "description": "Nuanced response - acknowledges concern but suggests need for regulation"
    },
    {
        "comment": "Remote work is destroying team collaboration and productivity.",
        "reply": "I partially agree - while productivity can suffer, remote work also offers flexibility benefits.",
        "description": "Complex case with partial agreement and counterpoint"
    },
    {
        "comment": "The new Dune movie does not really capture the vision laid out by Frank Herbert.",
        "reply": "I think the new Dune movie better captures the spirit, if not the content, of Frank Herbert's philosophy.",
        "description": "Ambiguous case - seems to disagree but offers alternative perspective"
    }
]

for i, case in enumerate(test_cases, 1):
    print(f"\n📊 Self-Consistency Test Case {i}: {case['description']}")
    print(f"Comment: {case['comment']}")
    print(f"Reply: {case['reply']}")
    print()
    
    # Run self-consistency analysis
    print("🔄 Running All Four Approaches...")
    
    # Get individual results first for comparison
    task_result = stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    cot_result = cot_stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    few_shot_result = few_shot_stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    tot_result = tot_stance_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    print(f"⚡ Task-only: {task_result.strip()}")
    print(f"🧠 Chain-of-Thought: {cot_result.strip()}")
    print(f"📚 Few-Shot: {few_shot_result.strip()}")
    print(f"🌳 Tree-of-Thought: {tot_result.strip()}")
    
    # Run self-consistency
    consistency_result = self_consistency_chain.invoke({
        "comment": case["comment"],
        "reply": case["reply"]
    })
    
    print(f"🎯 Self-Consistency Final: {consistency_result.strip()}")
    print("-" * 50)

print("\n✅ Self-Consistency testing completed!")
print("🎯 Self-Consistency provides the most robust classification by:")
print("   • Running multiple reasoning approaches")
print("   • Identifying consensus across methods")
print("   • Resolving inconsistencies intelligently")
print("   • Reducing single-method bias")
print("=" * 70)

🧪 Testing Self-Consistency Stance Classification

📊 Self-Consistency Test Case 1: Nuanced response - acknowledges concern but suggests need for regulation
Comment: The new AI regulation will stifle innovation in the tech industry.
Reply: That's a valid concern, though some oversight might be necessary for safety.

🔄 Running All Four Approaches...
⚡ Task-only: neutral
🧠 Chain-of-Thought: <|assistant|>
disagree
📚 Few-Shot: neutral
<|assistant|>
neutral
🌳 Tree-of-Thought: "agree"
🎯 Self-Consistency Final: <|assistant|>
agree
--------------------------------------------------

📊 Self-Consistency Test Case 2: Complex case with partial agreement and counterpoint
Comment: Remote work is destroying team collaboration and productivity.
Reply: I partially agree - while productivity can suffer, remote work also offers flexibility benefits.

🔄 Running All Four Approaches...
⚡ Task-only: neutral
🧠 Chain-of-Thought: <|assistant|>
neutral
📚 Few-Shot: neutral
<|assistant|>
neutral
🌳 Tree-of-Thought: <

In [16]:
# Detailed Self-Consistency Process Demonstration
print("🔍 Step-by-Step Self-Consistency Process Demonstration")
print("=" * 70)

# Use a complex example for demonstration
demo_comment = "Climate change is the most urgent global challenge we face today."
demo_reply = "While it's certainly important, economic recovery should be our immediate priority right now."

print("📝 Demonstration Case:")
print(f"Comment: {demo_comment}")
print(f"Reply: {demo_reply}")
print()

print("🔄 STEP 1-4: Running Individual Approaches")
print("-" * 50)

# Step 1: Task-only (Regular)
print("⚡ Task-only Approach:")
task_result = stance_chain.invoke({
    "comment": demo_comment,
    "reply": demo_reply
})
print(f"Result: {task_result.strip()}")
print()

# Step 2: Chain-of-Thought
print("🧠 Chain-of-Thought Approach:")
cot_result = cot_stance_chain.invoke({
    "comment": demo_comment,
    "reply": demo_reply
})
print(f"Result: {cot_result.strip()}")
print()

# Step 3: Few-Shot
print("📚 Few-Shot Approach:")
few_shot_result = few_shot_stance_chain.invoke({
    "comment": demo_comment,
    "reply": demo_reply
})
print(f"Result: {few_shot_result.strip()}")
print()

# Step 4: Tree-of-Thought
print("🌳 Tree-of-Thought Approach:")
tot_result = tot_stance_chain.invoke({
    "comment": demo_comment,
    "reply": demo_reply
})
print(f"Result: {tot_result.strip()}")
print()

print("🎯 STEP 5: Self-Consistency Synthesis")
print("-" * 50)

# Create the consistency input manually to show the process
consistency_input = consistency_prompt.format(
    comment=demo_comment,
    reply=demo_reply,
    task_output=task_result.strip(),
    cot_output=cot_result.strip(),
    few_shot_output=few_shot_result.strip(),
    tot_output=tot_result.strip()
)

print("📋 Input to Self-Consistency Model:")
print(consistency_input)
print()

# Get final result
final_result = llm.invoke(consistency_input)
print("🏆 Final Self-Consistency Decision:")
print(f"Result: {final_result.strip()}")
print()

print("📊 Analysis Summary:")
results = [task_result.strip(), cot_result.strip(), few_shot_result.strip(), tot_result.strip()]
unique_results = list(set(results))
print(f"Individual Results: {results}")
print(f"Unique Stances: {unique_results}")
print(f"Most Common: {max(set(results), key=results.count) if len(set(results)) < len(results) else 'No majority'}")
print(f"Self-Consistency Choice: {final_result.strip()}")
print()

print("✅ Self-Consistency Process Benefits:")
print("  • Ensemble approach reduces individual method errors")
print("  • Intelligent synthesis handles disagreements")
print("  • More robust than any single reasoning method")
print("  • Provides confidence through consensus")
print("=" * 70)

🔍 Step-by-Step Self-Consistency Process Demonstration
📝 Demonstration Case:
Comment: Climate change is the most urgent global challenge we face today.
Reply: While it's certainly important, economic recovery should be our immediate priority right now.

🔄 STEP 1-4: Running Individual Approaches
--------------------------------------------------
⚡ Task-only Approach:
Result: disagree

🧠 Chain-of-Thought Approach:
Result: <|assistant|>
disagree

📚 Few-Shot Approach:
Result: disagree

<|assistant|>
disagree

🌳 Tree-of-Thought Approach:
Result: Based on the evaluation scores and justifications, the best hypothesis is Hypothesis 2 with a score of 4. This hypothesis best captures the neutral stance of the reply, where neither the comment nor the reply strongly supports or opposes each other, but instead maintains a balanced view.

Final Stance: "neutral"

🎯 STEP 5: Self-Consistency Synthesis
--------------------------------------------------
📋 Input to Self-Consistency Model:
Consider the fol

## Understanding Self-Consistency Prompting

### 🎯 Self-Consistency Methodology

Self-Consistency prompting represents the pinnacle of robust reasoning by combining multiple independent approaches and synthesizing their outputs. Rather than relying on a single reasoning path, it leverages the "wisdom of crowds" principle applied to different AI reasoning strategies.

### 🔄 Four-Approach Ensemble

Our Self-Consistency implementation combines:

1. **Task-only (Regular)**: Direct, straightforward classification
2. **Chain-of-Thought (CoT)**: Linear reasoning with explicit steps
3. **Few-Shot**: Learning from examples and patterns
4. **Tree-of-Thought (ToT)**: Multiple hypotheses with evaluation

### 📊 Synthesis Process

| Step | Process | Purpose |
|------|---------|---------|
| **Parallel Execution** | Run all four approaches simultaneously | Generate diverse perspectives |
| **Consensus Detection** | Identify majority agreement | Find confident classifications |
| **Conflict Resolution** | Analyze disagreements intelligently | Handle ambiguous cases |
| **Final Decision** | Synthesize into single output | Provide most robust answer |

### 🎯 Consensus Rules

- **Majority Consensus**: If 3+ methods agree → Select majority stance
- **Split Decision**: If 2-2 split → Analyze reasoning quality and context
- **No Consensus**: If all different → Weight by method reliability and reasoning strength

### 🔬 When Self-Consistency Excels

- **High-stakes decisions** requiring maximum confidence
- **Ambiguous inputs** where single methods might fail
- **Quality assurance** for critical classification tasks
- **Research applications** where accuracy is paramount

### 📈 Expected Benefits

Self-Consistency typically provides:
- **Highest accuracy** through ensemble averaging
- **Reduced bias** from any single reasoning approach
- **Increased confidence** through consensus measurement
- **Error detection** when methods disagree significantly

### 🏆 Complete Reasoning Hierarchy

```
Simple → Complex → Robust
Task-only → CoT → Few-Shot → ToT → Self-Consistency
```

Each level builds upon the previous, culminating in Self-Consistency as the most comprehensive approach for stance classification.