<a href="https://colab.research.google.com/github/SushmithaSudharsan/Generative_AI_Experiments/blob/main/M3_Lab1_Prompting_Strategies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<!-- Intro Section -->
<div style="background: linear-gradient(135deg, #001a70 0%, #0055d4 100%); color: white; padding: 30px; border-radius: 12px; text-align: center; box-shadow: 0 4px 12px rgba(0,0,0,0.1);">
    <h1 style="margin-bottom: 10px; font-size: 32px;">Introduction to Prompting Strategies</h1>
    <p style="font-size: 18px; margin: 0;">Instructor: <strong>Dr. Dehghani</strong></p>
</div>

<!-- Spacer -->
<div style="height: 30px;"></div>

<!-- Why It Matters Section -->
<div style="background: #ffffff; padding: 25px; border-radius: 10px; border-left: 6px solid #0055d4; box-shadow: 0 4px 8px rgba(0,0,0,0.05);">
    <h2 style="margin-top: 0; color: #001a70;">Why Prompting Strategies Matter</h2>
    <p style="font-size: 16px; line-height: 1.6;">
        Imagine you‚Äôre working with a junior engineer. You say:  
        <em>‚ÄúOptimize the system.‚Äù</em><br>
        They‚Äôll probably ask: <em>‚ÄúWhich system? Optimize for cost, speed, or energy? Any constraints?‚Äù</em> üßê
    </p>
    <p style="font-size: 16px; line-height: 1.6;">
        Now try this instead:  
        <em>‚ÄúAnalyze the HVAC system and minimize energy consumption while keeping temperatures between 22-24¬∞C. Provide a cost breakdown.‚Äù</em>  
    </p>
    <p style="font-size: 16px; line-height: 1.6;">
        That‚Äôs not just a prompt‚Äîit‚Äôs a <strong>clear strategy</strong> with defined objectives and boundaries.
        And that‚Äôs exactly what AI models need to perform at their best.
    </p>
</div>

<!-- Tip Section -->
<div style="background: #f5faff; padding: 20px; border-radius: 8px; border-left: 5px solid #0055d4; margin-top: 30px;">
    <h3 style="margin-top: 0; color: #0055d4;">üí° Pro Tip</h3>
    <p style="margin: 0; font-size: 16px; line-height: 1.6;">
        AI models appreciate well-structured instructions just like engineers appreciate complete design specs.
        Be specific, set clear goals, and watch the results improve!
    </p>
</div>

<!-- Upcoming Topics -->
<div style="margin-top: 40px; text-align: center;">
    <h3 style="color: #001a70;">What‚Äôs Ahead</h3>
    <ul style="list-style: none; padding: 0; font-size: 16px; line-height: 1.8;">
        <li>üìö Basic Prompting Types</li>
        <li>üß© Advanced Strategies</li>
        <li>üìä Application-Specific Techniques</li>
    </ul>
    <p style="font-size: 16px; color: #333;">Let‚Äôs engineer some powerful AI conversations! üõ†Ô∏è</p>
</div>


<!-- Section Header -->
<div style="background: linear-gradient(135deg, #001a70 0%, #0055d4 100%); color: white; padding: 25px; border-radius: 12px; text-align: center; box-shadow: 0 4px 12px rgba(0,0,0,0.1);">
    <h1 style="margin-bottom: 10px; font-size: 30px;">üìö Basic Prompting Types</h1>
</div>

<!-- Spacer -->
<div style="height: 25px;"></div>

<!-- Zero-Shot Prompting -->
<div style="background: #ffffff; padding: 20px; border-radius: 10px; border-left: 6px solid #0055d4; margin-bottom: 20px;">
    <h3 style="margin-top: 0; color: #001a70;">1Ô∏è‚É£ Zero-Shot Prompting</h3>
    <p style="font-size: 16px; line-height: 1.6;">
        Provide only the task without any examples.  
        <strong>Use When:</strong> The task is simple and well-known by the model.  
        <em>Example:</em> ‚ÄúTranslate 'Hello' to French.‚Äù
    </p>
</div>


In [2]:
# ==========================
# üìå Set Up LLM and OpenAI API
# ==========================
# Import required libraries
from google.colab import userdata
import openai
import os

# Load the OpenAI API key securely from Colab secrets
api_key = userdata.get('OpenAI_Key')

# Check that the API key was found
if api_key is None:
    raise ValueError("‚ùå API Key not found. Please store your OpenAI API key using Colab secrets.")

# Set API key as environment variable for OpenAI
os.environ["OPENAI_API_KEY"] = api_key

# Initialize OpenAI client
client = openai.OpenAI(api_key=api_key)

print("‚úÖ OpenAI API Key successfully loaded and environment is ready!")

# ==========================
# üìå Set LLM Model to GPT-3.5
# ==========================
# Define which LLM model to use
model_name = "gpt-3.5-turbo"

print(f"‚úÖ LLM model set to: {model_name}")


‚úÖ OpenAI API Key successfully loaded and environment is ready!
‚úÖ LLM model set to: gpt-3.5-turbo


In [7]:
# ==========================
# üìå Zero-Shot Test: Hidden Formula Sequence
# ==========================

hard_sequence_prompt_zero = (
    "The sequence is: 3, 12, 27, 48, 75, ___. What‚Äôs next?"
)

response_zero_hard = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": hard_sequence_prompt_zero}],
    temperature=0.6
)

print("üîπ LLM Response (Zero-Shot - Hard Sequence):\n")
print(response_zero_hard.choices[0].message.content.strip())


üîπ LLM Response (Zero-Shot - Hard Sequence):

The next number in the sequence is 108. 

To find the pattern, we can see that the difference between each consecutive number is increasing by 3 each time. 

3 + 9 = 12
12 + 15 = 27
27 + 21 = 48
48 + 27 = 75
75 + 33 = 108

Therefore, the next number in the sequence is 108.



<!-- One-Shot Prompting -->
<div style="background: #ffffff; padding: 20px; border-radius: 10px; border-left: 6px solid #0055d4; margin-bottom: 20px;">
    <h3 style="margin-top: 0; color: #001a70;">2Ô∏è‚É£ One-Shot Prompting</h3>
    <p style="font-size: 16px; line-height: 1.6;">
        Provide one clear example along with the instruction.  
        <strong>Use When:</strong> You want to guide the model‚Äôs behavior with a single example.  
        <em>Example:</em> ‚ÄúTranslate 'Hello' to French: Bonjour. Now translate 'Goodbye'.‚Äù
    </p>
</div>


In [4]:
# ==========================
# üìå Zero-Shot vs One-Shot Comparison: Alternating Pattern Sequence (Correct One-Shot)
# ==========================

model_name = "gpt-3.5-turbo"

# Zero-Shot Prompt (No Example)
zero_shot_prompt = (
    "The sequence is: 1, 4, 2, 9, 3, 16, 4, ___. What number should replace the blank?"
)

# One-Shot Prompt (One Example + New Question)
one_shot_prompt = (
    "Example:\n"
    "The sequence is: 1, 1, 2, 4, 3, 9, ___. What‚Äôs next?\n"
    "Answer: 4.\n\n"
    "Now solve this one:\n"
    "The sequence is: 1, 4, 2, 9, 3, 16, 4, ___. What number should replace the blank?"
)

# Run Zero-Shot
response_zero = client.chat.completions.create(
    model=model_name,
    messages=[{"role": "user", "content": zero_shot_prompt}],
    temperature=0.6
)

# Run One-Shot
response_one = client.chat.completions.create(
    model=model_name,
    messages=[{"role": "user", "content": one_shot_prompt}],
    temperature=0.6
)

# Display Results
print("üîπ Zero-Shot Response:\n" + "-"*40)
print(response_zero.choices[0].message.content.strip())

print("\n\nüîπ One-Shot Response:\n" + "-"*40)
print(response_one.choices[0].message.content.strip())


üîπ Zero-Shot Response:
----------------------------------------
The sequence follows the pattern of adding 1 to the previous number, then squaring the result. 

1 + 1 = 2, 2^2 = 4
4 + 1 = 5, 5^2 = 25

Therefore, the number that should replace the blank is 25.


üîπ One-Shot Response:
----------------------------------------
Answer: 25.



<!-- Few-Shot Prompting -->
<div style="background: #ffffff; padding: 20px; border-radius: 10px; border-left: 6px solid #0055d4; margin-bottom: 20px;">
    <h3 style="margin-top: 0; color: #001a70;">3Ô∏è‚É£ Few-Shot Prompting</h3>
    <p style="font-size: 16px; line-height: 1.6;">
        Provide multiple examples to clearly demonstrate the pattern.  
        <strong>Use When:</strong> The task is complex or requires understanding a specific format.  
        <em>Example:</em>  
        - ‚ÄúTranslate 'Hello' to French: Bonjour.‚Äù  
        - ‚ÄúTranslate 'Goodbye' to French: Au revoir.‚Äù  
        - ‚ÄúTranslate 'Thank you' to French: Merci.‚Äù  
        Now translate 'Good night'.
    </p>
</div>

<!-- Spacer -->
<div style="height: 30px;"></div>

<!-- Closing Tip -->
<div style="background: #f5faff; padding: 20px; border-radius: 8px; border-left: 5px solid #0055d4;">
    <h3 style="margin-top: 0; color: #0055d4;">üí° Quick Reminder</h3>
    <p style="margin: 0; font-size: 16px; line-height: 1.6;">
        The more complex the task, the more examples you should provide. But remember, too many examples can make prompts bulky and inefficient.
    </p>
</div>



In [8]:
# ==========================
# üìå Few-Shot Prompting Example: Ultra-Hard Pattern (3 Hidden Rules)
# ==========================

model_name = "gpt-4-turbo"  # Best for complex reasoning

# Few-Shot Prompt with 2 Examples
few_shot_prompt = (
    "Example 1:\n"
    "The sequence is: 1, 1, 2, 4, 3, 9, ___. What‚Äôs next?\n"
    "Answer: 4.\n\n"
    "Example 2:\n"
    "The sequence is: 1, 1, 2, 4, 4, 9, 7, 16, ___. What‚Äôs next?\n"
    "Answer: 11.\n\n"
    "Now try this one:\n"
    "The sequence is: 1, 1, 2, 4, 4, 9, 7, 16, 11, ___, 16, 36. What number should replace the blank?"
)

# Run Few-Shot Prompt
response_few = client.chat.completions.create(
    model=model_name,
    messages=[{"role": "user", "content": few_shot_prompt}],
    temperature=0.6
)

# Display Result
print("üîπ Few-Shot Prompting (Two Examples Provided):")
print("-" * 40)
print(response_few.choices[0].message.content.strip())


üîπ Few-Shot Prompting (Two Examples Provided):
----------------------------------------
To solve this, let's analyze the sequences provided in the examples and the new sequence:

In Example 1:
- The sequence was 1, 1, 2, 4, 3, 9, ___ (next is 4).
  When analyzing, it seems to alternate between squares of integers and adjacent numbers:
  1^2, 1, 2^2-2, 2^2, 3^2-6, 3^2, ___
  Following this pattern, the next number after 9 (3^2) would be 4 (2^2).

In Example 2:
- The sequence was 1, 1, 2, 4, 4, 9, 7, 16, ___ (next is 11).
  Following a similar alternating pattern:
  1^2, 1, 2^2-2, 2^2, 3^2-6, 3^2, 4^2-9, 4^2, ___
  Following this pattern, the next number after 16 (4^2) would be 11 (5^2-14).

Now, for the new sequence:
- 1, 1, 2, 4, 4, 9, 7, 16, 11, ___, 16, 36
  We attempt to apply a similar pattern:
  1^2, 1, 2^2-2, 2^2, 3^2-6, 3^2, 4^2-9, 4^2, 5^2-14, ___, 6^2-20, 6^2

From the pattern, the next number after 11 (5^2 - 14) should be:
  6^2 - 20 = 36 - 20 = 16

Therefore, the number th

## üß† Advanced Prompting Techniques  

Moving beyond basic prompting methods like zero-shot and few-shot, advanced strategies help enhance the reasoning and adaptability of large language models (LLMs). These techniques guide the model's thought process to handle complex tasks more effectively.

---

### üîó Chain-of-Thought (CoT) Prompting  

Chain-of-Thought prompting encourages models to **explain their intermediate reasoning steps**, leading to more transparent and accurate conclusions. By structuring prompts to include logical steps, CoT improves the model‚Äôs ability to solve complex reasoning tasks.

**Why is CoT Important?**  
- ‚úîÔ∏è Improves performance on multi-step reasoning tasks.  
- ‚úîÔ∏è Helps produce logically structured and coherent responses.  
- ‚úîÔ∏è Breaks down complex problems into manageable steps.

üìñ **Reference:** [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)

---

*Next, explore practical examples of Chain-of-Thought prompting.*


In [11]:
# ==========================
# üìå Chain-of-Thought Demonstration: Make 110 with Five 5's
# ==========================

model_name = "gpt-4-turbo"

# Zero-Shot Prompt (No Reasoning Encouraged)
zero_shot_prompt = (
    "Use exactly five 5‚Äôs and only four operations (+, -, *, /) and parentheses to make 110."
)

# Chain-of-Thought Prompt (Encourages Step-by-Step Reasoning)
cot_prompt = (
    "Let's solve this step by step.\n"
    "We need to use exactly five 5‚Äôs and only four operations (+, -, *, /) and parentheses to make 110.\n"
    "Step 1: Think about how we can combine the 5's to form larger numbers (e.g., 55).\n"
    "Step 2: Try to combine them logically to reach 110.\n"
    "Now, provide the final equation and the answer."
)

# Run Zero-Shot
response_zero = client.chat.completions.create(
    model=model_name,
    messages=[{"role": "user", "content": zero_shot_prompt}],
    temperature=0.6
)

# Run Chain-of-Thought
response_cot = client.chat.completions.create(
    model=model_name,
    messages=[{"role": "user", "content": cot_prompt}],
    temperature=0.3
)

# Display Results
print("üîπ Zero-Shot Response (No Reasoning Encouraged):\n" + "-" * 50)
print(response_zero.choices[0].message.content.strip())

print("\nüîπ Chain-of-Thought Response (Reasoning Encouraged):\n" + "-" * 50)
print(response_cot.choices[0].message.content.strip())


üîπ Zero-Shot Response (No Reasoning Encouraged):
--------------------------------------------------
To achieve the target number 110 using exactly five 5's and the operations (+, -, *, /), along with parentheses, you can use the following expression:

\( (5 \times 5 + 5/5) \times (5+5) = 110 \)

Here's the breakdown:

1. \( 5 \times 5 = 25 \)
2. \( 5/5 = 1 \)
3. \( 25 + 1 = 26 \) (result of the first part inside the parentheses)
4. \( 5 + 5 = 10 \) (result of the second part inside the parentheses)
5. \( 26 \times 10 = 260 \)

This is the correct expression and solution to achieve 110 using five 5's and basic operations.

üîπ Chain-of-Thought Response (Reasoning Encouraged):
--------------------------------------------------
To solve this, let's follow the steps you outlined:

Step 1: Consider combining the 5's to form larger numbers or using them individually. We can think of using two 5's to make 55, or using them as they are.

Step 2: Try to combine these numbers logically to rea

# ‚úã Hands-On Experiment: Observations  

üìå **Instructions:**  
- Run your experiments by changing the model type (e.g., `gpt-3.5-turbo`, `gpt-4-turbo`, `gpt-o3`), temperature, and prompt style.  
- You can **either attach a screenshot/image of your results** or **write a brief summary of your observations (max half a page)**.

---

- **Model Used:**  
  _I used gpt-4-turbo primarily and compared it with gpt-3.5-turbo for performance evaluation._

- **Temperature Setting:**  
  _The temperature was set to 0.7 to allow a balance between creativity and accuracy. I also tested a lower temperature (0.3) to observe differences in determinism._


- **Zero-Shot Result:**  
  _Yes in the example I gave, the zero shot prompting worked correctly as it was not much of a complex task to tackle without prior examples._

- **Chain-of-Thought Result:**  
  Yes, with chain-of-thought prompting, the model demonstrated better reasoning and clarity. By analyzing the problem step by step, it produced a more structured and reliable solution to the math problem, especially when guided by a prior example._

- **Key Takeaways (Max Half Page or Screenshot):**  
  From my experiments, I noticed that gpt-4-turbo performed better than gpt-3.5-turbo, especially for questions that required logical reasoning or step-by-step explanations. The answers from gpt-4-turbo were clearer, more accurate, and easier to follow. I also observed that the temperature setting made a difference. When I used a lower temperature (around 0.3), the responses were more straightforward and factual, while a higher temperature (0.7) made the answers slightly more creative without affecting correctness. Chain-of-thought prompting helped a lot with problem-solving tasks, as it guided the model to think through the problem before answering. On the other hand, zero-shot prompting worked well for simpler tasks. Overall, using gpt-4-turbo with chain-of-thought prompting and a moderate temperature gave the best results.

---

‚úçÔ∏è *Try at least two models and different temperatures. Compare the results and reflect on how prompting strategies influence performance!*


## üîÅ Self-Consistency Prompting

While Chain-of-Thought (CoT) improves reasoning by encouraging step-by-step thinking, it may still produce **inconsistent or incorrect** answers, especially in complex scenarios.  
**Self-Consistency Prompting** enhances CoT by asking the model to **generate multiple reasoning paths** and then select the most common or consistent final answer.

### Why is Self-Consistency Useful?

- ‚úÖ Reduces random reasoning errors.
- ‚úÖ Boosts reliability on ambiguous or multi-path problems.
- ‚úÖ Often improves performance on mathematical, logical, and symbolic tasks.

üìñ **Reference**: [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171)

---

*Next, we‚Äôll see how Self-Consistency works in action using a complex reasoning example.*


In [10]:
# ==========================
# üìå Comparing Chain-of-Thought vs. Self-Consistency Prompting
# ==========================

model_name = "gpt-4-turbo"  # Using GPT-4 for better reasoning

# Define the problem prompt
problem_prompt = (
    "If a train travels at 60 miles per hour and leaves at 2 PM, and another train leaves "
    "the same station at 3 PM traveling at 90 miles per hour, when will the second train catch up to the first?"
)

# Chain-of-Thought Prompt (Standard)
cot_prompt = (
    "Let's solve this step by step.\n"
    + problem_prompt
)

# Self-Consistency Prompt: Ask the model to produce multiple reasoning paths
def run_self_consistency(prompt, num_attempts=5):
    answers = []
    for _ in range(num_attempts):
        response = client.chat.completions.create(
            model=model_name,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7  # Add randomness to explore different reasoning paths
        )
        answer = response.choices[0].message.content.strip()
        answers.append(answer)
    return answers

# Run Chain-of-Thought (Single Attempt)
response_cot = client.chat.completions.create(
    model=model_name,
    messages=[{"role": "user", "content": cot_prompt}],
    temperature=0
)
cot_answer = response_cot.choices[0].message.content.strip()

# Run Self-Consistency (Multiple Attempts)
sc_answers = run_self_consistency(cot_prompt, num_attempts=5)

# Simple Majority Vote to Find Most Consistent Answer
from collections import Counter
most_common_answer = Counter(sc_answers).most_common(1)[0]

# Display Results
print("üîπ Chain-of-Thought Response (Single Attempt):\n" + "-" * 50)
print(cot_answer)

print("\nüîπ Self-Consistency Responses (Multiple Attempts):\n" + "-" * 50)
for idx, ans in enumerate(sc_answers, 1):
    print(f"Attempt {idx}: {ans}")

print("\nüîπ Final Self-Consistency Selected Answer:\n" + "-" * 50)
print(f"Most Common Answer: {most_common_answer[0]}\nAppeared {most_common_answer[1]} times.")


üîπ Chain-of-Thought Response (Single Attempt):
--------------------------------------------------
To find out when the second train will catch up to the first, we can start by calculating how far the first train has traveled by the time the second train leaves.

1. **Calculate the distance traveled by the first train in one hour:**
   Since the first train travels at 60 miles per hour and it leaves at 2 PM, by 3 PM (one hour later), it will have traveled:
   \[
   \text{Distance} = \text{Speed} \times \text{Time} = 60 \text{ miles per hour} \times 1 \text{ hour} = 60 \text{ miles}
   \]

2. **Set up the equation to find when the second train catches up:**
   Let \( t \) be the time in hours after 3 PM when the second train catches up to the first train. At this time, both trains will have traveled the same distance from the station.

   - The first train will have traveled \( 60 \text{ miles} + 60 \text{ miles per hour} \times t \) (since it continues to travel after 3 PM).
   - The 

<div style="background: linear-gradient(135deg, #001a70 0%, #0055d4 100%); color: white; padding: 25px; border-radius: 12px; text-align: center;">
    <h1 style="margin-bottom: 10px;">üìö Exploring More Advanced Prompting Strategies</h1>
</div>

<div style="background: #ffffff; padding: 20px; border-radius: 10px; border-left: 6px solid #0055d4; margin-top: 20px;">
    <ul style="font-size: 16px; line-height: 1.8;">
        <li><strong>üß© Tree-of-Thought (ToT) Prompting:</strong> Explores multiple reasoning paths like a decision tree, helping the model evaluate and compare various solutions before choosing the best one.</li>
        <li><strong>ü§ñ ReAct (Reasoning and Acting) Prompting:</strong> Combines reasoning steps with actions, including API calls or external tool usage. Ideal for interactive agents and dynamic decision-making tasks.</li>
        <li><strong>üîÑ Reflexion Prompting:</strong> Encourages the model to critique its own responses and iteratively improve them, simulating self-correction and learning.</li>
    </ul>
</div>

<div style="margin-top: 40px; text-align: center;">
    <h2 style="color: #001a70;">‚úã Hands-On Task: Compare Prompting Strategies</h2>
</div>

<div style="background: #f5faff; padding: 20px; border-radius: 8px; border-left: 5px solid #0055d4;">
    <p style="font-size: 16px;">
        üìå <strong>Task Instructions:</strong><br>
        - Experiment with <strong>Self-Consistency</strong>, <strong>Tree-of-Thought</strong>, and <strong>ReAct</strong> prompting methods.<br>
        - Try to solve the following problem using each method and compare the results.
    </p>
</div>

<div style="background: #ffffff; padding: 20px; border-radius: 10px; border-left: 6px solid #0055d4; margin-top: 20px;">
    <h3>üß† <strong>Challenge Problem:</strong></h3>
    <p style="font-size: 16px;">A farmer has chickens and rabbits in a cage. There are 35 heads and 94 legs. How many chickens and rabbits are there?</p>
</div>

<div style="margin-top: 40px;">
    <ul style="font-size: 16px; line-height: 1.8;">
        <li>Try different models (e.g., <code>gpt-3.5-turbo</code>, <code>gpt-4-turbo</code>, <code>gpt-o3</code>).</li>
        <li>Experiment with different temperatures (e.g., <code>0.0</code>, <code>0.5</code>, <code>0.7</code>).</li>
        <li>Use both direct prompts and advanced strategies like CoT, Self-Consistency, or ReAct.</li>
    </ul>
</div>

<div style="margin-top: 40px; text-align: center;">
    <h2 style="color: #001a70;">üìñ Observations</h2>
</div>

<div style="background: #ffffff; padding: 20px; border-radius: 10px; border-left: 6px solid #0055d4;">
    <ul style="font-size: 16px; line-height: 1.8;">
        <li><strong>Model and Strategy Used:</strong><br>_ I tried Chain of Thought Prompting and Self Consistency Prompting Techniques_</li>
        <li><strong>Was the Correct Answer Found?</strong><br>_ Yes, the answer found by the model was correct. _</li>
        <li><strong>Key Takeaways (Max Half Page or Screenshot):</strong><br>_During testing of these approaches on the Chickens & Rabbits problem, Chain of Thought and Self-Consistency methods showed the best results. The Chain of Thought method directed the model to reason step by step, thereby reducing any arithmetic error, and the Self-Consistency method ensured reliability by repeatedly solving the problem and then selecting the most consistent answer. Although the direct method was faster, it was less reliable, and the ReAct method was slightly slower but showed increased accuracy with verification._</li>
    </ul>
</div>

<div style="margin-top: 20px; text-align: center;">
    ‚úçÔ∏è <em>Hint: Try breaking down the problem into equations or ask the model to explain its steps before giving the final answer. Notice which strategies lead to faster and more accurate results!</em>
</div>


In [None]:
# ==========================
# ‚úã Hands-On Code: Try Different Prompting Strategies and Models
# ==========================

# üìù Instructions:
# - Change 'model_name' to try different models (e.g., "gpt-3.5-turbo", "gpt-4-turbo", "gpt-o3").
# - Adjust 'temperature' to test how creativity affects reasoning.
# - Try Self-Consistency by sampling multiple outputs and comparing answers.
# - Optionally, explore Tree-of-Thought and ReAct patterns by modifying prompts.
# ‚úÖ Your Experiment Starts Here üëá


In [14]:
# ==========================
# üìå Prompting Strategy Comparison: Chickens & Rabbits Problem
# Model & Temperature Sensitivity Included
# ==========================

# ==========================
# üîπ Prompts
# ==========================
direct_prompt = (
    "A farmer has chickens and rabbits in a cage. "
    "There are 35 heads and 94 legs. "
    "How many chickens and rabbits are there?"
)

cot_prompt = (
    "Let's solve this step by step.\n"
    "We know that chickens have 2 legs and rabbits have 4 legs.\n"
    "Step 1: Assign variables for chickens and rabbits.\n"
    "Step 2: Form equations using heads and legs.\n"
    "Step 3: Solve the equations logically.\n"
    "Finally, provide the number of chickens and rabbits."
)

self_consistency_prompt = (
    "Solve the following problem using multiple approaches.\n"
    "A farmer has chickens and rabbits in a cage with 35 heads and 94 legs.\n"
    "Derive the solution independently at least twice and return the most consistent answer."
)

react_prompt = (
    "Reason about the problem carefully.\n"
    "Verify your calculations before answering.\n"
    "A farmer has chickens and rabbits in a cage with 35 heads and 94 legs.\n"
    "Provide the final verified answer."
)

# ==========================
# üöÄ Run Direct Prompt (Lower-capability model, higher temperature)
# ==========================
response_direct = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": direct_prompt}],
    temperature=0.8
)

# ==========================
# üöÄ Run Chain-of-Thought (Stronger model, lower temperature)
# ==========================
response_cot = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": cot_prompt}],
    temperature=0.4
)

# ==========================
# üöÄ Run Self-Consistency (Advanced model, moderate temperature)
# ==========================
response_self_consistency = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": self_consistency_prompt}],
    temperature=0.7
)

# ==========================
# üöÄ Run ReAct (Reasoning-optimized model, low temperature)
# ==========================
response_react = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": react_prompt}],
    temperature=0.3
)

# ==========================
# üìä Display Results
# ==========================
print("üîπ Direct Prompt (gpt-3.5-turbo | temp=0.8):\n" + "-" * 50)
print(response_direct.choices[0].message.content.strip())

print("\nüîπ Chain-of-Thought (gpt-4-turbo | temp=0.4):\n" + "-" * 50)
print(response_cot.choices[0].message.content.strip())

print("\nüîπ Self-Consistency (gpt-4-turbo | temp=0.7):\n" + "-" * 50)
print(response_self_consistency.choices[0].message.content.strip())

print("\nüîπ ReAct (gpt-4-turbo | temp=0.3):\n" + "-" * 50)
print(response_react.choices[0].message.content.strip())

üîπ Direct Prompt (gpt-3.5-turbo | temp=0.8):
--------------------------------------------------
Let x be the number of chickens and y be the number of rabbits.

From the information given:
x + y = 35 (total number of heads)
2x + 4y = 94 (total number of legs)

Solving the equations simultaneously:
Multiply the first equation by 2:
2x + 2y = 70

Subtract the second equation from the result:
0x + 2y = 24
y = 12

Substitute y back into the first equation:
x + 12 = 35
x = 23

Therefore, there are 23 chickens and 12 rabbits in the cage.

üîπ Chain-of-Thought (gpt-4-turbo | temp=0.4):
--------------------------------------------------
Sure, let's solve the problem step by step.

### Step 1: Assign Variables
Let \( c \) represent the number of chickens and \( r \) represent the number of rabbits.

### Step 2: Form Equations
We need two equations: one for the heads and one for the legs. Each animal has one head, and the number of legs varies by animal type.

1. **Heads Equation**: Since eac

<div style="background: linear-gradient(135deg, #001a70 0%, #0055d4 100%); color: white; padding: 25px; border-radius: 12px; text-align: center;">
    <h1 style="margin-bottom: 10px;">üìå Conclusion</h1>
</div>

<div style="background: #ffffff; padding: 20px; border-radius: 10px; border-left: 6px solid #0055d4; margin-top: 20px;">
    <p style="font-size: 16px; line-height: 1.8;">
        In this hands-on exploration, different advanced prompting strategies were tested to solve reasoning-based challenges.
        Through experimenting with <strong>Chain-of-Thought (CoT)</strong>, <strong>Self-Consistency</strong>, and other methods,
        the following key insights were observed:
    </p>
    <ul style="font-size: 16px; line-height: 1.8;">
        <li>Advanced prompting techniques significantly improve model performance, especially on complex, multi-step problems.</li>
        <li>Changing the <strong>model type</strong> and <strong>temperature</strong> can drastically affect reasoning quality and creativity.</li>
        <li>Some strategies, like <strong>Self-Consistency</strong>, help reduce random errors by exploring multiple reasoning paths.</li>
        <li>For ambiguous or challenging problems, combining strategies (e.g., CoT + Self-Consistency) often leads to the most reliable results.</li>
    </ul>
</div>

<div style="background: #f5faff; padding: 20px; border-radius: 8px; border-left: 5px solid #0055d4; margin-top: 20px;">
    <p style="font-size: 16px; font-style: italic;">
        üìñ <em>Remember: Prompt engineering is both an art and a science. The more you experiment, the better you understand how to guide LLMs effectively!</em>
    </p>
</div>

<div style="margin-top: 40px; text-align: center;">
    <h3 style="color: #001a70;">‚úçÔ∏è Final Reflection</h3>
</div>

<div style="background: #ffffff; padding: 20px; border-radius: 10px; border-left: 6px solid #0055d4;">
    <p style="font-size: 20px;">
        _ Throughout this experiment, I observed that prompting stratergies such as Chain of Thoughtand Self Consistency significantly improved reasoning accuracy from multi step problems. I also noticed model sensitivity when we change the temperature. I also observed that advanced models produced more stable and consistent answers, while higher temperature introduced more variation. This in turn shows that prompt design, model selection and temperature combinedly influence the reliability of LLM outputs. _
    </p>
</div>
