<a href="https://colab.research.google.com/github/alipala/llm_showcases/blob/main/Chain_of_Thought_Implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
!pip install --upgrade openai



In [17]:
import openai
import logging
from openai import OpenAI

# Initialize the OpenAI client
client = OpenAI(api_key="YOUR_API_KEY")

# Generate response

In [18]:
def generate_response(prompt, use_cot=False):
    """Generate a response using GPT-3.5-turbo, optionally with Chain of Thought reasoning."""
    if use_cot:
        system_message = """Solve the problem step by step. Show your work clearly:
        1. State what you're calculating in each step.
        2. Show the calculation.
        3. Briefly explain the result.
        4. Provide a final answer."""
    else:
        system_message = "Solve the problem concisely but show key steps."

    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": f"Problem: {prompt}"}
        ],
        max_tokens=300,
        temperature=0.5,
    )

    return response.choices[0].message.content.strip()

# Analyze response

In [19]:
def analyze_response(response, is_cot=False):
    """Analyze the generated response, including quality and logical flow."""
    print(f"\n{'Chain of Thought' if is_cot else 'Standard'} Response:")
    print(response)

    steps = [step for step in response.split('\n') if step.strip()]
    print(f"\nAnalysis:")
    print(f"Number of steps: {len(steps)}")

    coherence = "High" if len(steps) > 1 and all(any(word in step.lower() for word in ["step", "calculate", "=", "$"]) for step in steps) else "Low"
    completeness = "Complete" if any(word in response.lower() for word in ["therefore", "total", "final"]) and "$86.40" in response else "Incomplete"

    print(f"Coherence: {coherence}")
    print(f"Completeness: {completeness}")

    return {"coherence": coherence, "completeness": completeness, "num_steps": len(steps)}


# Example problem

In [20]:
prompt = """
A store is having a sale: buy 2 items, get 1 free. Each item normally costs $20.
If a customer buys 6 items during this sale, how much will they spend in total, including 8% sales tax?
"""

# Responses

In [21]:
print("Standard Response:")
standard_response = generate_response(prompt, use_cot=False)
standard_analysis = analyze_response(standard_response)

print("\nChain of Thought Response:")
cot_response = generate_response(prompt, use_cot=True)
cot_analysis = analyze_response(cot_response, is_cot=True)

Standard Response:

Standard Response:
To calculate the total cost for the customer:
1. The customer will pay for 4 items (buy 2, get 1 free for each set of 3 items).
2. The total cost for these 4 items is $20 * 4 = $80.
3. Adding 8% sales tax to $80 gives $80 * 0.08 = $6.40.
4. The total amount the customer will spend is $80 + $6.40 = $86.40.

Analysis:
Number of steps: 5
Coherence: Low
Completeness: Complete

Chain of Thought Response:

Chain of Thought Response:
Step 1: Calculate the total number of items the customer will pay for.
The customer buys 6 items, and for every 2 items, they get 1 free. So, for 6 items, the customer will pay for 4 items.

4 items = 6 items - 2 free items

Step 2: Calculate the total cost of the items before the sales tax.
Each item costs $20, and the customer is buying 4 items.
Total cost before sales tax = $20/item * 4 items

Total cost before sales tax = $80

Step 3: Calculate the sales tax amount.
The sales tax rate is 8%.
Sales tax amount = 8% * $80



# Comparison

In [16]:
print("\nComparison:")
print(f"Standard response length: {len(standard_response.split())}")
print(f"CoT response length: {len(cot_response.split())}")
print(f"Standard coherence: {standard_analysis['coherence']}, completeness: {standard_analysis['completeness']}")
print(f"CoT coherence: {cot_analysis['coherence']}, completeness: {cot_analysis['completeness']}")


Comparison:
Standard response length: 116
CoT response length: 185
Standard coherence: High, completeness: Incomplete
CoT coherence: Low, completeness: Complete
