https://vijaykumarkartha.medium.com/self-reflecting-ai-agents-using-langchain-d3a93684da92

https://arxiv.org/abs/2405.06682

## Lab: Answer and Reflection Module with OpenAI Chat API

### Objective

By the end of this lab, you will:

1. Implement a basic Q\&A function that prompts the OpenAI Chat API using a step-by-step chain-of-thought system message.
2. Extend the module with a reflection step that analyzes the initial answer and improves the original question.
3. Compare the original and improved questions and observe how refinement affects answer quality.

---

### Prerequisites

* Python 3.8+
* OpenAI Python client library installed (`pip install openai`)
* API key set in your environment (`export OPENAI_API_KEY="your_key"`)

### 1. Setup

```python
import os
from openai import OpenAI

# Initialize client\ nclient = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Basic answer function using chain-of-thought prompt
def get_answer(question: str, model: str = "gpt-3.5-turbo") -> str:
    messages = [
        {"role": "system", "content": "You are a helpful assistant that thinks step by step."},
        {"role": "user", "content": f"{question}\nLet's think step by step."}
    ]
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    return response.choices[0].message.content
```

### 2. Test the Answer Module

```python
question = "What is the capital of France and why is it significant in European history?"
answer = get_answer(question)
print("Original Answer:\n", answer)
```

> **Observation:** Note length, depth, and clarity of the answer.

---

### 3. Reflection Module

Now, implement a reflection step to refine the question based on the answer.

```python

def reflect_and_improve(question: str, answer: str, model: str = "gpt-3.5-turbo") -> str:
    reflect_prompt = (
        f"I asked: \"{question}\" and got this answer: \"{answer}\". "
        "Suggest a clearer or more specific version of my question to get an even better answer."
    )
    messages = [
        {"role": "system", "content": "You are a helpful assistant that improves user questions."},
        {"role": "user", "content": reflect_prompt}
    ]
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    return response.choices[0].message.content
```

### 4. Test the Reflection Module

```python
improved_q = reflect_and_improve(question, answer)
print("Improved Question:\n", improved_q)
```

---

### 5. Compare and Re-run

1. Run `get_answer(improved_q)` and compare the new answer to the original.
2. Discuss:

   * How did refining the question change the response?
   * What elements of question design led to a more informative answer?

---

### 6. Extension Ideas

* Automate multiple reflection rounds until answer quality stops improving.
* Add metrics to evaluate answer completeness or relevance.
* Integrate into a web app or chat interface for live question debugging.

**End of Lab**

---

## 7. Demo: Self‑Reflecting AI Agent

In this section, you'll build an AI agent that not only answers questions but also reflects on its own responses to iteratively improve.

### 7.1 Agent Design

1. **Answer Step**: Agent generates an answer with chain-of-thought.
2. **Reflection Step**: Agent critiques its own answer for clarity and completeness.
3. **Revision Step**: Agent refines its answer based on the critique.

### 7.2 Implementation

```python
import os
from openai import OpenAI

# Initialize client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def self_reflecting_answer(question: str, model: str = "gpt-3.5-turbo") -> str:
    # Step 1: Generate initial answer
    ans_msgs = [
        {"role": "system", "content": "You are a thoughtful assistant."},
        {"role": "user", "content": f"{question}
Let's think step by step."}
    ]
    initial = client.chat.completions.create(model=model, messages=ans_msgs).choices[0].message.content

    # Step 2: Reflect on the answer
    ref_msgs = [
        {"role": "system", "content": "You critique your own answer for clarity and completeness."},
        {"role": "user", "content": f"I answered: \"{initial}\". Provide a critique and suggest improvements."}
    ]
    reflection = client.chat.completions.create(model=model, messages=ref_msgs).choices[0].message.content

    # Step 3: Revise the answer
    rev_msgs = [
        {"role": "system", "content": "You are an assistant that refines answers based on feedback."},
        {"role": "user", "content": f"Original answer: \"{initial}\"
Critique: \"{reflection}\"
Please provide a revised, improved answer."}
    ]
    revised = client.chat.completions.create(model=model, messages=rev_msgs).choices[0].message.content
    return revised

# Demo usage
question = "Explain the importance of data normalization in machine learning."
final_answer = self_reflecting_answer(question)
print("Final Revised Answer:
", final_answer)
```

### 7.3 Observations

* **Compare**: Initial vs. final answers.
* **Discuss**: How reflection improved accuracy, depth, or clarity.

---

**End of Lab**


In [3]:
import os
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

True

In [4]:
import os
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
model_name = "gpt-4.1"    

# openai_client = OpenAI(api_key=os.environ.get("OPENTYPHOON_API_KEY"),base_url="https://api.opentyphoon.ai/v1")
# model_name = "typhoon-instruct"

In [5]:
import os
from openai import OpenAI

# Initialize client\ nclient = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Basic answer function using chain-of-thought prompt
def get_answer(question: str, model: str = "gpt-3.5-turbo") -> str:
    messages = [
        {"role": "system", "content": "You are a helpful assistant that thinks step by step."},
        {"role": "user", "content": f"{question}\nLet's think step by step."}
    ]
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    return response.choices[0].message.content

In [6]:
def reflect_and_improve(question: str, answer: str, model: str = "gpt-3.5-turbo") -> str:
    reflect_prompt = (
        f"I asked: \"{question}\" and got this answer: \"{answer}\". "
        "Suggest a clearer or more specific version of my question to get an even better answer."
    )
    messages = [
        {"role": "system", "content": "You are a helpful assistant that improves user questions."},
        {"role": "user", "content": reflect_prompt}
    ]
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    return response.choices[0].message.content

In [10]:
questions = [
    "How would you design a testing framework for an AI-driven medical diagnosis system intended for rare diseases?",
    "What metrics would you propose to evaluate the fairness of a hiring algorithm trained on historical HR data?",
    "How can you test the robustness of a self-driving car’s perception module under extreme weather conditions?",
    "Describe an approach to validate the security of an AI-powered financial trading bot against adversarial inputs.",
    "How would you simulate and test task allocation in a swarm robotics system with dynamic network topologies?",
    "Propose a method to measure the interpretability of a black-box deep learning model used for legal decision support.",
    "What strategy would you use to continuously monitor and test an AI recommendation engine for drift in user preferences over time?",
    "How can you design experiments to test real-time latency guarantees in an AI-based emergency response coordination platform?",
    "Suggest a testing plan for an adaptive educational tutor AI that personalizes content based on student engagement metrics?",
    "How would you verify the reliability of a distributed sensor network AI system deployed for environmental monitoring in remote regions?",
]

In [11]:
question = questions[2]  # Example question
answer = get_answer(question)
print("Original Answer:\n", answer)

Original Answer:
 Of course! Here's a step-by-step approach to testing the robustness of a self-driving car's perception module under extreme weather conditions:

1. **Define Extreme Weather Conditions**: First, clearly define what extreme weather conditions you want to test - this could include heavy rain, snow, fog, strong winds, etc.

2. **Create Test Scenarios**: Develop a set of test scenarios that mimic these extreme weather conditions. For example, simulate heavy rain by using water sprays or simulate fog with smoke machines.

3. **Test Data Collection**: Collect real-world data of these extreme weather conditions to use during testing. This can include images, videos, Lidar scans, and radar data.

4. **Test in Controlled Environment**: Initially, conduct tests in a controlled environment such as a closed test track or a simulation environment where you can safely replicate extreme weather conditions.

5. **Observe System Performance**: Monitor how the perception module of the s

In [12]:
improved_q = reflect_and_improve(question, answer)
print("Improved Question:\n", improved_q)

Improved Question:
 How can one effectively assess the robustness of a self-driving car's perception module in extreme weather conditions through testing methods and scenarios?
