In [None]:
import openai
import base64

# Set your OpenAI API key here
client = openai.OpenAI(api_key="")

# Initialize conversation history
messages = [
    {"role": "system", "content": "You are an intelligent assistant capable of engaging in natural multi-turn conversations."},
]

def chat_with_gpt(user_input):
    """Handles text-only chat requests with GPT-4o."""
    messages.append({"role": "user", "content": user_input})

    # Send request to OpenAI GPT-4o
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        temperature=0.7
    )

    # Correctly access the response object
    reply = response.choices[0].message.content

    # Append AI response to conversation history
    messages.append({"role": "assistant", "content": reply})

    return reply

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def chat_with_image(image_path, user_input):
    """Handles image + text requests with GPT-4o."""
    base64_image = encode_image(image_path)

    messages.append({"role": "user", "content": user_input})
    
    # Attach the image in the proper format
    messages.append({
        "role": "user",
        "content": [
            {"type": "text", "text": user_input},  # Text input
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
        ]
    })

    # Send request to OpenAI GPT-4o
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        temperature=0.7
    )

    # Correctly access the response object
    reply = response.choices[0].message.content

    # Append AI response to conversation history
    messages.append({"role": "assistant", "content": reply})

    return reply

In [3]:

user_input = """You are controlling a drone inside a school office building.  
Your objective: Guide the drone to the entrance of Room 407.

Important requirements:  
1. The drone must actively search for room signs or any other identifiers (e.g., corridor signs, office nameplates).  
2. If the sign or room number is partially or completely obscured, instruct the drone to adjust its position or yaw to obtain a clearer view.  
3. Use small, incremental movements (Observation-Based Control) to avoid collisions, especially near glass walls or narrow corridors.  
4. After each movement, carefully observe the updated camera feed for new indicators or obstacles.  
5. Once you think you see the correct room (407), instruct the drone to perform a final check by hovering close enough to read the sign. If the sign is still not visible or the number is unclear, continue searching or try an alternative viewpoint.  
6. If recognition remains ambiguous, try capturing a photo to confirm with an operator or look for adjacent rooms to cross-verify location.  

Your control format (body frame, incremental moves):  
Observation-Based Control (delta_x, delta_y, delta_z, delta_yaw)  
- delta_x, delta_y, delta_z ∈ [-3, 3] (in meters)  
- delta_yaw ∈ [-3.14, 3.14] (in radians)  
- Right-hand rule for both body and world coordinate systems.  

Now, look at the current camera feed:  
Image is attached.  
Based on the image, provide the next movement command to approach or identify Room 407.  
If you cannot see any sign or the sign is unclear, consider rotating or shifting the drone for a better view.
"""

response = chat_with_gpt(user_input)
print("1st round GPT-4o:", response)


image_response = chat_with_image("assets/pic1.PNG", "What do you see in this image?")
print("\n\n2nd round GPT-4o (Image Response):", image_response)


1st round GPT-4o: Based on the image, let's proceed with the next movement:

1. **Objective**: Approach the door and sign to check for the room number.

2. **Movement Command**:
   - **delta_x**: 2 (move forward to get closer to the door and sign)
   - **delta_y**: 0 (no lateral movement)
   - **delta_z**: 0 (maintain altitude)
   - **delta_yaw**: 0 (no rotation)

Execute this movement to get a clearer view of the sign. After moving, check if the room number "407" is visible or if further adjustments are necessary.


2nd round GPT-4o (Image Response): The image shows a hallway with brick walls and a patterned floor. There’s a wooden door at the end of the hallway, with a dark rectangular sign or plaque on the wall next to it. 

To proceed, move the drone closer to the sign for better visibility:

- **delta_x**: 2 (move forward to approach the door and sign)
- **delta_y**: 0 (no lateral movement)
- **delta_z**: 0 (maintain altitude)
- **delta_yaw**: 0 (no rotation)

This should help in 