# Lab | Multi-modal output agent

**Created another version of the prompts on the lab to query the dall-e vision AI model following the example shown in the lab notebook.**

# **Report on Lab Script Execution and Custom Solution Implementation**

---

## **Subject**: Issues with Original Lab Script and Alternative Approach

---

### **Summary of the Issue**

During the execution of the assigned lab script, I encountered persistent errors that prevented successful completion. After extensive debugging, I identified that the root cause was a connection problem with the model, specifically tied to the localhost port 11434. This issue hindered the script's ability to run as intended.

### **Debugging Process**

- **Initial Observations**: Upon running the lab script, the script failed to execute correctly, producing repeated error messages related to the connection with the model.
- **Detailed Debugging**: I undertook a comprehensive debugging process, systematically checking each part of the script. The focus was on potential networking issues, port conflicts, and model connectivity.
- **Discovery**: The main problem was traced back to the inability of the model to establish a connection through the localhost port 11434. This port seemed to be critical for the model's communication but was unresponsive or blocked.

### **Alternative Approach**

Given the unresolved issue with the original script, I opted to create an alternative solution from scratch. This approach allowed me to complete the lab objectives without interfering with the original script's structure or components. 

### **Steps Taken for the Custom Solution**

1. **Environment Setup**: I ensured all necessary environment variables were loaded and configured securely using `dotenv`.
2. **Client Initialization**: Established a new connection with the OpenAI client, ensuring proper API key handling and secure communication.
3. **Function Redefinition**: Redefined key functions for generating images and handling outputs, focusing on robustness and error management.
4. **Direct Invocation**: I invoked the DALL-E model directly to generate the required outputs, bypassing the problematic port connection.

### **Outcome**

Despite the initial hurdles with the lab script, my custom solution achieved the desired outcomes:
- Successfully generated and saved images using the DALL-E model.
- Displayed the images as required by the lab objectives.

This experience was valuable in deepening my understanding of debugging network-related issues and reinforced my problem-solving skills by creating a working solution independently.

### **Conclusion**

The original lab script's dependency on localhost port 11434 created an obstacle that could not be resolved in the provided timeframe. However, by designing a custom solution, I managed to fulfill the lab requirements without modifying the original script. This approach demonstrated adaptability and technical proficiency in handling unforeseen issues.

In [1]:
import os
import uuid
import requests
from dotenv import load_dotenv
from openai import OpenAI as OpenAIClient
from IPython.display import Image, display
from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import OpenAI
from langchain.tools import Tool
from langchain import hub

# Load environment variables from .env file
load_dotenv()

# Initialize OpenAI client
openai_client = OpenAIClient(api_key=os.getenv("OPENAI_API_KEY"))

# Function to save image locally
def save_image_locally(image_url, filename):
    """Save an image from a URL to a local file."""
    try:
        response = requests.get(image_url)
        if response.status_code == 200:
            with open(filename, 'wb') as file:
                file.write(response.content)
            print(f"Image saved as {filename}")
        else:
            print(f"Failed to retrieve image from {image_url}. Status code: {response.status_code}")
    except Exception as e:
        print(f"An error occurred while saving the image: {e}")

# Define DALL-E image generation function
def generate_image_dalle(prompt):
    """Generate an image using DALL-E based on a given prompt."""
    request_uuid = uuid.uuid4()  # Unique identifier for the request
    try:
        response = openai_client.images.generate(
            model="dall-e-3",
            prompt=prompt,
            size="1024x1024",
            quality="standard",
            n=1,
        )
        
        # Extract the image URL from the response
        image_url = response.data[0].url
        
        # Save the image locally using the UUID as filename
        save_image_locally(image_url, f"{request_uuid}.png")
        
        return image_url
    except Exception as e:
        print(f"An error occurred while generating the image with UUID {request_uuid}: {e}")
        return None

# Define the DALL-E tool
dalle_tool = Tool(
    name="generate_image_dalle",
    func=generate_image_dalle,
    description="Generates an image using DALL-E based on the given prompt."
)

# Initialize the language model
llm = OpenAI(temperature=0)

# Create a React agent using the provided prompt from the hub
tools = [dalle_tool]
prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)

# Create an AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Define the prompt for image generation
prompt_for_image = "lion with jersey half-black/half-white and a soccer ball in a field at sunset"

# Generate the image URL using the DALL-E tool
generated_image_url = generate_image_dalle(prompt_for_image)

# Function to display the generated image
def show_output(url):
    """Display an image given its URL."""
    if isinstance(url, str) and url.startswith('http'):
        print("Displaying generated image:")
        display(Image(url=url))
    else:
        print("No valid image URL generated or displayed.")

# Display the generated image
show_output(generated_image_url)

Image saved as 7ca1a9c1-61d6-4c3c-adc7-1f7ac8640777.png
Displaying generated image:
