# Image to Text with LCEL
### (with GPT-4o and maybe others)

Inspired by: https://tykimos.github.io/2024/05/15/image_descriptions_with_gpt_4o_and_lcel/

In [1]:
import base64

from devtools import debug
from dotenv import load_dotenv
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.messages.base import BaseMessage
from langchain_core.output_parsers import StrOutputParser

from python.ai_core.llm import get_llm

load_dotenv(verbose=True)

!export PYTHONPATH=":./python"

###  Chain to query an image 

In [4]:
def gen_prompt(param_dict: dict) -> list[BaseMessage]:
    # Function to generate a prompt based on given parameters
    system_message = "You are a helpful assistant that kindly explains images and answers questions provided by the user."
    human_messages = [
        {
            "type": "text",
            "text": f"{param_dict['question']}",
        },
        {
            "type": "image_url",
            "image_url": {
                "url": f"{param_dict['image_url']}",
            },
        },
    ]
    return [SystemMessage(content=system_message), HumanMessage(content=human_messages)]


llm = get_llm(llm_id="gpt_4o_openai")
# Does not work; 
    # llm = get_llm(llm_id="gpt_4o_edenai")  
#llm = get_llm(llm_id="gpt_4o_azure")

chain = gen_prompt | llm | StrOutputParser()

[32m2024-07-17 10:31:37.608[0m | [1mINFO    [0m | [36mpython.ai_core.llm[0m:[36mget_llm[0m:[36m418[0m - [1mget LLM:'gpt_4o_openai' -configurable: False - streaming: False[0m


### Send an URL to analyse

In [5]:
# Invoke the chain with the provided question and image URL
response = chain.invoke(
    {
        "question": "Please describe this person.",
        "image_url": "http://tyritarot.github.io/warehouse/2024/2024-4-7-shining_in_the_cherry_blossoms_and_just_me_title.jpg",
    }
)

print(response)

This image depicts an animated character with a cheerful and vibrant appearance. The character has long, flowing blonde hair and is wearing a pair of stylish sunglasses on her head. She is dressed in a light, floral-patterned dress with lace details, which gives off a springtime vibe. The background features a scenic outdoor setting with blooming cherry blossoms and a soft, pastel color palette, enhancing the overall cheerful and serene atmosphere. The character is smiling warmly, adding to the pleasant and inviting feel of the image.


### Embed the image in the message

In [6]:
IMAGE_PATH = "use_case_data/railway/network rail.png"


def encode_image(image_path):
    # Open the image file and encode it as a base64 string
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image(IMAGE_PATH)

response = chain.invoke(
    {
        "question": "Please describe this person.",
        "image_url": f"data:image/jpeg;base64,{base64_image}",
    }
)
print(response)

This image is a schematic diagram of the Doncaster Junction railway signaling layout. It shows the arrangement of tracks, signals, and other railway infrastructure elements. Here are some key features:

1. **Tracks**: The diagram shows multiple railway tracks, labeled as "UP MAIN" and "DOWN MAIN," indicating the direction of travel.
2. **Signals**: Various signals are marked with numbers (e.g., 1, 2, 3, 5, 6, 12, 13, 23, 24, 25, 26, 27, 28, 31, 34, 35, 36). These signals control train movements and ensure safe operation.
3. **Distances**: Distances from the signal box are indicated in yards (e.g., 1203 YDS. FROM BOX, 300 YDS. FROM BOX, 171 YDS. FROM BOX).
4. **Fixed Signals**: Some signals are marked as "FIXED," indicating they are permanently set and do not change.
5. **Electric Release**: There is a note about an electric release to a 2-lever ground frame.
6. **Spare Levers**: The diagram lists spare levers (1, 4, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 29, 32, 33) that are not curren

In [8]:
import base64
import requests

headers = {"Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX2lkIjoiMmJiYzc4YWUtOGU4Mi00MGVlLTkwM2EtNDkxYzMyZWIwNmY0IiwidHlwZSI6ImFwaV90b2tlbiJ9.KX-G1B9wLkIv5ubmONxhDugZg3ybZMWYFnQQW3TUHec"}
url = "https://api.edenai.run/v2/multimodal/chat"


# Function to read the image file and convert it to base64
with open(IMAGE_PATH, "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode("utf-8")
payload = {
    "providers": "openai, google",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "content": {"text": "Describe this image please!"},
                },
                {
                    "type": "media_base64",
                    "content": {
                        "media_base64": base64_image,
                        "media_type": "image/png",
                    },
                },
            ],
        }
    ],
    "chatbot_global_action": "",
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()
print(result["openai"]["generated_text"])


This image is a schematic diagram of Doncaster Junction, a railway junction. The diagram includes various tracks, signals, and points (switches) that are part of the railway infrastructure. Here are some key elements:

1. **Tracks**: The diagram shows multiple railway tracks, labeled as "Down Main" and "Up Main," indicating the direction of travel.
2. **Signals**: There are several signals marked with red and white symbols, each with a number next to them. These signals control train movements and ensure safe operation.
3. **Points (Switches)**: The diagram includes points (switches) that allow trains to move from one track to another. These are indicated by the curved lines connecting different tracks.
4. **Distances**: Distances from a reference point (likely a signal box) are marked in yards, such as "1203 YDS. FROM BOX" and "790 YDS. FROM BOX."
5. **Fixed Signals**: Some signals are marked as "FIXED," indicating they are permanently set and do not change.
6. **Electric Release**: T