# Image to Text with LCEL
### (with GPT-4o and maybe others)

Inspired by: https://tykimos.github.io/2024/05/15/image_descriptions_with_gpt_4o_and_lcel/

In [1]:
import base64

from devtools import debug
from dotenv import load_dotenv
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.messages.base import BaseMessage
from langchain_core.output_parsers import StrOutputParser

from python.ai_core.llm import get_llm

load_dotenv(verbose=True)

!export PYTHONPATH=":./python"

###  Chain to query an image 

In [2]:
def gen_prompt(param_dict: dict) -> list[BaseMessage]:
    # Function to generate a prompt based on given parameters
    system_message = "You are a helpful assistant that kindly explains images and answers questions provided by the user."
    human_messages = [
        {
            "type": "text",
            "text": f"{param_dict['question']}",
        },
        {
            "type": "image_url",
            "image_url": {
                "url": f"{param_dict['image_url']}",
            },
        },
    ]
    return [SystemMessage(content=system_message), HumanMessage(content=human_messages)]


llm = get_llm(llm_id="gpt_4o_openai")
# Does not work;
#llm = get_llm(llm_id="gpt_4o_edenai")
#llm = get_llm(llm_id="gpt_4_azure")

chain = gen_prompt | llm | StrOutputParser()

[32m2024-09-30 22:22:17.476[0m | [1mINFO    [0m | [36mpython.config[0m:[36myaml_file_config[0m:[36m43[0m - [1mload /home/tcl/prj/genai-blueprint/app_conf.yaml[0m
[32m2024-09-30 22:22:17.491[0m | [1mINFO    [0m | [36mpython.ai_core.llm[0m:[36mget_llm[0m:[36m319[0m - [1mget LLM:'gpt_4o_openai' -configurable: False - streaming: False[0m


### Send an URL to analyse

In [3]:
# Invoke the chain with the provided question and image URL
response = chain.invoke(
    {
        "question": "Please describe this person.",
        "image_url": "http://tyritarot.github.io/warehouse/2024/2024-4-7-shining_in_the_cherry_blossoms_and_just_me_title.jpg",
    }
)

print(response)

The image depicts an animated character with a cheerful and vibrant appearance. The character has long, flowing blonde hair and is wearing sunglasses on top of her head. She has large, expressive eyes and is smiling warmly. Her outfit is a light, floral dress with a white lace collar and green and pink flower patterns. She is also wearing earrings and a necklace, adding to her stylish look. The background features a scenic outdoor setting with blooming cherry blossoms and a group of people in the distance, suggesting a pleasant, springtime atmosphere.


### Embed the image in the message

In [4]:
IMAGE_PATH = "use_case_data/railway/network rail.png"


def encode_image(image_path):
    # Open the image file and encode it as a base64 string
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image(IMAGE_PATH)

response = chain.invoke(
    {
        "question": "Please describe this junction.",
        "image_url": f"data:image/jpeg;base64,{base64_image}",
    }
)
print(response)

The diagram represents the layout of Doncaster Junction, a railway junction. Here is a detailed description of the junction:

1. **Tracks and Directions**:
   - The diagram shows multiple tracks converging and diverging at the junction.
   - The main tracks are labeled as "Down Main" and "Up Main," indicating the direction of travel.
   - There are additional tracks branching off from the main lines, including connections to Braithwell Junction and Bullcroft Junction.

2. **Signals**:
   - Various signals are marked with red rectangles and numbers, such as 1, 2, 3, 5, 6, 12, 13, 23, 24, 25, 26, 27, 28, 31, 34, 35, and 36.
   - These signals control the movement of trains through the junction, ensuring safe passage and preventing collisions.

3. **Points (Switches)**:
   - Points (or switches) are indicated by the curved lines connecting different tracks. These allow trains to move from one track to another.
   - Points are controlled by levers, which are numbered and listed at the bott

In [5]:
import base64
import os

import requests

headers = {"Authorization": f"Bearer {os.environ['EDENAI_API_KEY']}"}
url = "https://api.edenai.run/v2/multimodal/chat"


# Function to read the image file and convert it to base64
with open(IMAGE_PATH, "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode("utf-8")
payload = {
    "providers": "openai, google",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "content": {"text": "Describe this image please!"},
                },
                {
                    "type": "media_base64",
                    "content": {
                        "media_base64": base64_image,
                        "media_type": "image/png",
                    },
                },
            ],
        }
    ],
    "chatbot_global_action": "",
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()
print(result["openai"]["generated_text"])

This image is a schematic diagram of the Doncaster Junction railway signaling layout. It shows the arrangement of tracks, signals, and points (switches) at the junction. Here are some key features:

1. **Title**: The top of the diagram is labeled "DONCASTER JUNCTION."
2. **Tracks**: The diagram includes multiple parallel tracks labeled "DOWN MAIN" and "UP MAIN," indicating the direction of travel.
3. **Signals**: Various signals are marked with numbers and symbols. For example, signals labeled "1," "2," "3," etc., are shown along the tracks.
4. **Points/Switches**: Points or switches are indicated by numbers and symbols, such as "12," "13," "23," etc., showing where trains can switch tracks.
5. **Distances**: Distances from the signal box are marked in yards, such as "1203 YDS. FROM BOX," "300 YDS. FROM BOX," etc.
6. **Fixed Signals**: Some signals are marked as "FIXED," indicating they are permanently set.
7. **Electric Release**: There is a note about "7 ELECTRIC RELEASE TO 2 LEVER G

In [6]:
print(result["google"]["generated_text"])

The image is a diagram of a railway junction called Doncaster Junction. It shows the layout of the tracks and the locations of various signals and switches. The diagram is labeled with numbers and distances from a signal box. There are also labels for the direction of the tracks, such as "Down Main" and "Up Main". The diagram also shows the location of a "piping lane crossing" and a "lever ground frame". The diagram is likely used by railway workers to understand the layout of the junction and to operate the signals and switches.
