# Image to Text with LCEL
### (with GPT-4o and maybe others)

Inspired by: https://tykimos.github.io/2024/05/15/image_descriptions_with_gpt_4o_and_lcel/

In [1]:
import base64
from pathlib import Path

from dotenv import load_dotenv
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.messages.base import BaseMessage
from langchain_core.output_parsers import StrOutputParser

from src.ai_core.llm import get_llm

load_dotenv(verbose=True)

[32m2025-04-02 10:56:44.243[0m | [1mINFO    [0m | [36msrc.utils.config_mngr[0m:[36msingleton[0m:[36m99[0m - [1mselected config=training_edenai[0m
[32m2025-04-02 10:56:44.248[0m | [1mINFO    [0m | [36msrc.ai_core.cache[0m:[36mset_method[0m:[36m89[0m - [1mLLM cache : InMemoryCache[0m


True

###  Chain to query an image 

In [3]:
def gen_prompt(param_dict: dict) -> list[BaseMessage]:
    # Function to generate a prompt based on given parameters
    system_message = (
        "You are a helpful assistant that kindly explains images and answers questions provided by the user."
    )
    human_messages = [
        {
            "type": "text",
            "text": f"{param_dict['question']}",
        },
        {
            "type": "image_url",
            "image_url": {
                "url": f"{param_dict['image_url']}",
            },
        },
    ]
    return [SystemMessage(content=system_message), HumanMessage(content=human_messages)]


llm = get_llm(llm_id="gpt_4o_openai")
# Does not work;
# llm = get_llm(llm_id="gpt_4o_edenai")
# llm = get_llm(llm_id="qwen2_vl72_openrouter")
# llm = get_llm(llm_id="llava_16_ollama")
chain = gen_prompt | llm | StrOutputParser()

[32m2025-04-02 10:58:21.616[0m | [1mINFO    [0m | [36msrc.ai_core.llm[0m:[36mget_llm[0m:[36m497[0m - [1mget LLM:'gpt_4o_openai'[0m


### Embed the image in the message

In [4]:
# IMAGE_PATH = "use_case_data/railway/network rail.png"


REPO = Path("/mnt/c/Users/a184094/OneDrive - Eviden/_ongoing/training GenAI/")
IMAGE_PATH = REPO / "network rail.png"


def encode_image(image_path: Path) -> str:
    # Open the image file and encode it as a base64 string
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image(IMAGE_PATH)

In [6]:
base64_image

'iVBORw0KGgoAAAANSUhEUgAAB0kAAAMPCAYAAACwoe2zAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAP+lSURBVHhe7N0HuG1Fef/xbTRIEQtgb4gIKkFiR8TYiBVjSxR7SWKMxm40Gk2MLWoUNdYosWsSe2yxYC9ETVAxFkRUJBo7KAiKmPO/n5Hf+Y8ra++79zn7XG55v88zz1pr1sw777xT9uyZNWud53//939XJkVRFEVRFEVRFEVRFEVRFEVRFEVRFDsIv3HOsSiKoiiKoiiKoiiKoiiKoiiKoiiKYoegFkmLoiiKoiiKoiiKoiiKoiiKoiiKotihqEXSoiiKoiiKoiiKoiiKoiiKoiiKoih2KGqRtCiKoiiKoiiKoiiKoiiKoiiKoiiKHYpaJC2KoiiKoiiKoiiKoiiKoiiKoiiKYoeiFkmLoiiKoiiKoiiKoiiKoiiKoiiKotihqEXSoiiKoiiKoiiKoiiKoiiKoiiKoih2KGqRtCiKoiiKoiiKoiiKoiiKoiiKoiiKHYpaJC2KoiiKoiiKoiiKoiiKoiiKoiiKYoeiFkmLoiiKoiiKoiiKoiiKoiiKoiiKotihqEXSoiiKoiiKoiiKoiiKoiiKoiiKoih2KGqRtCiKoiiKoiiKoiiKoiiKoiiKoiiKHYpaJC2KoiiKoiiKoiiKoiiKoiiKoiiKYoeiFkmLoiiKoiiKoiiKoiiKoiiKoiiKotihqEXSoiiKoiiKoiiKoiiKoiiKoiiKoih2KGqRtCiKoiiKoiiKoiiKoiiKoiiKoiiKHYpaJC2KoiiKoiiKoiiKoiiKoiiKoiiKYoeiFkmLoiiKoiiKoiiKoiiKoiiKoiiKotihqEXSoiiKoiiKoiiKoii2S1ZWVs45m0zOc57znHNWFEVRFEVRFEVRi6RFURRFURRFURRFUWzHWCi1QPq///u/k1/+8pfn+BZFURRFURR

In [5]:
response = chain.invoke(
    {
        "question": "Please describe this junction.",
        "image_url": f"data:image/jpeg;base64,{base64_image}",
    }
)
print(response)

The diagram represents Doncaster Junction, a railway junction layout. Here's a description of its features:

1. **Tracks**: There are multiple tracks labeled as "Down Main" and "Up Main," indicating the direction of train travel.

2. **Signals**: Various signals are marked with numbers (e.g., 1, 2, 3, etc.) and are positioned at different distances from the signal box. These control train movements and ensure safety.

3. **Points/Switches**: The diagram includes points (switches) that allow trains to move from one track to another. These are indicated by numbers like 12, 13, 23, etc.

4. **Distances**: Distances from the signal box to various points and signals are noted (e.g., 1203 yards, 300 yards).

5. **Fixed Signals**: Some signals are marked as "FIXED," indicating they are permanently set and do not change.

6. **Electric Release**: There is an electric release mechanism for a 2-lever ground frame, which is used to control certain points or signals.

7. **Spare Levers**: The diag

In [7]:
import os

import requests

headers = {"Authorization": f"Bearer {os.environ['EDENAI_API_KEY']}"}
url = "https://api.edenai.run/v2/multimodal/chat"


# Function to read the image file and convert it to base64
with open(IMAGE_PATH, "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode("utf-8")
payload = {
    "providers": "openai, google",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "content": {"text": "Describe this image please!"},
                },
                {
                    "type": "media_base64",
                    "content": {
                        "media_base64": base64_image,
                        "media_type": "image/png",
                    },
                },
            ],
        }
    ],
    "chatbot_global_action": "",
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()
print(result["openai"]["generated_text"])

The image is a schematic diagram of the Doncaster Junction railway layout. It shows various tracks, signals, and junctions. Key features include:

- Multiple tracks labeled as "Down Main" and "Up Main."
- Signals are marked with numbers and some are labeled with distances from a reference point, such as "300 YDS. FROM BOX."
- There are fixed signals and electric releases indicated.
- The diagram includes junctions leading to other locations, such as "FROM BRAITHWELL JCN." and "TO BULLCROFT JCN."
- Spare levers are listed at the bottom, numbered 1 to 33 with some numbers missing.
- The layout includes sidings and crossings, such as "PIPERING LANE CROSSING."

Overall, it provides a detailed view of the railway infrastructure at Doncaster Junction.


In [8]:
print(result["google"]["generated_text"])

The image is a diagram of Doncaster Junction, showing the layout of the railway tracks and signaling system.  The diagram is highly technical, using numbered points to represent switches and signals, along with distances from a central "box" (likely a signal box).  The diagram includes:

* **Track Layout:** Multiple tracks are shown, labeled as "Up Main" and "Down Main," along with various connecting lines and sidings.  The tracks lead to and from different junctions (Braithwell Jcn., Bullcroft Jcn.).
* **Signal Numbers:**  Numbers (1, 2, 3, etc.) represent individual signals or points (switches) in the system.  Red rectangles indicate the location of these signals/points on the tracks.
* **Distances:** Distances in yards from a central signal box are indicated for various points along the tracks.
* **Lever Numbers:** A key indicates which lever numbers in the signal box control specific points and signals.
* **Special Features:**  The diagram notes an "electric release" to a ground fr