In [1]:
from PIL import Image
import base64
import io

In [2]:
def image_to_base64(image_path):
    # Open the image file
    with Image.open(image_path) as img:
        # Create a BytesIO object to hold the image data
        buffered = io.BytesIO()
        # Save the image to the BytesIO object in a specific format (e.g., PNG)
        img.save(buffered, format="PNG")
        # Get the byte data from the BytesIO object
        img_bytes = buffered.getvalue()
        # Encode the byte data to base64
        img_base64 = base64.b64encode(img_bytes).decode('utf-8')
        return img_base64

# Example usage
image_path = 'diagram1.png'  # Replace with your image path
base64_image = image_to_base64(image_path)

In [4]:
# Use Ollama to analyze the image with Llama 3.2-Vision
import ollama
response = ollama.chat(
    model="x/llama3.2-vision:11b",
    messages=[{
      "role": "user",
      "content": "Describe this image?",
      "images": [base64_image]
    }],
)

# Extract the model's response about the image
cleaned_text = response['message']['content'].strip()
print(f"Model Response: {cleaned_text}")

Model Response: The image presents a flowchart illustrating the process of generating text based on user input. The flowchart is divided into three sections, each representing a different stage in the text generation process.

*   **User Input**
    *   The first section of the flowchart begins with a box labeled "Query" that contains an example question: "Which mammal catches mice, enjoys eating fish and has a tail?"
    *   Below this box is another labeled "RAG" (Reinforcement Learning from Augmented Data), which indicates that the user's input will be processed using a reinforcement learning algorithm.
*   **Text Generation**
    *   The second section of the flowchart shows how the RAG model processes the user's input and generates text based on it.
    *   The box labeled "RAG" is connected to two other boxes: one labeled "Answer" and another labeled "Similar Photos".
    *   The "Answer" box contains a response generated by the RAG model, which in this case is "Cat, a mammal wit

In [5]:
image_path = 'diagram2.png'  # Replace with your image path
base64_image = image_to_base64(image_path)

response = ollama.chat(
    model="x/llama3.2-vision:11b",
    messages=[{
      "role": "user",
      "content": "Describe this image?",
      "images": [base64_image]
    }],
)

# Extract the model's response about the image
cleaned_text = response['message']['content'].strip()
print(f"Model Response: {cleaned_text}")

Model Response: The image presents a graph with two line charts, each displaying the accuracy of protein and enzyme datasets in relation to their respective k values. The purpose of the graph is to compare the performance of these two types of datasets.

* Two line charts are shown:
	+ One chart shows the accuracy of proteins
	+ The other chart shows the accuracy of enzymes
* The x-axis represents the k value, ranging from 0 to 5.
* The y-axis represents the accuracy percentage, ranging from 40% to 80%.
* Each chart has a distinct color scheme:
	+ Proteins are represented by blue lines
	+ Enzymes are represented by red lines

The main finding from this graph is that both proteins and enzymes have varying levels of accuracy across different k values. The accuracy of proteins generally increases as the k value increases, while the accuracy of enzymes decreases as the k value increases. This suggests that there may be differences in how these two types of datasets perform under different 

In [None]:
response = ollama.chat(
    model="x/llama3.2-vision:11b",
    messages=[{
      "role": "user",
      "content": "Describe the image in json format. This json should have a title, description and detected data attirbutes",
      "images": [base64_image]4
    }],
)

# Extract the model's response about the image
cleaned_text = response['message']['content'].strip()
print(f"Model Response: {cleaned_text}")

Model Response: The image presents two line graphs that illustrate the accuracy of hyper-parameter studies on node classification with proteins and enzymes datasets. The graphs are titled "Figure 3: Hyper-parameter study with hopsk (Left) from 1 to 5 and topK (Right) on node classification with PROTEINS, and ENZYMES datasets with the setting in Table 1."

**Graph 1: Hopsk (Left)**

*   The x-axis represents the number of hops (from 1 to 5).
*   The y-axis represents accuracy (%).
*   Two lines are plotted:
    *   One line represents proteins, which starts at around 55% and increases to approximately 75%.
    *   Another line represents enzymes, which begins at about 65% and decreases to roughly 60%.

**Graph 2: TopK (Right)**

*   The x-axis represents the top K value (from 1 to 5).
*   The y-axis represents accuracy (%).
*   Two lines are plotted:
    *   One line represents proteins, which starts at around 50% and increases to approximately 70%.
    *   Another line represents enzym