# Python SDK to access the GLM Vision API

This cookbook will describe how to use the Python SDK to call the GLM-V API to complete simple visual understanding and analysis work.

 Set up the API Key

In [1]:
import os
from zhipuai import ZhipuAI

os.environ["ZHIPUAI_API_KEY"] = "your api key"
client = ZhipuAI()


First, we need to convert the image to base64 format that can be uploaded. Here we use the PIL library to accomplish this task.

In [2]:
import base64
import io
from PIL import Image

def image_to_base64(image_path):
    """
    Convert an image to base64 encoding.
    """
    with Image.open(image_path) as image:
        buffered = io.BytesIO()
        image.save(buffered, format="JPEG")  # or format="PNG", depending on your image.
        img_str = base64.b64encode(buffered.getvalue()).decode()
    return f"data:image/jpeg;base64,{img_str}"


base64_image = image_to_base64("data/zR.jpg")

In [3]:
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "what is this image describe?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": base64_image
                }

            }
        ]
    }
]

We have organized the information and pictures, now let us follow the [official document](https://open.bigmodel.cn/dev/api#glm-4v) to pass in the corresponding parameters and get the model's answer

In [4]:
response = client.chat.completions.create(
    model="glm-4.5v",
    messages=messages,
    temperature=0.7,
    top_p=0.9,
    max_tokens=8192
)


Through this operation, we will be able to get the model's description of this picture.

In [5]:
response.choices[0].message.content

'\nThe image is a minimalist, stylized illustration of a woman viewed from the side. She has long, flowing light pink hair that appears to be blowing in the wind. She wears a long, dark (likely navy or black) dress with a sleek silhouette. In her hand, she holds an object resembling a violin bow. The background is a solid dark color, which emphasizes the figure and gives the artwork a dramatic, elegant tone. The overall style is characterized by clean lines, flat colors, and simplified forms, creating a modern and sophisticated aesthetic.'

You can also ask more questions about this picture and use historical records to retain previous questions and answers. Now I'm going to add a new history to your conversation

In [6]:
messages += [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "what is the color of the hair?"
            }
        ]
    },
    {
        "role": "assistant",
        "content": "It is pink"
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "what is the color of hair and dress the this women?"
            }
        ]
    },
]
messages[-1]

{'role': 'user',
 'content': [{'type': 'text',
   'text': 'what is the color of hair and dress the this women?'}]}

Now, we request again and see the results returned by the model

In [7]:
response = client.chat.completions.create(
    model="glm-4.5v",
    messages=messages,
    temperature=0.7,
    top_p=0.9
)
response.choices[0].message.content

'\nThe woman’s hair is pink, and her dress is dark (such as dark blue or black).'