# Python SDK to access the ZhipuAI GLM-4 Vision API

**This tutorial is available in English and is attached below the Chinese explanation**

此代码将讲述如何使用Python SDK 调用 GLM-4V API，来完成简单的视觉理解和分析工作。

This cookbook will describe how to use the Python SDK to call the GLM-4V API to complete simple visual understanding and analysis work.

In [1]:
import os
from zhipuai import ZhipuAI
os.environ["ZHIPUAI_API_KEY"] = "your api key"
client = ZhipuAI()


首先，我们需要将图片转为可以上传的base64格式，这里我们使用PIL库来完成这个工作

In [2]:
import base64
import io
from PIL import Image

def image_to_base64(image_path):
    """
    Convert an image to base64 encoding.
    """
    with Image.open(image_path) as image:
        buffered = io.BytesIO()
        image.save(buffered, format="JPEG")  # or format="PNG", depending on your image.
        img_str = base64.b64encode(buffered.getvalue()).decode()
    return f"data:image/jpeg;base64,{img_str}"


base64_image = image_to_base64("data/zR.jpg")

In [3]:
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "what is this image describe?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": base64_image
                }

            }
        ]
    }
]

我们已经组织好了信息和图片，现在让我们按照[官方文档](https://open.bigmodel.cn/dev/api#glm-4v)的内容传入对应的参数并获取模型的回答

We have organized the information and pictures, now let us follow the [official document](https://open.bigmodel.cn/dev/api#glm-4v) to pass in the corresponding parameters and get the model's answer

In [4]:
response = client.chat.completions.create(
    model="glm-4v",
    messages=messages,
    temperature=0.7,
    top_p=0.9
)

temperature:0.7
top_p:0.9
temperature:0.7
top_p:0.9


通过这个操作，我们将能得到模型对这张图的描述。

Through this operation, we will be able to get the model's description of this picture.

In [5]:
response

Completion(model='glm-4v', created=1714304990, choices=[CompletionChoice(index=0, finish_reason='stop', message=CompletionMessage(content="The image depicts a stylized illustration of a figure, viewed from the side, set against a dark background. The figure is wearing a long, flowing dress that drapes elegantly to the floor, with the hem and sleeves showing gradients of color transitioning from a light tone at the top to a darker shade towards the bottom and edges. One hand of the figure is slightly raised, holding what appears to be a violin by its neck, indicating that the figure may be a musician or a representation of one. The violin is also depicted in a simplified form, aligning with the overall abstract and artistic nature of the illustration. The movement suggested by the figure's posture and the flow of the dress gives a sense of grace and poise. The use of negative space and simplicity in design creates an elegant and sophisticated mood.", role='assistant', tool_calls=None))]

你还可以对这张图片进行更多的提问，并使用历史记录的方式保留之前的提问和回答。现在，我将为这一段对话添加一段新的历史

You can also ask more questions about this picture and use historical records to retain previous questions and answers. Now I'm going to add a new history to your conversation

In [6]:
messages += [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "what is the color of the hair?"
            }
        ]
    },
    {
        "role": "assistant",
        "content": "It is pink"
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "what is the color of hair and dress the this women?"
            }
        ]
    },
]
messages[-1]

{'role': 'user',
 'content': [{'type': 'text',
   'text': 'what is the color of hair and dress the this women?'}]}

现在，我们再次请求，看看模型返回的结果

Now, we request again and see the results returned by the model

In [7]:
response = client.chat.completions.create(
    model="glm-4v",
    messages=messages,
    temperature=0.7,
    top_p=0.9
)
response

temperature:0.7
top_p:0.9
temperature:0.7
top_p:0.9


Completion(model='glm-4v', created=1714304992, choices=[CompletionChoice(index=0, finish_reason='stop', message=CompletionMessage(content='The woman in the image has long, flowing hair that appears to be a pale pink or blush color. Her dress is dark, likely black or a very deep navy blue.', role='assistant', tool_calls=None))], request_id='8611192515567253881', id='8611192515567253881', usage=CompletionUsage(prompt_tokens=1706, completion_tokens=37, total_tokens=1743))