# Python SDK to access the ZhipuAI GLM-4 Vision API

**This tutorial is available in English and is attached below the Chinese explanation**

此代码将讲述如何使用Python SDK 调用 GLM-4V API，来完成简单的视觉理解和分析工作。

This cookbook will describe how to use the Python SDK to call the GLM-4V API to complete simple visual understanding and analysis work.

In [1]:
import os

os.environ["ZHIPUAI_API_KEY"] = "your api key"

首先，我们需要将图片转为可以上传的base64格式，这里我们使用PIL库来完成这个工作

In [2]:
import base64
import io
from zhipuai import ZhipuAI
from PIL import Image

client = ZhipuAI()


def image_to_base64(image_path):
    """
    Convert an image to base64 encoding.
    """
    with Image.open(image_path) as image:
        buffered = io.BytesIO()
        image.save(buffered, format="JPEG")  # or format="PNG", depending on your image.
        img_str = base64.b64encode(buffered.getvalue()).decode()
    return img_str


base64_image = image_to_base64("data/zR.jpg")

In [3]:
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "what is this image describe?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": base64_image
                }

            }
        ]
    }
]

我们已经组织好了信息和图片，现在让我们按照[官方文档](https://open.bigmodel.cn/dev/api#glm-4v)的内容传入对应的参数并获取模型的回答

We have organized the information and pictures, now let us follow the [official document](https://open.bigmodel.cn/dev/api#glm-4v) to pass in the corresponding parameters and get the model's answer

In [4]:
response = client.chat.completions.create(
    model="glm-4v",
    messages=messages,
)

通过这个操作，我们将能得到模型对这张图的描述。

Through this operation, we will be able to get the model's description of this picture.

In [5]:
response

Completion(model='glm-4v', created=1705986870, choices=[CompletionChoice(index=0, finish_reason='stop', message=CompletionMessage(content='This is a black and white painting of a woman, with long hair flowing behind her. She wears a black dress that reaches the ground. The background is all dark.', role='assistant', tool_calls=None))], request_id='8313808876867429596', id='8313808876867429596', usage=CompletionUsage(prompt_tokens=1040, completion_tokens=35, total_tokens=1075))

你还可以对这张图片进行更多的提问，并使用历史记录的方式保留之前的提问和回答。现在，我将为这一段对话添加一段新的历史

You can also ask more questions about this picture and use historical records to retain previous questions and answers. Now I'm going to add a new history to your conversation

In [6]:
messages +=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "what is the color of the hair?"
                }
            ]
        },
        {
            "role": "assistant",
            "content": "It is pink"
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "what is the color of hair and dress the this women?"
                }
            ]
        },
    ]
messages[-1]

{'role': 'user',
 'content': [{'type': 'text',
   'text': 'what is the color of hair and dress the this women?'}]}

现在，我们再次请求，看看模型返回的结果

Now, we request again and see the results returned by the model

In [7]:
response = client.chat.completions.create(
    model="glm-4v",
    messages=messages,
)
response

Completion(model='glm-4v', created=1705986872, choices=[CompletionChoice(index=0, finish_reason='stop', message=CompletionMessage(content='The woman has pink hair and a black dress.', role='assistant', tool_calls=None))], request_id='8313807158880493247', id='8313807158880493247', usage=CompletionUsage(prompt_tokens=1078, completion_tokens=11, total_tokens=1089))