# Using ZhipuAI Embedding API to get the embedding of a text

**This tutorial is available in English and is attached below the Chinese explanation**

本代码旨在使用 ZhipuAI 提供的 embedding-3 模型对一段文本进行 Embedding。此代码包含了以下内容：
1. 使用 SDK 简单调用 ZhipuAI 的 Embedding API
2. 使用向量数据库和 Embedding 进行检索

This cookbook is designed to use the embedding-3 model provided by ZhipuAI to embed a piece of text. This code contains the following content:
1. Use SDK to simply call ZhipuAI’s Embedding API
2. Retrieve using vector database and Embedding

运行本代码之前，你需要安装一些额外的依赖：

Before running this code, you need to install some additional dependencies:

In [1]:
!pip install faiss-cpu scikit-learn scipy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


接着，设置好 API KEY，然后就可以开始了。

Next, set the API KEY and you’re ready to go.

In [2]:
import os
from zhipuai import ZhipuAI

os.environ["ZHIPUAI_API_KEY"] = "your api key"
client = ZhipuAI()

## Simple Embedding SDK Usage
我们先设置一段用于检索的文本,我们需要将这段文本向量化，这里，我们演示了最基础的调用方式，直接对整个文本进行向量化。

We first set up a piece of text for retrieval. We need to quantify this text. Here, we demonstrate the most basic calling method to directly quantify the entire text.

In [3]:
embedding_text = "hello world"
response = client.embeddings.create(
    model="embedding-3",
    input=embedding_text,
)

返回的结果将会包含两组信息，分别是
1. 使用的 token 数量，位于 `response.usage.total_tokens` 中。
2. Embed 后的向量，默认为 `2048` 维的向量，也可以选择256、512或1024维度，位于 `response.data[0].embedding` 中。

The results returned will contain instant information, namely
1. The number of tokens used, located in `response.usage.total_tokens`.
2. The vector after Embed, which defaults to a `2048` dimensional vector, or optionally 256, 512, or 1024 dimensions, is located in `response.data[0].embedding`.


In [4]:
response.usage.total_tokens

5

In [5]:
response.data[0].embedding

[0.0032958984,
 -0.005367279,
 -0.03390503,
 0.021347046,
 -0.029006958,
 0.011680603,
 0.0037059784,
 0.00084733963,
 -0.0043754578,
 0.02381897,
 0.031280518,
 0.009757996,
 0.012886047,
 0.006706238,
 0.0014133453,
 -0.0026474,
 0.0149002075,
 0.011657715,
 -0.023590088,
 -0.009277344,
 0.0054016113,
 0.0061683655,
 -0.0032138824,
 0.033477783,
 -0.016235352,
 -0.017349243,
 0.019012451,
 -0.046142578,
 -0.008666992,
 0.00894928,
 0.012924194,
 0.05355835,
 0.011459351,
 0.01474762,
 -0.0007472038,
 0.028686523,
 0.0055389404,
 -0.0014095306,
 0.03012085,
 0.013252258,
 -0.01197052,
 0.018737793,
 0.0067863464,
 -0.031555176,
 0.028305054,
 -0.0036087036,
 0.019821167,
 -0.024841309,
 0.008041382,
 0.013023376,
 -0.012321472,
 -0.0058135986,
 -0.002231598,
 0.0060272217,
 -0.0018377304,
 0.0014848709,
 0.0027179718,
 0.0124435425,
 -0.004611969,
 -0.031280518,
 -0.00415802,
 -0.027496338,
 0.0118255615,
 0.013374329,
 -0.020614624,
 0.028182983,
 -0.0024375916,
 -0.021514893,
 -0.00

## Use vector database and search

我们可以开始尝试一些更高级的操作。例如对文章进行分词后存入到数据库。我们选取了来自[AGENT AI:
SURVEYING THE HORIZONS OF MULTIMODAL INTERACTION](https://arxiv.org/abs/2401.03568)的部分文章段落并进行嵌入。
由于文章太长，我们先要对文章进行分词。在这里，我们使用没有任何优化的顺序分词器，将文章分成了 150 个字符一段的小文本块。

We can start trying some more advanced operations. For example, the article is segmented and stored in the database. We selected a selection from [AGENT AI:
SURVEYING THE HORIZONS OF MULTIMODAL INTERACTION](https://arxiv.org/abs/2401.03568) and embed some article paragraphs.
Since the article is too long, we first need to segment the article. Here, we use a sequential tokenizer without any optimization to split the article into small text chunks of 150 characters each.


In [6]:
embedding_text = """
Multimodal Agent AI systems have many applications. In addition to interactive AI, grounded multimodal models could help drive content generation for bots and AI agents, and assist in productivity applications, helping to re-play, paraphrase, action prediction or synthesize 3D or 2D scenario. Fundamental advances in agent AI help contribute towards these goals and many would benefit from a greater understanding of how to model embodied and empathetic in a simulate reality or a real world. Arguably many of these applications could have positive benefits.

However, this technology could also be used by bad actors. Agent AI systems that generate content can be used to manipulate or deceive people. Therefore, it is very important that this technology is developed in accordance with responsible AI guidelines. For example, explicitly communicating to users that content is generated by an AI system and providing the user with controls in order to customize such a system. It is possible the Agent AI could be used to develop new methods to detect manipulative content - partly because it is rich with hallucination performance of large foundation model - and thus help address another real world problem.

For examples, 1) in health topic, ethical deployment of LLM and VLM agents, especially in sensitive domains like healthcare, is paramount. AI agents trained on biased data could potentially worsen health disparities by providing inaccurate diagnoses for underrepresented groups. Moreover, the handling of sensitive patient data by AI agents raises significant privacy and confidentiality concerns. 2) In the gaming industry, AI agents could transform the role of developers, shifting their focus from scripting non-player characters to refining agent learning processes. Similarly, adaptive robotic systems could redefine manufacturing roles, necessitating new skill sets rather than replacing human workers. Navigating these transitions responsibly is vital to minimize potential socio-economic disruptions.

Furthermore, the agent AI focuses on learning collaboration policy in simulation and there is some risk if directly applying the policy to the real world due to the distribution shift. Robust testing and continual safety monitoring mechanisms should be put in place to minimize risks of unpredictable behaviors in real-world scenarios. Our “VideoAnalytica" dataset is collected from the Internet and considering which is not a fully representative source, so we already go through-ed the ethical review and legal process from both Microsoft and University Washington. Be that as it may, we also need to understand biases that might exist in this corpus. Data distributions can be characterized in many ways. In this workshop, we have captured how the agent level distribution in our dataset is different from other existing datasets. However, there is much more than could be included in a single dataset or workshop. We would argue that there is a need for more approaches or discussion linked to real tasks or topics and that by making these data or system available.

We will dedicate a segment of our project to discussing these ethical issues, exploring potential mitigation strategies, and deploying a responsible multi-modal AI agent. We hope to help more researchers answer these questions together via this paper.

"""

chunk_size = 150
chunks = [embedding_text[i:i + chunk_size] for i in range(0, len(embedding_text), chunk_size)]
chunks

['\nMultimodal Agent AI systems have many applications. In addition to interactive AI, grounded multimodal models could help drive content generation for',
 ' bots and AI agents, and assist in productivity applications, helping to re-play, paraphrase, action prediction or synthesize 3D or 2D scenario. Funda',
 'mental advances in agent AI help contribute towards these goals and many would benefit from a greater understanding of how to model embodied and empat',
 'hetic in a simulate reality or a real world. Arguably many of these applications could have positive benefits.\n\nHowever, this technology could also be',
 ' used by bad actors. Agent AI systems that generate content can be used to manipulate or deceive people. Therefore, it is very important that this tec',
 'hnology is developed in accordance with responsible AI guidelines. For example, explicitly communicating to users that content is generated by an AI s',
 'ystem and providing the user with controls in order to customize 

接着，我们将这些小文本块进行 Embedding，得到一个 2048 维的向量。然后，我们将这些向量存入到一个向量数据库中，以便后续进行检索。

Next, we Embedding these small text blocks to obtain a 2048-dimensional vector. We then store these vectors into a vector database for subsequent retrieval.


In [7]:
from sklearn.preprocessing import normalize
import numpy as np
import faiss

response = client.embeddings.create(
        model="embedding-3",
        input=chunks,
    )

embeddings = []
for Embedding in response.data:
    embedding = Embedding.embedding
    embeddings.append(embedding)

normalized_embeddings = normalize(np.array(embeddings).astype('float32'))
d = 2048
index = faiss.IndexFlatIP(d)
index.add(normalized_embeddings)

n_vectors = index.ntotal

n_vectors

23

## Search

我们可以使用向量数据库进行检索。下面代码实现了一个名为`match_text`的函数，其目的是在一个文本集合中找到与给定输入文本最相似的文本块。
其中 `k`是要返回的相似文本块的数量。

We can use the vector database for retrieval. The following code implements a function called `match_text`, whose purpose is to find the text block in a text collection that is most similar to the given input text.
where `k` is the number of similar text blocks to be returned.

In [8]:
from sklearn.preprocessing import normalize


def match_text(input_text, index, chunks, k=2):
    k = min(k, len(chunks))

    response = client.embeddings.create(
        model="embedding-3",
        input=input_text,
    )
    input_embedding = response.data[0].embedding
    input_embedding = normalize(np.array([input_embedding]).astype('float32'))

    distances, indices = index.search(input_embedding, k)

    for i, idx in enumerate(indices[0]):
        print(f"similarity: {distances[0][i]:.4f}\nmatching text: \n{chunks[idx]}\n")


我们可以使用这个函数来检索一些文本。例如，我们可以检索一些与“VideoAnalytica dataset”最相似的文本块。

We can use this function to retrieve some text. For example, we can retrieve some text blocks that are most similar to "VideoAnalytica dataset".

In [9]:
input_text = "VideoAnalytica dataset"

matched_texts = match_text(input_text=input_text, index=index, chunks=chunks, k=2)

similarity: 0.5168
matching text: 
oring mechanisms should be put in place to minimize risks of unpredictable behaviors in real-world scenarios. Our “VideoAnalytica" dataset is collecte

similarity: 0.4217
matching text: 
 be characterized in many ways. In this workshop, we have captured how the agent level distribution in our dataset is different from other existing da

