# Using ZhipuAI Embedding API to get the embedding of a text

This cookbook is designed to use the embedding-3 model provided by ZhipuAI to embed a piece of text. This code contains the following content:
1. Use SDK to simply call ZhipuAI’s Embedding API
2. Retrieve using vector database and Embedding

Before running this code, you need to install some additional dependencies:

In [1]:
!pip install faiss-cpu scikit-learn scipy

Collecting faiss-cpu
  Downloading faiss_cpu-1.12.0-cp310-cp310-macosx_14_0_arm64.whl.metadata (5.1 kB)
Collecting scikit-learn
  Downloading scikit_learn-1.7.1-cp310-cp310-macosx_12_0_arm64.whl.metadata (11 kB)
Collecting scipy
  Downloading scipy-1.15.3-cp310-cp310-macosx_14_0_arm64.whl.metadata (61 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.5.2-py3-none-any.whl.metadata (5.6 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Downloading faiss_cpu-1.12.0-cp310-cp310-macosx_14_0_arm64.whl (3.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m4.5 MB/s[0m  [33m0:00:00[0m eta [36m0:00:01[0m
[?25hDownloading scikit_learn-1.7.1-cp310-cp310-macosx_12_0_arm64.whl (8.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.7/8.7 MB[0m [31m5.2 MB/s[0m  [33m0:00:01[0m eta [36m0:00:01[0m
[?25hDownloading scipy-1.15.3-c

## Set the API KEY.

In [2]:
import os
from zhipuai import ZhipuAI

os.environ["ZHIPUAI_API_KEY"] = "your api key"
client = ZhipuAI()

## Simple Embedding SDK Usage
We first set up a piece of text for retrieval. We need to quantify this text. Here, we demonstrate the most basic calling method to directly quantify the entire text.

In [3]:
embedding_text = "hello world"
response = client.embeddings.create(
    model="embedding-3",
    input=embedding_text,
)

The results returned will contain instant information, namely
1. The number of tokens used, located in `response.usage.total_tokens`.
2. The vector after Embed, which defaults to a `2048` dimensional vector, or optionally 256, 512, or 1024 dimensions, is located in `response.data[0].embedding`.


In [4]:
response.usage.total_tokens

5

In [5]:
response.data[0].embedding

[0.0033012412,
 -0.0054249326,
 -0.03390876,
 0.021289637,
 -0.029081387,
 0.01166419,
 0.0037142625,
 0.000881698,
 -0.004373339,
 0.023785342,
 0.03128417,
 0.009742616,
 0.0128768915,
 0.006737227,
 0.0014096915,
 -0.0026480232,
 0.014892201,
 0.011652473,
 -0.02356272,
 -0.009291515,
 0.005386852,
 0.006174815,
 -0.0031987182,
 0.033510383,
 -0.016227929,
 -0.017341036,
 0.019039989,
 -0.04616465,
 -0.008652943,
 0.0089575825,
 0.0129003255,
 0.05356974,
 0.011470862,
 0.014775032,
 -0.0007392642,
 0.028776746,
 0.0055655353,
 -0.0014206761,
 0.030112475,
 0.013216682,
 -0.011951255,
 0.018770501,
 0.0068309624,
 -0.031565372,
 0.028331505,
 -0.0035707303,
 0.019778155,
 -0.024863297,
 0.00802023,
 0.013052645,
 -0.01234963,
 -0.005817449,
 -0.0022218204,
 0.006063504,
 -0.001835162,
 0.0015041592,
 0.00271686,
 0.012490233,
 -0.004601819,
 -0.031260733,
 -0.0042093024,
 -0.027417585,
 0.011834086,
 0.013415869,
 -0.020610057,
 0.028214335,
 -0.0024341894,
 -0.021523975,
 -0.009502

## Use vector database and search
We can start trying some more advanced operations. For example, the article is segmented and stored in the database. We selected a selection from [AGENT AI:
SURVEYING THE HORIZONS OF MULTIMODAL INTERACTION](https://arxiv.org/abs/2401.03568) and embed some article paragraphs.
Since the article is too long, we first need to segment the article. Here, we use a sequential tokenizer without any optimization to split the article into small text chunks of 150 characters each.


In [6]:
embedding_text = """
Multimodal Agent AI systems have many applications. In addition to interactive AI, grounded multimodal models could help drive content generation for bots and AI agents, and assist in productivity applications, helping to re-play, paraphrase, action prediction or synthesize 3D or 2D scenario. Fundamental advances in agent AI help contribute towards these goals and many would benefit from a greater understanding of how to model embodied and empathetic in a simulate reality or a real world. Arguably many of these applications could have positive benefits.

However, this technology could also be used by bad actors. Agent AI systems that generate content can be used to manipulate or deceive people. Therefore, it is very important that this technology is developed in accordance with responsible AI guidelines. For example, explicitly communicating to users that content is generated by an AI system and providing the user with controls in order to customize such a system. It is possible the Agent AI could be used to develop new methods to detect manipulative content - partly because it is rich with hallucination performance of large foundation model - and thus help address another real world problem.

For examples, 1) in health topic, ethical deployment of LLM and VLM agents, especially in sensitive domains like healthcare, is paramount. AI agents trained on biased data could potentially worsen health disparities by providing inaccurate diagnoses for underrepresented groups. Moreover, the handling of sensitive patient data by AI agents raises significant privacy and confidentiality concerns. 2) In the gaming industry, AI agents could transform the role of developers, shifting their focus from scripting non-player characters to refining agent learning processes. Similarly, adaptive robotic systems could redefine manufacturing roles, necessitating new skill sets rather than replacing human workers. Navigating these transitions responsibly is vital to minimize potential socio-economic disruptions.

Furthermore, the agent AI focuses on learning collaboration policy in simulation and there is some risk if directly applying the policy to the real world due to the distribution shift. Robust testing and continual safety monitoring mechanisms should be put in place to minimize risks of unpredictable behaviors in real-world scenarios. Our “VideoAnalytica" dataset is collected from the Internet and considering which is not a fully representative source, so we already go through-ed the ethical review and legal process from both Microsoft and University Washington. Be that as it may, we also need to understand biases that might exist in this corpus. Data distributions can be characterized in many ways. In this workshop, we have captured how the agent level distribution in our dataset is different from other existing datasets. However, there is much more than could be included in a single dataset or workshop. We would argue that there is a need for more approaches or discussion linked to real tasks or topics and that by making these data or system available.

We will dedicate a segment of our project to discussing these ethical issues, exploring potential mitigation strategies, and deploying a responsible multi-modal AI agent. We hope to help more researchers answer these questions together via this paper.

"""

chunk_size = 150
chunks = [embedding_text[i:i + chunk_size] for i in range(0, len(embedding_text), chunk_size)]
chunks

['\nMultimodal Agent AI systems have many applications. In addition to interactive AI, grounded multimodal models could help drive content generation for',
 ' bots and AI agents, and assist in productivity applications, helping to re-play, paraphrase, action prediction or synthesize 3D or 2D scenario. Funda',
 'mental advances in agent AI help contribute towards these goals and many would benefit from a greater understanding of how to model embodied and empat',
 'hetic in a simulate reality or a real world. Arguably many of these applications could have positive benefits.\n\nHowever, this technology could also be',
 ' used by bad actors. Agent AI systems that generate content can be used to manipulate or deceive people. Therefore, it is very important that this tec',
 'hnology is developed in accordance with responsible AI guidelines. For example, explicitly communicating to users that content is generated by an AI s',
 'ystem and providing the user with controls in order to customize 

Next, we Embedding these small text blocks to obtain a 2048-dimensional vector. We then store these vectors into a vector database for subsequent retrieval.


In [7]:
from sklearn.preprocessing import normalize
import numpy as np
import faiss

response = client.embeddings.create(
        model="embedding-3",
        input=chunks,
    )

embeddings = []
for Embedding in response.data:
    embedding = Embedding.embedding
    embeddings.append(embedding)

normalized_embeddings = normalize(np.array(embeddings).astype('float32'))
d = 2048
index = faiss.IndexFlatIP(d)
index.add(normalized_embeddings)

n_vectors = index.ntotal

n_vectors

23

## Search
We can use the vector database for retrieval. The following code implements a function called `match_text`, whose purpose is to find the text block in a text collection that is most similar to the given input text.
where `k` is the number of similar text blocks to be returned.

In [8]:
from sklearn.preprocessing import normalize


def match_text(input_text, index, chunks, k=2):
    k = min(k, len(chunks))

    response = client.embeddings.create(
        model="embedding-3",
        input=input_text,
    )
    input_embedding = response.data[0].embedding
    input_embedding = normalize(np.array([input_embedding]).astype('float32'))

    distances, indices = index.search(input_embedding, k)

    for i, idx in enumerate(indices[0]):
        print(f"similarity: {distances[0][i]:.4f}\nmatching text: \n{chunks[idx]}\n")


We can use this function to retrieve some text. For example, we can retrieve some text blocks that are most similar to "VideoAnalytica dataset".

In [9]:
input_text = "VideoAnalytica dataset"

matched_texts = match_text(input_text=input_text, index=index, chunks=chunks, k=2)

similarity: 0.5165
matching text: 
oring mechanisms should be put in place to minimize risks of unpredictable behaviors in real-world scenarios. Our “VideoAnalytica" dataset is collecte

similarity: 0.4220
matching text: 
 be characterized in many ways. In this workshop, we have captured how the agent level distribution in our dataset is different from other existing da

