 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LotsoTeddy/ark-samples/tutorial.ipynb)



<hr/>
<img src="https://portal.volccdn.com/obj/volcfe/logo/appbar_logo_dark.2.svg?sanitize=true" align=center>
<hr/>

# Introduction

This is a **novice-friendly tutorial** for Volengine ARK SDK and API. This tutorial is able to help you to build your own intelligent applications through agent, knowledge base, and other amazing features.

[Volengine ARK](https://www.volcengine.com/product/ark) provides a development platform with large model services, offering feature-rich, secure and price-competitive model calling services, as well as end-to-end functions such as model data, fine-tuning, reasoning, evaluation, and so on, to comprehensively guarantee your AI application development landing.

## Overview

### Why ARK?

ARK is a platform that supports multiple kinds of models running. Ark has the following advantages:

- **Security and Mutual Trust**: Large model security and trust program strictly protects the model and information security of model providers and model users, click to view the white paper on security and mutual trust.
- **Selected Models**: Supporting multi-industry models for various business scenarios, providing rich platform applications and tools to help you build your own innovative scenarios.
- **Strong Arithmetic Power**: Based on the volcano's Wanka resource pool, we provide sufficient high-performance GPU resources to provide you with end-to-end modeling services including model fine-tuning, evaluation, and inference.
- **Enterprise-level services**: provide professional service system support, professional product operation and sales delivery services to meet the needs of enterprise application construction and delivery.

### Productions

- Large models (e.g., Doubao-*, Deepseek-*, etc.)
- Knowledge base
- ...

## Setup

### Installation

 Install Volcengine ARK SDK and ARK Agent SDK via `pip`. 
 
 The source code of ARK SDK is available [here](https://github.com/volcengine/volcengine-python-sdk).
 The source code of ARK Agent SDK is available [here](https://github.com/volcengine/ai-app-lab/tree/main/arkitect).

In [None]:
%pip install 'volcengine-python-sdk[ark]' -q
%pip install arkitect -q
%pip install chromadb -q

### Authentication

Before running this tutorial, you should generate your ARK API KEY (see [here](https://www.volcengine.com/docs/82379/1541594)).

#### Notebook

In this tutorial, set `YOUR_ARK_API_KEY` as an environment and a global variable:

In [None]:
import os

os.environ["ARK_API_KEY"] = "YOUR_ARK_API_KEY"
ARK_API_KEY = os.environ["ARK_API_KEY"]

#### Google Colab

If you run this tutorial in Google Colab, you can set your ARK api key followed by [here](). Then run this code:

In [None]:
import os

from google.colab import userdata

os.environ["ARK_API_KEY"] = userdata.get("ARK_API_KEY")
ARK_API_KEY = os.environ["ARK_API_KEY"]

## Quickstart

In this tutorial, we define some default large models for different tasks. You can change your referenced models here.

In [None]:
# for text processing
DEFAULT_LLM = "doubao-1.5-pro-32k-250115"

# for image understanding
DEFAULT_VLM = "doubao-1.5-vision-pro-32k-250115"

# for video generation
VIDEO_GENERATION_LM = "doubao-seedance-1-0-lite-i2v-250428"

# for text embedding, when building RAG
EMBEDDING_MODEL = "doubao-embedding-text-240715"

Simplest, you can chat with a model by the chat completion interface:

In [None]:
from volcenginesdkarkruntime import Ark

client = Ark(api_key=ARK_API_KEY)

response = client.chat.completions.create(
    model=DEFAULT_LLM,
    messages=[{"role": "user", "content": "Slogan of Bytedance?"}],
)

print(response.choices[0].message.content)

Furthermore, you can send a *system prompt* by specifying the role as *system*, which can help you to control the behavior of the model. For example, you can use the system prompt to tell the model to do some translations:

In [None]:
response = client.chat.completions.create(
    model=DEFAULT_LLM,
    messages=[
        {
            "role": "system",
            "content": "Translate the input text from English to Chinese, French, and Japanese.",
        },
        {"role": "user", "content": "Inspire Creativity, Enrich Life!"},
    ],
)

print(response.choices[0].message.content)

# Basic Usage

This section introduces the basic usage and features of ARK SDK.

## Overview

ARK's model family includes a wide range of models. Here we list some primary models and its abilities:

| Model ID                                      | Image Understanding | Video Generation | Function Calling |
|-----------------------------------------------|---------------------|------------------|------------------|
| doubao-1-5-pro-256k-250115                    |                     |                | ✅               |
| doubao-1-5-thinking-pro-250415                |                     |                | ✅                 |
| doubao-1-5-thinking-pro-m-250415              | ✅                  |                | ✅               |
| doubao-1.5-vision-pro-250328                  | ✅                  |                  |                  |
| doubao-seedance-1-0-lite-i2v-250428           |                     | ✅               |                  |
| deepseek-r1-250120                            |                   |                  |✅                  |
| deepseek-v3-250324                            |                   |                  |✅                  |
| doubao-1-5-pro-32k-250115                     |                   |                  |✅                  |
| doubao-1-5-lite-32k-250115                    |                   |                  |✅                  |

The full API reference can be found [here](https://www.volcengine.com/docs/82379/).

## Text Capabilities

### Single-turn Chat

Single-turn chat is the simplest form of interaction with a large language model. Single-turn chat generally without any context information. For example:

In [None]:
response = client.chat.completions.create(
    model=DEFAULT_LLM, messages=[{"role": "user", "content": "Who are you?"}]
)

print(response.choices[0].message.content)

### Multi-turn Chat

Generally, multi-turn chat generally comes with context, e.g., user's historical messages and model's response. For example: 

In [None]:
# The first turn chat
response = client.chat.completions.create(
    model=DEFAULT_LLM,
    messages=[{"role": "user", "content": "Your name is Bytedancer."}],
)
content = response.choices[0].message.content

print("The first turn response:")
print(response.choices[0].message.content)

# The second turn chat
# In this turn, we carry the model response (`content`) from the last turn
response = client.chat.completions.create(
    model=DEFAULT_LLM,
    messages=[
        {"role": "user", "content": "Your name is Bytedancer."},
        {"role": "assistant", "content": content},
        {"role": "user", "content": "Do you remember your name?"},
    ],
)

print("\nThe second turn response:")
print(response.choices[0].message.content)

### Stream Chat

Stream chat (i.e., make model response to be streaming) can reduce the user's waiting time when the model's output is too long. You can enable stream chat by setting the `stream` as `True`, then the output will be printed gradually:

In [None]:
stream = client.chat.completions.create(
    model=DEFAULT_LLM,
    messages=[
        {"role": "system", "content": "You are a model assistant"},
        {
            "role": "user",
            "content": "Please help me to write an introduction of Bytedance with nearly 300 words.",
        },
    ],
    stream=True,  # streaming output
)

for chunk in stream:
    if not chunk.choices:
        continue
    print(chunk.choices[0].delta.content, end="")

## Vision Capabilities

ARK provides capabilities about multi-media, such as vision and sounds. Here we introduce the vision-related demos. The vision-related task is divided into image understanding and video generation:
- Image understanding: this task can read information from one or several images and return the content to the user
- Video generation: this task can generate video from text and images

### Image understanding

We use the default vision model to understand the following image:

![demo_image](https://ark-tutorial.tos-cn-beijing.volces.com/assets/images/cat.png)

In [None]:
IMAGE_PATH = "https://ark-tutorial.tos-cn-beijing.volces.com/assets/images/cat.png"

response = client.chat.completions.create(
    model=DEFAULT_VLM,
    messages=[
        {
            "role": "user",
            "content": [
                {"text": "Please describe this image with details.", "type": "text"},
                {"image_url": {"url": IMAGE_PATH}, "type": "image_url"},
            ],
        }
    ],
)

print(response.choices[0].message.content)

### Video generation

The following demo shows generating a video according to a static image and prompt.

The video generation is asynchronous, hence the generation goes through two stages:
1. Send generation request
   - Input: prompt, image (optional), and other parameters
   - Output: generation task ID
2. Check the status of the generation

The entire process is shown in the following code snippet:

In [None]:
import time

print("1. Send generation request")
response = client.content_generation.tasks.create(
    model="doubao-seaweed-241128",
    content=[
        {
            "text": "Please generate a video with a cat running. --ratio 16:9",
            "type": "text",
        }
    ],
)
tid = response.id
print(f"Video generation task {tid} submitted.")

print("\n2. Check the status of the generation")
MAX_RETRIES = 100
for _ in range(MAX_RETRIES):
    response = client.content_generation.tasks.get(task_id=tid)
    status = response.status

    if status == "succeeded":
        print(
            f"Successfully! Your video can be download from {response.content.video_url}"
        )
        break
    else:
        print(f"Current status: {status}")

    time.sleep(10)  # check every 10 seconds

 For more models that support video generation, you can visit [here](https://www.volcengine.com/docs/82379/1366799#%E6%94%AF%E6%8C%81%E6%A8%A1%E5%9E%8B).
 
 If you want to make the video more vivid, maybe you need [prompt refine](https://www.promptrefine.com/prompt/new).

# [WIP] Agent

Here we introduce the architecture and key concepts of Arkitect.

- `Context`: Maintain the conversation state and coordinate the LLM call and tool execution logic.

## Minimal Agent

You can build a minimal agent through the following code:

In [None]:
from arkitect.core.component.context.context import Context

# initialize context
ctx = Context(model=DEFAULT_LLM)
await ctx.init()

agent_name = "Meeting assistant"
message = "who are you?"
completion = await ctx.completions.create(
    [
        {"role": "system", "content": f"your name is {agent_name}"},
        {"role": "user", "content": message},
    ],
)
print(completion.choices[0].message.content)


## Tool

Agent uses a tool by function calling to finish a task.

### Function Tool

We use python tool to implement some functions.

In [None]:
from arkitect.core.component.context.context import Context
from arkitect.core.component.context.model import ToolChunk


def get_weather(city: str, next_n_days: int) -> str:
    """get the weather of a city

    Args:
        city (str): city name
        next_n_days (int): next n days. Need to be a positive integer.

    Returns:
        str: description of next_n_days' weather
    """
    print("[Tool] Invoke get_weather tool.")
    return "Weather at {} is sunny".format(city)


async def context_chat_with_tools(message: str):
    ctx = Context(
        model="doubao-1.5-pro-32k-250115",
        tools=[get_weather],  # function call here.
    )
    await ctx.init()

    completion = await ctx.completions.create([{"role": "user", "content": message}])

    async for chunk in completion:
        if not isinstance(chunk, ToolChunk):
            print(chunk.choices[0].delta.content, end="")


await context_chat_with_tools("What's the weather like in Beijing tomorrow?")

### Build-in Tool

We provide some built-in tools to finish some common tasks.

#### Link Reader

[TODO]()

#### Calculator

[TODO]()

### MCP Tool

We can connect a MCP server to use its tool. Here we list some tools provided by XXX MCP server.

**MCP Client**

In [None]:
from arkitect.core.component.context.context import Context
from arkitect.core.component.tool.mcp_client import MCPClient
from arkitect.core.component.context.model import ToolChunk


async def context_chat_with_tools_with_mcp_clients():
    mcp_client = MCPClient(
        name="TimeTools",
        command="python",
        arguments=["-m", "mcp_server_time", "--local-timezone", "Asia/Shanghai"],
    )

    first_round_message = "What time is it in Beijing time now?"
    ctx = Context(
        model="doubao-1.5-pro-32k-250115",
        tools=[mcp_client],
    )
    await ctx.init()

    completion = await ctx.completions.create(
        [{"role": "user", "content": first_round_message}], stream=True
    )
    async for chunk in completion:
        if isinstance(chunk, ToolChunk):
            continue
        else:
            print(chunk.choices[0].delta.content, end="")
    await mcp_client.cleanup()  # Pay attention to cleanup!!!


await context_chat_with_tools_with_mcp_clients()

## RAG

### Build a Knowledge Base

We use `chromadb` to implement vector database.

Building a simple knowledge base needs the following steps:

1. Initialize chromadb vector database
2. Prepare your data
3. Embedding your data from raw/human-friendly format to vector format
4. Indexing the data vector
5. Creating a function for data searching

**Initialization**

In [None]:
import chromadb

# create a `chromadb` client
chroma_client = chromadb.Client()
# create a collection (i.e., table in traditional database) in client
collection = chroma_client.create_collection("sample")

**Data Preparation**

Here we prepare a list of event happened in some years:

In [None]:
data_list = [
    "In 1936, Alan Turing proposed the Turing machine model, laying the theoretical foundation for modern computers;",
    "In 1949, Maurice Wilkes completed EDSAC, the first electronic computer to implement the stored-program concept.",
    "In 1957, John Backus and his team developed FORTRAN, the first widely used high-level programming language.",
    "In 1965, Gordon Moore proposed Moore's Law, predicting that the number of transistors in integrated circuits would double approximately every two years.",
    "In 1969, Ken Thompson and Dennis Ritchie developed the Unix operating system at Bell Labs, which was written in the C programming language.",
    "In 1984, Richard Stallman released the GNU General Public License (GPL), driving the free software movement.",
    "In 1991, Linus Torvalds created the Linux kernel, which was released under the GPL license.",
    "In 2000, Fabrice Bellard developed FFmpeg, an open-source multimedia framework supporting audio/video codecs and streaming processing.",
    "In 2012, Geoffrey Hinton's team used the deep convolutional network AlexNet in the ImageNet competition, sparking the resurgence of deep learning.",
    "In 2017, Ashish Vaswani and colleagues published the paper *Attention Is All You Need*, introducing the Transformer architecture that revolutionized natural language processing.",
]

**Embedding**

Then we embed the text to vertors using the *embedding model*.

In [None]:
reponse = client.embeddings.create(model=EMBEDDING_MODEL, input=data_list)
embedding_list = [response.data[i].embedding for i in range(len(response.data))]

**Indexing**

The embedding text should be added into the collection.

In [None]:
import uuid

collection.add(
    ids=[str(uuid.uuid4()) for i in range(len(data_list))],
    documents=data_list,
    embeddings=embedding_list,
)

**Search function**

Build a search interface to search for a specific string in a text file.

In [None]:
def search_vb(query: str) -> list[str]:
    """Retrieve documents similar to the query text in the vector database.

    Args:
        query (str): The query text to be retrieved (e.g., "Who proposed the Turing machine model?")

    Returns:
        list[str]: A list of the top 2 most similar document contents retrieved (sorted by vector similarity)
    """
    # We retrieve the top 2 most similar documents from the vector database
    TOP_N = 2

    # We need to embed the input string to realize vector similarity search
    response = client.embeddings.create(model=EMBEDDING_MODEL, input=[query])

    result = collection.query(
        query_embeddings=response.data[0].embedding, n_results=TOP_N
    )
    return result["documents"]

## Equip to AgentAW

The knowledge base should be equipped to enable RAG.

In [None]:
from arkitect.core.component.context.context import Context
from arkitect.core.component.context.model import ToolChunk


async def context_chat_with_vb(message: str):
    ctx = Context(
        model="doubao-1.5-pro-32k-250115",
        tools=[search_vb],  # function call here.
    )
    await ctx.init()

    completion = await ctx.completions.create([{"role": "user", "content": message}])

    async for chunk in completion:
        if not isinstance(chunk, ToolChunk):
            print(chunk.choices[0].delta.content, end="")


await context_chat_with_vb("What did Hinton and his team do in 2012?")

## Workflow

### Sequencial

In [None]:
# code

### Parallel

In [None]:
# code

### ...

In [None]:
# code

### Advanced

In [None]:
# code

## Context Management

In [None]:
# code

## Callback

### Function calling Callback

In [None]:
# code

### LLM callback

In [None]:
# code

### ...

## Human-in-the-loop

In [None]:
# code

# [WIP] Samples

## Custom service

This demo shows how to create a custom service.

### Definition

**Task**: Receive a message from a user and send a response according to preset question/answer pairs.

**Input**: A message from a user.

**Output**: A response to the user's message.

### Workflow

1. Receive user's message
2. Retrieve relevant documents from knowledge base (i.e.,vector database)
3. Generate a response using Doubao LLM

### Components

**Knowledge base**: A collection of question/answer pairs.

**Tools**: `xxx`, `xxx`, and `xxx` tools for xxx.

### Steps

**Build knowledge base**

We build a knowledge base from the documents.

In [None]:
# build something

## Information summarizer

In [None]:
# code

## Recommendation engine

In [None]:
# code

## Platform monitor

In [None]:
# code

# [WIP] Compatibility

## OpenAI API

Reference [here](https://www.volcengine.com/docs/82379/1330626)