# Quick start with OceanBase vector search

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Jackieqwj/notebooks/blob/main/ob-doc-ipynb/3200.ob-vector-search-quick-start.ipynb)

This notebook demonstrates how to implement vector search using OceanBase Database. You will learn how to create vector tables, insert vector data, and perform similarity searches using SQL.

OceanBase supports efficient vector search directly with SQL. OceanBase Database's vector search is built for multi-modal integration, offering unified queries, scalability, high performance, high availability, low cost, multi-tenancy, and data security. For more details, see [Overview of vector search](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001976351).

> **Note:** This tutorial uses MySQL-compatible mode as an example.

> **ðŸ’¡ Tip:** The database connection is pre-configured, so you can run all cells directly without any modifications!

## Prerequisites

Before you begin, ensure you have:

- Python 3.7 or higher installed
- Network access to the OceanBase Database instance

## Install Python requirements

Install the required Python dependencies.

In [None]:
%pip install pymysql

## Configure database connection

Configure the database connection parameters below. If you need to use your own database, you can modify these values.


In [None]:
import pymysql

host = "obmt7bftsnwuc9z4-mi.aliyun-cn-hangzhou-internet.oceanbase.cloud"
port = 3306
user = "jackietest3"
password = "Nl]03?yN"
database = "jackic-test-3"

print("âœ… Configuration loaded")
print(f"   Host: {host}")
print(f"   Port: {port}")
print(f"   User: {user}")
print(f"   Database: {database}")

## Connect to the database

Run the cell below to connect to the database and verify the connection.

In [None]:
conn = pymysql.connect(
    host=host,
    port=port,
    user=user,
    password=password,
    database=database,
    charset="utf8mb4"
)
cursor = conn.cursor()

# Verify connection
cursor.execute("SELECT VERSION()")
version = cursor.fetchone()[0]

print("âœ… Database connection successful!")
print(f"   Host: {host}")
print(f"   Port: {port}")
print(f"   User: {user}")
print(f"   Database: {database}")
print(f"   Version: {version}")

## Create a vector table

Create a table with a vector column and vector index. Use the `VECTOR(dim)` data type to declare a vector column and specify its dimension. Create a vector index on this column, specifying at least the `type` and `distance` parameters.

This example creates a vector column called `embedding` with dimension `3`, and adds an HNSW index using `L2` distance.

In [None]:
TABLE_NAME = "t1"

cursor.execute(f"USE `{database}`")
cursor.execute(f"DROP TABLE IF EXISTS {TABLE_NAME}")
conn.commit()

CREATE_TABLE_SQL = f"""
CREATE TABLE {TABLE_NAME}( 
    id INT PRIMARY KEY, 
    doc VARCHAR(200), 
    embedding VECTOR(3), 
    VECTOR INDEX idx1(embedding) WITH (distance=L2, type=hnsw) 
)
"""

cursor.execute(CREATE_TABLE_SQL)
conn.commit()

print(f"âœ… Created table {TABLE_NAME}")
print("   Table schema:")
print("   - id: INT PRIMARY KEY")
print("   - doc: VARCHAR(200)")
print("   - embedding: VECTOR(3)")
print("   - Vector index: idx1 (L2 distance, HNSW type)")


## Insert vector data

Insert sample vector data into the table. Each row includes an ID, a description, and its corresponding vector embedding.

In [None]:
sample_data = [
    (1, 'Apple', '[1.2,0.7,1.1]'),
    (2, 'Banana', '[0.6,1.2,0.8]'),
    (3, 'Orange', '[1.1,1.1,0.9]'),
    (4, 'Carrot', '[5.3,4.8,5.4]'),
    (5, 'Spinach', '[4.9,5.3,4.8]'),
    (6, 'Tomato', '[5.2,4.9,5.1]')
]

cursor.execute(f"USE `{database}`")
insert_sql = f"INSERT INTO {TABLE_NAME} VALUES (%s, %s, %s)"
cursor.executemany(insert_sql, sample_data)
conn.commit()

# Display inserted data
cursor.execute(f"SELECT * FROM {TABLE_NAME}")
results = cursor.fetchall()

print(f"âœ… Inserted {len(results)} rows")
print("\nInserted data:")
print("+" + "-" * 4 + "+" + "-" * 12 + "+" + "-" * 20 + "+")
print(f"| {'id':<4} | {'doc':<12} | {'embedding':<20} |")
print("+" + "-" * 4 + "+" + "-" * 12 + "+" + "-" * 20 + "+")
for row in results:
    embedding_str = str(row[2])[:18] + "..." if len(str(row[2])) > 18 else str(row[2])
    print(f"| {row[0]:<4} | {row[1]:<12} | {embedding_str:<20} |")
print("+" + "-" * 4 + "+" + "-" * 12 + "+" + "-" * 20 + "+")

## Perform a vector search

Search for similar vectors using the `l2_distance` function with the `APPROXIMATE` keyword for efficient approximate nearest neighbor search.

In [None]:
QUERY_VECTOR = '[0.9, 1.0, 0.9]'
LIMIT = 3

cursor.execute(f"USE `{database}`")
search_sql = f"""
SELECT id, doc 
FROM {TABLE_NAME} 
ORDER BY l2_distance(embedding, '{QUERY_VECTOR}') APPROXIMATE LIMIT {LIMIT}
"""

cursor.execute(search_sql)
results = cursor.fetchall()

print(f"Query vector: {QUERY_VECTOR}")
print(f"Searching for top {LIMIT} similar items...\n")
print("Search results:")
print("+" + "-" * 4 + "+" + "-" * 12 + "+")
print(f"| {'id':<4} | {'doc':<12} |")
print("+" + "-" * 4 + "+" + "-" * 12 + "+")
for row in results:
    print(f"| {row[0]:<4} | {row[1]:<12} |")
print("+" + "-" * 4 + "+" + "-" * 12 + "+")
print(f"{len(results)} rows in set")

## Summary

You have successfully completed all steps:
- âœ… Connected to OceanBase Database
- âœ… Created a vector table with HNSW index
- âœ… Inserted vector data
- âœ… Performed vector similarity search

You are now ready to use OceanBase vector search with SQL. For more advanced scenarios, refer to the [official documentation](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001976351).

## Clean up resources (optional)

After completing the experiment, you can clean up the test data and close the database connection.

In [None]:
# Clean up resources (optional)
cursor.execute(f"USE `{database}`")
cursor.execute(f"DROP TABLE IF EXISTS {TABLE_NAME}")
conn.commit()

cursor.close()
conn.close()

print("âœ… Cleanup completed")