# SQL Server

## Introduction

In this notebook, we will demonstrate the use of the SQL server workflow.
This workflow runs a server that provides access to an ApertureDB instance via queries in the SQL language, specifically the PostgreSQL dialect.

## Setup

In order to run this notebook, you will need to be running the SQL server workflow.
You can do this conveniently in the Cloud UI.

You will also need to have some data stored in your ApertureDB, perhaps from running the "Website Chatbot" or "Dataset Ingestion" workflows.

You will also need to know the hostname and password for the SQL server.
You can find them in the "Connection Helper" dialog in the Cloud UI.

In [15]:
import psycopg
from getpass import getpass
import os
import json
import pandas as pd

## Enter Password

In [3]:
password = getpass("password")

## Set up the client connection

We're using the `psycipg` PostgreSQL database adaptor. 

In [35]:
host = "<DB-HOST>"
host = "workflow-test8-76fskh9x.farm0004.cloud.aperturedata.dev"
database = "aperturedb"
user = "aperturedb"

def run_query(query):
    """Run a query and return the results."""
    with psycopg.connect(
            f"dbname={database} user={user} password={password} host={host}") as conn:
        with conn.cursor() as cur:
            cur.execute(query)
            return pd.DataFrame(cur.fetchall(), columns=[desc[0] for desc in cur.description])

## Test the connection

If everything is set up correctly, this will print:
```
   ping
0	1
```

In [36]:

run_query("SELECT 1 AS ping")

Unnamed: 0,ping
0,1


## List tables

Let's find out what tables the SQL server has.

Our tables are divided up into four schemata:
* `system`: This schema contains a table for every system object type, including one for `Entity` and one for `Connection`. These correspond to the `Find...` commands in the ApertureDB Query Language.
* `entity`: This schema contains a table for every user-defined entity class. These correspond to the values you can use with `with_class` in `FindEntity` commands.
* `connection`: This schema contains a table for every connection class (whether user-defined or system-defined). These correspond to the values you can use with `with_class` in `FindConnection` commands or the `connection_class` field in the `is_connected_to` parameter.
* `descriptor`: This schema contains a table for every descriptor set, effectively the values you can use with the `set` parameter in `FindDescriptor`.

Note that it isn't necessary to give the schema explicitly when referring to a table, unless the table name is ambiguous, because all four schemata are on the search path.
You will generally have to enclose table names in double quotes because their names are mixed case and can contain special characters, whereas unquoted identifiers in SQL are somewhat restricted.

In [37]:
def list_tables(conn, schema):
    return run_query(f"""
        SELECT table_name
        FROM information_schema.tables
        WHERE table_schema = {schema!r}
        ORDER BY table_name;
        """)

for schema in ['system', 'entity', 'connection', 'descriptor']:
    tables = list_tables(conn, schema)
    print(f"Tables in schema '{schema}' ({len(tables)} total):")
    display(tables.head())


Tables in schema 'system' (5 total):


Unnamed: 0,table_name
0,Blob
1,Connection
2,Descriptor
3,DescriptorSet
4,Entity


Tables in schema 'entity' (11 total):


Unnamed: 0,table_name
0,CrawlDocument
1,CrawlRun
2,CrawlSpec
3,EmbeddingsRun
4,EmbeddingsSpec


Tables in schema 'connection' (19 total):


Unnamed: 0,table_name
0,WorkflowCreated
1,_DescriptorConnection
2,_DescriptorSetToDescriptor
3,crawlDocumentHasBlob
4,crawlRunHasDocument


Tables in schema 'descriptor' (2 total):


Unnamed: 0,table_name
0,crawl-to-rag
1,wf_embeddings_clip_text


## Find some entities

Two ways to find entities. We can use the `system."Entity"` table to look at all entities, or a `entity.…` table to look at a specific entity type.

In [38]:
display(run_query("SELECT * FROM system.\"Entity\" LIMIT 5;"))

entity_class = list_tables(conn, 'entity').table_name[0]
print(f"Using entity class: {entity_class}")
display(run_query(f"SELECT * FROM entity.\"{entity_class}\" LIMIT 5;"))

Unnamed: 0,_uniqueid,cache_control,cache_control_max_age,content_type,crawl_time,domain,etag,expires,id,last_modified,...,total_tokens,type,uniqueid,_dimensions,_name,embeddings,embeddings_fingerprint,embeddings_model,embeddings_pretrained,embeddings_provider
0,7.0.540,,,,NaT,,,NaT,,NaT,...,,,,,,,,,,
1,8.0.560,,,,NaT,,,NaT,,NaT,...,,,,,,,,,,
2,10.0.16880,,,,NaT,,,NaT,crawl-to-rag,NaT,...,,,,,,,,,,
3,11.0.16920,max-age=600,600.0,text/html; charset=utf-8,2025-08-04 15:05:35.429201+00:00,docs.aperturedata.io,"W/""688abc9f-53b3""",2025-08-04 14:10:44+00:00,c98928c3-4c53-482a-9bba-feddcc2ebc58,2025-07-31 00:45:19+00:00,...,,,,,,,,,,
4,11.1.16940,max-age=600,600.0,text/html; charset=utf-8,2025-08-04 15:05:37.209353+00:00,docs.aperturedata.io,"W/""688abc9f-87f7""",2025-08-04 15:15:35+00:00,7ed07ef2-a2e5-417e-ba86-968e635fa8c2,2025-07-31 00:45:19+00:00,...,,,,,,,,,,


Using entity class: CrawlDocument


Unnamed: 0,cache_control,cache_control_max_age,content_type,crawl_time,domain,etag,expires,id,last_modified,run_id,simple_content_type,spec_id,url,_uniqueid
0,max-age=600,600.0,text/html; charset=utf-8,2025-08-04 15:05:35.429201+00:00,docs.aperturedata.io,"W/""688abc9f-53b3""",2025-08-04 14:10:44+00:00,c98928c3-4c53-482a-9bba-feddcc2ebc58,2025-07-31 00:45:19+00:00,93f1c421-c58e-4409-8d0f-136c968326af,text/html,crawl-to-rag,https://docs.aperturedata.io/,11.0.16920
1,max-age=600,600.0,text/html; charset=utf-8,2025-08-04 15:05:37.209353+00:00,docs.aperturedata.io,"W/""688abc9f-87f7""",2025-08-04 15:15:35+00:00,7ed07ef2-a2e5-417e-ba86-968e635fa8c2,2025-07-31 00:45:19+00:00,93f1c421-c58e-4409-8d0f-136c968326af,text/html,crawl-to-rag,https://docs.aperturedata.io/administration/faq,11.1.16940
2,max-age=600,600.0,text/html; charset=utf-8,2025-08-04 15:05:37.637315+00:00,docs.aperturedata.io,"W/""688abc9f-16a0c""",2025-08-04 15:15:35+00:00,515b4e79-5eea-4247-bccf-451e823e5455,2025-07-31 00:45:19+00:00,93f1c421-c58e-4409-8d0f-136c968326af,text/html,crawl-to-rag,https://docs.aperturedata.io/Setup/QuickStart,11.2.16960
3,max-age=600,600.0,text/html; charset=utf-8,2025-08-04 15:05:37.663759+00:00,docs.aperturedata.io,"W/""688abc9f-5b5b""",2025-08-04 15:15:35+00:00,f3b2a374-16ae-410c-8e47-e9f275ad14ba,2025-07-31 00:45:19+00:00,93f1c421-c58e-4409-8d0f-136c968326af,text/html,crawl-to-rag,https://docs.aperturedata.io/category/command-...,11.3.16980
4,max-age=600,600.0,text/html; charset=utf-8,2025-08-04 15:05:37.664009+00:00,docs.aperturedata.io,"W/""688abc9f-561b""",2025-08-04 15:15:35+00:00,fe4de2a6-303d-4042-8000-38ac54d85749,2025-07-31 00:45:19+00:00,93f1c421-c58e-4409-8d0f-136c968326af,text/html,crawl-to-rag,https://docs.aperturedata.io/category/database...,11.4.17000


## Find some connections

We can do the same two types of query to find connections.
In the language of relational databases, you might like to think of ApertureDB connections as "join tables" or ["associative entities](https://en.wikipedia.org/wiki/Associative_entity)".

In [39]:
display(run_query("SELECT * FROM system.\"Connection\" LIMIT 5;"))

connection_class = list_tables(conn, 'connection').table_name[0]
print(f"Using connection class: {connection_class}")
display(run_query(f"SELECT * FROM connection.\"{connection_class}\" LIMIT 5;"))

Unnamed: 0,_uniqueid,_src,_dst
0,3.0.561,7.0.540,8.0.560
1,6.0.16921,10.0.16880,11.0.16920
2,6.1.16941,10.0.16880,11.1.16940
3,6.2.16961,10.0.16880,11.2.16960
4,6.3.16981,10.0.16880,11.3.16980


Using connection class: WorkflowCreated


Unnamed: 0,_uniqueid,_src,_dst
0,3.0.561,7.0.540,8.0.560
1,6.0.16921,10.0.16880,11.0.16920
2,6.1.16941,10.0.16880,11.1.16940
3,6.2.16961,10.0.16880,11.2.16960
4,6.3.16981,10.0.16880,11.3.16980


## Find some images

For objects that have an associated blob, such as images, we can fetch those blobs as part of the SQL query.
The blobs end up in a special field, here `_image` of type `BYTEA`.

It can be expensive to fetch blob data, and so ApertureDB never does so by default, only when [the `blobs` parameter](https://docs.aperturedata.io/query_language/Reference/shared_command_parameters/blobs) is set.
To ensure that blobs are not returned casually when the user asks for `SELECT *`, a special column `_blobs` must be set to ask for blobs.

Another special column here is `_operations`. In combination with SQL functions we have defined, we can generate a pipeline of [operations](https://docs.aperturedata.io/query_language/Reference/shared_command_parameters/operations) that mutate the blob.

In [40]:
image_results = run_query("""
SELECT * FROM system."Image" 
WHERE _blobs
AND _operations = OPERATIONS(
	THRESHOLD(128), 
	CROP(x:=10, y:=10, width:=100, height:=100),
	FLIP(+1),
	ROTATE(angle:= 90),
	RESIZE(width:=50))
LIMIT 5
""")
display(image_results)

UndefinedTable: relation "system.Image" does not exist
LINE 2: SELECT * FROM system."Image" 
                      ^

## Run a tool: Count images

How many images are there in the database? We can see the `matched` field here.

In [None]:
print(f"Looking for class {entity_class}")
async with client:
    description = await client.call_tool('describe_entity_class',
                                         dict(class_name=entity_class))
    print(json.dumps(json.loads(
        description.content[0].text), indent=2, ensure_ascii=False))