Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update snowflake-example.md #1588

Merged
merged 2 commits into from
Dec 22, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
50 changes: 14 additions & 36 deletions docs/hr/content/use_cases/sql_examples/snowflake-example.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,14 @@
# Snowflake vector-search

## How to implement vector-search using `superduperdb` on Snowflake

In this use-case we describe how to implement vector-search using `superduperdb` on Snowflake.

### Configure `superduperdb` to work with Snowflake
## Connect to Snowflake

The first step in doing this is
to connect to your snowflake account. When you log in it should look something like this:
The first step in doing this is to connect to your snowflake account. When you log in, it should look something like this:

![](/img/snowflake-login.png)

The important thing you need to get from this login is the **organization-id** and **user-id** from the menu in the bottom right (annotated on the image). You will set these values in the cell below.

The important thing to get from this login page is the **< organization-id >** and **< user-id >** from the menu in the bottom right (annotated on the image). You will set these values in the cell below.


```python
Expand All @@ -22,16 +18,20 @@ import os
os.environ['SUPERDUPERDB_BYTES_ENCODING'] = 'Str'

from superduperdb import superduper, CFG
from superduperdb.backends.ibis.query import RawSQL

user = "<USERNAME>"
password = "<PASSWORD>"
account = "WSWZPKW-LN66790" # ORGANIZATIONID-USERID
account = "WSWZPKW-LN66790" # <ORGANIZATIONID>-<USERID>
database= "FREE_COMPANY_DATASET/PUBLIC"

def make_uri(database):
return f"snowflake://{user}:{password}@{account}/{database}"
db = superduper(
f"snowflake://{user}:{password}@{account}/{database}"
metadata_store='sqlite:///sqlite.db'
)
```

## Set up sample data to test vector-search
## Load Dataset

We're going to use some of the Snowflake sample data in this example, namely the `FREE_COMPANY_DATASET`. You
can find the `FREE_COMPANY_DATASET` on [this link](https://app.snowflake.com/marketplace/listing/GZSTZRRVYL2/people-data-labs-free-company-dataset).
Expand All @@ -40,37 +40,18 @@ Since the database where this data is hosted is read-only, we copy a sample of t


```python
from superduperdb.backends.ibis.query import RawSQL

db = superduper(
make_uri("FREE_COMPANY_DATASET/PUBLIC"),
metadata_store='sqlite:///.testdb.db'
)

sample = db.execute(RawSQL('SELECT * FROM FREECOMPANYDATASET SAMPLE (5000 ROWS);')).as_pandas()
```

### Connect to your dedicated vector-search database
### Create a table for vector-search index

We use the connection we created to get the snapshot, to also create the databset we are going to work with:
We use the connection we created to get the snapshot, to also create the dataset we are going to work with:


```python
db.databackend.conn.create_database('SUPERDUPERDB_EXAMPLE', force=True)
```

Now we are ready to connect to this database with `superduperdb`:


```python
from superduperdb.backends.ibis.query import RawSQL

db = superduper(
make_uri("SUPERDUPERDB_EXAMPLE/PUBLIC"),
metadata_store='sqlite:///.testdb.db'
)
```

Since `superduperdb` implements extra features on top of your classical database/ datalake, it's necessary
to add the tables you wish to work with to the system. You'll notice we are creating a schema as well; that allows
us to implement "interesting" data-types on top of Snowflake, such as images or audio.
Expand Down Expand Up @@ -180,10 +161,7 @@ db.add(

This step will take a few moments (unless you have a GPU to hand).

:::important
**Once this step is finished you can
search Snowflake with vector-search!**
:::
> **important** Once this step is finished you can search Snowflake with vector-search!

### Execute a vector-search query with `.like`

Expand Down