# Use Cases: Starter Use Case

Welcome to the "Starter Use Case" Notebook, your step-by-step guide to becoming familiar with the basics of the Vantage SDK.

In this notebook, we'll cover the most important functionalities of our SDK, such as creating a collection, preparing the data, uploading the data, and finally querying the collection containing that data.

You'll encounter some intermediate steps, and by the end of this example, you'll be prepared to start your journey with Vantage!

Let's start!

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]()

### ✅ Installation

The first step involves installing the package. Before that, let's make sure we have all necessary dependencies installed as well.

In [1]:
pip install pydantic==2.6.1 urllib3==2.0.7



Execute the command below to install [Vantage](https://test.pypi.org/project/vantage-sdk/):

> ❗ *Currently, we are using Test PyPi, but we are planning transition to the official PyPi index soon*

In [2]:
pip install -i https://test.pypi.org/simple/ vantage-sdk==0.0.5

Looking in indexes: https://test.pypi.org/simple/
Collecting vantage-sdk==0.0.5
  Downloading https://test-files.pythonhosted.org/packages/ee/b7/316a716a9e0a6bf466fbec05d4ddb7fd82b4da29153d20e62518abce7a76/vantage_sdk-0.0.5-py3-none-any.whl (100 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.5/100.5 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: vantage-sdk
Successfully installed vantage-sdk-0.0.5


As usual, let's import the necessary libraries for this example:

In [3]:
import os

### ✅ Initialization

In this example, we will authenticate using a JWT token.
For additional details on initializing the Vantage client, refer to the [notebook](../initializing_the_client.ipynb) that covers this topic first.

Please update the following two cells with the appropriate values.

In [4]:
ACCOUNT_ID = "YOUR_ACCOUNT_ID"
API_HOST = "https://api.dev-a.dev.vantagediscovery.com"

In [None]:
%env VANTAGE_JWT_TOKEN=YOUR_VANTAGE_JWT_TOKEN

In [6]:
from vantage import Vantage

vantage_instance = Vantage.using_jwt_token(
    vantage_api_jwt_token=os.environ["VANTAGE_JWT_TOKEN"],
    account_id=ACCOUNT_ID,
    api_host=API_HOST,
)

### ✅ Creating External API Key

Let's create External API Key. We will need it later to create our Vantage Managed Embeddings (VME) collection.

> For more details on external keys or different collection types, check our [documentation](https://docs.vantagediscovery.com/docs/collections) or notebooks from our [Getting Started with Vantage](https://github.com/VantageDiscovery/vantage-sdk-python/blob/develop/examples/notebooks/getting_started/) series.

We'll use OpenAI as LLM provider for our in this example. Please update the following cell with your LLM secret key.

In [None]:
LLM_SECRET = "YOUR_LLM_SECRET"

external_api_key = vantage_instance.create_external_api_key(
    llm_provider = "OpenAI",
    llm_secret = LLM_SECRET,
    url = None,
)

external_api_key

Let's get the external API key ID, which we will use in the next step.

In [8]:
external_api_key.external_key_id

'2edb92de-d26c-4127-b0f0-0ea693886e1b'

### ✅ Creating Collection

In this example, we're going to set up a Vantage Managed Embeddings (VME) collection, as we mentioned above.

We'll use an External API Key previously created for the OpenAI LLM provider. Our choice will be the `text-embedding-ada-002` model, for which we'll specify the embedding dimension of `1536`.

As standard practice, we will also provide the `collection_id` and `collection_name`.

Update the following cell with your external key id.

In [21]:
COLLECTION_ID = "furniture-collection"
EXTERNAL_KEY_ID = "2edb92de-d26c-4127-b0f0-0ea693886e1b"

Now, let's create our collection.

In [22]:
collection = vantage_instance.create_collection(
    collection_id = COLLECTION_ID,
    collection_name = "Furniture Collection",
    embeddings_dimension = 1536,
    llm = "text-embedding-ada-002",
    external_key_id = EXTERNAL_KEY_ID
)

### ✅ Preparing Data



Next, we will need some data for our new collection. To upload it to Vantage, we need to prepare it in the correct format. For this, we'll use the *pandas* library. Ensure it is installed before moving forward.

In [23]:
pip install pandas==1.5.3



In [24]:
import pandas as pd

[In progress] Downloading the data.

In [25]:
furniture_data = pd.read_parquet("vantage_furniture_tutorial.parquet")
furniture_data.sample()

Unnamed: 0,id,text,meta_category,meta_rating_bucket,meta_numratings_bucket,noop_url,noop_rating,noop_numratings,noop_image_url,noop_description,noop_title
5077,6067e3a06df79045d50077c83878424f,The Euclid / Record Console /Customizable Reco...,Console Tables & Cabinets,5 stars,hundreds,https://www.etsy.com/listing/1231579806/the-eu...,5.0,721.0,https://i.etsystatic.com/25453291/r/il/3830b3/...,The Euclid / Record Console /Customizable Reco...,The Euclid Record Console Customizable


[In progress] Before uploading the data, we will eliminate some columns and keep only those that are important to us.

In [26]:
columns_to_keep = ["id", "text", "meta_category", "meta_rating_bucket", "meta_numratings_bucket"]

furniture_prepared = furniture_data[columns_to_keep]
furniture_prepared.head()

Unnamed: 0,id,text,meta_category,meta_rating_bucket,meta_numratings_bucket
0,c76532c4c9f16dfd0d5f4ff630a18e20,Console table made of old solid wood beams joi...,Console Tables & Cabinets,5 stars,dozens
1,545110c7c31fd107f9092c74d44e2aa1,"Narrow Console Table, 9.8"" Deep Entry Table ♥ ...",Console Tables & Cabinets,5 stars,dozens
2,2c4a4b1d9c0738907cd4a94c3738bff7,glass coffee table Do not settle for less when...,,,
3,9e11e1bc4cc09ae548e870b3c67882d0,"Linen fabric Floor seating sofa,Off white Beig...",Couches & Loveseats,5 stars,hundreds
4,cd20ee1e96cec7b1c4781538bc7ef625,"Coffee Table - South American Walnut, Live Edg...",Coffee & End Tables,5 stars,


Let's convert our parquet furniture data into the JSONL format, which is suitable for our method, and store it in the `documents` variable.

In [27]:
documents = furniture_prepared.to_json(path_or_buf=None, orient='records', lines=True)

### ✅ Uploading Data

Now we are ready and we can easily upload our prepared data using `upload_documents_from_jsonl` method. We just need to specify our `collection_id` and pass our `documents`.

> You can also upload the data from a path, check our [Documents Upload notebook](https://github.com/VantageDiscovery/vantage-sdk-python/blob/develop/examples/notebooks/getting_started/management_api/documents_upload.ipynb) for more details.

In [28]:
vantage_instance.upload_documents_from_jsonl(
    collection_id = COLLECTION_ID,
    documents = documents
)

### ✅ Querying Collection

In this example, we'll show how to use the `semantic_search` feature to query our collection.

We'll enter our query in the `text` field, specify the collection we want to search by providing its `collection_id`, and set up the Vantage API key (`vantage_api_key`).

Vantage API Key can be found on the Vantage Console UI or retrieved programmatically using the SDK through the `get_vantage_api_keys` method. We'll use the second option in this example.

In [None]:
vantage_api_keys = vantage_instance.get_vantage_api_keys()
vantage_api_keys

In [30]:
VANTAGE_API_KEY = "YOUR_VANTAGE_API_KEY"

In [34]:
# result = vantage_instance.semantic_search(
#     text = "glass coffee table",
#     collection_id = COLLECTION_ID,
#     vantage_api_key = VANTAGE_API_KEY
# )

[In progress]