# Getting Started with Vantage: Collection Management

Welcome to the Collection Management part of our [Getting Started with Vantage](https://github.com/VantageDiscovery/vantage-tutorials/tree/main/examples/sdk/python/notebooks/getting_started) series.

A Collection is the fundamental object in the Vantage platform that enables you to organize, manage, and search your data sets within the platform.

Your data records, called documents, are ingested into a collection. Your search queries run against a collection. We currently support text data in collections, but we will soon support other types of data as well.

When creating a collection, you give it an ID, a name, and specify some parameters for the AI model that will be used to embed your collection data.

You can create many collections in your account to separate your different data sets that you want to search against.

Let's see how we can leverage these functionalities using our SDK!

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/VantageDiscovery/vantage-tutorials/blob/main/examples/sdk/python/notebooks/getting_started/management_api/collection_management.ipynb)

### ✅ Installation

The first step involves installing the [Vantage](https://pypi.org/project/vantage-sdk/) package.

In [None]:
! pip install vantage-sdk -qU

As usual, let's import the necessary libraries.

In this example we will need just the `os` library to load our environment variables:

In [None]:
import os

### ✅ Initialization

In this example, we will authenticate using a Vantage API Key.
For additional details on initializing the Vantage client, refer to the [notebook](../initializing_the_client.ipynb) that covers this topic first.

Please update the following two cells with the appropriate values.

In [None]:
ACCOUNT_ID = "YOUR_ACCOUNT_ID"
API_HOST = "https://api.dev-a.dev.vantagediscovery.com"

In [None]:
%env VANTAGE_API_KEY=VANTAGE_API_KEY

In [None]:
from vantage_sdk import VantageClient

vantage_instance = VantageClient.using_vantage_api_key(
    vantage_api_key=os.environ["VANTAGE_API_KEY"],
    account_id=ACCOUNT_ID,
    api_host=API_HOST,
)

### ✅ Collection Management

The Collection Management API enables the listing of your collections, creation of new ones, updating existing collections, and deleting them.

In the following cells, you will find more details on each of these functionalities.

#### Get All Collections

- Easily access all your collections by calling `list_collections` method.

In [7]:
collections = vantage_instance.list_collections()
collections

[]

#### Create Collection

In Vantage, there are two types of collections you can create:
- Vantage Managed Embeddings (VME), where embeddings are managed by Vantage, and
- User-Provided Embeddings (UPE), where the user supplies the embeddings.

For both types, it's necessary to supply parameters such as `collection_id`, `collection_name`, and `embedding_dimension`. Further details are provided below.

- Collection ID
     - A `collection_id` is used when using our API or Console to upload and search your data. The collection ID tells the Vantage platform which one of your collections you want to search. The ID must be unique within your account. There are a few rules when naming a collection ID:

          - Characters: the colleciton ID can only contain lowercase letters [a-z], digits [0-9] and a hypen [-]
          - Length: the maximum length for a colleciton ID is 36 characters
          - Immutable: the collection ID can not be changed after the collection is created

- Collection Name
     - A `collection_name` is your easy and descriptive way to identify your different collections in the Console. There are a few rules when naming a collection:

          - Length: the maximum length for a colleciton ID is 256 characters
          - Mutable: the collection name can be renamed after the collection is created

- Embedding Dimension
     - The dimension of the embeddings used during the search. It must align with the embedding dimension specified for the chosen `llm` in the context of VME collections. In the case of UPE collections, it should match the dimensions of the embeddings provided by the user.

**Vantage Managed Embeddings (VME)**

- By far the most common case is to have the Vantage platform manage the translation of your data to AI embeddings. This means that during ingestion and search, the Vantage platform will automate the translation of your data and search queries into embeddings to support semantic search. We call this Vantage Managed Embeddings (VME).
- To use this option, you need to set `user_provided_embeddings` field to `False`, which is also a default option.

- For VME collections you will be required to enter:

     - LLM model (`llm`): Select or enter the name of the model that you'll use from your LLM provider.
     - LLM API key (`external_key_id`): Your LLM provider API key. The Vantage platform securely stores and uses this key on your behalf to embed your data and your search queries.

First, let's get the `external_key_id`. For more details, check the [External API Keys notebook](https://github.com/VantageDiscovery/vantage-tutorials/blob/main/examples/sdk/python/notebooks/getting_started/management_api/external_api_keys.ipynb).

In [8]:
external_api_keys = vantage_instance.get_external_api_keys()
external_api_keys

[ExternalAPIKey(external_key_id='8bc8369f-14b5-47c7-9aa2-88150b42e6b5', account_id='jelena1', external_key_created_date='2024-02-13T11:18:54', url=None, llm_provider='OpenAI', llm_secret='sk-YE**********************************************')]

Now, let's use it in our `create_collection` method.

In [9]:
vme_collection_id = "vantage-managed"

vme_collection = vantage_instance.create_collection(
    collection_id = vme_collection_id,
    collection_name = "New Collection",
    embeddings_dimension= 1536,
    user_provided_embeddings = False,
    llm = "text-embedding-ada-002",
    external_key_id= "8bc8369f-14b5-47c7-9aa2-88150b42e6b5"
)

vme_collection

Collection(collection_created_time='2024-02-13T11:28:31', collection_status='Pending', collection_state='Active', collection_id='vantage-managed', user_provided_embeddings=False, llm='text-embedding-ada-002', embeddings_dimension=1536, external_key_id=None, collection_name='New Collection', collection_preview_url_pattern=None)

**User Provided Embeddings (UPE)**

- A less common, but supported, option is for you to upload embeddings from the LLM of your choice into your collection. We call this User Provided Embeddings (UPE). When creating a collection with UPE, no additional LLM configuration is necessary.
- To use this option, you need to set `user_provided_embeddings` field to `True`.

- In this mode, instead of uploading text data, you embed your data yourself (could be text, image, etc) and send the embedding to the Vantage platform. You must also provide the embedding for every search query sent to the Vantage platform. The platform supports embedding dimension sizes up to 2048. If higher dimensions are needed, please contact support.

> ❗ *The semantic search by text query endpoint will not be avaiable for UPE collections.*

In [10]:
upe_collection_id = "user-provided"

upe_collection = vantage_instance.create_collection(
    collection_id = upe_collection_id,
    collection_name = "New Collection",
    embeddings_dimension= 1536,
    user_provided_embeddings = True
)

upe_collection

Collection(collection_created_time='2024-02-13T11:28:32', collection_status='Pending', collection_state='Active', collection_id='user-provided', user_provided_embeddings=True, llm=None, embeddings_dimension=1536, external_key_id=None, collection_name='New Collection', collection_preview_url_pattern=None)

#### Get One Collection

- Easily access your collection by providing its `collection_id`.

In [11]:
collection = vantage_instance.get_collection(
    collection_id = vme_collection_id
)

collection

Collection(collection_created_time='2024-02-13T11:28:31', collection_status='Pending', collection_state='Active', collection_id='vantage-managed', user_provided_embeddings=False, llm='text-embedding-ada-002', embeddings_dimension=1536, external_key_id=None, collection_name='New Collection', collection_preview_url_pattern=None)

#### Update Collection

- Easily update your collection by providing its `collection_id` along with the specific fields you wish to update.
  - Currently, it is possible to change the `collection_name`.

In [12]:
updated_collection = vantage_instance.update_collection(
    collection_id = vme_collection_id,
    collection_name = "Updated Collection Name",
)

updated_collection

Collection(collection_created_time='2024-02-13T11:28:31', collection_status='Pending', collection_state='Active', collection_id='vantage-managed', user_provided_embeddings=False, llm='text-embedding-ada-002', embeddings_dimension=1536, external_key_id=None, collection_name='Updated Collection Name', collection_preview_url_pattern=None)

#### Delete Collection

- Easily delete your collection by providing its `collection_id`.

In [14]:
vantage_instance.delete_collection(
    collection_id = vme_collection_id
)

## 📌 Next Steps

You are now familiar with the Collection Management endpoints! 

You can take a look at other notebooks from our [Getting Started with Vantage](https://github.com/VantageDiscovery/vantage-tutorials/tree/main/examples/sdk/python/notebooks/getting_started) series or continue using Vantage on your own.

If you need some ideas, check our [Tutorials](https://docs.vantagediscovery.com/docs/tutorials), where you can find inspiration and best practices for using Vantage.

Happy discovering! 🔎
