# Getting Started with Vantage: Semantic Search

Welcome to the Semantic Search part of our [Getting Started with Vantage](https://github.com/VantageDiscovery/vantage-tutorials/tree/main/examples/sdk/python/notebooks/getting_started) series.

This notebook will demonstrate the semantic search capabilities provided by the Vantage SDK and guide you on how to use them effectively.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/VantageDiscovery/vantage-tutorials/blob/main/examples/sdk/python/notebooks/getting_started/search_api/semantic_search.ipynb)

### ✅ Installation

The first step involves installing the [Vantage](https://pypi.org/project/vantage-sdk/) package.

In [None]:
! pip install vantage-sdk -qU

As usual, let's import the necessary libraries.

In this example we will need just the `os` library to load our environment variables:

In [1]:
import os

### ✅ Initialization

In this example, we will authenticate using a Vantage API Key.
For additional details on initializing the Vantage client, refer to the [notebook](../initializing_the_client.ipynb) that covers this topic first.

Please update the following two cells with the appropriate values.

In [2]:
ACCOUNT_ID = "YOUR_ACCOUNT_ID"
API_HOST = "https://api.dev-a.dev.vantagediscovery.com"

In [None]:
%env VANTAGE_API_KEY=VANTAGE_API_KEY

In [4]:
from vantage_sdk import VantageClient

vantage_instance = VantageClient.using_vantage_api_key(
    vantage_api_key=os.environ["VANTAGE_API_KEY"],
    account_id=ACCOUNT_ID,
    api_host=API_HOST,
)

## ✅ Semantic Search

To perform our Semantic Search, we will first create a sample collection and upload some sample data to it, which we will then search over later.

To perform semantic search, we need to create a Vantage-managed embeddings collection, as User-provided collections do not support it.

In [18]:
COLLECTION_ID = "semantic-search-vme-collection"
EMBEDDINGS_DIMENSION = 1536

collection = vantage_instance.create_collection(
    collection_id=COLLECTION_ID,
    embeddings_dimension=EMBEDDINGS_DIMENSION,
    user_provided_embeddings=False,
    llm="text-embedding-ada-002",
    external_key_id="d60b903e-4e56-464a-a997-6a3da1fac4f8"
)

In [19]:
sample_documents = [
    {"id": "first_doc", "text": "Water boils at 100 degrees Celsius under standard atmospheric conditions, turning from a liquid to a gas."},
    {"id": "second_doc", "text": "Eating a diet rich in fruits and vegetables is linked to a reduced risk of many lifestyle-related health conditions."},
    {"id": "third_doc", "text": "Polar bears primarily live in the Arctic Circle, surrounded by sea ice from which they hunt seals."},
    {"id": "fourth_doc", "text": "The Great Wall of China, built between the 5th century BC and the 16th century, is over 13,000 miles long."},
]

In [20]:
import json

DOCUMENTS_JSONL = "\n".join(map(json.dumps, [doc for doc in sample_documents]))

In [21]:
vantage_instance.upload_documents_from_jsonl(
    collection_id=COLLECTION_ID,
    documents=DOCUMENTS_JSONL,
)

We are now able to perform searches over our collection once it has been created and populated with data. We are creating `QUERY_TEXT`, which should match our fourth document, since we are asking a question and the answer is present in that document.

In [23]:
QUERY_TEXT = "How long is the Great Wall of China?"

response = vantage_instance.semantic_search(
    collection_id=COLLECTION_ID,
    text=QUERY_TEXT,
)

In [24]:
for res in response.results:
    print(res)

id='fourth_doc' score=0.955161452293396
id='first_doc' score=0.8515287041664124
id='third_doc' score=0.8513298034667969


We can see from returned results that the fourth document - describing the Great Wall of China - is the most similar to our question, since it has the highest score returned, which is correct! 

Semantic search becomes much more powerful when you have more data and more context and you can use it in different domains such as e-commerce, customer support, legal and healthcare document retrieval, recruitment and more!

## 📌 Next Steps

You are now familiar with the Semantic Search with Vantage! 

You can take a look at other notebooks from our [Getting Started with Vantage](https://github.com/VantageDiscovery/vantage-tutorials/tree/main/examples/sdk/python/notebooks/getting_started) series or continue using Vantage on your own.

If you need some ideas, check our [Tutorials](https://docs.vantagediscovery.com/docs/tutorials), where you can find inspiration and best practices for using Vantage.

Happy discovering! 🔎
