# Getting Started with Vantage: Documents Upload

Welcome to the Documents Upload part of our [Getting Started with Vantage](https://github.com/VantageDiscovery/vantage-tutorials/tree/main/examples/sdk/python/notebooks/getting_started) series.

An important part of the Vantage platform is the data ingestion process, where collections are populated with documents. To ingest your data, you can use our Console UI to upload it in [Parquet format](https://docs.vantagediscovery.com/docs/vantage-parquet-format), or you can utilize our SDK and upload it in [JSONL format](https://docs.vantagediscovery.com/docs/vantage-jsonl-format), which is described in the following cells.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/VantageDiscovery/vantage-tutorials/blob/main/examples/sdk/python/notebooks/getting_started/management_api/documents_upload.ipynb)

### ✅ Installation

The first step involves installing the [Vantage](https://pypi.org/project/vantage-sdk/) package.

In [None]:
! pip install vantage-sdk -qU

As usual, let's import the necessary libraries.

In this example we will need just the `os` library to load our environment variables:

In [None]:
import os

### ✅ Initialization

In this example, we will authenticate using a Vantage API Key.
For additional details on initializing the Vantage client, refer to the [notebook](../initializing_the_client.ipynb) that covers this topic first.

Please update the following two cells with the appropriate values.

In [None]:
ACCOUNT_ID = "YOUR_ACCOUNT_ID"
API_HOST = "https://api.dev-a.dev.vantagediscovery.com"

In [None]:
%env VANTAGE_API_KEY=VANTAGE_API_KEY

In [None]:
from vantage_sdk import VantageClient

vantage_instance = VantageClient.using_vantage_api_key(
    vantage_api_key=os.environ["VANTAGE_API_KEY"],
    account_id=ACCOUNT_ID,
    api_host=API_HOST,
)

### ✅ Documents Upload

In order to successfully upload your documents, they need to follow the [Vantage Ingestion format](https://docs.vantagediscovery.com/docs/vantage-ingest-format), you can read more on that on our documentation. 

In this example we are going to upload JSONL documents to our UPE (User-provided embeddings) collection, which we are going to create in the following cell.

In [None]:
EMBEDDINGS_DIMENSION = 3
COLLECTION_ID = "documents-upload-notebook"

collection = vantage_instance.create_collection(
    collection_id=COLLECTION_ID,
    embeddings_dimension=EMBEDDINGS_DIMENSION,
    user_provided_embeddings=True,
)

#### Upload Documents from JSONL

- As mentioned, documents must adhere to a specific format. In the following cell, we provide an example. We will create three documents with embeddings of size `EMBEDDINGS_DIMENSION` to match our previously created collection.

In [None]:
documents_list = [
    {"id": "1", "text": "Example text", "meta_color": "green", "embeddings": [1,2,3]},
    {"id": "2", "text": "Sample text", "meta_color": "blue", "embeddings": [4,5,6]},
]

In [None]:
import json

DOCUMENTS_JSONL = "\n".join(map(json.dumps, [doc for doc in documents_list]))

In [7]:
vantage_instance.upload_documents_from_jsonl(
    collection_id=COLLECTION_ID,
    documents=DOCUMENTS_JSONL,
)

#### Upload Documents from path

- Another scenario involves uploading your data using a file. In this case, you must ensure your data is properly formatted in the file before uploading it. We will demonstrate this by writing our `documents_list` to a JSONL file and using the Python SDK function to upload it.

In [None]:
import json

DOCUMENTS_FILE_PATH = "vantage_documents.jsonl"

with open(DOCUMENTS_FILE_PATH, "w") as documents_file:
    for doc in documents_list:
        json.dump(doc, documents_file)
        documents_file.write('\n')

In [8]:
vantage_instance.upload_documents_from_path(
    collection_id=COLLECTION_ID,
    file_path=DOCUMENTS_FILE_PATH,
)

After running the cells above, you can check your collection, identified by `COLLECTION_ID`, on the Console UI to make sure it's in the `Indexing` status. After few minutes, you will be able to query it and try our search capabilities. For that you can check our [Search API notebooks](https://github.com/VantageDiscovery/vantage-tutorials/tree/main/examples/sdk/python/notebooks/getting_started/search_api), or visit our [documentation](https://docs.vantagediscovery.com/docs/search) for more information.

## 📌 Next Steps

You are now familiar with the Document Upload endpoints! 

You can take a look at other notebooks from our [Getting Started with Vantage](https://github.com/VantageDiscovery/vantage-tutorials/tree/main/examples/sdk/python/notebooks/getting_started) series or continue using Vantage on your own.

If you need some ideas, check our [Tutorials](https://docs.vantagediscovery.com/docs/tutorials), where you can find inspiration and best practices for using Vantage.

Happy discovering! 🔎

