# Integrate LangChain with Astra DB Serverless

For more information, visit the DataStax [Astra DB docs page](https://docs.datastax.com/en/astra-db-serverless/integrations/langchain.html).

## Prerequisites

- An active Astra account.
- An active [Serverless (Vector) database](https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html#create-a-database-and-store-your-credentials).
- An Open AI account and an [OpenAI API key](https://platform.openai.com/).

_This guide uses OpenAI to generate embeddings. You can get embeddings directly from OpenAI, or you can use Astra DB’s built-in OpenAI embedding provider integration (also known as a "vectorize integration")._

_If you want to use the built-in OpenAI integration, you must [configure the OpenAI embedding provider integration](https://docs.datastax.com/en/astra-db-serverless/integrations/embedding-providers/openai.html) before you begin. In the integration settings, note the **API key name**, and make sure that your database is in the key’s scope._

- The following Python dependencies:

In [None]:
!pip install --quiet \
    "langchain>=0.3,<0.4" \
    "langchain-astradb>=0.6,<0.7" \
    "langchain-openai>=0.3,<0.4"

## Connect to the Serverless (Vector) database

### Import dependencies

In [None]:
import os
import requests
from getpass import getpass

from astrapy.info import VectorServiceOptions
from langchain_astradb import AstraDBVectorStore

from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

### Set secrets and connection parameters

Get an application token and Data API endpoint for your database:

- In the [Astra Portal](https://astra.datastax.com/) navigation menu, click Databases, and then click the name of your Serverless (Vector) database.
- On the Overview tab, find the Database Details section.
- In API Endpoint, click Copy to get your database’s Data API endpoint in the form of `https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com`.
- Click Generate Token to create an [application token](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html) scoped to your database.

#### Astra DB parameters

In [None]:
os.environ["ASTRA_DB_API_ENDPOINT"] = input("ASTRA_DB_API_ENDPOINT =")
os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass("ASTRA_DB_APPLICATION_TOKEN =")

if _keyspace := input("ASTRA_DB_KEYSPACE (optional) ="):
    os.environ["ASTRA_DB_KEYSPACE"] = _keyspace

os.environ["ASTRA_DB_API_KEY_NAME"] = input("ASTRA_DB_API_KEY_NAME (required for 'vectorize') =")

#### OpenAI parameter (Optional)

In [None]:
os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY (required for explicit embeddings) =")

##### _Additional step for Azure OpenAI_

If you use Microsoft Azure OpenAI, uncomment the following cell and edit as needed to set additional environment variables:

_(remember the `OPENAI_API_KEY` provided earlier must be appropriate to Azure.)_

In [None]:
# os.environ["OPENAI_API_TYPE"] = "azure"
# os.environ["OPENAI_API_VERSION"] = "2023-05-15"
# os.environ["OPENAI_API_BASE"] = input("OPENAI_API_BASE (e.g. 'https://RESOURCE_NAME.openai.azure.com' =")

### Load environment variables

In [None]:
ASTRA_DB_APPLICATION_TOKEN = os.environ["ASTRA_DB_APPLICATION_TOKEN"]
ASTRA_DB_API_ENDPOINT = os.environ["ASTRA_DB_API_ENDPOINT"]
ASTRA_DB_KEYSPACE = os.environ.get("ASTRA_DB_KEYSPACE")
ASTRA_DB_API_KEY_NAME = os.environ.get("ASTRA_DB_API_KEY_NAME") or None

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") or None

## Create embeddings from text

### Create a vector store

> **Choose** between [server-side embedding computation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html) ("vectorize") or explicit embeddings by editing the following as desired. Then run the cell. _(If using "vectorize", you must have configured an embedding provider in your Astra DB. Conversely, if opting for explicit embeddings, the OpenAI API Key must have been set in the notebook.)_

In [None]:
# Edit if necessary, then run the cell

USE_VECTORIZE = True  # server-side embeddings
# USE_VECTORIZE = False  # explicit embeddings

Depending on the choice of embedding computation, the parameters are slightly different.

When creating the LangChain vector store, you specify the database and a collection name. The collection is created automatically if it does not exist.

In [None]:
if USE_VECTORIZE:
    vectorize_options = VectorServiceOptions(
        provider="openai",  # Change these if using another embedding provider/model
        model_name="text-embedding-3-small",
        authentication={"providerKey": ASTRA_DB_API_KEY_NAME},
    )
    vector_store = AstraDBVectorStore(
        collection_name="langchain_integration_demo_vectorize",
        token=ASTRA_DB_APPLICATION_TOKEN,
        api_endpoint=ASTRA_DB_API_ENDPOINT,
        namespace=ASTRA_DB_KEYSPACE,
        collection_vector_service_options=vectorize_options,
    )

if not USE_VECTORIZE:
    embedding = OpenAIEmbeddings()
    vector_store = AstraDBVectorStore(
        collection_name="langchain_integration_demo",
        embedding=embedding,
        token=ASTRA_DB_APPLICATION_TOKEN,
        api_endpoint=ASTRA_DB_API_ENDPOINT,
        namespace=ASTRA_DB_KEYSPACE,
    )


## If you already have a populated vector collection, try this instead
## (and then skip the load+process+insert phases if you are so inclined):

# vector_store = AstraDBVectorStore(
#     collection_name="INSERT_YOUR_COLLECTION_NAME",
#     embedding=EMBEDDING,  # omit for vectorize; else, must be the same used for the data on DB
#     token=ASTRA_DB_APPLICATION_TOKEN,
#     api_endpoint=ASTRA_DB_API_ENDPOINT,
#     namespace=ASTRA_DB_KEYSPACE,
#     autodetect_collection=True,
# )

### Load data

Load a small dataset of philosophical quotes from this repository.

In [None]:
philo_dataset = requests.get(
    "https://raw.githubusercontent.com/"
    "datastaxdevs/mini-demo-astradb-langchain/"
    "refs/heads/main/data/philosopher-quotes.json"
).json()

print("An example entry:")
print(philo_dataset[16])

### Process dataset

Transform the dataset into ready-to-insert LangChain `Document` objects.

In [None]:
documents_to_insert = []

for entry_idx, entry in enumerate(philo_dataset):
    metadata = {
        "author": entry["author"],
        **entry["metadata"],
    }
    # Construct the Document, with the quote and metadata tags
    new_document = Document(
        id=entry["_id"],
        page_content=entry["quote"],
        metadata=metadata,
    )
    documents_to_insert.append(new_document)

print(f"Ready to insert {len(documents_to_insert)} documents.")
print(f"Example document: {documents_to_insert[16]}")

### Insert documents

This step will compute vector embedding and save all entries in the vector store.

In [None]:
inserted_ids = vector_store.add_documents(documents_to_insert)

print(f"\nInserted {len(inserted_ids)} documents: {', '.join(inserted_ids[:3])} ...")

## Verify the integration

Find quotes semantically similar to a given input query.

In [None]:
results = vector_store.similarity_search("Our life is what we make of it", k=3)

for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

## Further usage patterns

### Use `add_texts`

You can store documents through `add_texts` and supply three parallel lists for the texts, the metadata and the IDs.

In [None]:
texts = [
    "I think, therefore I am.",
    "To the things themselves!",
]
metadatas = [
    {"author": "descartes", "knowledge": "y"},
    {"author": "husserl", "knowledge": "y"},
]
ids = [
    "desc_999",
    "huss_888",
]
inserted_ids_2 = vector_store.add_texts(texts=texts, metadatas=metadatas, ids=ids)
print(f"\nInserted {len(inserted_ids_2)} documents.")

### Return similarity scores from a search

In [None]:
results = vector_store.similarity_search_with_score("Our life is what we make of it", k=3)
for res, score in results:
    print(f"* [{score:.3f}] {res.page_content} [{res.metadata}]")

### Similarity search with metadata filtering

In [None]:
results = vector_store.similarity_search(
    "Our life is what we make of it",
    k=3,
    filter={"author": "aristotle"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

### MMR (maximal marginal relevance) similarity search

In [None]:
results = vector_store.max_marginal_relevance_search(
    "Our life is what we make of it",
    k=3,
    filter={"author": "aristotle"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

### Delete documents from the store

#### Delete by document ID

In [None]:
delete_1 = vector_store.delete(inserted_ids[:3])
print(f"delete result = {delete_1}")

In [None]:
delete_2 = vector_store.delete(inserted_ids[2:5])
print(f"delete result = {delete_2}")

#### Retrieve and then delete

Sometimes you do not have the IDs, ... but you might want to run a search and then delete the results:

In [None]:
ids_to_delete = []
for res_doc, res_score, res_id in vector_store.similarity_search_with_score_id(
    "Philosophy has no goals",
    k=2,
):
    print(f"* [SIM={res_score:.3f}] {res_doc.page_content} [{res_doc.metadata}]")
    ids_to_delete.append(res_id)

print(f"\nDeleting IDs = {ids_to_delete} ...")
success = vector_store.delete(ids_to_delete)
print(f"Deletion succeeded = {success}")

Now try again the same search:

In [None]:
for res_doc, res_score, res_id in vector_store.similarity_search_with_score_id(
    "Philosophy has no goals",
    k=2,
):
    print(f"* [SIM={res_score:.3f}] {res_doc.page_content} [{res_doc.metadata}]")

#### Delete the **whole** stored data

> _Warning: use with caution. Data loss!_

In [None]:
vector_store.clear()

## Cleanup

Completely delete the collection, thereby freeing the associated resources on Astra DB:

> _Warning: use with caution. Data loss!_

In [None]:
vector_store.delete_collection()

## Next steps

- [This quickstart on DataStax documentation](https://docs.datastax.com/en/astra-db-serverless/integrations/langchain.html)
- [`AstraDBVectorStore` in LangChain docs](https://python.langchain.com/docs/integrations/providers/astradb/#vector-store)
- [`AstraDBVectorStore`, API Reference](https://python.langchain.com/api_reference/astradb/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html#langchain_astradb.vectorstores.AstraDBVectorStore)