### Astra Vector Store

Astra Vector store is a serverless AI-ready database built on Apache Cassandra® and made conveniently available through an easy-to-use JSON API.

Ref: https://python.langchain.com/docs/integrations/vectorstores/astradb/

In [31]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [33]:
import getpass

ASTRA_DB_API_ENDPOINT = os.getenv("ASTRA_DB_API_ENDPOINT")
ASTRA_DB_APPLICATION_TOKEN = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
ASTRA_DB_KEYSPACE = "samplekeyspace"

In [10]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")
embeddings_google = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
len(embeddings_google.embed_query("Hello AI"))

768

In [26]:
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders.text import TextLoader

loader = TextLoader("../../../data/sample.txt")
text_document = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=20)
texts = text_splitter.split_documents(text_document)
print(texts[0].page_content)
texts

Created a chunk of size 167, which is longer than the specified 100


Langchain is specially used for LLM powered applications.


[Document(metadata={'source': '../../../data/sample.txt'}, page_content='Langchain is specially used for LLM powered applications.'),
 Document(metadata={'source': '../../../data/sample.txt'}, page_content='Langchain  -> Chaining  -> Focus on Sequential Execution of process\n* In Langchain, everything happens in sequential order So we call it as Directed Acyclic Graph(DAG)'),
 Document(metadata={'source': '../../../data/sample.txt'}, page_content='LangGraph: We can create stateful Multi AI Agents Applications where Agents will be communicating with themselves to solve complex workflow.\n* They follow the graph structure and it is also going to maintain information which will be shared within the Nodes that is why it is called stategraph.')]

In [34]:
from langchain_astradb import AstraDBVectorStore

vector_store_explicit_embeddings = AstraDBVectorStore(
    collection_name="astra_vector_langchain",
    embedding=embeddings_google,
    api_endpoint=ASTRA_DB_API_ENDPOINT,               
    token=ASTRA_DB_APPLICATION_TOKEN,
    namespace=ASTRA_DB_KEYSPACE,
)

In [27]:
vector_store_explicit_embeddings.add_documents(documents=texts)

['76873a60af474f12a0f053d75849ad35',
 '9171fab0e7f5417c91c091c922b65313',
 'c58b9afb63d043c9996e41aa0fe83d85']

In [28]:
esults = vector_store_explicit_embeddings.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=3)

In [29]:
esults

[Document(id='76873a60af474f12a0f053d75849ad35', metadata={'source': '../../../data/sample.txt'}, page_content='Langchain is specially used for LLM powered applications.'),
 Document(id='c58b9afb63d043c9996e41aa0fe83d85', metadata={'source': '../../../data/sample.txt'}, page_content='LangGraph: We can create stateful Multi AI Agents Applications where Agents will be communicating with themselves to solve complex workflow.\n* They follow the graph structure and it is also going to maintain information which will be shared within the Nodes that is why it is called stategraph.'),
 Document(id='9171fab0e7f5417c91c091c922b65313', metadata={'source': '../../../data/sample.txt'}, page_content='Langchain  -> Chaining  -> Focus on Sequential Execution of process\n* In Langchain, everything happens in sequential order So we call it as Directed Acyclic Graph(DAG)')]

In [30]:
retriever = vector_store_explicit_embeddings.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 1, "score_threshold": 0.5},
)
# retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})
retriever.invoke("LangChain provides abstractions to make working with LLMs easy")

[Document(id='76873a60af474f12a0f053d75849ad35', metadata={'source': '../../../data/sample.txt'}, page_content='Langchain is specially used for LLM powered applications.')]