# How to use a graph vector store to retrieve data

Graph vector stores are a special type of vector store that store links between documents as well together with the documents.
The Graph vector stores retrieval methods can be used to retrieve documents based on similarity to a query and the links between the documents.

## Get started

For this guide, the following packages are needed:

In [ ]:
%pip install langchain-community langchain-openai python-dotenv ragstack-ai-knowledge-store keybert

We'll use the Cassandra graph vector store implementation with a DataStax Astra DB database.
For this, you'll need to create a free Astra DB account and get your Database ID and token.

The store also needs an embedding model.
We'll use the OpenAI embedding model so you need to create an OpenAI account and get your OpenAI API key.

In [None]:
import getpass
import os
from dotenv import load_dotenv

load_dotenv()

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter OpenAI API Key: ")

if "ASTRA_DB_DATABASE_ID" not in os.environ:
    os.environ["ASTRA_DB_DATABASE_ID"] = input("Enter Astra DB Database ID: ")

if "ASTRA_DB_APPLICATION_TOKEN" not in os.environ:
    os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass.getpass(
        "Enter Astra DB Application Token: "
)

if "ASTRA_DB_KEYSPACE" not in os.environ:
    keyspace = input("Enter Astra DB Keyspace (Empty for default): ")
    if keyspace:
        os.environ["ASTRA_DB_KEYSPACE"] = keyspace

We chunk the State of the Union text and split it into documents.

In [11]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

raw_documents = TextLoader('state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

Links can be added to documents manually but it's easier to use a link extractor.
Several usual link extractors are available and you can build your own.
For this guide, we'll use the `KeybertLinkExtractor` which uses the KeyBERT model to tag documents with keywords and creates links between documents that have the same keywords.

In [12]:
from langchain_community.graph_vectorstores.extractors import KeybertLinkExtractor
from langchain_core.graph_vectorstores.links import add_links

extractor = KeybertLinkExtractor()

for doc in documents:
  add_links(doc, extractor.extract_one(doc))

documents[:10]

[Document(metadata={'source': 'state_of_the_union.txt', 'links': [Link(kind='kw', direction='bidir', tag='russia'), Link(kind='kw', direction='bidir', tag='putin'), Link(kind='kw', direction='bidir', tag='ukraine'), Link(kind='kw', direction='bidir', tag='ukrainian'), Link(kind='kw', direction='bidir', tag='vladimir')]}, page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscal

We use cassio to configure the connection to the database globally based on the environment variables we set previously.
And we create a CassandraGraphVectorStore from the documents and the OpenAIEmbeddings model.

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.graph_vectorstores import CassandraGraphVectorStore
import cassio

cassio.init(auto=True)
store = CassandraGraphVectorStore.from_documents(
    embedding=OpenAIEmbeddings(),
    documents=documents,
)

## Similarity search

If we don't traverse the graph, a graph vector store behaves like a regular vector store.
So all methods available in a vector store are also available in a graph vector store.
The similarity search method returns documents similar to a query without considering the links between documents.

In [6]:
store.similarity_search("What did the president say about Ketanji Brown Jackson?")

[Document(id='c298b649cb4009c0', metadata={'source': 'state_of_the_union.txt', 'links': [Link(kind='kw', direction='bidir', tag='breyer'), Link(kind='kw', direction='bidir', tag='appeals'), Link(kind='kw', direction='bidir', tag='senate'), Link(kind='kw', direction='bidir', tag='judge'), Link(kind='kw', direction='bidir', tag='honor')]}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated 

## Traversal search

The traversal search method returns documents similar to a query considering the links between documents.
It first does a similarity search and then traverses the graph to find linked documents.

In [7]:
list(store.traversal_search("What did the president say about Ketanji Brown Jackson?"))

[Document(id='c298b649cb4009c0', metadata={'source': 'state_of_the_union.txt', 'links': [Link(kind='kw', direction='bidir', tag='breyer'), Link(kind='kw', direction='bidir', tag='appeals'), Link(kind='kw', direction='bidir', tag='senate'), Link(kind='kw', direction='bidir', tag='judge'), Link(kind='kw', direction='bidir', tag='honor')]}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated 

## Async methods

The graph vector store has async versions of the methods prefixed with `a`.

In [8]:
[doc async for doc in store.atraversal_search("What did the president say about Ketanji Brown Jackson?")]

[Document(id='c298b649cb4009c0', metadata={'source': 'state_of_the_union.txt', 'links': [Link(kind='kw', direction='bidir', tag='breyer'), Link(kind='kw', direction='bidir', tag='appeals'), Link(kind='kw', direction='bidir', tag='senate'), Link(kind='kw', direction='bidir', tag='judge'), Link(kind='kw', direction='bidir', tag='honor')]}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated 

## Graph vector store retriever

The graph vector store can be converted to a retriever. It is similar to the vector store retriever but it also has traversal search methods such as `traversal` and `mmr_traversal`.

In [9]:
retriever  = store.as_retriever(search_type="traversal")
retriever.invoke("What did the president say about Ketanji Brown Jackson?")

[Document(id='c298b649cb4009c0', metadata={'source': 'state_of_the_union.txt', 'links': [Link(kind='kw', direction='bidir', tag='breyer'), Link(kind='kw', direction='bidir', tag='appeals'), Link(kind='kw', direction='bidir', tag='senate'), Link(kind='kw', direction='bidir', tag='judge'), Link(kind='kw', direction='bidir', tag='honor')]}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated 