# Auto-Retrieval from a Vector Database

### Using Metadata for Better Retrieval

- Many popular vector dbs support a set of `metadata filters` in addition to a query string for semantic search. Given a natural language query, we `first use the LLM to infer a set of metadata filters as well as the right query string` to pass to the vector db (either can also be blank). This overall query bundle is then executed against the vector db.

- This allows for more dynamic, expressive forms of retrieval beyond top-k semantic search. The relevant context for a given query may only require filtering on a metadata tag, or require a joint combination of filtering + semantic search within the filtered set, or just raw semantic search.

- We demonstrate an example with Pinecone, but auto-retrieval is also implemented with many other vector dbs (e.g. Milviss, Weaviate, and more).

In [16]:
import yaml, os
from pinecone import Pinecone
from pinecone import ServerlessSpec
from llama_index.schema import TextNode
from llama_index.llms import AzureOpenAI, OpenAI
from llama_index.llm_predictor import LLMPredictor
from llama_index import set_global_service_context
from llama_index.vector_stores import MetadataFilters
from llama_index.embeddings import HuggingFaceEmbedding
from llama_index.text_splitter import TokenTextSplitter
from llama_index.vector_stores.types import MetadataInfo, VectorStoreInfo
from llama_index.vector_stores import PineconeVectorStore, MetadataFilters
from llama_index.indices.vector_store.retrievers import VectorIndexAutoRetriever
from llama_index import ServiceContext, load_index_from_storage, StorageContext, VectorStoreIndex

# Configure LLMs and VDB

In [2]:
with open('cadentials.yaml') as f:
    credentials = yaml.load(f, Loader=yaml.FullLoader)

In [3]:
api_key = credentials["PINECONE_API_KEY"]
pc = Pinecone(api_key=api_key)

In [4]:
llm_flag = 'DIRECT'

embedding_llm = HuggingFaceEmbedding(
                                    model_name="BAAI/bge-small-en-v1.5",
                                    device='mps'
                                    )

if llm_flag == 'AZURE':
    llm=AzureOpenAI(
                    model=credentials['AZURE_ENGINE'],
                    api_key=credentials['AZURE_OPENAI_API_KEY'],
                    deployment_name=credentials['AZURE_DEPLOYMENT_ID'],
                    api_version=credentials['AZURE_OPENAI_API_VERSION'],
                    azure_endpoint=credentials['AZURE_OPENAI_API_BASE'],
                    temperature=0.3
                    )
    
    chat_llm = LLMPredictor(llm)
else:
    chat_llm = OpenAI(
                    api_key=credentials['DEMO_OPENAI_API_KEY'],
                    temperature=0.3
                    )

text_splitter = TokenTextSplitter(
                                separator=" ",
                                chunk_size=1024,
                                chunk_overlap=20,
                                backup_separators=["\n"]
                                )

if llm_flag == 'AZURE':
    service_context = ServiceContext.from_defaults(
                                                    text_splitter=text_splitter,
                                                    # prompt_helper=prompt_helper,
                                                    embed_model=embedding_llm,
                                                    llm_predictor=chat_llm
                                                    )
else:
    service_context = ServiceContext.from_defaults(
                                                    text_splitter=text_splitter,
                                                    # prompt_helper=prompt_helper,
                                                    embed_model=embedding_llm,
                                                    llm=chat_llm
                                                    )

set_global_service_context(service_context)

In [5]:
# Dimensions are for text-embedding-ada-002
try:
    pc.create_index(
                    name="advanced-rag-experiments",
                    dimension=384,                  # Replace with your model dimensions
                    metric="euclidean",             # Replace with your model metric
                    spec=ServerlessSpec(
                                        cloud="aws",
                                        region="us-west-2"
                                        ) 
                    )
except Exception as e:
    print(e)
    pass

(409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'content-type': 'text/plain; charset=utf-8', 'access-control-allow-origin': '*', 'vary': 'origin,access-control-request-method,access-control-request-headers', 'access-control-expose-headers': '*', 'X-Cloud-Trace-Context': '9ea56095fac1a503258c39cb8b43b398', 'Date': 'Thu, 18 Jan 2024 15:09:57 GMT', 'Server': 'Google Frontend', 'Content-Length': '85', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
HTTP response body: {"error":{"code":"ALREADY_EXISTS","message":"Resource  already exists"},"status":409}



In [6]:
pinecone_index = pc.Index("advanced-rag-experiments")

# Prepare Dummy Data

In [7]:
nodes = [
        TextNode(
                text="The Shawshank Redemption",
                metadata={
                    "author": "Stephen King",
                    "theme": "Friendship",
                    "year": 1994,
                },
            ),
        TextNode(
                text="The Godfather",
                metadata={
                    "director": "Francis Ford Coppola",
                    "theme": "Mafia",
                    "year": 1972,
                },
            ),
        TextNode(
                text="Inception",
                metadata={
                    "director": "Christopher Nolan",
                    "theme": "Fiction",
                    "year": 2010,
                },
            ),
        TextNode(
                text="To Kill a Mockingbird",
                metadata={
                    "author": "Harper Lee",
                    "theme": "Fiction",
                    "year": 1960,
                },
            ),
        TextNode(
                text="1984",
                metadata={
                    "author": "George Orwell",
                    "theme": "Totalitarianism",
                    "year": 1949,
                },
            ),
        TextNode(
                text="The Great Gatsby",
                metadata={
                    "author": "F. Scott Fitzgerald",
                    "theme": "The American Dream",
                    "year": 1925,
                },
            ),
        TextNode(
                text="Harry Potter and the Sorcerer's Stone",
                metadata={
                    "author": "J.K. Rowling",
                    "theme": "Fiction",
                    "year": 1997,
                },
            ),
        ]

# Build Vector Index

In [8]:
vector_store = PineconeVectorStore(
                                pinecone_index=pinecone_index,
                                namespace="test",
                                )
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [9]:
index = VectorStoreIndex(
                        nodes, 
                        storage_context=storage_context
                        )

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Upserted vectors:   0%|          | 0/7 [00:00<?, ?it/s]

# Retriever

In [10]:
vector_store_info = VectorStoreInfo(
                                content_info="famous books and movies",
                                metadata_info=[
                                                MetadataInfo(
                                                    name="director",
                                                    type="str",
                                                    description=("Name of the director"),
                                                ),
                                                MetadataInfo(
                                                    name="theme",
                                                    type="str",
                                                    description=("Theme of the book/movie"),
                                                ),
                                                MetadataInfo(
                                                    name="year",
                                                    type="int",
                                                    description=("Year of the book/movie"),
                                                ),
                                            ],
                                            )

In [11]:
retriever = VectorIndexAutoRetriever(
                                    index,
                                    empty_query_top_k=10,
                                    vector_store_info=vector_store_info,
                                    default_empty_query_vector=[0] * 384, # this is a hack to allow for blank queries in pinecone
                                    verbose=True,
                                    )

# Querying

In [12]:
nodes = retriever.retrieve(
    "Tell me about some books/movies after the year 2000"
)
for node in nodes:
    print(node.text)
    print(node.metadata)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using query str: 
Using filters: [('year', '>', 2000)]
Inception
{'director': 'Christopher Nolan', 'theme': 'Fiction', 'year': 2010}
Inception
{'director': 'Christopher Nolan', 'theme': 'Fiction', 'year': 2010}
Inception
{'director': 'Christopher Nolan', 'theme': 'Fiction', 'year': 2010}
Inception
{'director': 'Christopher Nolan', 'theme': 'Fiction', 'year': 2010}


In [14]:
nodes = retriever.retrieve(
    "Tell me about some books that are Fiction"
)
for node in nodes:
    print(node.text)
    print(node.metadata)

Using query str: Fiction
Using filters: [('theme', '==', 'Fiction')]
To Kill a Mockingbird
{'author': 'Harper Lee', 'theme': 'Fiction', 'year': 1960}
To Kill a Mockingbird
{'author': 'Harper Lee', 'theme': 'Fiction', 'year': 1960}


# Pass in Additional Metadata Filters

In [17]:
filter_dicts = [{
                "key": "year", 
                "operator": "==", 
                "value": 1997
                }]
filters = MetadataFilters.from_dicts(filter_dicts)
retriever2 = VectorIndexAutoRetriever(
                                    index,
                                    empty_query_top_k=10,
                                    vector_store_info=vector_store_info,
                                    default_empty_query_vector=[0] * 384,
                                    extra_filters=filters,
                                    )

In [18]:
nodes = retriever2.retrieve("Tell me about some books that are Fiction")
for node in nodes:
    print(node.text)
    print(node.metadata)

Harry Potter and the Sorcerer's Stone
{'author': 'J.K. Rowling', 'theme': 'Fiction', 'year': 1997}
Harry Potter and the Sorcerer's Stone
{'author': 'J.K. Rowling', 'theme': 'Fiction', 'year': 1997}


### Example of a failing Query

In [20]:
nodes = retriever.retrieve("Tell me about some books that are mafia-themed")
for node in nodes:
    print(node.text)
    print(node.metadata)

Using query str: books
Using filters: [('theme', '==', 'mafia'), ('year', '==', 'null'), ('director', '==', 'null')]
