This notebook was inspired by this [LlamaIndex example](https://docs.llamaindex.ai/en/stable/examples/cookbooks/mixedbread_reranker/).

[mixedbread.ai reranker models](https://www.mixedbread.ai/blog/mxbai-rerank-v1)

Making some changes to it with the only intention of trying ideas and learning.

Notice that I am assuming you have the relevant API_KEYs as environmental variables.

In [None]:
!pip install sentence-transformers

In [1]:
import os
import subprocess
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.postprocessor import SentenceTransformerRerank

<jemalloc>: Unsupported system page size


## Defining Global Variables

In [2]:
DATA_DIR = os.path.join(os.environ["DATA_DIR"], "paul_graham")
PERSIST_DIR = os.path.join(os.environ["PERSIST_DIR"], "paul_graham")

## Download data

In [3]:
if not os.path.exists(DATA_DIR):
    os.mkdir(DATA_DIR)
    command = f"wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O '{DATA_DIR}/paul_graham.txt'"       
    subprocess.run(command, shell=True)

## Define postprocessor for mxbai-rerank-base-v1 reranker

In [4]:
postprocessor = SentenceTransformerRerank(
    model="mxbai-rerank-xsmall-v1", top_n=2
)

## Create index and query engine for each company individually

In [5]:
if not os.path.exists(PERSIST_DIR):
    print("Creating Index")
    # load the documents and create the index
    documents = SimpleDirectoryReader(DATA_DIR).load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    print("Loading Index")
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# We will first retrieve 5 relevant nodes and pick top-2 nodes using the defined postprocessor.
query_engine = index.as_query_engine(similarity_top_k=5, node_postprocessors=[postprocessor])
    

Creating Index


In [7]:
response = query_engine.query("Why did Sam Altman decline the offer of becoming president of Y Combinator?")
print(response)

Initially, Sam Altman declined the offer of becoming president of Y Combinator because he wanted to start a startup to make nuclear reactors.
