<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/CohereRerank.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cohere Rerank

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
!pip install llama-index
!pip install llama-index-core


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [None]:
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    # pprint_response,
)

Download Data

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-02-20 17:15:44--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8003::154, 2606:50c0:8000::154, 2606:50c0:8001::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8003::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-02-20 17:15:44 (2.79 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [None]:
import os

os.environ[
    "OPENAI_API_KEY"
] = "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"

In [None]:
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

# build index
index = VectorStoreIndex.from_documents(documents=documents)

#### Retrieve top 10 most relevant nodes, then filter with Cohere Rerank

In [None]:
from typing import Any, List, Optional

from llama_index.core.bridge.pydantic import Field, PrivateAttr
from llama_index.core.callbacks import CBEventType, EventPayload
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.schema import MetadataMode, NodeWithScore, QueryBundle
from llama_index.core.utils import infer_torch_device

# from transformers import AutoTokenizer, AutoModel
# import torch

DEFAULT_COLBERT_MAX_LENGTH = 512


class ColbertRerank(BaseNodePostprocessor):
    model: str = Field(description="Colbert model name.")
    top_n: int = Field(
        description="Number of nodes to return sorted by score."
    )
    device: str = Field(
        default="cpu",
        description="Device to use for sentence transformer.",
    )
    keep_retrieval_score: bool = Field(
        default=False,
        description="Whether to keep the retrieval score in metadata.",
    )
    _model: Any = PrivateAttr()
    _tokenizer: Any = PrivateAttr()

    def __init__(
        self,
        top_n: int = 5,
        model: str = "colbert-ir/colbertv2.0",
        tokenizer: str = "colbert-ir/colbertv2.0",
        device: Optional[str] = None,
        keep_retrieval_score: Optional[bool] = False,
    ):
        try:
            from transformers import AutoTokenizer, AutoModel
            import torch
        except ImportError:
            raise ImportError(
                "Cannot import sentence-transformers or torch package,",
                "please `pip install torch sentence-transformers`",
            )
        device = infer_torch_device() if device is None else device
        self._tokenizer = AutoTokenizer.from_pretrained(tokenizer)
        self._model = AutoModel.from_pretrained(model)
        super().__init__(
            top_n=top_n,
            model=model,
            tokenizer=tokenizer,
            device=device,
            keep_retrieval_score=keep_retrieval_score,
        )

    @classmethod
    def class_name(cls) -> str:
        return "ColbertRerank"

    def _calculate_sim(
        self, query: str, nodes: List[NodeWithScore]
    ) -> List[float]:
        # Expand dimensions for broadcasting
        # Query: [batch_size, query_length, embedding_size] -> [batch_size, query_length, 1, embedding_size]
        # Document: [batch_size, doc_length, embedding_size] -> [batch_size, 1, doc_length, embedding_size]
        query_encoding = self._tokenizer(query, return_tensors="pt")
        query_embedding = self._model(**query_encoding).last_hidden_state.mean(
            dim=1
        )
        rerank_score_list = []

        for node in nodes:
            document_text = node.node.get_content(
                metadata_mode=MetadataMode.EMBED
            )
            document_encoding = self._tokenizer(
                document_text,
                return_tensors="pt",
                truncation=True,
                max_length=512,
            )
            document_embedding = self._model(
                **document_encoding
            ).last_hidden_state

            # Compute cosine similarity across the embedding dimension
            sim_matrix = torch.nn.functional.cosine_similarity(
                query_embedding.unsqueeze(2),
                document_embedding.unsqueeze(1),
                dim=-1,
            )

            # Take the maximum similarity for each query token (across all document tokens)
            # sim_matrix shape: [batch_size, query_length, doc_length]
            max_sim_scores, _ = torch.max(sim_matrix, dim=2)

            # Average these maximum scores across all query tokens
            avg_max_sim = torch.mean(max_sim_scores, dim=1)
            rerank_score_list.append(avg_max_sim)
        return rerank_score_list

    def _postprocess_nodes(
        self,
        nodes: List[NodeWithScore],
        query_bundle: Optional[QueryBundle] = None,
    ) -> List[NodeWithScore]:
        if query_bundle is None:
            raise ValueError("Missing query bundle in extra info.")
        if len(nodes) == 0:
            return []

        # tokenizer = AutoTokenizer.from_pretrained(self._tokenizer)
        # model = AutoModel.from_pretrained(self._model)

        # for node in nodes:

        # nodes_text_list = [
        #     node.node.get_content(metadata_mode=MetadataMode.EMBED) for node in nodes
        # ]
        # print(nodes_text_list)

        with self.callback_manager.event(
            CBEventType.RERANKING,
            payload={
                EventPayload.NODES: nodes,
                EventPayload.MODEL_NAME: self.model,
                EventPayload.QUERY_STR: query_bundle.query_str,
                EventPayload.TOP_K: self.top_n,
            },
        ) as event:
            scores = self._calculate_sim(query_bundle.query_str, nodes)

            assert len(scores) == len(nodes)

            for node, score in zip(nodes, scores):
                if self.keep_retrieval_score:
                    # keep the retrieval score in metadata
                    node.node.metadata["retrieval_score"] = node.score
                node.score = float(score)

            reranked_nodes = sorted(
                nodes, key=lambda x: -x.score if x.score else 0
            )[: self.top_n]
            event.on_end(payload={EventPayload.NODES: reranked_nodes})

        return reranked_nodes

In [None]:
colbert_reranker = ColbertRerank(
    top_n=5, model="colbert-ir/colbertv2.0", tokenizer="colbert-ir/colbertv2.0"
)

query_engine = index.as_query_engine(
    similarity_top_k=5,
    node_postprocessors=[colbert_reranker],
)
response = query_engine.query(
    "What did Sam Altman do in this essay?",
)

TypeError: ColbertRerank._calculate_sim() takes 2 positional arguments but 3 were given

In [None]:
pprint_response(response)

['file_path: data/paul_graham/paul_graham_essay.txt\n\nWhat I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.\n\nThe language we used was an early version of Fortra

TypeError: ColbertRerank._calculate_sim() takes 2 positional arguments but 3 were given

### Directly retrieve top 2 most similar nodes

In [None]:
query_engine = index.as_query_engine(
    similarity_top_k=2,
)
response = query_engine.query(
    "What did Sam Altman do in this essay?",
)

Retrieved context is irrelevant and response is hallucinated.

In [None]:
pprint_response(response)

Final Response: Sam Altman was one of the founders of Y Combinator, a
startup accelerator. He was part of the first batch of startups funded
by Y Combinator, which included Reddit, Justin Kan and Emmett Shear's
Twitch, and Aaron Swartz. He was also involved in the Summer Founders
Program, which was a summer program where undergrads could start their
own startups instead of taking a summer job at a tech company. He also
helped to develop a new version of Arc, a programming language, and
wrote a book on Lisp.
______________________________________________________________________
Source Node 1/2
Document ID: abc0f1aa-464a-4ae1-9a7b-2d47a9dc967e
Similarity: 0.7940524933077708
Text: due to our ignorance about investing. We needed to get
experience as investors. What better way, we thought, than to fund a
whole bunch of startups at once? We knew undergrads got temporary jobs
at tech companies during the summer. Why not organize a summer program
where they'd start startups instead? We wouldn'