# WS24 - Intelligente Informationssysteme

## Block 3: Retrieval Augmented Generation

**Part 8: Advanced Retrieval - Retrieval**

![Retrieval](./media/LangChain_Retrieval.png "Retrieval")

1. Re-Ranking

In [1]:
## FIRST: Initialize the VectorDB and LLM
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import ChatOllama
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma(persist_directory="vector_store", collection_name="lils_blogs", embedding_function=embeddings)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 20})

## LLM with function calling ability
llm = ChatOllama(model="llama3.2:latest", temperature=0)

## Re-Ranking

![Re-Ranking](./media/LangChain_Re-Ranking.png "Re-Ranking")


see: 
- https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_10_and_11.ipynb


In [2]:
question = "What are the approaches to task decomposition?"

retrieved_docs = retriever.invoke(question)

initial_ranked = [{"text": doc.page_content, "cross-encoder_score":0, "id": i} for i, doc in enumerate(retrieved_docs)]

len(initial_ranked)

20

### Re-Ranking with Cross-Encoders

In [None]:
#!pip install sentence_transformers

In [3]:
from sentence_transformers import CrossEncoder

# from https://medium.com/@rossashman/the-art-of-rag-part-3-reranking-with-cross-encoders-688a16b64669
def reranker(query, hits):
        
    # To refine the results, we use a CrossEncoder. A CrossEncoder gets both inputs (input_question, retrieved_question)
    # and outputs a score 0...1 indicating the similarity.
    cross_encoder_model = CrossEncoder("cross-encoder/stsb-roberta-base")

    # Now, do the re-ranking with the cross-encoder
    sentence_pairs = [[query, hit["text"]] for hit in hits]
    similarity_scores = cross_encoder_model.predict(sentence_pairs)
    
    for idx in range(len(hits)):
        hits[idx]["cross-encoder_score"] = similarity_scores[idx]

    # Sort list by CrossEncoder scores
    #hits = sorted(hits, key=lambda x: x["cross-encoder_score"], reverse=True)
    #print("Top 5 hits with CrossEncoder:")
    #for hit in hits:
    #    print("\t{:.3f}\t{}".format(hit["cross-encoder_score"], hit["id"]))

    #print("\n\n========\n")

In [4]:
reranker(question, initial_ranked)

In [5]:
re_ranked = sorted(initial_ranked, key=lambda x:x["cross-encoder_score"], reverse=True)

In [6]:
print("cross-encoder_score", "    id")
for hit in re_ranked:
    print("\t{:.3f}\t\t{}".format(hit["cross-encoder_score"], hit["id"]))

cross-encoder_score     id
	0.607		18
	0.543		19
	0.507		6
	0.499		7
	0.498		2
	0.471		0
	0.470		9
	0.454		16
	0.441		17
	0.438		11
	0.437		10
	0.424		12
	0.415		13
	0.400		1
	0.388		14
	0.387		5
	0.387		4
	0.262		15
	0.213		3
	0.197		8


### Re-Ranking with BM25

In [None]:
#!pip install llama-index-retrievers-bm25

In [None]:
# see: https://docs.llamaindex.ai/en/stable/examples/retrievers/bm25_retriever/

In [7]:
from llama_index.core import Document
#from llama_index.core.node_parser import SentenceSplitter

documents = [Document(text=doc.page_content, metadata={"id":i}) for i, doc in enumerate(retrieved_docs)]

# parse nodes
#parser = SentenceSplitter()
#nodes = parser.get_nodes_from_documents(documents)

from llama_index.retrievers.bm25 import BM25Retriever
import Stemmer

# We can pass in the index, docstore, or list of nodes to create the retriever
bm25_retriever = BM25Retriever.from_defaults(
    nodes=documents,
    similarity_top_k=len(documents),
    # Optional: We can pass in the stemmer and set the language for stopwords
    # This is important for removing stopwords and stemming the query + text
    # The default is english for both
    stemmer=Stemmer.Stemmer("english"),
    language="english",
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [8]:
bm25_retriever

<llama_index.retrievers.bm25.base.BM25Retriever at 0x336cfa0d0>

In [9]:
retrieved_nodes = bm25_retriever.retrieve(question)
for node in retrieved_nodes:
    print(node.metadata)

{'id': 0}
{'id': 2}
{'id': 8}
{'id': 10}
{'id': 16}
{'id': 18}
{'id': 5}
{'id': 14}
{'id': 4}
{'id': 1}
{'id': 11}
{'id': 7}
{'id': 12}
{'id': 6}
{'id': 15}
{'id': 17}
{'id': 19}
{'id': 9}
{'id': 13}
{'id': 3}
