## MediPal -- Multi-Vector Retriever with rerank

### In this section, I will build a multi-vector retriever with Re-rank mechanism
![](../screenshots/rerank_retriever.PNG "")

Before, I already used a medical-domain LLM to generate questions from different aspects based on the given content.

As per previous plan, I am going to implement Multi-Vector as the foundation of the RAG.

That means I am going to:
* Embed those questions to vectorestore and put documents to docstore.
* Utilize doc_id which were generated at chunking stage to be a link between vectorstore and docstore.
* Use an cross-encoder to rerank the retrieved docs.

In [None]:
# Multi-Vector implementation
from mytools import timed, login_huggingface
import os, json, copy
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryStore
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_huggingface import HuggingFaceEmbeddings
from sentence_transformers import CrossEncoder

W0925 21:44:02.014000 41564 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.


##### I choose sentence-transformers/embeddinggemma-300m-medical, as it is a sentence-transformers model finetuned from google/embeddinggemma-300m on the miriad/miriad-4.4M dataset. It maps sentences & documents to a 768-dimensional dense vector space and can be used for medical information retrieval, specifically designed for searching for passages (up to 1k tokens) of scientific medical papers using detailed medical questions.

##### Citation:

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

### I involve a cross-encoder(ncbi/MedCPT-Cross-Encoder) to rerank the retrieved documents and output top_k(n) ones.
##### This crossEncoder(Bert) model was fine-tuned on 30522 medical related tokens.

##### Citation:

@article{jin2023medcpt,
  title={MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval},
  author={Jin, Qiao and Kim, Won and Chen, Qingyu and Comeau, Donald C and Yeganova, Lana and Wilbur, W John and Lu, Zhiyong},
  journal={Bioinformatics},
  volume={39},
  number={11},
  pages={btad651},
  year={2023},
  publisher={Oxford University Press}
}


In [None]:
class Rerank_Retriever():
    """
        Rerank_Retriever class definition:
            Attributes:
                workspace_base_path: The current workspace.
                dataset_path: The path to the medicine dataset.                
                embedding_model_id: The name of the embedding model.
                cross_encoder_model_id: The name of crossEncoder model which is used to do reranking.
                embedding_model: A embedding model.
                retriever: It is a very important retriever who will similarity search the documents based on query.

            Functions:
                load_json_list: Load json file to json objects.
                login_huggingface: Login huggingface to gain the access to the LLMs
                build_medicine_retriever: Build a multi-vector db which contains vectorstore and docstore. Embedding generated questions to vectorstore and Storing original documents to docstore.
                load_embedding_model: Load embedding model.
                load_crossencoder: Load cross encoder model.
                retrieve: Wrap retriever and reranker up to fetch top_k relevant documents.
    """
    def __init__(self) -> None:

        self.workspace_base_path = os.getcwd()
        self.dataset_path = os.path.join(self.workspace_base_path, "datasets", "medicine_data_questions.json")  
        self.chunked_dataset_path = os.path.join(self.workspace_base_path, "datasets", "chunked_medicine_data.json")  
        self.vector_persist_directory = os.path.join(self.workspace_base_path, "datasets", "vectordb")
        self.embedding_model_id = "sentence-transformers/embeddinggemma-300m-medical"
        self.cross_encoder_model_id = "ncbi/MedCPT-Cross-Encoder" 
        self.vectorstore = None
        self.embedding_model = None
        self.retriever = None
        self.cross_encoder = None

    @timed
    def load_embedding_model(self):        
        self.embedding_model = HuggingFaceEmbeddings(
            model_name=self.embedding_model_id,
            model_kwargs = {'device': 'cpu'},            
            # Normalizing helps cosine similarity behave better across models
            encode_kwargs={"normalize_embeddings": True},
        )      
    
    @timed
    def load_crossencoder(self):
        self.cross_encoder = CrossEncoder(self.cross_encoder_model_id)

    def load_questions_data(self):    
        with open(self.dataset_path, mode = "r", encoding="utf-8") as f:
            return json.load(f)
        
    def load_chunked_data(self):    
        with open(self.chunked_dataset_path, mode = "r", encoding="utf-8") as f:
            return json.load(f)      
        
    def build_medicine_retriever(self):        
        questions_data = self.load_questions_data()  
        chunked_data = self.load_chunked_data()          
        docstore = InMemoryStore()
        id_key = "doc_id"

        # The vectorstore to use to index the questions
        self.vectorstore = Chroma(
            collection_name = "medicine_data", 
            embedding_function = self.embedding_model,
            persist_directory=self.vector_persist_directory
        )
        # The Multi-Vector retriever
        self.retriever = MultiVectorRetriever(
            vectorstore=self.vectorstore,
            docstore=docstore,
            id_key=id_key,
        )

        doc_ids = list()
        questions = list()
        docs = list()
        for d in questions_data:
            doc_id = d["doc_id"]
            doc_ids.append(doc_id)
            docs.append(Document(metadata={"doc_id": doc_id}, page_content=d["original_doc"]))
            for q in d["questions"]:
                questions.append(Document(metadata={"doc_id": doc_id}, page_content=q))

        for d in chunked_data: 
            doc_id = d["doc_id"]        
            for q in d["docs"]:
                questions.append(Document(metadata={"doc_id": doc_id}, page_content=q))

        self.retriever.vectorstore.add_documents(questions)
        self.retriever.docstore.mset(list(zip(doc_ids,docs)))  
        
    def load_existing_retriever(self):
        questions_data = self.load_questions_data()
        docstore = InMemoryStore()
        id_key = "doc_id"
        # The vectorstore to use to index the questions
        self.vectorstore = Chroma(
            collection_name = "medicine_data", 
            embedding_function = self.embedding_model,
            persist_directory=self.vector_persist_directory
        )
        # The Multi-Vector retriever
        self.retriever = MultiVectorRetriever(
            vectorstore=self.vectorstore,
            docstore=docstore,
            id_key=id_key,
        )

        doc_ids = list()        
        docs = list()
        for d in questions_data:
            doc_id = d["doc_id"]
            doc_ids.append(doc_id)
            docs.append(Document(metadata={"doc_id": doc_id}, page_content=d["original_doc"]))
            
        self.retriever.docstore.mset(list(zip(doc_ids,docs)))

    @timed       
    def setup_retriever(self):
        login_huggingface()      
        self.load_embedding_model()
        self.load_crossencoder()

        if os.path.isdir(self.vector_persist_directory) and os.listdir(self.vector_persist_directory):
            self.load_existing_retriever()
        else:
            self.build_medicine_retriever()

    @timed
    def retrieve(self, query: str, top_k: int=5):
        retrieved_docs = self.retriever.invoke(query, kwargs={"k":10})
        retrieved_docs = copy.deepcopy(retrieved_docs) # Avoid rerank changes original documents
        #Rerank part
        pairs = [[query, d.page_content] for d in retrieved_docs]
        scores = self.cross_encoder.predict(pairs, batch_size=32)
        for r_d, score in zip(retrieved_docs, scores):
            r_d.metadata["rerank_score"] = float(score)
        retrieved_docs.sort(key= lambda d: d.metadata["rerank_score"], reverse=True)
        #Rerank part
        return retrieved_docs[ :top_k]

In [None]:
rag = Rerank_Retriever()

In [4]:
rag.setup_retriever()

setup_retriever starts runing!
Login HuggingFace!
load_embedding_model starts runing!


You are trying to use a model that was created with Sentence Transformers version 5.2.0.dev0, but you're currently using version 5.1.0. This might cause unexpected behavior or errors. In that case, try to update to the latest version.


load_embedding_model took 4.7082s
load_crossencoder starts runing!
load_crossencoder took 1.4767s
setup_retriever took 8.3227s


In [5]:
rag.retrieve("what is Phenylephrine?",top_k=4)

retrieve starts runing!
retrieve took 7.5131s


[Document(metadata={'doc_id': 'f7ad6ffd-7176-4c1d-a7ee-bec020443c2c', 'rerank_score': 0.9999997615814209}, page_content="phenylephrine comes as a tablet, a liquid, or a dissolving strip to take by mouth. it is usually taken every 4 hours as needed. follow the directions on your prescription label or the package label carefully, and ask your doctor or pharmacist to explain any part you do not understand. take phenylephrine exactly as directed. do not take more or less of it or take it more often than prescribed by your doctor or directed on the label.phenylephrine comes alone and in combination with other medications. ask your doctor or pharmacist for advice on which product is best for your symptoms. check nonprescription cough and cold product labels carefully before using two or more products at the same time. these products may contain the same active ingredient(s) and taking them together could cause you to receive an overdose. this is especially important if you will be giving cou

In [19]:
rag.retrieve("How can I avoid it?",top_k=4)

retrieve starts runing!
retrieve took 5.5456s


[Document(metadata={'doc_id': '955b6fca-bc0e-433a-8187-34d439546cdb', 'rerank_score': 0.5537765026092529}, page_content="keep Phenylephrine in the container it came in, tightly closed, and out of reach of children. store it at room temperature and away from excess heat and moisture (not in the bathroom).keep all medication out of sight and reach of children as many containers are not child-resistant. always lock safety caps. place the medication in a safe location â\x80\x93 one that is up and away and out of their sight and reach.https://www.upandaway.orgdispose of unneeded medications in a way so that pets, children, and other people cannot take them. do not flush Phenylephrine down the toilet. use a medicine take-back program. talk to your pharmacist about take-back programs in your community. visit the fda's safe disposal of medicines websitehttps://goo.gl/c4rm4pfor more information."),
 Document(metadata={'doc_id': 'f7ad6ffd-7176-4c1d-a7ee-bec020443c2c', 'rerank_score': 0.116048440

In [20]:
rag.retrieve("What wrong with you?",top_k=4)

retrieve starts runing!
retrieve took 4.7198s


[Document(metadata={'doc_id': 'f7ad6ffd-7176-4c1d-a7ee-bec020443c2c', 'rerank_score': 0.04489823803305626}, page_content="phenylephrine comes as a tablet, a liquid, or a dissolving strip to take by mouth. it is usually taken every 4 hours as needed. follow the directions on your prescription label or the package label carefully, and ask your doctor or pharmacist to explain any part you do not understand. take phenylephrine exactly as directed. do not take more or less of it or take it more often than prescribed by your doctor or directed on the label.phenylephrine comes alone and in combination with other medications. ask your doctor or pharmacist for advice on which product is best for your symptoms. check nonprescription cough and cold product labels carefully before using two or more products at the same time. these products may contain the same active ingredient(s) and taking them together could cause you to receive an overdose. this is especially important if you will be giving co