RAG is generation of text by LLMs augmeneted by additional context retrieved from knowledge database. In this project, a Question Answer system is developed to answer any questions pertaining to the company Bristol Myers Suibb. The questions are answered from all the information that can be found on their website.
The inference pipeline is available for use and is demoable at: https://ansidd.github.io/rag_bms.html
Code to the front end is available here: https://github.com/ansidd/ansidd.github.io/blob/main/rag_bms.md
The RAG System is accesed through API calls to a Python backend server hosted on an Azure Virtual Machine
Llama 2 of 13B parameters is used for text generation inference. The model is accessed through an API provided as a service by Amazon Bedrock.
The knowledge base for this usecase is compiled by performing webscrape on allowed sites of the Bristol Myers Squibb website.
To compile the context that would be fed into the LLMs input relevant documents must be idenitified. Relevant documents are filtered from a ton of others in the knowledge base using cosine similarity.
Cosine similarirty between vector embeddings of the query and these documents would help us find the ones closely related to the query. The vecor embeddings of text are generated by
a pretrained model available as part of sentence-transformers library