# Retrival-Augmented Generation

## Components of RAG

- **Retrieval Engine** - Search and rank data based on a query. 
    - **Input Query Processor** - Interpret and refine user input.

    - **Search Engine** - Searches indexed data using a model (i.e. SEntence Transformers) then ranks the embeddings using another model (i.e. Elastisearch KNN)

- **Augmentation Engine** - Takes the top-ranked data from retrieval engine and adds to **prompt**, that will be def to the LLM.

- **Generation Engine** - AAdvanced LLM. Creates a response by combining its language skills with the newly retrieved external data that is used as a prompt addition to the LLM.
     - Allows generation of response that is coherent, up-to-date, and relevant.

## [Step-by-Step of a RAG](https://www.linkedin.com/pulse/how-rag-works-detailed-explanation-its-components-steps-pradeep-menon-ws7sc/)

### **Data-indexing** 


- The data that needs to be ingested, be it documents, images etc. is processed and chunked. Then the data is indexed using an indexing strategy.
- The step of data indexing is performed periodically as and when new data needs to be used for response generation.

- ***Search Indexing*** - Used when the data is indexed by exact matches of words or phrases. 
    - Fast and precise, but can miss relevant data when not matched exactly.

- ***Vector Indexing*** - Used when data is indexed by numerical vectors representing meaning of words or phrases.
    - Slower and less precise, but can find more relevant data that is not an exact match.

- ***Hybrid indexing*** - Indexed by both eaxt matches and numerical vectors. Hybrid indexing can improve the accuracy and diversity of data retrieval.

### **Input Query Processing**

- Fine-tuning the question to improve its compatibility with the indexed data. The question is simplified and optimized for effective search.

- *Search indexing*: Undegoes simple text processing to remove stopwords, or simply use the question as it is.

- *Vector indexing*: IQP is more complex and complicated:
    - The input query is transformed into a **vector** using neural network techniques like encoding.

    - This transformation captures its **semantic similarity** (i.e. SBERT). 

- **Hybrid indexing** - Here, the input query processing can be a combination of search and vector indexing.
    - The query processing can involve using the question as is or removing some stop words to make it more concise

    - Then, a neural network can be used to encode the query into a vector that captures its semantic similarity.

### **Search and Ranking**

- The query, which can be a word, a phrase, or a vector, is used to search the indexed data, which can be exact matches or numerical vectors 
- The search returns a list of data that are relevant to the query.
- The search result is further used by RAG to generate response that is responsive and useful for the users
- RAG can use different algorithms for text search depending on the type of indexing and the type of query.

- *Search Indexing*
    - **TF-IDF** (Term Frequency-Inverse Document Frequency): Ranks the documents based on how often the quer term appears.
    - **BM25** - Ab etter version of TF-IDF. COnsiders how often a term appears and how long a document is, giving a more advanced ordering of search results.

- *Vector Indexing*
    - **Word Embeddings** (Word2Vec, GloVe) - Converts words into dense vectors that capture meanings, used for understanding word context and relationships.
    - **Cosine Similarity** - Measures the cosine of the angle between two vectors, used to determine query vector similarity to document vectors.
    - **SBERT** - Integrates Cosine Similarity to standard BERT model.

- *Hybrid Indexing* utilizes both algorithmically. 

### **Prompt Augmentation**

- We add the best pieces of information to the original question to enhance the prompt.
- This step ensures that the LLM's response is not solely reliant on its pre-existing knowledge. The response is also tailored with up-to-date and specific information.

In [None]:
prompt_template = """
Generate me 5 questions for ESAT (Knowledge).

Additional information:
<information>
"""

### **Response Generation**

- LLM uses the augmented prompt to create a response. 
- The answer is ***grounded*** on the specific, current data obtained earlier.
- The LLM combines its own knowledge with external data to create precise and relevant responses.

## RAG Architectures