# Workflow of a Reliable RAG

https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/reliable_rag.ipynb




**Start** → query → **Vectorstore** → retrieved docs + query → **Check Document Relevancy** → relevant docs + query → **Generate Answer** → relevant docs + answer → **Check Hallucination** → query + relevant docs + answer → **Highlight Document Snippet** → **End**

| Step | Input                          | Process                        | Output                         |
| ---- | ------------------------------ | ------------------------------ | ------------------------------ |
| 1    | –                              | **Start**                      | query                          |
| 2    | query                          | **Vectorstore**                | retrieved docs + query         |
| 3    | retrieved docs + query         | **Check Document Relevancy**   | relevant docs + query          |
| 4    | relevant docs + query          | **Generate Answer**            | relevant docs + answer         |
| 5    | relevant docs + answer         | **Check Hallucination**        | query + relevant docs + answer |
| 6    | query + relevant docs + answer | **Highlight Document Snippet** | snippet                        |
| 7    | snippet                        | **End**                        | –                              |



### **1. Check document relevancy**

```python

class GradeDocuments(BaseModel):
    """Binary score for relevance check on retrieved documents."""

    binary_score: str = Field(
        description="Documents are relevant to the question, 'yes' or 'no'"
    )

```

    llm.with_structured_output(DataModel)

    grader_prompt = "system" + ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),

    retrieval_grader = grade_prompt | structured_llm_grader

### **2. Filter out the non-relevant docs**

  
  ```
  invoke retrieval_grader with  question + doc.page_content
  -> res.binary_score == 'yes' put into docs_to_use list
  ```

### **3. Generate Result vs Baseline Result**

  ```python

  def format_docs(docs):
    return "\n".join(f"<doc{i+1}>:\nTitle:{doc.metadata['title']}\nSource:{doc.metadata['source']}\nContent:{doc.page_content}\n</doc{i+1}>\n" for i, doc in enumerate(docs))

  rag_chain.invoke({"documents":format_docs(docs_to_use), "question": question})
  ```

### **4. Check for Hallucinations**

  ```python
  class GradeHallucinations(BaseModel):
    """Binary score for hallucination present in 'generation' answer."""

    binary_score: str = Field(
        ...,
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )

    hallucination_prompt = "system" + ("human", "Set of facts: \n\n <facts>{documents}</facts> \n\n LLM generation: <generation>{generation}</generation>")

    hallucination_grader = hallucination_prompt | structured_llm_grader

    hallucination_grader.invoke({"documents": format_docs(docs_to_use), "generation": generation})
  
  ```

### **5. Highlight used docs**

  ```python
  
  class HighlightDocuments(BaseModel):
    """Return the specific part of a document used for answering the question."""

    id: List[str] = Field(
        ...,
        description="List of id of docs used to answers the question"
    )

    title: List[str] = Field(
        ...,
        description="List of titles used to answers the question"
    )

    source: List[str] = Field(
        ...,
        description="List of sources used to answers the question"
    )

    segment: List[str] = Field(
        ...,
        description="List of direct segements from used documents that answers the question"
    )


    parser = PydanticOutputParser(pydantic_object=HighlightDocuments)


    prompt = PromptTemplate(
            template= system,
            input_variables=["documents", "question", "generation"],
            partial_variables={"format_instructions": parser.get_format_instructions()},
          )

    doc_lookup.invoke({"documents":format_docs(docs_to_use), "question": question, "generation": generation})
  
  ```



# Optimizing Chunk Sizes

https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/choose_chunk_size.ipynb




### **1. Create evaluation questions and pick k out of them**

  ```python
  num_eval_questions = 25

eval_documents = documents[0:20]
data_generator = DatasetGenerator.from_documents(eval_documents) # List[llama_index.core.schema.Document] : size = 20
eval_questions = data_generator.generate_questions_from_nodes()  # List[str] : size = 938
k_eval_questions = random.sample(eval_questions, num_eval_questions) # List[str] : size = 25
  
  ```

### **2. Define metrics evaluators and modify llama_index faithfullness evaluator prompt to rely on the context**

```python

from llama_index.core.evaluation import (
    DatasetGenerator,
    FaithfulnessEvaluator,
    RelevancyEvaluator
)

*   Set appropriate settings for the LLM
*   Define Faithfulness Evaluators : faithfulness_eval
*   Define Relevancy Evaluators : relevancy_eval # no prompt
*   faithfulness_new_prompt_template :
        ...... (few-shot prompting examples)......
              Information: {query_str}
              Context: {context_str}
              Answer:


faithfulness_eval.update_prompts({"some_prompt_key": faithfulness_new_prompt_template})

```

### **3. Function to evaluate metrics for each chunk size**

```python
def evaluate_response_time_and_accuracy(chunk_size, eval_questions):
  """Evaluate the average response time, faithfulness, and relevancy of responses generated by LLM for a given chunk size."""

  1. set llm,chunk_size,chunk_overlap using Settings.
  2. vector_index = VectorStoreIndex.from_documents(eval_documents)
  3. build query engine with similarity_top_k=5
  4. num_questions = len(eval_questions) # 938
  5. for each question in eval_question:
            query_engine.query(question)
            faithfulness_eval.evaluate_response(res_vect).passing
            # same for relevancy_eval

    average_ = total / num_questions
```

### **4. Test different chunk sizes**
  ```python
  
  chunk_sizes = [128, 256]

  evaluate_response_time_and_accuracy(chunk_size, k_eval_questions)

  ```


# Propositions Chunking


https://github.com/NirDiamant/RAG_TECHNIQUES/blob/main/all_rag_techniques/proposition_chunking.ipynb



### Key Components

1. **Document Chunking:** Splitting a document into manageable pieces for analysis.
2. **Proposition Generation:** Using LLMs to break down document chunks into factual, self-contained propositions.
3. **Proposition Quality Check:** Evaluating generated propositions based on accuracy, clarity, completeness, and conciseness.
4. **Embedding and Vector Store:** Embedding both propositions and larger chunks of the document into a vector store for efficient retrieval.
5. **Retrieval and Comparison:** Testing the retrieval system with different query sizes and comparing results from the proposition-based model with the larger chunk-based model.




### **1. Basic Chunking**

  ```python
  Build Index -> Set Embedding model -> create docs_list -> Split docs_list -> add 'chunk_id' to metadata
  
  ```

### **2. Generate Propositions**

```python
class GeneratePropositions(BaseModel):
    """List of all the propositions in a given document"""

    propositions: List[str] = Field(
        description="List of propositions (factual, self-contained, and concise information)"
    )

    1. LLM with function call
    2. Few shot prompting
        proposition_examples = [{"document":  "...",
     "propositions": "['...', '...', '...',...]"
     },]

    3. example_proposition_prompt = [
        ("human", "{document}"),
        ("ai", "{propositions}"),
    ]

    4. few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt = example_proposition_prompt,
    examples = proposition_examples,
)
    5. prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        few_shot_prompt,
        ("human", "{document}"),
    ]
)

    6. proposition_generator = prompt | structured_llm

    7. invoke with ({"document": doc_splits[i].page_content}) for each doc in docs_splits

      7.1 store all response.propositions as Document(page_content=proposition, metadata={"Title":"...", "chunk_id": i+1}) into a list : propositions

```

### **3. Quality Check**

```python
class GradePropositions(BaseModel):
    """Grade a given proposition on accuracy, clarity, completeness, and conciseness"""

    accuracy: int = Field(
        description="Rate from 1-10 based on how well the proposition reflects the original text."
    )
    
    clarity: int = Field(
        description="Rate from 1-10 based on how easy it is to understand the proposition without additional context."
    )

    completeness: int = Field(
        description="Rate from 1-10 based on whether the proposition includes necessary details (e.g., dates, qualifiers)."
    )

    conciseness: int = Field(
        description="Rate from 1-10 based on whether the proposition is concise without losing important information."
    )

    1. LLM with function call
    2. evaluation_prompt_template = """
        Please evaluate the following proposition based on the criteria below:
      - **Accuracy**: Rate from 1-10 ...
      - **Clarity**: Rate from 1-10 ...
      - **Completeness**: ...
      - **Conciseness**: ...

      Example:
      Docs: ...

      Propositons_1: Neil Armstrong was an astronaut.
      Evaluation_1: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

      ## similarly more examples

      Format:
      Proposition: "{proposition}"
      Original Text: "{original_text}"
    
    """
    2. prompt = ChatPromptTemplate.from_messages(
    [
        ("system", evaluation_prompt_template),
        ("human", "{proposition}, {original_text}"),
    ]
)
    3. create proposition_evaluator
    4. Define evaluation categories and thresholds

    evaluation_categories = ["accuracy", "clarity", "completeness", "conciseness"]

    thresholds = {"accuracy": 7, "clarity": 7, "completeness": 7, "conciseness": 7}

    5. def evaluate_proposition(proposition, original_text):

      5.1 invoke with {"proposition": proposition, "original_text": original_text}
      5.2. Parse the response to extract scores : return {"accuracy": response.accuracy, ...}

    6. def passes_quality_check(scores):

      6.1 Check if the proposition passes the quality check score > thresholds[category] : return True

    7.1 call 5 & 6 with evaluate_proposition(proposition.page_content, doc_splits[proposition.metadata['chunk_id'] - 1].page_content)
    7.2 passes_quality_check(scores) -> store in evaluated_propositions=[]

```
### **4. Index into vector store & compare retrival**

```python
  1. crerate vectorstore_propositions & vectorstore_larger
  2. create retriver for both of them
  3. run both retriver -> get doc.page_content,doc.metadata
  4. compare both should get following result

```

### Comparison

| **Aspect**                | **Proposition-Based Retrieval**                                         | **Simple Chunk Retrieval**                                              |
|---------------------------|--------------------------------------------------------------------------|--------------------------------------------------------------------------|
| **Precision in Response**  | High: Delivers focused and direct answers.                              | Medium: Provides more context but may include irrelevant information.    |
| **Clarity and Brevity**    | High: Clear and concise, avoids unnecessary details.                    | Medium: More comprehensive but can be overwhelming.                      |
| **Contextual Richness**    | Low: May lack context, focusing on specific propositions.               | High: Provides additional context and details.                           |
| **Comprehensiveness**      | Low: May omit broader context or supplementary details.                 | High: Offers a more complete view with extensive information.            |
| **Narrative Flow**         | Medium: Can be fragmented or disjointed.                                | High: Preserves the logical flow and coherence of the original document. |
| **Information Overload**   | Low: Less likely to overwhelm with excess information.                  | High: Risk of overwhelming the user with too much information.           |
| **Use Case Suitability**   | Best for quick, factual queries.                                        | Best for complex queries requiring in-depth understanding.               |
| **Efficiency**             | High: Provides quick, targeted responses.                               | Medium: May require more effort to sift through additional content.      |
| **Specificity**            | High: Precise and targeted responses.                                   | Medium: Answers may be less targeted due to inclusion of broader context.|
