# Documenation for backend/inference.py

#### Description:

#### Imports

- `os`: Standard library module for operating system interactions.

- `create_retrieval_chain`: Function to create a retrieval chain using Langchain.

- `create_stuff_documents_chain`: Function to combine multiple documents into a single output.

- `ChatMistralAI`: Class for interacting with the Mistral AI model.

- Document Loading Functions:

    - `load_documents_from_directory`: Loads PDF documents from a directory and splits them into chunks.
    - `load_or_create_faiss_vector_store`: Loads or creates a FAISS vector store.
    - `get_hybrid_retriever`: Creates a hybrid retriever combining BM25 and vector search.
      
- `prompt`: Presumably contains predefined prompts for interacting with the AI model.

- `get_answer_with_source`: Returns answers with their source references.

- `load_dotenv`: Loads environment variables from a `.env` file.

In [9]:
import os
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_mistralai import ChatMistralAI
from dotenv import load_dotenv

# Function to load environment variables and initialize paths
def initialize_environment():
    load_dotenv(override=True)
    document_path = os.getenv("CORPUS_SOURCE")
    persist_directory = os.path.join(document_path, "faiss_indexes")
    print("Environment initialized with document path and persist directory.")
    return document_path, persist_directory

document_path, persist_directory = initialize_environment()

Environment initialized with document path and persist directory.


#### Description:

- Load environment variables: `load_dotenv(override=True)` Load the environment variables, allowing them to override any existing ones.
  
- Load Embeddings: Retrieve the document path from the environment variable CORPUS_SOURCE.
    `def load_embeddings():
    """
    Load documents and the embeddings from the FAISS vector store.
    Returns:
        retriever: The hybrid retriever created from the documents and FAISS store.
    """
    document_path = os.getenv("CORPUS_SOURCE")`

  
- Raise an error if the document path is not set.

   `if not document_path:
        raise ValueError("CORPUS_SOURCE not found in environment variables.")`

- Define the path for storing the FAISS index.
    `persist_directory = os.path.join(document_path, "faiss_indexes")`

-  Set the number of relevant documents to retrieve.
    `top_k = 15` : number of relevant documents to be returned

- Load documents
    `documents = load_documents_from_directory(document_path)`
    - Load documents from the specified directory.

-  Print the number of documents loaded for confirmation.
`print(f"Loaded {len(documents)} documents from {document_path}.")`

-  Raise an error if no documents were loaded.

        `if not documents:

        raise ValueError("No documents loaded. Please check the document path.")`

- Create or load FAISS vector store : Create a new FAISS vector store or load an existing one using the loaded documents.
    `faiss_store = load_or_create_faiss_vector_store(documents, persist_directory)`
  
- Get the hybrid retriever: Retrieve a hybrid retriever using the documents and FAISS store
    `retriever = get_hybrid_retriever(documents, faiss_store, top_k)`

- Print confirmation that embeddings and retriever are ready.

  `print("Embeddings and retriever loaded.")`

- Return the hybrid retriever for further use: `return retriever`
   
- Retrieve the Mistral API key from the environment variable.

`def get_api_key():
    """
    Get Mistral API Key from the environment variables.
    Returns:
        str: The Mistral API key.
    """
    api_key = os.getenv("MISTRAL_API_KEY")`
    
- Raise an error if the API key is not set  

    `if not api_key:
        raise ValueError("MISTRAL_API_KEY not found in environment variables.")`
  
- Return the retrieved Mistral API key: `return api_key`
   
- Example usage :  Entry point for the code when run directly.

if __name__ == "__main__":
- Retrieve the Mistral API key
    `try:
        retriever = load_embeddings()  # Call to load embeddings
        # Load the embeddings and create the retriever.
        api_key = get_api_key()        # Call to get API key` 
        
- Print confirmation of successful API key retrieval: `print(f"Successfully retrieved API Key: {api_key}")`

- Print any error that occurs during the loading process.
-    `except ValueError as e:
        print(f"Error: {e}")`

In [23]:
import os
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_mistralai import ChatMistralAI
from document_loading import (
    load_documents_from_directory, 
    load_or_create_faiss_vector_store,
    get_hybrid_retriever
)
from prompts import prompt
from citations import get_answer_with_source
from dotenv import load_dotenv

# Load environment variables
load_dotenv(override=True)

def load_embeddings():
    """
    Load documents and the embeddings from the FAISS vector store.
    Returns:
        retriever: The hybrid retriever created from the documents and FAISS store.
    """
    document_path = os.getenv("CORPUS_SOURCE")
    if not document_path:
        raise ValueError("CORPUS_SOURCE not found in environment variables.")
    
    persist_directory = os.path.join(document_path, "faiss_indexes")
    top_k = 15  # number of relevant documents to be returned
    
    # Load documents
    documents = load_documents_from_directory(document_path)
    print(f"Loaded {len(documents)} documents from {document_path}.")
    
    if not documents:
        raise ValueError("No documents loaded. Please check the document path.")
    
    # Create or load FAISS vector store
    faiss_store = load_or_create_faiss_vector_store(documents, persist_directory)
    
    # Get the hybrid retriever
    retriever = get_hybrid_retriever(documents, faiss_store, top_k)
    
    print("Embeddings and retriever loaded.")
    return retriever

def get_api_key():
    """
    Get Mistral API Key from the environment variables.
    Returns:
        str: The Mistral API key.
    """
    api_key = os.getenv("MISTRAL_API_KEY")
    if not api_key:
        raise ValueError("MISTRAL_API_KEY not found in environment variables.")
    
    return api_key

# Example usage
if __name__ == "__main__":
    try:
        retriever = load_embeddings()  # Call to load embeddings
        api_key = get_api_key()        # Call to get API key
        print(f"Successfully retrieved API Key: {api_key}")
    except ValueError as e:
        print(f"Error: {e}")

Loading documents from /app/data/swebok...
Loaded 470 documents from /app/data/swebok.
Loading existing FAISS vector store from /app/data/swebok/faiss_indexes/collection...

Embeddings and retriever loaded.
Successfully retrieved API Key: KOswaOluwY1jBZqUHPmUGiKIiuR1FubH


#### Description:

Function defining:

- Defines the function `load_llm_api`, which takes a string argument model_name.
  
- Docstring: Provides a description of the function's purpose, its arguments, and the return type. The function is designed to load and configure the Mistral AI model.

`def load_llm_api(model_name):
    """
    Load and configure the Mistral AI LLM.
    
    Args:
        model_name (str): The name of the model to load.
    
    Returns:
        ChatMistralAI: Configured LLM instance.
    """
`
- Retrieve API Key: Uses `os.getenv` to get the Mistral API key from the environment variables.
      `api_key = os.getenv("MISTRAL_API_KEY")`

- API Key Validation: Checks if the API key was retrieved. If not, it raises a ValueError with a descriptive message, indicating that the API key is missing.

    `if not api_key:
        raise ValueError("MISTRAL_API_KEY not found in environment variables.")`

- Return Configured LLM Instance: Creates and returns an instance of the ChatMistralAI class, configured with the provided model name, the retrieved API key, and several parameters (temperature, max tokens, and top_p) that control the model's behavior.

      ` return ChatMistralAI(
        model=model_name,
        mistral_api_key=api_key,
        temperature=0.2,
        max_tokens=256,
        top_p=0.4,
    )`

- Define Model Name: Specifies the name of the Mistral AI model to be loaded, in this case, "open-mistral-7b".
   `MODEL_NAME = "open-mistral-7b"`

- Attempt to Load Model: Tries to call the `load_llm_api` function with the defined `MODEL_NAME` and assigns the returned instance to the variable llm.
 
`try:
    llm = load_llm_api(MODEL_NAME)`
    
- Success Message: Prints a message indicating that the model has been successfully loaded.
     `print("Successfully loaded the Mistral LLM.")`
  
- Print Model Configuration: These lines print the configuration details of the loaded model, including its name, temperature setting, maximum number of tokens, and top probability value.

  ` print(f"Model Name: {llm.model}")
    print(f"Temperature: {llm.temperature}")
    print(f"Max Tokens: {llm.max_tokens}")
    print(f"Top P: {llm.top_p}")`

- Error Handling: Catches any ValueError raised during the loading process (e.g., if the API key is missing) and prints an error message to inform the user of the issue.

  `except ValueError as e:
    print(f"Error: {e}")`

In [24]:
import os
from langchain_mistralai import ChatMistralAI

def load_llm_api(model_name):
    """
    Load and configure the Mistral AI LLM.
    
    Args:
        model_name (str): The name of the model to load.
    
    Returns:
        ChatMistralAI: Configured LLM instance.
    """
    api_key = os.getenv("MISTRAL_API_KEY")
    if not api_key:
        raise ValueError("MISTRAL_API_KEY not found in environment variables.")
    
    return ChatMistralAI(
        model=model_name,
        mistral_api_key=api_key,
        temperature=0.2,
        max_tokens=256,
        top_p=0.4,
    )

# Set the model name
MODEL_NAME = "open-mistral-7b"

# Load the model and print its configuration
try:
    llm = load_llm_api(MODEL_NAME)
    print("Successfully loaded the Mistral LLM.")
    print(f"Model Name: {llm.model}")
    print(f"Temperature: {llm.temperature}")
    print(f"Max Tokens: {llm.max_tokens}")
    print(f"Top P: {llm.top_p}")
except ValueError as e:
    print(f"Error: {e}")

Successfully loaded the Mistral LLM.
Model Name: open-mistral-7b
Temperature: 0.2
Max Tokens: 256
Top P: 0.4


#### Description:

- Function Declaration: Defines the function `chat_completion_as_dict`, which takes a string parameter question.
- Docstring: Provides a description of the function's purpose, its arguments, and the return value.

`def chat_completion_as_dict(question):
    """
    Generate a response to a given question using the RAG chain,
    returning the answer and context in a dictionary.

    Args:
        question (str): The user question to be answered.

    Returns:
        dict: A dictionary containing the answer and context.
    """
`
- Print Statement: Outputs the question to the console, indicating which question is currently being processed.
        `print(f"Running prompt: {question}")`  
  
- Question-Answer Chain Creation: Initializes a question-answer chain using the language model `(llm)` and a predefined `prompt`.
        `question_answer_chain = create_stuff_documents_chain(llm, prompt)`

  
- Retrieval Chain Creation: Combines the retriever and the `question_answer_chain` to form a Retrieval-Augmented Generation (RAG) chain.
        `rag_chain = create_retrieval_chain(retriever, question_answer_chain)`

- Response Initialization: Sets up a dictionary called `full_response` to hold the generated answer and associated context.
         `full_response = {"answer": "", "context": []}`

- Streaming Loop: Begins iterating over the chunks of data streamed from the rag_chain based on the input question.
        `for chunk in rag_chain.stream({"input": question}):`  

- Answer Check: If the current chunk includes an "answer", it appends that answer to `full_response["answer"]`.
          ` if "answer" in chunk:  
            full_response["answer"] += chunk["answer"]` 

- Context Check: If the chunk contains "context", it adds that context to `full_response["context"]`.
            `if "context" in chunk:  
            full_response["context"].extend(chunk["context"])  

- Final Answer Extraction: Calls the function get_answer_with_source to process `full_response `and obtain the final answer, potentially with citations
    `final_answer = get_answer_with_source(full_response)`

- Remaining Answer Calculation: Determines if there is any part of `final_answer` that was not included in the streamed response.
          `remaining_answer = final_answer[len(full_response["answer"]):]`

- Append Remaining Answer: If there is a `remaining_answer`, it appends this to `full_response["answer"]`.
       `if remaining_answer:  
        full_response["answer"] += remaining_answer`  

  
- Return Statement: Constructs and returns a dictionary that includes the complete answer, context, and the model name.

  
       `return {  # Return a dictionary containing the complete answer, context, and the model name
        "complete_answer": full_response["answer"], 
        "context": full_response["context"], 
        "model": MODEL_NAME  
    }`

- Main Check: Ensures that the following code block only runs if the script is executed directly, not if it's imported as a module.
     `if __name__ == "__main__"`: Check if the script is being run directly

- Sample Question Definition: Sets a sample question to test the function.
       `question = "What are the benefits of Retrieval-Augmented Generation?"`  

- Function Call: Executes the `chat_completion_as_dict` function with the sample question.
        `response = chat_completion_as_dict(question)`

- Print Final Response: Outputs the complete answer, context, and the model name from the returned response to the console.
         print(f"Response: {response['complete_answer']}\nModel: {response['model']}"    

In [33]:
def chat_completion_as_dict(question):
    """
    Generate a response to a given question using the RAG chain,
    returning only the answer in a dictionary.

    Args:
        question (str): The user question to be answered.

    Returns:
        dict: A dictionary containing the answer and model name.
    """
    print(f"Running prompt: {question}")  # Print the question being processed
    question_answer_chain = create_stuff_documents_chain(llm, prompt)  # Create a question-answer chain
    rag_chain = create_retrieval_chain(retriever, question_answer_chain)  # Create a retrieval chain

    full_response = {"answer": "", "context": []}  # Initialize the response dictionary

    for chunk in rag_chain.stream({"input": question}):  # Stream the response from the RAG chain
        if "answer" in chunk:  # Check if the chunk contains an answer
            full_response["answer"] += chunk["answer"]  # Append the answer

        if "context" in chunk:  # Check if the chunk contains context
            full_response["context"].extend(chunk["context"])  # Add context (can be removed)

    # You can skip this line to remove sourcing
    # final_answer = get_answer_with_source(full_response)  
    remaining_answer = full_response["answer"]  # Get the answer without sourcing

    # Return the response without sources and context
    return {
        "complete_answer": remaining_answer,  # Return the complete answer
        "model": MODEL_NAME  # Return the model name
    }

# Example usage
if __name__ == "__main__":
    question = "What are the benefits of Retrieval-Augmented Generation?"  # Define a sample question
    response = chat_completion_as_dict(question)  # Call the function with the question
    print(f"Response: {response['complete_answer']}\nModel: {response['model']}")  # Print the results

Running prompt: What are the benefits of Retrieval-Augmented Generation?
Response: Retrieval-Augmented Generation (RAG) is a technique that combines human expertise with machine learning to generate responses to natural language queries. The benefits of RAG include:

1. Improved efficiency: RAG can generate responses more quickly than a human alone, as it can process and analyze large amounts of data in a fraction of the time.
2. Increased accuracy: By leveraging machine learning algorithms, RAG can reduce the likelihood of errors and improve the overall accuracy of responses.
3. Enhanced consistency: RAG can ensure that responses are consistent across different queries, as it can learn from previous interactions and use that knowledge to generate future responses.
4. Scalability: RAG can handle a large volume of queries simultaneously, making it an ideal solution for applications with high traffic or complex query structures.
5. Cost savings: By automating the response generation proc