# Create a Document Retrieval Agent

In this tutorial, we will be using `council` to build a document retrieval agent that can answer questions about Microsoft's 2022 10-K report, a comprehensive report filed annually by publicly-traded companies about their financial performance.

In many real-world scenarios, LLMs might not have enough contextual information to answer certain types of queries. For instance, if a query pertains to very recent or rapidly changing data (like today's weather or latest stock prices), it falls outside the scope of a pre-trained LLM, which has been trained on a fixed, static corpus of text and does not inherently know about events in the world after its training data was collected. This applies to OpenAI's gpt-4 and gpt-3.5 models that have a knowledge cutoff date of September 2021 and would not be able to answer questions using information found in Microsoft's 2022 documents.

This is where augmenting an LLM with external data can be beneficial. By adding up-to-date information fetched from external data sources to the LLM prompt as context, we can make the responses from the LLM more relevant and accurate.

When we augment an LLM with external data, it can be beneficial to represent this external data in a way that allows for efficient search and retrieval. One common approach is to convert the data into a series of vectors, and store these in a vector index. A vector index is a data structure used to optimize the lookup of vectors in high-dimensional spaces. The vector representation allows us to perform similarity searches: given a query vector, we can find the most similar vectors in the index, which correspond to the most relevant pieces of data. This can significantly speed up data retrieval and make the augmentation process more efficient, especially when dealing with large volumes of data.

We will be using `LLamaIndex`, a framework for augmenting LLMs with external data, for creating a vector index.
- LLamaIndex [github](https://github.com/jerryjliu/llama_index)
- LlamaIndex [documentation](https://gpt-index.readthedocs.io/en/latest/index.html)

The process will include the following steps:
1) Extracting the text from Microsoft's 10-K saved as a pdf
2) Splitting the text into chunks of a certain size
3) Creating embeddings for each text chunk
4) Storing the text chunks and their corresponding embeddings in a vector index
5) Retrieving text chunks most similar to a user query based on semantic similarity

Code from this notebook will be used to build the Financial Analyst Agent in `4_financial_analyst_agent`, but this notebook can also be executed to create a standalone search agent.


## Install required libraries
Install council and other dependencies

In [None]:
!pip install -r requirements.txt

## Import the required modules
Import the required modules from the Council framework, supporting frameworks such as LlamaIndex and set the environment variables.

In [None]:
import os
from typing import List
from string import Template

import tiktoken
from transformers import AutoTokenizer
from tiktoken import Encoding

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser
from llama_index.indices.vector_store import VectorIndexRetriever
from llama_index.schema import NodeWithScore

from council.agents import Agent
from council.skills import SkillBase, LLMSkill, PromptToMessages
from council.contexts import SkillContext, ChatMessage
from council.llm import OpenAILLM, LLMMessage
from council.chains import Chain
from council.controllers import BasicController
from council.evaluators import BasicEvaluator
from council.filters import BasicFilter
from council.contexts import AgentContext, ChatHistory
from council.prompt import PromptBuilder

import dotenv
dotenv.load_dotenv()

print(os.getenv("OPENAI_API_KEY", None) is not None)

## Specifying constants used in the notebook
These parameters will dictate the behaviour of the document indexing and retrieval system

In [None]:
COMPANY_NAME = "Microsoft"
COMPANY_TICKER = "MSFT"

PDF_FILE_NAME = "msft-10K-2022.pdf"
EMBEDDING_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
ENCODING_NAME = "cl100k_base"
MAX_CHUNK_SIZE = 256
CHUNK_OVERLAP = 20
CONTEXT_TOKEN_LIMIT = 3000
NUM_RETRIEVED_DOCUMENTS = 50

## Instantiating tokenizers

A tokenizer is a component that breaks down text into smaller units called tokens. Tokenization is a crucial first step in text data preprocessing because machine learning models don't inherently understand text in its raw form. Instead, they require numerical input. Therefore, before feeding text into a model that generates vector representations / embeddings, the text must be tokenized.

The tokenizer used for text chunking is the same tokenizer used for the selected embedding model from `sentence-transformers`. We use the it to split the document into chunks small enough to use with the embedding model. We create the ChunkingTokenizer class to wrap the tokenizer we load from `transformers` with the required methods for it to be used directly in the `LlamaIndex` text splitter `TokenTextSplitter`.

The tokenizer for the OpenAI LLM will be used to count the number of tokens from the retrieved document chunks that we are adding to the model input to ensure we do not go over the model's token limit.

In [None]:
class ChunkingTokenizer:
    """Tokenizer for chunking document data for creation of embeddings"""

    def __init__(self, model_name: str):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)

    def __call__(self, text: str) -> List[int]:
        return self.tokenizer.encode(text)

# Instantiate tokenizer for chunking
chunking_tokenizer = ChunkingTokenizer(EMBEDDING_MODEL_NAME)

# Instantiate tokenizer for OpenAI LLM
llm_tokenizer = tiktoken.get_encoding(ENCODING_NAME)

## Split the text
We set up the TokenTextSplitter with the chunking tokenizer and with other set constants values, such as the maximum chunk size. A maximum chunk size of 256 is used because that is the maximum number of tokens we input into the `all-MiniLM-L6-v2` embedding model. We split the text based on paragraph and line breaks, and empty spaces.

LlamaIndex splits the text by creating objects called a `Node` that contain the text from the document chunk and some additional data. This process is completed using `the SimpleNodeParser`.

In [None]:
# Instantiate text splitter
text_splitter = TokenTextSplitter(
    chunk_size=MAX_CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP,
    tokenizer=chunking_tokenizer,
    separator="\n\n",
    backup_separators=["\n", " "])

# Instantiate node parser
node_parser = SimpleNodeParser(text_splitter=text_splitter)

## Create the Vector Index
We have the end-to-end creation of the LlamaIndex vector index, where we extract the text from the pdf document, split it into nodes, calculate the embeddings for each node and store it into a vector index. Finally, we initialize the vector index as a retriever (that we can interact with to retrieve nodes based on semantic similarity) by specifying the number of most similar nodes to retrieve.

**Note:** LlamaIndex requires a local model name to begin with the *local:* prefix

In [None]:
# Specify the embedding model and node parser
service_context = ServiceContext.from_defaults(
    embed_model=f"local:{EMBEDDING_MODEL_NAME}", node_parser=node_parser)

# Extract the text from the pdf document
documents = SimpleDirectoryReader(input_files=[PDF_FILE_NAME]).load_data()

# Create the index by splitting text into nodes and calculating text embeddings
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

# Initialize index as retriever for top K most similar nodes
index_retriever = index.as_retriever(similarity_top_k=NUM_RETRIEVED_DOCUMENTS)

## Create the Document Retrieval Skill
We first create a `Retriever` class that will interact with LlamaIndex to retrieve most similar documents (nodes) and process them into paragraphs of text that can be added to an LLM prompt.

When a query comes in, it is also converted into a similar high-dimensional vector using the same `all-MiniLM-L6-v2` embedding model. The system then calculates the cosine similarity between this query vector and all the vectors in the index. The top NUM_RETRIEVED_DOCUMENTS results are then chosen based on these similarity scores. The documents whose vectors are the most similar to the query vector are considered the most relevant and are returned.

We then create our `DocRetrievalSkill`, which inherits from council's `SkillBase`, that queries the `Retriever` with the last user message and returns the formatted document text chunks. 

In [None]:
# Define utility class for document retrieval with LlamaIndex
class Retriever:
    def __init__(self, llm_tokenizer: Encoding, retriever: VectorIndexRetriever):
        """Class to retrieve text chunks from Llama Index and create context for LLM"""
        self.llm_tokenizer = llm_tokenizer
        self.retriever = retriever

    def retrieve_docs(self, query) -> str:
        """End-to-end function to retrieve most similar nodes and build the context"""
        nodes = self.retriever.retrieve(query)
        docs = self._extract_text(nodes)
        context = self._build_context(docs)

        return context

    @staticmethod
    def _extract_text(nodes: List[NodeWithScore]) -> List[str]:
        """Function to extract the text from the retrieved nodes"""
        return [node.node.text for node in nodes]

    def _build_context(self, docs: List[str]) -> str:
        """Function to build context for LLM by separating text chunks into paragraphs"""
        context = ""
        num_tokens = 0
        for doc in docs:
            doc += "\n\n"
            num_tokens += len(self.llm_tokenizer.encode(doc))
            if num_tokens <= CONTEXT_TOKEN_LIMIT:
                context += doc
            else:
                break

        return context


# Define document retrieval skill
class DocRetrievalSkill(SkillBase):
    """Skill to retrieve documents and build context"""

    def __init__(self, retriever: Retriever):
        super().__init__(name="document_retrieval")
        self.retriever = retriever

    def execute(self, context: SkillContext) -> ChatMessage:
        query = context.last_user_message.message
        context = self.retriever.retrieve_docs(query)

        return self.build_success_message(context)


# Instantiate document retrieval skill
doc_retrieval_skill = DocRetrievalSkill(Retriever(llm_tokenizer, index_retriever))

## Create the LLMSkill
The `LLMSkill` is a skill provided by council that can make a call and return a response from an LLM model. 
We will inject the context created from the retrieved documents by the `DocRetrievalSkill` into the `LLMSkill`.   

In [None]:
# Load OpenAILLM
llm = OpenAILLM.from_env(model='gpt-3.5-turbo')

# OpenAI LLM prompts
SYSTEM_MESSAGE = "You are a financial analyst whose job is to answer user questions about $company with the provided context."

PROMPT = """Use the following pieces of context to answer the query.
If the answer is not provided in the context, do not make up an answer. Instead, respond that you do not know.

CONTEXT:
{{chain_history.last_message}}
END CONTEXT.

QUERY:
{{chat_history.user.last_message}}
END QUERY.

YOUR ANSWER:
"""

# Function used by the LLMSkill to add the document context and user message into LLM prompt
def build_context_messages(context: SkillContext) -> List[LLMMessage]:
    """Context messages function for LLMSkill"""
    context_message_prompt = PromptToMessages(prompt_builder=PromptBuilder(PROMPT))
    return context_message_prompt.to_user_message(context)

# Instantiate LLMSkill
llm_skill = LLMSkill(
    llm=llm,
    system_prompt=Template(SYSTEM_MESSAGE).substitute(company=COMPANY_NAME),
    context_messages=build_context_messages,
)

## Create the chain
The chain will use the `DocRetrievalSkill` followed by the `LLMSkill` to answer the user query.

In [None]:
doc_retrieval_chain = Chain(
    name="doc_retrieval_chain",
    description=f"Information from {COMPANY_NAME} ({COMPANY_TICKER}) 10-K from their 2022 fiscal year, a document that contain important updates for investors about company performance and operations",
    runners=[doc_retrieval_skill, llm_skill],
)

## Create the document retrieval agent
The agent will use the document retrieval chain we created and the `BasicController`, `BasicEvaluator` and `BasicFilter` to select our single chain and return the response to the user.

Notebooks `3_controller` and `4_financial_analyst_agent` will demonstrate more complicated uses of the controller and filter.

In [None]:
agent = Agent(controller=BasicController(chains=[doc_retrieval_chain]), evaluator=BasicEvaluator(), filter=BasicFilter())

## Interact with the agent

In [None]:
from council.contexts import Budget

run_context = AgentContext.from_user_message("What is the financial performance of Microsoft?", budget=Budget(600))

result = agent.execute(run_context)

print(result.best_message.message)

In [None]:
print(f"\nexecution log:\n{run_context._execution_context._executionLog.to_json()}")

In the [next part](./2_google_search.ipynb), we will learn how to leverage Google search. 