# File Knowledge Retrieval Agent

## 0. Import Libraries

Import and initialize the necessary libraries.

In [1]:
from openai import OpenAI
from instill.clients import init_pipeline_client
import os

pipeline = init_pipeline_client(api_token=os.environ["INSTILL_API_TOKEN"])
client = OpenAI()

## 1. Initialize Variables

Here we set the outputs from the executing agent, as well as the parameters defined by Agent-BE, user interaction and other add-on pipelines.

In [2]:
# Catalog and namespace for retrieval
catalog_name = "benchmark-s1-wework"
namespace = "george_strong"

# File summary from indexing-generate-summary pipeline
file_summary = """
This document contains the S-1 Registration Statement for WeWork Companies Inc., filed with the Securities and Exchange Commission on August 14, 2019. It provides a comprehensive overview of the company's business model, which focuses on offering flexible workspace solutions through a "space-as-a-service" membership model, catering to a diverse clientele that includes freelancers, startups, and large enterprises. The document highlights WeWork's rapid growth, showcasing a committed revenue backlog of $4.0 billion as of June 30, 2019, alongside significant increases in memberships and revenue.

Key sections include an analysis of risk factors, financial performance, and strategic growth plans, emphasizing market expansion and product enhancement. It details the company's capital structure, including various stock classes and their voting rights, as well as significant relationships with major investors like SoftBank. The financial statements reflect substantial increases in revenue and operating expenses, alongside notable net losses, while also addressing lease-related liabilities and assets in accordance with accounting standards.

Additionally, the document discusses WeWork's acquisitions, stock-based compensation, and related party transactions, providing insights into the company's operational strategies and financial health. Overall, this registration statement serves as a detailed prospectus for potential investors, outlining both the opportunities and risks associated with investing in WeWork.
"""

# Instruction from executing agent
instruction = "identify who Adam Neumann is in relation to WeWork"

# State context from executing agent
state_context = " "

# Recommend actions from executing agent
recommend_actions = " "

# User's follow-up query
user_query = "Who is Adam Neumann?"

# User's chat history
chat_history = " "

# Default relevance threshold for RAG mode
relevance_threshold = 0.1

In [3]:
# Preprocess file summary to remove double quotes - can cause issues with JSON parsing
file_summary = file_summary.replace('"', "'")
chat_history = chat_history.replace('"', "'")
user_query = user_query.replace('"', "'")
instruction = instruction.replace('"', "'")

## 2. Define Prompt and System Message

Here we define the prompt and system message templates for the agent.

In [4]:
SYSTEM_MESSAGE = """
You are an AI assistant tasked with collecting and aggregating relevant information based on the supervisor's context and the user's query. You should decide whether to use **Retrieval-Augmented Generation (RAG)** (retrieving-rag) or **deep extraction** (retrieving-extract) to gather and synthesize information.

Consider the following to help determine which tool to use:
1. **Instruction Type**: 
   - If the instruction requires **specific, factual answers** (e.g., who, what, when, where), use **retrieving-rag** as it excels at retrieving relevant information directly.
   - If the instruction involves **general analysis** (e.g., risks, sentiment, patterns), **retrieving-extract** may be more appropriate, as it involves a comprehensive analysis of the content.
   
2. **Complexity and Risk**: 
   - If the instruction requires a **nuanced exploration**, where missing key details could compromise the quality of the response, opt for **retrieving-extract** to ensure that a comprehensive analysis is performed.
   - For simpler, fact-based queries, **retrieving-rag** is more efficient.

3. **Follow-Up Queries**:
   - If the user's query **builds upon prior responses** or requires more refined details, a deep extraction approach may be needed to integrate prior knowledge and ensure completeness.

4. **Chat History Context**:
   - If relevant context exists in the previous conversation, consider how it might refine the query and influence the tool choice.

Remember: If neither **retrieving-rag** nor **retrieving-extract** can help address the instruction, refuse the task.

You will still return properly formatted bracket citations (e.g., [1][2]) for each referenced source, ensuring correct citation order. The citations should point to the specific file sources that support each piece of information.

User Past Conversation History:
{chat_history}

User Follow-up Query:
{user_query}
"""

PROMPT = """
Below is the background/context from your supervisor:
${state_context}

The supervisor's task description for you is as follows:
${instruction}

Recommended actions or parameters you may use:
${recommend_actions}

Please read the information above carefully and decide whether you should call **retrieving-rag** or **retrieving-extract** to gather relevant data.

To make the decision, consider the following:

1. **Instruction Type**: Does the instruction ask for specific, factual answers (e.g., who, what, when, where)? If so, prefer **retrieving-rag**. If the instruction is more general or asks for broader analysis (e.g., identifying patterns or risks), lean toward **retrieving-extract**.
2. **Complexity**: Does the instruction require a nuanced understanding, where missing key details could compromise the result? In this case, **retrieving-extract** is preferable.
3. **Follow-Up Query**: If the user's query builds on prior conversation or needs further refinement, consider using **retrieving-extract** to ensure all necessary context is captured.
4. **Chat History Context**: Does the chat history provide relevant context that can help refine the decision?

Once you have determined which tool to use, gather the necessary data and incorporate it into a comprehensive, well-structured response, embedding bracket citations (e.g., [3][5]) within the text to show which file/chunk sources support each piece of information.

# Steps

1. Review the given context, user's past conversation history, and follow-up query.
2. Decide whether to call **retrieving-rag** (for specific, fact-based queries) or **retrieving-extract** (for general analysis or when deeper exploration is required).
3. If necessary, formulate a retrieval query to extract the relevant data and include the appropriate citations in your response.
4. After receiving the tool response, decide if further retrievals are needed for completeness.
5. Compile a final, comprehensive answer that incorporates the retrieved data and maintains proper citation order.

# Output Format

• Provide your final answer in clear, well-structured English.
• Ensure each reference to an external file source is enclosed in bracket citations (e.g., [1], [2]), corresponding to the relevant file chunks.
• Maintain citation order — do not mix them up or omit any.
"""

## 3. Define Tools

Here we define the tools that the agent will use. We have two tools, `retrieving-rag` and `retrieving-extract`. The `retrieving-extract` tool is used when the user has selected deep file analysis. The `retrieving-rag` tool is used when the user has not selected deep file analysis (or the number of chunks is less than or equal to 15).

In [5]:
tools = [
{
    "type": "function",
    "function": {
        "name": "retrieving-rag",
        "description": "Retrieves chunks from the document that are semantically relevant to the instruction, user query, and chat history.",
        "parameters": {
            "type": "object",
            "properties": {},
            "required": [],
            "additionalProperties": False
        },
        "strict": True
    }
},
{
    "type": "function",
    "function": {
        "name": "retrieving-extract",
        "description": "Performs a deep analysis to extract information relevant to the instruction, user query, and chat history from an entire file with high recall.",
        "parameters": {
            "type": "object",
            "properties": {},
            "required": [],
            "additionalProperties": False
        },
        "strict": True
    }
}]

# Set up the messages with the system message and prompt
messages = [
    {
        "role": "system",
        "content": SYSTEM_MESSAGE.format(
            chat_history=chat_history,
            user_query=user_query
        )
    },
    {
        "role": "user",
        "content": PROMPT.format(
            state_context=state_context,
            instruction=instruction,
            recommend_actions=recommend_actions
        )
    }
]

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    temperature=0.0,
    top_p=0.95
)

tool_call = completion.choices[0].message.tool_calls[0]
print(tool_call.function.name)

2025-02-19 21:27:34,507.507 INFO     HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


retrieving-rag


## 4. Handle Deep File Analysis

If the number of chunks is less than or equal to 15, we will set `deep_analysis` to `False` and set `relevance_threshold` to `0`. The logic is that if the number of chunks is less than or equal to 15, we will retrieve the whole document. By setting the `relevance_threshold` to `0`, we ensure that all chunks are consumed by the agents response (none are filtered out).

In [6]:
# Default to RAG mode if num_chunks is less than or equal to 15 as the whole document is retrieved
if tool_call.function.name == "retrieving-extract":
    num_chunks = pipeline.trigger(
        namespace_id=namespace,
        pipeline_id="get-num-chunks",
        data=[{
            "catalog-name": catalog_name,
            "namespace": namespace
        }]
    )['outputs'][0]['num-chunks']

    if num_chunks <= 15:
        tool_call.function.name = "retrieving-rag"
        relevance_threshold = 0 # We don't want to filter chunks if we're doing deep analysis so we set it to 0
print(tool_call.function.name)

retrieving-rag


## 5. Execute the Tool Call

We first extract the arguments from the tool call and then execute the tool call.


In [7]:
if tool_call.function.name == "retrieving-rag":
    tool_result = pipeline.trigger(
        namespace_id=namespace,
        pipeline_id="retrieving-rag",
        data=[{
            "catalog-name": catalog_name,
            "namespace": namespace,
            "instruction": instruction,
            "chat-history": chat_history,
            "user-query": user_query,
            "relevance-threshold": relevance_threshold
        }]
    )['outputs'][0]
    result = tool_result['chunks']
elif tool_call.function.name == "retrieving-extract":
    tool_result = pipeline.trigger(
        namespace_id=namespace,
        pipeline_id="retrieving-extract",
        data=[{
            "catalog-name": catalog_name,
            "namespace": namespace,
            "instruction": instruction,
            "chat-history": chat_history,
            "user-query": user_query,
            "file-summary": file_summary
        }]
    )['outputs'][0]
    result = tool_result
tool_result

{'citations': ['Source: S-1 Wework.md. Chunk UID: 54c25ddc-2bfb-40ae-85b4-312553e1c131.',
  'Source: S-1 Wework.md. Chunk UID: 5c612658-baea-4f92-a931-f4fab0024826.',
  'Source: S-1 Wework.md. Chunk UID: e234498a-1562-43a6-ab6c-7a99c378f9e1.',
  'Source: S-1 Wework.md. Chunk UID: 27a67185-615e-4f13-8780-7b97304e8c6c.',
  'Source: S-1 Wework.md. Chunk UID: 12425ed4-c7de-4dca-8a7e-9ad4b4584cd1.',
  'Source: S-1 Wework.md. Chunk UID: 9c93e746-0b1c-45ae-866e-ea799505a101.',
  'Source: S-1 Wework.md. Chunk UID: 0f761585-f122-4a2b-abbd-4f17b58173e1.',
  'Source: S-1 Wework.md. Chunk UID: f626db37-7885-499d-ba75-8fe5960f170c.',
  'Source: S-1 Wework.md. Chunk UID: 60de9d6d-4e46-4d65-9315-8451fc2f0e37.',
  'Source: S-1 Wework.md. Chunk UID: f77a7a3a-89c8-4b8a-b754-55ea0fa9847f.'],
 'scores': [0.9914887,
  0.97923523,
  0.87831426,
  0.55326396,
  0.53518957,
  0.5179062,
  0.40386072,
  0.25870037,
  0.24926445,
  0.21304953],
 'chunks': ['[1] # CERTAIN RELATIONSHIPS AND RELATED PARTY TRANSACT

## 6. Formulate Agent Response

Here we append the tool call result to the messages which are then used to create a new ChatCompletion object to return the result to the executing agent

In [8]:
messages.append(completion.choices[0].message)  # append model's function call message
messages.append({                               # append result message
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": str(result)
})

response_completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools,
    temperature=0.0,
    top_p=0.95
)

print(response_completion.choices[0].message.content)

2025-02-19 21:28:07,986.986 INFO     HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Adam Neumann is a co-founder of WeWork and has served as the Chief Executive Officer (CEO) and Chairman of the company's board of directors since its inception. He is recognized for his role in shaping the vision, strategic direction, and operational priorities of WeWork, which has become a significant player in the coworking space industry. Neumann is noted for his unique leadership style, which combines visionary thinking with operational execution and community building [1][2].

Neumann's influence extends beyond his executive role; he controls a majority of the company's voting power due to his ownership of high-vote stock, which allows him to significantly influence corporate decisions, including the election of directors and major corporate transactions [3][4]. His leadership has been characterized by a commitment to growth and innovation, although it has also faced scrutiny and challenges, particularly regarding governance and financial practices [5][6].

In addition to his corp