# Observability and human feedback collection for an Agent application using Agents for Amazon bedrock.

### Context:
In the following example, we will use a `Agents for Amazon Bedrock` that you have already created and add the request and response to an `Amazon Kinesis Data Firehose`. The Amazon Firehose will then apply a transformation on the data to flatten the nested JSON and created logical partitions in the data using `call_type` variable, which will ease data usage later when you query it in a database. 

The data transformation takes place using a `Transformation Lambda function` associated with the Amazon Kinesis Data Firehose. This configuration allows for data transformation without adding any latency to your application due to the transformation step. You can optionally disable the data flattening in the Amazon lambda function.

### Prerequisite
After successfully setting up the backend resources required using the provided `CloudFormation template` to gather necessary data on user requests, your custom metadata like latency, time to first token, tags, model responses, citations, and any other custom identifiers you would like to add (e.g., user_id/customer_id), you can now test if your observability architecture is working as expected and determine the latency introduced by adding this additional component to your application.

#### `Important Note`: 

##### 1. Please use your AWS configuration to fill in the `config.py` file before running the code 

##### 2: Make sure you have upgraded your boto3 version to have at least `1.34.126` version.

#### Section 1:

In the below section, we will go through the code that interacts with the Agents for Amazon Bedrock that can take custom actions like RAG retrival, SQL execution, other API calls, etc.. The code imports necessary libraries and modules, including the AWS SDK (boto3) and the observability custom module called `observability` that contains the `BedrockLogs` class for logging, evaluation, and observability purposes. To use `Agent`, you must specify the `feature_name='Agent'` to use Agent features.

The `invoke_agent` function is responsible for interacting with the agent API. The `invoke_agent` function takes a question and other arguments as input, and calls the `bedrock_agent_runtime_client.invoke_agent` API to generate a response using Agents from Amazon Bedrock based on the provided question and configuration parameters. The function processes the response to extract the agent's answer and trace data, handles exceptions, and returns relevant information.

The `invoke_agent` function is decorated with `@bedrock_logs.watch`, which logs and tracks the function call for observability purposes. `@bedrock_logs.watch` tracks first input argument, so, you can choose to pass a JSON and add any metadata according to your use case and the observability solution will track it. Similarly, the observability solution will also track all the return varaiable values. You can also log any custom metric like time to first token, time to last token, or any other custom metric of your choosing inside the decorated function and add it to the return statement so that it gets logged.

In [None]:
import boto3
import time
import json

# Custom Module:
from observability import BedrockLogs

# Import your configuration values
from config import (
    REGION, FIREHOSE_NAME, CRAWLER_NAME, MODEL_ARN, AGENT_ID, AGENT_ALIAS_ID, 
    SESSION_ID, ENABLE_TRACE, END_SESSION, AGENT_CONFIG, EXPERIMENT_ID, 
    CUSTOM_TAG, GUARDRAIL_ID, GUARDRAIL_VERSION, MAX_TOKENS, TEMPERATURE, TOP_P
)

bedrock_agent_client = boto3.client('bedrock-agent')
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')

# Initialize BedrockLogs for observability using Amazon Firehose:
# bedrock_logs = BedrockLogs(delivery_stream_name=FIREHOSE_NAME, 
#                            feedback_variables=True, 
#                            experiment_id=EXPERIMENT_ID, # this can be your project name
#                            feature_name='Agent')

# Initialize BedrockLogs for observability using local mode for troubleshooting:
bedrock_logs = BedrockLogs(delivery_stream_name='local', 
                           feedback_variables=True, 
                           feature_name='Agent',
                           experiment_id=EXPERIMENT_ID # this can be your project name
                           )

# in the below function, only `query_to_agent` input argument will be logged to allow passing sensitive data to the function.
# invoke the agent API
@bedrock_logs.watch(call_type='agent-in-prod')
def invoke_agent(query_to_agent, agent_id, agent_alias_id,
                 session_id, enableTrace, endSession, agent_config=None):
    try:
        start_time = time.time()
        time_to_first_token = None
        time_at_first_token = None
        
        agentResponse = bedrock_agent_runtime_client.invoke_agent(
            inputText=query_to_agent,
            agentId=agent_id,
            agentAliasId=agent_alias_id,
            sessionId=session_id,
            enableTrace=enableTrace,
            endSession=endSession
        )

        event_stream = agentResponse['completion']
        agent_answer = None
        end_event_received = False
        trace_data = []

        for event in event_stream:
            if 'chunk' in event:
                if time_to_first_token is None:
                    time_at_first_token = time.time()
                    time_to_first_token = time_at_first_token - start_time
                data = event['chunk']['bytes']
                agent_answer = data.decode('utf8')
                end_event_received = True
            elif 'trace' in event:
                trace = event['trace']
                trace['start_trace_time'] = time.time()
                trace_data.append(trace)
            else:
                raise Exception("Unexpected event.", event)

        if not end_event_received:
            raise Exception("End event not received.")
            
        agentResponse['ResponseMetadata']['time_to_first_token'] = time_to_first_token
        agentResponse['ResponseMetadata']['time_to_last_token'] = time.time() - time_at_first_token

    except Exception as e:
        raise e
    
    # the following will be returned as a tuple datatype:
    return agentResponse['ResponseMetadata'], agent_answer, trace_data

# Test the observability by calling the decorated Agent function

Here we pass a question to the invoke_agent custom function and expect to only the `QUESTION` arguments, `model response`, and `traces` to be logged to the configured Amazon S3 bucket.

The reason we expect to only log the `QUESTION` arguments is because observability package only logs the first argument passed to the decorated function.

Check below example pattern:
```python
@bedrock_logs.watch(capture_input=True, capture_output=True, call_type='LLM')
def your_function(arg1, arg2): # only arg1 will be tracked to give you an option to not log sensitive information
    # Your function code here
    
    # Your code to calculate any other custom metric, like time to first/last token 
    
    return None # or output, custom_metric, response or any other output variable
```

In [None]:
QUESTION = "<enter-your-question-here>"

results, log, run_id, observation_id = invoke_agent(
    QUESTION, 
    AGENT_ID, 
    AGENT_ALIAS_ID, 
    SESSION_ID, 
    ENABLE_TRACE, 
    END_SESSION
)

#### Section 2: Collecting feedback for your agent application responses.

In this section, you are using the `run_id` and `observation_id` generated from the `invoke_agent function` to collect feedback on the responses from your end-users or QA team. The code defines two functions, `observation_level_feedback` and `session_level_feedback`, both decorated with `@bedrock_logs.watch` to track the feedback collection process.

The `call_type` variable in the decorator is used to create logical partitions in the collected data. This allows you to separate the feedback data based on whether it is collected at the observation level or the session level, making it easier to analyze and process the feedback data later.

The `observation_level_feedback` function is designed to collect feedback at the observation level, which means feedback is associated with a specific `observation_id`. This function takes a dictionary as input, containing the `user_id`, `f_run_id` (the run_id associated with the feedback), `f_observation_id` (the observation_id associated with the feedback), and `actual_feedback` (the feedback itself, which can be a simple "Thumbs-up" or more detailed text).

The `session_level_feedback` function is designed to collect feedback at the session level, which means feedback is associated with a specific `run_id`. The input parameters for this function are not provided in the code snippet.

When using the feedback mechanism, it is crucial to always pass the `run_id` and `observation_id` for which the feedback is being collected like we did with `f_run_id` and `observation_id`. These identifiers act as keys for joining various logically partitioned datasets, allowing you to associate the feedback with the specific response generated by your GenAI application.

The code demonstrates how the `observation_level_feedback` function can be called with a dictionary containing the necessary information, including a dummy `user_feedback` value of "Thumbs-up".

By collecting feedback at the observation or session level and using the `call_type` variable to create logical partitions, you can effectively organize and analyze the feedback data, enabling you to evaluate the performance and quality of the responses, identify areas for improvement, and refine the knowledge base or model accordingly.

In [None]:
@bedrock_logs.watch(call_type='observation-feedback')
def observation_level_feedback(feedback):
    pass

@bedrock_logs.watch(call_type='session-feedback')
def session_level_feedback(feedback):
    pass


# defining a dummy user_feedback:
user_feedback = 'Thumbs-up'

observation_feedback_from_front_end = {
    'user_id': 'User-1',
    'f_run_id': run_id,
    'f_observation_id': observation_id,
    'actual_feedback': user_feedback
}

# log observation-feedback as a separate dataset based on call_type
observation_level_feedback(observation_feedback_from_front_end)

In [None]:
user_feedback = 'Amazing - this is fast and an awesome way to help the customers!'
session_feedback_from_front_end = {
    'user_id': 'User-1',
    'f_run_id': run_id,
    'actual_feedback': user_feedback
}

# log session-feedback as a separate dataset based on call_type
session_level_feedback(session_feedback_from_front_end)

### Next Steps:

1. Now that your data is available in Amazon S3, you can `optionally` trigger the `Glue Crawler` to help you with the creation of Amazon `Athena tables`. These Athena tables can be used to create amazing dashboards for analyzing and visualizing the collected data.

2. Using Athena and Amazon S3, you can perform detailed analysis for troubleshooting your application, response evaluation, or build analytical dashboards. The provided screenshots demonstrate how you can not only track metrics for your application but also incorporate any information logged via `@bedrock_logs.watch`, including custom data or metrics like latency, token metrics, cost-related metrics, and more.

# END