Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0

## Dynamic Filtering for Amazon Bedrock Knowledge Bases with LangChain

## Introduction

This notebook demonstrates how to implement dynamic filtering for [Amazon Bedrock Knowledge Bases](https://aws.amazon.com/bedrock/knowledge-bases/)  using [LangChain](https://www.langchain.com/). The technique presented here aims to enhance the relevance and efficiency of information retrieval in AI-powered applications.

[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that provides access to foundation models via a unified API. One of its key features, the Knowledge Base, allows organizations to integrate their proprietary data, enabling AI applications to reason over and utilize this information effectively.

LangChain is an open-source framework designed to simplify the development of applications using large language models. It offers a suite of tools and abstractions that facilitate the integration of various components in an AI system, including document processing, embeddings, and vector stores.

In this tutorial, we will explore how to combine these technologies to implement dynamic filtering capabilities. This approach allows for real-time adjustment of retrieval criteria based on user-specific attributes or contextual information, significantly improving the accuracy and personalization of retrieved documents.

The notebook will cover the following topics:

1. Configuration of a Bedrock Knowledge Base using sample travel review documents
2. Implementation of dynamic filtering logic utilizing LangChain and Boto3
3. Exploration of different retrieval chain architectures
4. Testing the implementation with various scenarios and result analysis
5. Discussion of best practices and considerations for production deployment

Let's proceed with creating sample documents, setting up our environment, and examining the core concepts that will drive our implementation.

## Documents
### Generation process
The data requiered for this blog is included in this repository. We used Bedrock to generate sample documents and metadata, then saved them to an S3 bucket to create a Knowledge Base. Here is an example of how the data was generated:

1. Using an LLM we generate 5 titles for each travel destination using the following prompt: "Give me 5 article titles for travel review guides about {replace with destination}. The titles should reflect different articles which have recommendations from audiences of various ages and travel companions"
2. Next we use an LLM to generate a document for each title using the following prompt: "Draft an article for the following title, {replace with title from above}, that has a 500 word count organized into 3 paragraphs about {replace with destination}. The article should be a travel review that highlights recommended attractions, restaurants, and sights to see, tips, and name specific top recommendations. Include an exact age of the reviewer in one of the paragraphs in the article."
3. Lastly, use an LLM to generate metadata for each of the documents. The metadata attributes we included were age, desired_destination, stay_duration, activities_interest, companion, travelling_with_children, and travelling_with_pets. Save the metadata as JSON files and save with the identical name as the correlating document to the S3 bucket.

### Example Travel Review Document

**Navigating Amsterdam's Canals: A Senior's Guide to the Venice of the North**

Amsterdam, the capital of the Netherlands, is a city that has long been celebrated for its picturesque canals, rich history, and vibrant culture. Often referred to as the "Venice of the North," this charming destination offers a unique blend of old-world charm and modern amenities, making it an ideal destination for seniors seeking a memorable travel experience. In this guide, we'll explore the best ways to navigate Amsterdam's iconic canals and discover the city's hidden gems.

For seniors seeking a leisurely and immersive way to explore Amsterdam's waterways, a canal cruise is an absolute must. These guided tours offer a unique perspective on the city's architecture, history, and daily life. Many operators offer specialized tours catering to seniors, with comfortable seating, audio commentary, and even refreshments on board. One of the most popular options is the hop-on, hop-off canal boat tour, which allows you to explore at your own pace, disembarking at various points of interest along the way.

Beyond the canals, Amsterdam boasts a wealth of cultural attractions that are sure to captivate seniors with diverse interests. The Rijksmuseum, one of the world's most renowned art museums, houses an impressive collection of Dutch Golden Age masterpieces, including works by Rembrandt and Vermeer. For those with a passion for history, the Anne Frank House offers a poignant glimpse into the life of the young diarist during World War II. Additionally, the Van Gogh Museum showcases an extensive collection of the iconic artist's works, providing a fascinating insight into his life and creative genius.

When it comes to accommodations, Amsterdam offers a range of options to suit every preference and budget. For seniors seeking luxury and comfort, the city boasts several high-end hotels nestled along the picturesque canals, offering stunning views and exceptional service. Alternatively, charming boutique hotels and cozy bed-and-breakfasts provide a more intimate and authentic experience, often housed in historic buildings with modern amenities.

Beyond the traditional tourist attractions, Amsterdam offers a wealth of activities and experiences tailored to seniors. Foodies will delight in the city's culinary scene, with a diverse array of restaurants serving up traditional Dutch fare as well as international cuisines. For those seeking a more relaxed pace, the city's numerous parks and gardens provide tranquil oases for leisurely strolls or picnics. Additionally, Amsterdam's vibrant cultural calendar features a variety of festivals, concerts, and exhibitions throughout the year, ensuring there's always something new to discover.

In conclusion, Amsterdam's canals, rich history, and diverse attractions make it a truly captivating destination for seniors seeking a memorable travel experience. With its accessible transportation options, comfortable accommodations, and a wealth of activities catering to various interests, the "Venice of the North" offers a perfect blend of relaxation and exploration. Whether you're a history buff, art enthusiast, or simply seeking a charming and picturesque escape, Amsterdam's canals and cobblestone streets are sure to leave a lasting impression.


### Metadata JSON for Document above

{
    "metadataAttributes": {
        "age": [
            "53",
            "54",
            "55",
            "56",
            "57",
            "58",
            "59",
            "60",
            "61",
            "62",
            "63"
        ],
        "desired_destination": "Amsterdam, Netherlands",
        "stay_duration": "unknown",
        "preferred_month": [
            "unknown"
        ],
        "activities_interest": [
            "museum",
            "park",
            "restaurant",
            "bar",
            "tour",
            "exhibit",
            "festival",
            "show",
            "attractions",
            "cultural attractions",
            "picnic",
            "canal cruise",
            "gardens"
        ],
        "companion": "couples",
        "travelling_with_children": "no",
        "travelling_with_pets": "no"
    }
}

# Configure Bedrock Knowledge Base

You will need a Bedrock Knowledge Base populated with documents and metadata. If you are not sure how to create that please reference [this documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/s3-data-source-connector.html). There are example documents and metadata in the repo for you to [upload](https://docs.aws.amazon.com/bedrock/latest/userguide/s3-data-source-connector.html) in the console.

# Libraries

In [1]:
!pip install langchain_aws==0.1.17 pandas==2.2.2


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [1]:
import json
import copy
import logging
from operator import itemgetter
from pprint import pprint
from typing import Any, Dict, List

import boto3
from langchain_aws import AmazonKnowledgeBasesRetriever
from langchain_core.documents import Document
from langchain_core.runnables import Runnable, RunnableConfig, RunnableLambda
from langchain_core.prompts import PromptTemplate
import pandas as pd

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Environment Variables

In [52]:
# must set the first two or the last 3
# %env AWS_PROFILE=
# %env AWS_DEFAULT_REGION=


# %env AWS_ACCESS_KEY_ID=
# %env AWS_SECRET_ACCESS_KEY=
# %env AWS_SESSION_TOKEN=

In [None]:
# Test AWS connection
# Create a session using your AWS credentials
session = boto3.Session()

# Create an STS client
sts_client = session.client('sts')

# Get the caller identity
response = sts_client.get_caller_identity()

# Print the response
pprint(response)

knowledge_base_id = 'XXXXXXXX'

retrieval_config = {
    "vectorSearchConfiguration": {
        "numberOfResults": 4,
        "overrideSearchType": "HYBRID"
    }
}

In [None]:
# Test bedrock knowledge bases connection
client = boto3.client('bedrock-agent-runtime')
# print(client.list_agent_knowledge_bases())

response = client.retrieve(
    knowledgeBaseId=knowledge_base_id,
    retrievalConfiguration=retrieval_config,
    retrievalQuery={"text": "Hello world"}
)

pprint(response)

# Other Components of Chain

In [43]:
def create_query_for_retrieval() -> PromptTemplate:
        """
        Formats the user's history and questionnaire according to what the retriever expects.

        Args:
            inputs (dict): The user's history, questionnaire answers, and (if applicable) messages.

        Returns:
            str: The formatted query string.
        """
        def create_base_prompt(inputs):
            logger.info("Creating base prompt")
            user_data = inputs["user_data"]
            return json.dumps(user_data)
        query = RunnableLambda(create_base_prompt)
       
        return query

# Dynamic Filter

## Dynamically Replace Values

In [44]:
def setup_retrieval_config(inputs):

    # need to make a copy because the filter is updated dynamically based on the user_data, so need to start from the original every time
    local_retrieval_config = copy.deepcopy(retrieval_config)

    updated_vector_search_config = replace_values(local_retrieval_config["vectorSearchConfiguration"], inputs["user_data"])
    local_retrieval_config["vectorSearchConfiguration"] = updated_vector_search_config
    print(local_retrieval_config)

    return local_retrieval_config

def replace_values(vector_search_config: Dict, user_data: Dict):
    # replace value fields in the filter with the correct value according to the user_data
    # uses breadth first search to find all of the value fields

    # filter is not a required key, if you do not want any filters get rid of the key
    if "filter" in vector_search_config and not vector_search_config["filter"]:
        del vector_search_config["filter"]

    # recursively traverse from the root
    elif 'filter' in vector_search_config:
        vector_search_config['filter'] = replace_values(vector_search_config['filter'], user_data)

    # at a node that is not the root
    else:
        for key, value in vector_search_config.items():
            if isinstance(value, dict):

                # at a leaf e.g. {"key": "age", "value": ""}}
                if 'key' in value and 'value' in value:

                    # only overwrite value['value'] that are not unknown
                    if value['key'] in user_data and not (value["value"] == "unknown" or value["value"] == ["unknown"]):

                        # primitive data type
                        if type(value["value"]) in [str, int, float, bool]:
                            value['value'] = user_data[value['key']]

                        # list data type
                        elif isinstance(value["value"], list):
                            value['value'] = [user_data[value['key']]]
                        else:
                            raise ValueError(f"Unsupported value['value'] type {type(value['value'])}")
                else:
                    vector_search_config[key] = replace_values(value, user_data)

            # recurse on each item in the list
            elif isinstance(value, list):
                vector_search_config[key] = [replace_values(item, user_data) for item in value]
            else:
                raise ValueError(f"Unsupported value type {type(value)}")

    return vector_search_config

## Option 1: Instantiate Retriever Again with Each Invoke

In [45]:
from langchain_core.runnables import RunnableParallel

def create_retrieval_chain() -> Runnable:
        """
        Creates a retrieval chain for the retriever.

        Returns:
            Runnable: The retrieval chain.
        """

        query = create_query_for_retrieval()


        def create_retriever(inputs):
            # This wrapper is necessary because if you return a callable object LangChain will automatically call it immediately, which is not the desired behavior
            # instead we want to call the retriever on the next step of the chain
            retriever_wrapper = {"retriever": AmazonKnowledgeBasesRetriever(knowledge_base_id=knowledge_base_id, retrieval_config=inputs["retrieval_config"])}
            print(inputs["retrieval_config"])
            return retriever_wrapper

        # retrieval chain has three steps: (1) create the filter based off of the user data, (2) create the retriever, and (3) invoke the retriever
        retrieval_chain = (
            RunnableParallel({
                "user_data" : itemgetter("user_data"),
                "retrieval_config" : lambda inputs: setup_retrieval_config(inputs)
            }) |
            {
                "query" : query,
                "retriever" : lambda inputs: create_retriever(inputs)
            } |
            RunnableLambda(lambda inputs: inputs["retriever"]["retriever"].invoke(inputs["query"]))
        )
        return retrieval_chain

    

retrieval_chain_1 = create_retrieval_chain()

## Option 2: Instantiate Retriever Once

In [46]:
retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=knowledge_base_id,
    retrieval_config=retrieval_config
)

def create_retrieval_chain() -> Runnable:
        """
        Creates a retrieval chain for the retriever.

        Returns:
            Runnable: The retrieval chain.
        """

        query = create_query_for_retrieval()
        
        def retrieve_and_format(inputs):
            results = retriever.client.retrieve(
                retrievalQuery={"text": inputs["query"]}, 
                knowledgeBaseId=knowledge_base_id, 
                retrievalConfiguration=inputs["retrieval_config"]
            )
        
            documents = []
            for result in results["retrievalResults"]:
                metadata = {
                    "location": result["location"],
                    "source_metadata": result["metadata"],
                    "score": result["score"],
                }

                document = Document(
                    page_content=result["content"]["text"],
                    metadata=metadata
                )
                documents.append(document)
            
            return documents

        retrieval_chain = (
            {
                "query" : query,
                "retrieval_config" : lambda inputs: setup_retrieval_config(inputs)
            } |
            RunnableLambda(lambda inputs: retrieve_and_format(inputs))
            # RunnableLambda(lambda inputs: retriever.client.retrieve(retrievalQuery={"text": inputs["query"]}, knowledgeBaseId=knowledge_base_id, retrievalConfiguration=inputs["retrieval_config"]))
        )
        return retrieval_chain

retrieval_chain_2 = create_retrieval_chain()

# Test Code

## Download User Data

In [47]:
with open("data/user_data.json", "r") as file:
    user_data = json.load(file)

print(json.dumps(user_data[:2], indent=2))

[
  {
    "trip_id": 1,
    "desired_destination": "Amsterdam, Netherlands",
    "stay_duration": 7,
    "age": 35,
    "gender": "male",
    "companion": "solo",
    "travelling_with_children": "no",
    "travelling_with_pets": "no"
  },
  {
    "trip_id": 2,
    "desired_destination": "Paris, France",
    "stay_duration": 5,
    "age": 28,
    "gender": "female",
    "companion": "solo",
    "travelling_with_children": "no",
    "travelling_with_pets": "yes"
  }
]


## Run Filters

In [48]:
filters_to_test: List = [
    {
        "andAll": [
            {
                "equals": {
                    "key": "desired_destination",
                    "value": "<UNKNOWN>"  # This will be overwritten with appropriate values at runtime
                }
            },
            {
                "equals": {
                    "key": "travelling_with_children",
                    "value": "<UNKNOWN>"  # This will be overwritten with appropriate values at runtime
                }
            }
        ]
    },
    None
]


In [49]:
retrieval_chains = [retrieval_chain_1, retrieval_chain_2]

results = []

for retrieval_chain_id, retrieval_chain in enumerate(retrieval_chains):
    logger.info(retrieval_chain_id)
    # Loop through each filter options
    for filter in filters_to_test:
        retrieval_config["vectorSearchConfiguration"]["filter"] = filter
        # Loop through each user data entry
        for user_entry in user_data:
            inputs = {
                    "user_data": user_entry,
                    "retrieval_config": retrieval_config
                }

            # Run the retrieval chain with the current user entry
            try:
                result = retrieval_chain.invoke(inputs)
                # print(f"Result for user entry {user_entry['trip_id']}: {result}")
                results.append(({'retrieval_chain_id': retrieval_chain_id, 'filter': bool(filter), 'user': user_entry, 'documents': result}))

            except Exception as e:
                print(f"Error during retrieval for user entry {user_entry['trip_id']}: {e}")



2024-11-07 15:40:46,911 - INFO - 0
2024-11-07 15:40:46,926 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Amsterdam, Netherlands'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Amsterdam, Netherlands'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:40:47,692 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Paris, France'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Paris, France'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:40:48,676 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Prague, Czech Republic'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Prague, Czech Republic'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:40:49,394 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Vienna, Austria'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Vienna, Austria'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:40:50,134 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Tokyo, Japan'}}, {'equals': {'key': 'travelling_with_children', 'value': 'yes'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Tokyo, Japan'}}, {'equals': {'key': 'travelling_with_children', 'value': 'yes'}}]}}}


2024-11-07 15:40:50,869 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Buenos Aires, Argentina'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Buenos Aires, Argentina'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:40:51,625 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Lisbon, Portugal'}}, {'equals': {'key': 'travelling_with_children', 'value': 'yes'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Lisbon, Portugal'}}, {'equals': {'key': 'travelling_with_children', 'value': 'yes'}}]}}}


2024-11-07 15:40:52,359 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Istanbul, Turkey'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Istanbul, Turkey'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:40:53,115 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Barcelona, Spain'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Barcelona, Spain'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:40:53,848 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Rome, Italy'}}, {'equals': {'key': 'travelling_with_children', 'value': 'yes'}}]}}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Rome, Italy'}}, {'equals': {'key': 'travelling_with_children', 'value': 'yes'}}]}}}


2024-11-07 15:40:54,600 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:40:55,336 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:40:56,097 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:40:56,807 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:40:57,555 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:40:58,294 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:40:59,028 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:40:59,747 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:00,462 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:01,199 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}
{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:01,982 - INFO - 1
2024-11-07 15:41:01,989 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Amsterdam, Netherlands'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:41:02,338 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Paris, France'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:41:02,586 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Prague, Czech Republic'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:41:02,789 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Vienna, Austria'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:41:03,030 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Tokyo, Japan'}}, {'equals': {'key': 'travelling_with_children', 'value': 'yes'}}]}}}


2024-11-07 15:41:03,280 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Buenos Aires, Argentina'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:41:03,537 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Lisbon, Portugal'}}, {'equals': {'key': 'travelling_with_children', 'value': 'yes'}}]}}}


2024-11-07 15:41:03,770 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Istanbul, Turkey'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:41:04,059 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Barcelona, Spain'}}, {'equals': {'key': 'travelling_with_children', 'value': 'no'}}]}}}


2024-11-07 15:41:04,310 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID', 'filter': {'andAll': [{'equals': {'key': 'desired_destination', 'value': 'Rome, Italy'}}, {'equals': {'key': 'travelling_with_children', 'value': 'yes'}}]}}}


2024-11-07 15:41:04,552 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:04,816 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:05,102 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:05,357 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:05,584 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:05,831 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:06,080 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:06,322 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:06,565 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


2024-11-07 15:41:06,832 - INFO - Creating base prompt


{'vectorSearchConfiguration': {'numberOfResults': 4, 'overrideSearchType': 'HYBRID'}}


# Summary of Results

In [50]:
# Initialize lists to store formatted data
retrieval_approaches = []
filters = []
trip_ids = []
destinations = []
page_contents = []
metadata = []

# Process each entry
for result in results:
    retrieval_approach = f'Option_{result["retrieval_chain_id"]}'
    filter = result["filter"]
    trip_id = result["user"]["trip_id"]
    destination = result["user"]["desired_destination"]

    for document in result['documents']:
        retrieval_approaches.append(retrieval_approach)
        filters.append(filter)
        trip_ids.append(trip_id)
        destinations.append(destination)
        page_contents.append(document.page_content)
        metadata.append(document.metadata)
            


# Create a DataFrame
df = pd.DataFrame({
    'Retrieval Approach': retrieval_approaches,
    'Filter': filters,
    'Trip ID': trip_ids,
    'Destination': destinations,
    'Page Content': page_contents,
    'Metadata': metadata
})

# Save the results to a CSV file
df.to_csv('data/output.csv', index=False)
output_df = pd.read_csv(f"data/output.csv")
print(output_df.shape)
output_df.head()

(112, 6)


Unnamed: 0,Retrieval Approach,Filter,Trip ID,Destination,Page Content,Metadata
0,Option_0,True,2,"Paris, France","As a 70-year-old retiree, I recently had the p...",{'location': {'s3Location': {'uri': 's3://know...
1,Option_0,True,2,"Paris, France",I am 45 years old and I am a pet owner. I rece...,{'location': {'s3Location': {'uri': 's3://know...
2,Option_0,True,2,"Paris, France","As a 35-year-old traveling with my two dogs, I...",{'location': {'s3Location': {'uri': 's3://know...
3,Option_0,True,2,"Paris, France",If you are looking for something a little more...,{'location': {'s3Location': {'uri': 's3://know...
4,Option_0,True,5,"Tokyo, Japan","As a 35-year-old parent of two young children,...",{'location': {'s3Location': {'uri': 's3://know...


In [51]:
# Analyzing the results, you will see that the first half of the documents are identical to the second half. In addition when metadata filters are not used the documents retrieved are occasionally for the wrong location. For example Trip ID 9 is to Barcelona but retriever pulls documents about Lisbon.

output_df.query('`Trip ID` == 9')

Unnamed: 0,Retrieval Approach,Filter,Trip ID,Destination,Page Content,Metadata
8,Option_0,True,9,"Barcelona, Spain","In conclusion, Barcelona is a city that has so...",{'location': {'s3Location': {'uri': 's3://know...
9,Option_0,True,9,"Barcelona, Spain","As a 65-year-old retiree, I recently visited B...",{'location': {'s3Location': {'uri': 's3://know...
10,Option_0,True,9,"Barcelona, Spain","As a 30-year-old pet owner, I recently visited...",{'location': {'s3Location': {'uri': 's3://know...
11,Option_0,True,9,"Barcelona, Spain","As a couple in our early thirties, we recently...",{'location': {'s3Location': {'uri': 's3://know...
48,Option_0,False,9,"Barcelona, Spain","{ ""metadataAttributes"": { ""age"": [ ...",{'location': {'s3Location': {'uri': 's3://know...
49,Option_0,False,9,"Barcelona, Spain","As a 35-year-old mother of two young children,...",{'location': {'s3Location': {'uri': 's3://know...
50,Option_0,False,9,"Barcelona, Spain","As a 65-year-old retiree, I recently visited B...",{'location': {'s3Location': {'uri': 's3://know...
51,Option_0,False,9,"Barcelona, Spain","As a 25-year-old solo traveler, I recently vis...",{'location': {'s3Location': {'uri': 's3://know...
64,Option_1,True,9,"Barcelona, Spain","In conclusion, Barcelona is a city that has so...",{'location': {'s3Location': {'uri': 's3://know...
65,Option_1,True,9,"Barcelona, Spain","As a 65-year-old retiree, I recently visited B...",{'location': {'s3Location': {'uri': 's3://know...


# Cleanup

To avoid incurring additional charges be sure to delete your [Bedrock Knowledge Base](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-manage.html#kb-delete) and underlying [S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html).