## Dynamic Metadata Filtering for Knowledge Bases

> Source: aws-samples complete [notebook here](https://github.com/aws-samples/rag-workshop-amazon-bedrock-knowledge-bases/blob/main/03-advanced-concepts/dynamic-metadata-filtering/dynamic-metadata-filtering-KB.ipynb)

In [1]:
import json
import boto3

In [2]:
# Session init
session = boto3.session.Session()
region = session.region_name
bedrock = boto3.client("bedrock-runtime", region_name=region)
bedrock_agent_runtime = boto3.client("bedrock-agent-runtime")

MODEL_ID = "anthropic.claude-3-haiku-20240307-v1:0" 

## Implement Entity Extraction using Tool Use
We'll define a tool for entity extraction with very basic instructions and use it with Amazon Bedrock:


In [3]:
subcategory_filters = [
        "3-D Puzzles", "Accessories", "Action Man", "Activity Centres", "Alternative Medicine",
        "Art & Craft Supplies", "Art Sand", "BRIO", "Banners, Stickers & Confetti", "Barbie",
        "Baskets & Bins", "Beach Toys", "Bikes, Trikes & Ride-ons", "Blackboards", "Board Games",
        "Bob the Builder", "Boxes & Organisers", "Braces, Splints & Slings", "Brain Teasers", "Card Games",
        "Casino Equipment", "Charms", "Chess", "Children's Bedding", "Children's Chalk",
        "Children's Craft Kits", "Chocolate", "Climbing Frames", "Clothing & Accessories", "Collectible Figures & Memorabilia",
        "Colouring Pencils", "Colouring Pens & Markers", "Costumes", "Cowboys & Indians", "Crayola",
        "Cup & Ball Games", "DVD Games", "Darts & Accessories", "Decorations", "Decorative Accessories",
        "Desk Accessories & Storage Products", "Dice & Dice Games", "Digital Cameras", "Dinosaurs", "Disney",
        "Doll Making", "Dolls' House Dolls & Accessories", "Dominoes & Tile Games", "Drawing & Painting Supplies", "Drinking Games",
        "Early Learning Centre", "Educational Computers & Accessories", "Educational Games", "Emergency Services", "Erasers & Correction Supplies",
        "Erotic Clothing", "Farm & Animals", "Fashion Dolls & Accessories", "Felt Kits", "Finger Puppets",
        "Football", "Frame Jigsaws", "Garden Tools", "Greenhouses & Plant Germination Equipment", "Guitars & Strings",
        "Hand Puppets", "Hand Tools", "Harry Potter", "Hasbro", "Hornby",
        "Instruments", "Invitations", "Jigsaw Accessories", "Jigsaws", "Kid Venture",
        "Kids Remote & App Controlled Toys", "Kids'", "Kitchen Tools & Gadgets", "Kites & Flight Toys", "Knights & Castles",
        "Lab Instruments & Equipment", "Labels, Index Dividers & Stamps", "LeapFrog", "Learning & Activity Toys", "Literacy & Spelling",
        "Markers & Highlighters", "Marvin's Magic", "Mathematics", "Military", "Model Building Kits",
        "Model Trains & Railway Sets", "Mystery Games", "Novelty", "Pain & Fever", "Painting By Numbers",
        "Paper & Stickers", "Party Bags", "Party Favours", "Party Tableware", "Pencils",
        "Pens & Refills", "Pianos & Keyboards", "Pirates", "Play Tools", "Playsets",
        "Pushchair Toys", "Racket Games", "Rattles", "Ravensburger", "Remote Controlled Devices",
        "Robots", "Rockers & Ride-ons", "Rocking Horses", "Sandwich Spreads, Pates & Pastes", "Schoolbags & Backpacks",
        "Science Fiction & Fantasy", "Seasonal Décor", "Shops & Accessories", "Sleeping Gear", "Slot Cars, Race Tracks & Accessories",
        "Soft Dolls", "Sorting, Stacking & Plugging Toys", "Sound Toys", "Specialty & Decorative Lighting", "Spinning Tops",
        "Sport", "Star Wars", "Strategy Games", "Tabletop & Miniature Gaming", "Target Games",
        "Teaching Clocks", "Thomas & Friends", "Thunderbirds", "Tomy", "Tops & T-Shirts",
        "Toy Story", "Toy Trains & Accessories", "Toy Vehicle Playsets", "Toy Vehicles & Accessories", "Trading Cards & Accessories",
        "Transportation & Traffic", "Travel & Pocket Games", "Trivia & Quiz Games", "Upstarts", "VTech",
        "WWE", "Wind & Brass", "Winnie-the-Pooh", "others"
    ]

## Tool Definition

In [6]:
# Get all categories from Database; i.e. refresh once a day from DB.
product_list = ",".join(str(x) for x in subcategory_filters)

# Define Tool specs
tool_name = "get_category"
tool_description = f"""Extract the only main category (just one) from the text, using only the following official categories: {product_list}. 
    If you cannot find the entity category value, use 'others'. If the question is not related to an ecommerce store, use 'unknown'.
    """

# What we need to extract
tool_get_category = ["Product_Category"]
tool_extract_property = ["entities"]
tool_entity_description = {
    "Product_Category": {"type": "string", "description": "The product category of the product, fetched from the official categories."}
}

# All Tool properties
tool_properties = {
    'tool_name':tool_name,
    'tool_description':tool_description,
    'tool_get_category':tool_get_category,
    'tool_extract_property':tool_extract_property,
    'tool_entity_description': tool_entity_description
}

In [7]:
# Define the tool specification
toolSpec = [{
        "toolSpec": {
            "name": tool_properties['tool_name'],
            "description": tool_properties['tool_description'],
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "entities": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": tool_properties['tool_entity_description'],
                                "required": tool_properties['tool_get_category']
                            }
                        }
                    },
                    "required": tool_properties['tool_extract_property']
                }
            }
        }
    }]

In [8]:
def get_category(text, tools):
    """
    Retrieves the product category from the given text using the specified tool properties.

    Args:
        text (str): The input text to be processed.
        tool_properties (dict): A dictionary containing the properties of the tool to be used, such as tool name, description, entity description, and required properties.

    Returns:
        str or None: The product category if found, otherwise None.
    """ 

    # Perform the conversation with the Bedrock model
    response = bedrock.converse(
        modelId=MODEL_ID,
        inferenceConfig={
            "temperature": 0,
            "maxTokens": 4096
        },
        toolConfig={"tools": tools},
        messages=[{"role": "user", "content": [{"text": text}]}]
    )

    # Extract the product category from the response
    product_category = None
    for content in response['output']['message']['content']:
        if "toolUse" in content and content['toolUse']['name'] == "get_category":
            product_category = content['toolUse']['input']
            break

    # Return the product category if found, otherwise print a message and return None
    if product_category:
        return product_category
    else:
        print("No entities found in the response.")
        return None

## Construct Metadata Filter
Now, let's create a function to construct the metadata filter based on the extracted entities:

In [9]:
def construct_metadata_filter(product_category):
    metadata_filter = {"andAll": []}

    if product_category and product_category != 'unknown':
        metadata_filter["andAll"].append({
            "equals": {
                "key": "subcategory_1",
                "value": product_category
            }
        })
    
    else:
        print("Product category is unknown. Skipping metadata filter.")

    return metadata_filter if metadata_filter["andAll"] else None

## Example

In [12]:
user_question="I'm looking for sand to do DIY activities with my son"
user_question="Barbie for Ken with dreamy house"
user_question="Tienes el DVD de Duro de Matar?"
# user_question="Tienes tijeras para cortar el césped?"

In [13]:
extracted_entities = get_category(user_question, toolSpec)
metadata_filter = construct_metadata_filter(extracted_entities)
print('Here is the prepared metadata filters:')
print(json.dumps(metadata_filter, indent=4))

Here is the prepared metadata filters:
{
    "andAll": [
        {
            "equals": {
                "key": "subcategory_1",
                "value": {
                    "entities": [
                        {
                            "Product_Category": "DVD"
                        }
                    ]
                }
            }
        }
    ]
}


---
### Finalmente, podemos llamar la API de Retrieve con los nuevos filtros:
> e.g.
```python
def process_query(text, tool_properties):
    extracted_entities = get_category(text, tool_properties)
    metadata_filter = construct_metadata_filter(extracted_entities)
    
    # Call Bedrock KB with Metadata Filter
    response = bedrock_agent_runtime.retrieve_and_generate(
        knowledgeBaseId=kb_id,
        retrievalConfiguration={
            "vectorSearchConfiguration": {
                "filter": metadata_filter
            }
        },
        retrievalQuery={
            "text": "Tienes tijeras para cortar el césped?"
        },
        modelArn='eu.claude...'
    return response
```
Tal cual nos los solicita la [API de Bedrock](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve_and_generate.html):

![](./images/bedrock_retrieve_api.png)