# AI Filtering Nodes from User Prompt

This workbook is a proof of concept for how an AI can be used to modify Neo4j queries.

The goal is to more accurately return the correct subset of nodes based on the user's query.

If they ask for a specific parameter, such as searching within a single Act or within a timeframe, only those nodes will be included in the similarity comparison. 

The flow of information happens in this order:
1. User's prompt is analyzed by the AI, which extracts key information into a JSON object.
2. That object is then used to build WHERE clauses for the Neo4j query.
3. The query is used to obtain the nodes which most closely resemble the user's query

At this point, the regular workflow where the selected nodes are passed to Bedrock to form the response can continue, but it is not covered in this workbook.

In [None]:
%pip install langchain_community
%pip install neo4j
%pip install sentence-transformers
%pip install transformers
%pip install boto3
%pip install botocore

In [None]:
import os
import json
from langchain_community.graphs import Neo4jGraph
from langchain_community.embeddings import HuggingFaceEmbeddings

# If using bedrock for initial processing
import boto3
from botocore.config import Config


In [None]:
# Standard Bedrock session setup
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")

session = boto3.Session(
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)

# Default retry mode is legacy otherwise
config = Config(
  retries = {
    'max_attempts': 3,
    'mode': 'standard'
  }
)
bedrock_runtime = session.client("bedrock-runtime", region_name="us-east-1", config=config)

In [None]:
# Define access to model in Bedrock
# In this case, claude 3.5 sonnet.
def get_claude_kwargs(prompt):
    kwargs = {
        "modelId": "anthropic.claude-3-5-sonnet-20240620-v1:0",
        "contentType": "application/json",
        "accept": "application/json",
        "body": json.dumps(
            {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 5000,
                "messages": [
                    {"role": "user", "content": [{"type": "text", "text": prompt}]}
                ],
            }
        ),
    }
    return kwargs

In [None]:
# Wrapper to get response from AWS and return only the text content
def get_agent_response(prompt):
    kwargs = get_claude_kwargs(prompt)
    response = bedrock_runtime.invoke_model(**kwargs)
    response_body = json.loads(response.get("body").read())
    return response_body["content"][0]["text"]

In [None]:
# Function that allows for connection to Neo4j
def neo4j():
    NEO4J_URI = "bolt://citz-imb-ai-neo4j-svc:7687"
    NEO4J_USERNAME = os.getenv("NEO4J_USER")
    NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")
    NEO4J_DB = os.getenv("NEO4J_DB")
    conn = Neo4jGraph(
        url=NEO4J_URI,
        username=NEO4J_USERNAME,
        password=NEO4J_PASSWORD,
        database='neo4j',
    )
    return conn

In [None]:
# A function that builds portions of a WHERE clause based on the filter
# To start, only the act_ids metadata is used, but this could be expanded for many parameters.
def build_metadata_filters(metadata_filter):
    filter_list = []
    for key, value in metadata_filter.items():
        match key:
            case "act_ids":
                filter_list.append(f"n.ActId IN {str(value)}")
                break
            case default:
                break
    return filter_list

In [None]:
# Use Neo4j to obtain all possible Acts. This prevents the AI from creating acts from thin air.
db = neo4j()

print(db.query(f"return gds.version()"))

all_acts_query_result = db.query("""
 MATCH (node:UpdatedChunk) 
 RETURN DISTINCT node.ActId
""")

# Only collect the Act names from the returned objects.
valid_act_ids = []
for obj in all_acts_query_result:
    valid_act_ids.append(obj["node.ActId"])

print(valid_act_ids)

In [None]:
# Sample Question
question = "What does the Motor Vehicle Act say about seatbelts?"

# Prompt is designed to produce a usable JSON object.
# Three examples are given to help with accuracy. 
# If additional parameters are needed, this prompt could be altered to include them.
prompt = f"""
You receive user prompts on BC Laws and help identify filtering criteria.
If the user's prompt mentions an Act, identify it within a JSON object in a list called act_ids.
Only return the JSON object in the output.

Examples: 
Question: What does the Motor Vehicle Act say about seatbelts? Output: {{ "act_ids": ["Motor Vehicle Act"] }}
Question: Does the Class Act have any info about chairs? Output: {{ "act_ids": ["Class Act"] }}
Question: I want to know how many M&Ms I can fit in my mouth. Check the Health and Safety Act and the Motor Vehicle Act. Output: {{ "act_ids": ["Health and Safety Act", "Motor Vehicle Act"] }}

These are the valid act IDs: {valid_act_ids}
This is the user's question: {question}
"""
# Get the AI's response, which should just be a JSON object.
ai_response = get_agent_response(prompt)
print(ai_response)

In [None]:
# Try/except here in case the ai_response isn't valid JSON
try:
    metadata_filter = json.loads(ai_response)
    filter_list = build_metadata_filters(metadata_filter)
    joined_filter = "WHERE " + " AND ".join(filter_list) if len(filter_list) > 0 else ""
except:
    joined_filter = ""

print(joined_filter)

In [None]:
# Create embedding of question for similarity search
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
question_embedding = embeddings.embed_query(question)

# Query to get nodes based on similarity. Only considers the nodes defined by the created WHERE clause
get_nodes = f"""
MATCH (n:UpdatedChunk)
{joined_filter}
WITH n, gds.similarity.cosine(n.textEmbedding, {question_embedding}) AS similarity
RETURN n, similarity
ORDER BY similarity DESC
LIMIT 10
"""

result = db.query(get_nodes)
print(result)