# Reranker Evaluation with Atomic Node Indexing and Arize

Before running this notebook, make sure to run the Airflow DAG atomic_nodes_create_and_join. This sets up the edges between the updated chunk and the new granular atomic index.

You'll also need Arize installed to monitor calls to Bedrock.

This notebook includes a reranker, and we compare performance on a set of questions—with and without the reranker.

In [None]:
!pip install langchain_huggingface
!pip install boto3
!pip install arize-phoenix-otel
!pip install openinference-instrumentation-bedrock opentelemetry-exporter-otlp
!pip install pandas

In [1]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.graphs import Neo4jGraph
import os
import json
import boto3
from neo4j_functions import neo4j
from sentence_transformers import CrossEncoder
import numpy as np

  neo4j = Neo4jGraph(
  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import sys
# Use os.getcwd() since __file__ is not available in interactive environments
current_dir = os.getcwd()

# If your structure is such that the package is in the parent directory, compute the parent directory:
parent_dir = os.path.abspath(os.path.join(current_dir, '..'))

# Add the parent directory to sys.path if it's not already there
if parent_dir not in sys.path:
    sys.path.insert(0, parent_dir)

## Connect to Arize

In [3]:
from opentelemetry.trace import get_tracer_provider
from phoenix.otel import register

In [4]:
from phoenix.otel import register

tracer_provider = register(
  project_name="testing Atomic Search", # Default is 'default'
  endpoint="http://phoenix:6006/v1/traces",
)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: testing Atomic Search
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: http://phoenix:6006/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



In [5]:
from openinference.instrumentation.bedrock import BedrockInstrumentor
BedrockInstrumentor().instrument(tracer_provider=tracer_provider,
                                capture_response_body=True  # Enable response capture
                                )

## AWS Session

In [6]:
# Replace this with updated keys if necessary
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")

session = boto3.Session(
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
bedrock_runtime = session.client("bedrock-runtime", region_name="us-east-1")

In [7]:
def get_mixtral_kwargs(prompt):
    kwargs = {
        "modelId": "mistral.mixtral-8x7b-instruct-v0:1",
        "contentType": "application/json",
        "accept": "*/*",
        "body": json.dumps(
            {
                "prompt": prompt,
                "max_tokens": 4096,
                "temperature": 0.5,
                "top_p": 0.9,
                "top_k": 50,
            }
        ),
    }
    return kwargs

In [8]:
def get_response(prompt):
    kwargs = get_mixtral_kwargs(prompt)
    response = bedrock_runtime.invoke_model(**kwargs)
    response_body = json.loads(response.get("body").read())
    return response_body["outputs"][0]["text"]

In [33]:
def get_response_with_tracking(prompt, rerank=False):
    search_type = ''
    if rerank:
        search_type = "Rerank"
    with get_tracer_provider().get_tracer(__name__).start_as_current_span(f"mixtral_{search_type}") as span:
        # Convert nested structure to flat attributes with dot notation
        #span.set_attribute("llm.model_name", "anthropic.claude-3-5-sonnet-20240620-v1:0")
        #span.set_attribute("llm.token_count.prompt", len(prompt.split()))
        #span.set_attribute("llm.invocation_parameters", get_response(prompt))
        
        try:
            span.set_attribute("input.value", prompt)
            # Get response using original function
            output = get_response(prompt)
                     
            # Set output as string
            span.set_attribute("output.value", output if output else "None")
        
            span.set_attribute("task.rerank", rerank)
            
            # Set span kind as string
            span.set_attribute("openinference.span.kind", "LLM")
            
            return output
            
        except Exception as e:
            # Log error as flat strings
            span.set_attribute("error.message", str(e))
            span.set_attribute("error.type", e.__class__.__name__)
            raise

## Load Embeddings

In [34]:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

In [35]:
#### Adjust your question here ####
# question = "What is the meaning of a yellow curb?"
# question = "Do I need to wear a seatbelt in BC?"
question = "testing"

## Load Cross Encoder

In [36]:
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

## Create Prompt

In [37]:
def create_prompt(query: str, context_str: str) -> str:
    """
    Generate a response using the given context.
    """
    messages = f"""
            You are a helpful and knowledgeable assistant. Use the following information to answer the user's question accurately and concisely. Do not provide information that is not supported by the given context or chat history.

            - Use the context to form your answer.
            - Laws and Acts can be used interchangeably.
            - If the answer is not found in the context, state that you don't know.
            - Do not attempt to fabricate an answer.

            Context: 
            {context_str}

            Question: 
            {query}

            Provide the most accurate and helpful answer based on the information above. If no answer is found, state that you don't know.
            In your responses, include references to where this piece of information came from. A reference will look like (Document Title, Section, Subsection, Paragraph, Subparagraph)
            Not all references will have data for all these fields. The order should always be Document Title, Section, Subsection, Paragraph, Subparagraph.
            Include nothing else in the reference.
            If you are not confident about what the reference should be, don't include it.
        """
    return messages

In [38]:
query_embeddings = embeddings.embed_query(question)

# This vector query grabs a lot of connected nodes.
# It first does the semantic vector search on the index for UpdatedChunk nodes.
# It then finds the corresponding atomic section node, then pulls it and all its children
vector_search_query = """
        CALL db.index.vector.queryNodes($index_name, $top_k, $question) 
        YIELD node, score
        OPTIONAL MATCH (node)-[:IS]-(atomicSection)
        OPTIONAL MATCH (atomicSection)-[:CONTAINS*]->(containedNode)
        OPTIONAL MATCH (containedNode)-[:NEXT*]->(nextNode)
        OPTIONAL MATCH (containedNode)-[:REFERENCE]->(refNode)
        RETURN 
            node.ActId,  node.RegId as Regulations, node.sectionId, node.sectionName, node.url,  node.text AS text,
            atomicSection,
            score, 
            collect(DISTINCT {containedProperties: properties(containedNode)}) AS containedNodes,
            collect(DISTINCT {referenceProperties: properties(refNode)}) AS referencedNodes,
            collect(DISTINCT {nextProperties: properties(nextNode)}) AS nextNodes
        ORDER BY score DESC
        """

NEO4J_VECTOR_INDEX = "Acts_Updatedchunks"

similar = neo4j.query(
    vector_search_query,
    params={
        "question": query_embeddings,
        "index_name": NEO4J_VECTOR_INDEX,
        "top_k": 10,
    },
)

## Run with different questions

In [39]:
#question = "what are Coordination agreements under the Emergency and Disaster Management Act"
#question = "What is the fine for excessive speeding?"
question = "please explain all the possible actions a complainant under part 11 of the police act can perform"
#question = "is there any legislation that creates navigator or liaison roles to help people submit complaints or get assistance through a complaint or hearing process?"
#question = "are there regulations that require companies to provide their notice of articles with shareholder information"
#question = "Can I put my mattress in a landfill?"
#question = "What projects does the Environmental Assessment Act regulate?"
#question = "What exactly is a limitation act? if no jdugement is passed but there is a court case can it be there indeifintely?"
#question = "What permits do pulp mills need?"

In [40]:
query_embeddings = embeddings.embed_query(question)

In [41]:
similar = neo4j.query(
    vector_search_query,
    params={
        "question": query_embeddings,
        "index_name": NEO4J_VECTOR_INDEX,
        "top_k": 10,
    },
)
#similar

In [47]:
search_results = similar
query = question
pairs = [[query, doc['text']] for doc in search_results]
scores = cross_encoder.predict(pairs)

In [49]:
reranked_similar = []
for o in np.argsort(scores)[::-1]:
    reranked_similar.append(search_results[o])

In [50]:
#reranked_similar
#len(reranked_similar)
#similar

### With Rerank

In [51]:
prompt_reranked = create_prompt(question, reranked_similar[0:5])
#bedrock_response = get_response(prompt_reranked)
bedrock_response =  get_response_with_tracking(prompt, rerank=True)
print(bedrock_response.strip())

As a complainant under Part 11 of the Police Act, there are several actions you can perform:

            1. Make a complaint: You can make a complaint about the conduct of a member or former member of a municipal police department (Police Act, Part 11, Division 3, Section 78).

            2. Receive acknowledgement: Upon making a complaint, the member or designated individual receiving the complaint must provide you with written acknowledgement of its receipt (Police Act, Part 11, Division 3, Section 80).

            3. Request assistance: The member or designated individual receiving the complaint must provide you with any assistance that you require in making the complaint (Police Act, Part 11, Division 3, Section 80).

            4. Receive information or advice: The member or designated individual receiving the complaint must provide you with any information or advice that may be required under the guidelines prepared by the police complaint commissioner (Police Act, Part 11, D

### Without Rerank

In [52]:
prompt = create_prompt(question, similar)
#bedrock_response = get_response(prompt)
bedrock_response =  get_response_with_tracking(prompt)
print(bedrock_response.strip())

As a complainant under Part 11 of the Police Act, you can perform the following actions:

            1. Make a complaint about the conduct of a member or former member of a municipal police department. (Police Act, Section 78, Paragraph 1)
            2. If the complaint is made to a member or designated individual under section 78 (2) (b), you can request assistance in making the complaint, receive information or advice required under the guidelines, provide any necessary information, and receive a copy of the police complaint commissioner's list of support groups and neutral dispute resolution service providers and agencies. (Police Act, Section 80)
            3. If the complaint is made to the police complaint commissioner directly or received a copy or record of a complaint from a member or designated individual, the police complaint commissioner will determine whether the complaint is admissible or inadmissible under this division. (Police Act, Section 82)
            4. If the 

## Testing and stroing the performance in a df

In [54]:
questions = [
    "What are Coordination Agreements under the Emergency and Disaster Management Act?",
    "What is the fine for excessive speeding?",
    "Please explain all the possible actions a complainant under Part 11 of the Police Act can perform.",
    "Is there any legislation that creates navigator or liaison roles to help people submit complaints or get assistance through a complaint or hearing process?",
    "Are there regulations that require companies to provide their notice of articles with shareholder information?",
    "Can I put my mattress in a landfill?",
    "What projects does the Environmental Assessment Act regulate?",
    "What exactly is a Limitation Act? If no judgment is passed but there is a court case, can it be there indefinitely?",
    "What permits do pulp mills need?"
]

In [55]:
# Initialize results list
results = []

In [58]:
import pandas as pd
import numpy as np

for question in questions:
    # Embed and get similar results
    query_embeddings = embeddings.embed_query(question)
    similar = neo4j.query(
        vector_search_query,
        params={
            "question": query_embeddings,
            "index_name": NEO4J_VECTOR_INDEX,
            "top_k": 10,
        },
    )
    search_results = similar

    # ----- With Reranker -----
    pairs = [[question, doc['text']] for doc in search_results]
    scores = cross_encoder.predict(pairs)
    reranked_similar = [search_results[i] for i in np.argsort(scores)[::-1]]

    prompt_reranked = create_prompt(question, reranked_similar[0:5])
    response_rerank = get_response_with_tracking(prompt_reranked, rerank=True).strip()

    # ----- Without Reranker -----
    prompt = create_prompt(question, search_results)
    response_no_rerank = get_response_with_tracking(prompt, rerank=False).strip()

    # Save to results
    results.append({
        "question": question,
        "response_with_rerank": response_rerank,
        "response_without_rerank": response_no_rerank
    })

# Convert to DataFrame
df = pd.DataFrame(results)


In [83]:
# Show full column content in pandas
pd.set_option('display.max_colwidth', 100)
pd.set_option('display.max_rows', 100)
df.head(20)

Unnamed: 0,question,response_with_rerank,response_without_rerank
0,What are Coordination Agreements under the Emergency and Disaster Management Act?,Answer:\n I don't have specific information about Coordination Agreements under the E...,The Emergency and Disaster Management Act does not explicitly define Coordination Agreements. Ho...
1,What is the fine for excessive speeding?,A person who drives a motor vehicle on a highway at a speed greater than 40 km/h over the applic...,Answer: \n A person who drives a motor vehicle on a highway at a speed greater than 4...
2,Please explain all the possible actions a complainant under Part 11 of the Police Act can perform.,Answer: \n A complainant under Part 11 of the Police Act has the following possible a...,A complainant under Part 11 of the Police Act can perform the following actions:\n\n ...
3,Is there any legislation that creates navigator or liaison roles to help people submit complaint...,"Answer:\n Yes, there is legislation that creates navigator or liaison roles to help p...","Answer: \n Yes, there are legislations that create navigator or liaison roles to help..."
4,Are there regulations that require companies to provide their notice of articles with shareholde...,"Based on the provided context, there is no information about regulations that require companies ...","Answer: \n Yes, there are regulations that require companies to provide their notice ..."
5,Can I put my mattress in a landfill?,"Answer: \n Based on the information provided, I don't know if you can put a mattress ...","Answer: \n Based on the information provided, it is not explicitly stated whether mat..."
6,What projects does the Environmental Assessment Act regulate?,Answer:\n The Environmental Assessment Act regulates new projects and modifications t...,Answer: \n The Environmental Assessment Act regulates projects that meet the criteria...
7,"What exactly is a Limitation Act? If no judgment is passed but there is a court case, can it be ...",Answer: \n A Limitation Act is a law that sets time limits on when legal proceedings ...,A Limitation Act is a law that sets time limits on when legal actions can be taken. These time l...
8,What permits do pulp mills need?,"Answer:\n Pulp mills need a removal permit as per the Forest Act, Section 132 (Docume...",Answer: \n Pulp mills need a licence for a pulp mill industrial purpose. This is stat...
