In [1]:
import os
from typing import Optional

# os.environ['OPEN_AI_KEY'] = 'sk-...'

from utils import *

## 2.1 Sentence Embeddings

We can visualize how adding a word to a sentence changes the embedding.

In [2]:
sentence = "Actuaries fly to CAS conferences to learn about new techniques"

graph = create_new_graph()
color_scale = [
    "#E6E6FA",  # Lavender
    "#D8BFD8",  # Thistle
    "#DDA0DD",  # Plum
    "#DA70D6",  # Orchid
    "#BA55D3",  # Medium Orchid
    "#9932CC",  # Dark Orchid
    "#9400D3",  # Dark Violet
    "#8A2BE2",  # Blue Violet
    "#800080",  # Purple
    "#4B0082",  # Indigo
]  # each one is progressively darker

words = sentence.split()

# Start from 1 to handle color scaling
for ix, word in enumerate(words[1:], 1):
    partial_sentence = " ".join(words[:ix])
    print(f"Partial sentence: {partial_sentence}")

    vector = get_embedding(partial_sentence, dimensions=3)

    add_vector_to_graph(graph, vector, name=partial_sentence, color=color_scale[ix])

Partial sentence: Actuaries
Partial sentence: Actuaries fly
Partial sentence: Actuaries fly to
Partial sentence: Actuaries fly to CAS
Partial sentence: Actuaries fly to CAS conferences
Partial sentence: Actuaries fly to CAS conferences to
Partial sentence: Actuaries fly to CAS conferences to learn
Partial sentence: Actuaries fly to CAS conferences to learn about
Partial sentence: Actuaries fly to CAS conferences to learn about new


In [3]:
# Notice how adding 'CAS' drastically shifts the embedding. You can experiment with adding unexpected words and seeing how the embedding changes.

graph


## 4.1 Full Document Context

In [4]:
doc_path = "./docs/portland-site-assessment-phase-1.pdf"
pdf = openparse.Pdf(doc_path)
parser = openparse.DocumentParser()
parsed_doc = parser.parse(doc_path)

full_doc_str = ""
for node in parsed_doc.nodes:
    full_doc_str += f"<br>{node.text}"

# you can see the output by writing the text to a file, try opening it
with open("full_doc.md", "w") as f:
    f.write(full_doc_str)

In [5]:
question = "Is there lead contamination into the groundwater?"

prompt = prompt_template.format(question=question, context=full_doc_str)

completion = get_completion(prompt)

print("Original Question: ", question)
completion

Completion used 18792 tokens costing $0.08
Original Question:  Is there lead contamination into the groundwater?


Yes, the document indicates that there is lead contamination in the groundwater. It states that groundwater concentrations in excess of the DEQ residential, urban residential, and occupational risk-based concentrations (RBCs) for lead, among other contaminants, have been observed on portions of the property at 400 South 1st Street, St. Helens, Oregon. Specifically, lead was detected in three of four locations assessed in the Lathe Area, though it was not detected above RBCs in samples from the Riverside area.