In [2]:
from qazure import get_embedder, get_llm

In [11]:
from llama_index.vector_stores.neo4jvector import Neo4jVectorStore
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.settings import Settings

In [7]:
from llama_index.core import StorageContext

In [4]:
llm = get_llm()
embedder = get_embedder()
text_embedding = embedder.get_text_embedding("Hi")

In [12]:
Settings.llm = llm
Settings.embed_model = embedder

In [5]:
username = "neo4j"
password = "password"
url = "bolt://localhost:7687"
embed_dim = len(text_embedding)

In [6]:
neo4j_vector = Neo4jVectorStore(username, password, url, embed_dim)



In [8]:
# load documents
documents = SimpleDirectoryReader(input_files=["paul_graham_essay.txt"]).load_data()

In [14]:
storage_context = StorageContext.from_defaults(vector_store=neo4j_vector)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

query_engine = index.as_query_engine()
response = query_engine.query("What happened at interleaf?")
print(response)



At Interleaf, the company implemented a bold move by adding a scripting language inspired by Emacs, which was a dialect of Lisp. The author worked there but admitted to being a poor employee, as they didn't know C (the primary language used) and were uninterested in learning it. They also found the conventional office hours unnatural and spent much of their time secretly working on their book, *On Lisp*. Despite these challenges, the job paid well, allowing the author to save money, return to RISD, and pay off college loans.

During their time at Interleaf, the author learned several lessons, such as the importance of having product people run technology companies, the drawbacks of excessive code editing by multiple people, the negative impact of depressing office spaces, the superiority of informal conversations over planned meetings, and the risks of relying on bureaucratic customers. They also realized that being the "entry-level" option in a market is advantageous, as the low end o

In [15]:
neo4j_vector_hybrid = Neo4jVectorStore(
    username, password, url, embed_dim, hybrid_search=True
)

storage_context = StorageContext.from_defaults(
    vector_store=neo4j_vector_hybrid
)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
query_engine = index.as_query_engine()
response = query_engine.query("What happened at interleaf?")
print("Response:", response)



Response: At Interleaf, the company had implemented a bold move by adding a scripting language inspired by Emacs, which was a dialect of Lisp. The author worked there as a Lisp programmer but struggled with the job due to a lack of interest in learning C, irresponsibility, and dissatisfaction with conventional office hours. Despite these challenges, the job paid very well, allowing the author to save money, pay off college loans, and return to RISD. During the time at Interleaf, the author learned several lessons about technology companies, such as the importance of being run by product people rather than salespeople, the drawbacks of excessive code editing by multiple people, and the inefficiency of conventional office hours for programming. The most significant lesson learned was the idea that the low end of the market often overtakes the high end, making it advantageous to be the entry-level option. After leaving Interleaf, the author continued doing freelance work for the company a

In [23]:
# Property Graph Index
from llama_index.core import PropertyGraphIndex
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="password",
    url="bolt://localhost:7687",
)

In [27]:
import nest_asyncio
nest_asyncio.apply()

# Extract graph from documents
pg_index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embedder,
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=llm
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 24.19it/s]
Extracting paths from text with schema: 100%|██████████| 21/21 [00:27<00:00,  1.29s/it]
Generating embeddings: 100%|██████████| 3/3 [00:02<00:00,  1.40it/s]
Generating embeddings: 100%|██████████| 15/15 [00:04<00:00,  3.16it/s]


In [29]:
# Define retriever
retriever = pg_index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)
results = retriever.retrieve("What happened at Interleaf and Viaweb?")
for record in results:
    print(record.text)

Trevor Blackwell -> WORKED_ON -> Viaweb
Viaweb -> HAS -> shopping cart
Viaweb -> HAS -> Lisp
Paul Graham -> WORKED_ON -> Viaweb
Robert Morris -> WORKED_ON -> Viaweb
Viaweb -> HAS -> WYSIWYG site builder
Viaweb -> HAS -> web infrastructure
Julian -> WORKED_ON -> Viaweb
Robert -> WORKED_ON -> Viaweb
Viaweb -> HAS -> manager
Interleaf software -> HAS -> scripting language
Robert Morris -> WORKED_ON -> Internet Worm of 1988
Robert Morris -> WORKED_ON -> internet worm of 1988


In [30]:
# Question answering
query_engine = pg_index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))

At Interleaf, the company struggled as the rapid advancements in commodity processors made high-end, specialized hardware and software companies like Interleaf increasingly irrelevant. This was a widespread issue during the 1990s, as Moore's Law drove the obsolescence of such businesses.

At Viaweb, the company was established to develop software for creating online stores. While the initial concept was desktop software, it transitioned into a web application where the software operated on the server, and users interacted with it through their browsers. Viaweb introduced features such as a WYSIWYG site builder, a shopping cart, and tools for managing orders and tracking statistics. Its emphasis on simplicity and affordability helped it gain traction in the early days of ecommerce. The founders, including Paul Graham, Robert Morris, Trevor Blackwell, and Julian, contributed to various aspects of the software, and the company achieved success by responding to user needs and market trends

In [31]:
from typing import Literal
# best practice to use upper-case
entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]

# define which entities can have which relations
validation_schema = {
    "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"],
    "PLACE": ["HAS", "PART_OF", "WORKED_AT"],
    "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"],
}

kg_extractor = SchemaLLMPathExtractor(
    llm=llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    # if false, allows for values outside of the schema
    # useful for using the schema as a suggestion
    strict=True,
)

In [32]:
op_index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[kg_extractor],
    embed_model=embedder,
    property_graph_store=graph_store,
    show_progress=True,
)

Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 13.88it/s]
Extracting paths from text with schema: 100%|██████████| 21/21 [00:28<00:00,  1.38s/it]
Generating embeddings: 100%|██████████| 3/3 [00:02<00:00,  1.44it/s]
Generating embeddings: 100%|██████████| 12/12 [00:02<00:00,  4.95it/s]


In [33]:
# Define retriever
retriever = op_index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)
results = retriever.retrieve("What happened at Interleaf and Viaweb?")
for record in results:
    print(record.text)

# Question answering
query_engine = op_index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))

Trevor Blackwell -> WORKED_ON -> Viaweb
Viaweb -> HAS -> shopping cart
Viaweb -> HAS -> Lisp
Paul Graham -> WORKED_ON -> Viaweb
Robert Morris -> WORKED_ON -> Viaweb
Viaweb -> HAS -> WYSIWYG site builder
Viaweb -> HAS -> web infrastructure
Julian -> WORKED_ON -> Viaweb
Robert -> WORKED_ON -> Viaweb
Viaweb -> HAS -> manager
Viaweb -> HAS -> California
Viaweb -> HAS -> Santa Clara
Y Combinator -> HAS -> Viaweb
Viaweb -> HAS -> New York
Viaweb -> HAS -> Paul Graham
Viaweb -> HAS -> Julian
Viaweb -> PART_OF -> Yahoo
Viaweb -> HAS -> Cambridge
Viaweb -> PART_OF -> ecommerce software startups
Interleaf -> WORKED_WITH -> Intel
Interleaf -> HAS -> Lisp Scripting Language
Interleaf -> PART_OF -> Software for Creating Documents
Paul Graham -> WORKED_ON -> Interleaf
Interleaf -> HAS -> Scripting Language
Paul Graham -> WORKED_AT -> Interleaf
Interleaf software -> HAS -> scripting language
Paul Graham -> WORKED_ON -> Interleaf Lisp Projects
At Interleaf, the company faced challenges as advancements

In [34]:
# Define retriever
retriever = op_index.as_retriever()
results = retriever.retrieve("What happened at Interleaf and Viaweb?")
for record in results:
    print(record.text)

Here are some facts extracted from the provided text:

Trevor Blackwell -> WORKED_ON -> Viaweb

In return for that and doing the initial legal work and giving us business advice, we gave him 10% of the company. Ten years later this deal became the model for Y Combinator's. We knew founders needed something like this, because we'd needed it ourselves.

At this stage I had a negative net worth, because the thousand dollars or so I had in the bank was more than counterbalanced by what I owed the government in taxes. (Had I diligently set aside the proper proportion of the money I'd made consulting for Interleaf? No, I had not.) So although Robert had his graduate student stipend, I needed that seed funding to live on.

We originally hoped to launch in September, but we got more ambitious about the software as we worked on it. Eventually we managed to build a WYSIWYG site builder, in the sense that as you were creating pages, they looked exactly like the static ones that would be generated