In [15]:
from dotenv import load_dotenv

load_dotenv()


True

In [2]:
!mkdir -p './paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O './paul_graham/paul_graham_essay.txt'

--2025-04-15 22:05:26--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8003::154, 2606:50c0:8001::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘./paul_graham/paul_graham_essay.txt’


2025-04-15 22:05:26 (4.72 MB/s) - ‘./paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [3]:
from llama_index.core import SimpleDirectoryReader


# load documents
documents = SimpleDirectoryReader('./paul_graham/').load_data()
len(documents)

1

In [4]:
from llama_index.core.node_parser import SentenceSplitter

# Initialize the sentence splitter with desired parameters
node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=64)

# Assuming 'documents' is a list of Document objects
nodes = node_parser.get_nodes_from_documents(documents)


In [6]:
len(nodes[0].text.split())


355

In [7]:
from llama_index.vector_stores.deeplake import DeepLakeVectorStore

my_activeloop_id = "charanvardhan"
my_activeloop_dataset = "LlamaIndex_paulgraham_essay"
dataset_path = f"hub://{my_activeloop_id}/{my_activeloop_dataset}"

# Create a DeepLake vector storepip install deeplake[enterprise]
vector_store = DeepLakeVectorStore(
    dataset_path=dataset_path,
    overwrite=False,
)

In [8]:
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(vector_store=vector_store)
storage_context.docstore.add_documents(nodes)

In [42]:
all_node_ids = list(storage_context.docstore.docs.keys())
print(f"Total nodes: {len(all_node_ids)}")
print("First 5 node IDs:", all_node_ids[:5])


Total nodes: 42
First 5 node IDs: ['ce233fec-43f7-4485-a6c2-11c4ebb3711b', '73dbc5e2-1125-4abc-bcb6-156b2eadef87', '273cc0b9-bc0c-4faf-9da6-939b79ca0cac', 'b1e4ada7-9b66-47f0-a531-679d34f6a674', '5fa72a97-3b4d-4730-95d8-b6b8721d3170']


In [45]:
all_nodes = [storage_context.docstore.get_node(node_id) for node_id in all_node_ids]
all_nodes[0].text


'What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.\n\nThe language we used was an early version of Fortran. You had to type programs on punch cards, then stack

In [9]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

In [10]:
query_engine = vector_index.as_query_engine(streaming=True, similarity_top_k=10)

In [11]:
streaming_response = query_engine.query(
    "What does Paul Graham do?",
)
streaming_response.print_response_stream()

Paul Graham is involved in funding startups through a program called the Summer Founders Program, where he invests in and supports young entrepreneurs. He also organizes talks with experts on startups, provides funding to selected groups of founders, and offers guidance and resources to help them succeed. Additionally, he is involved in creating a community of startup founders through his program, which fosters collaboration and support among the participants.

## SubQuestion Query Engine

In [12]:
query_engine = vector_index.as_query_engine(similarity_top_k=10)

In [13]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine

query_engine_tools = [QueryEngineTool
                     (
                        query_engine=query_engine,
                        metadata=ToolMetadata(
                            name="pg_essay",
                            description="Paul Graaham essay on what i worked on"
                        ),
                     ),]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

In [57]:
query_engine

<llama_index.core.query_engine.sub_question_query_engine.SubQuestionQueryEngine at 0x11b1d9c90>

In [16]:
import nest_asyncio
nest_asyncio.apply()


In [17]:
response = query_engine.query(
    "How was Paul Grahams life different before, during, and after YC?"
)

Generated 3 sub questions.
[1;3;38;2;237;90;200m[pg_essay] Q: What did Paul Graham work on before Y Combinator?
[0m[1;3;38;2;90;149;237m[pg_essay] Q: What did Paul Graham work on during Y Combinator?
[0m[1;3;38;2;11;159;203m[pg_essay] Q: What did Paul Graham work on after Y Combinator?
[0m[1;3;38;2;237;90;200m[pg_essay] A: Before Y Combinator, Paul Graham worked on a new version of Arc with Robert in the summer of 2006. They created a faster version of Arc by compiling it into Scheme. As a test for this new Arc, Paul Graham wrote Hacker News, originally intended as a news aggregator for startup founders but later changed to cater to future startup founders and cover topics that engaged intellectual curiosity.
[0m[1;3;38;2;90;149;237m[pg_essay] A: During Y Combinator, Paul Graham worked on various projects such as organizing a Summer Founders Program, investing in startups, and creating a news aggregator called Hacker News.
[0m[1;3;38;2;11;159;203m[pg_essay] A: After Y Combin

In [18]:
print( "The final response :\n", response )

The final response :
 Paul Graham's life involved working on a new version of Arc with Robert before Y Combinator, during which they created a faster version of Arc by compiling it into Scheme and developed Hacker News. During Y Combinator, he focused on organizing a Summer Founders Program, investing in startups, and creating Hacker News. After Y Combinator, he continued working on a new version of Arc with Robert, compiling it into Scheme, and using it to create Hacker News, which evolved to cater to future startup founders and topics of intellectual curiosity.
