# Document Reader using LlamaIndex

In [24]:
pip install llama-index

Note: you may need to restart the kernel to use updated packages.


In [25]:
pip install llama-index-core llama-index-readers-file llama-index-llms-ollama llama-index-embeddings-huggingface

Note: you may need to restart the kernel to use updated packages.


# Loading LLM from Ollama

In [26]:
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings

In [27]:
llm = Ollama(model="llama3", request_timeout=120.0)

# Prompting an LLM

In [28]:
resp = llm.complete("Who is Paul Graham?")

In [29]:
print(resp)

Paul Graham (1946-2011) was a Northern Irish Anglican priest and theologian who played a significant role in the ecumenical movement, particularly in the area of Christian-Islamic dialogue.

Graham was born in Belfast, Northern Ireland. He studied theology at Trinity College, Dublin, and later earned his Ph.D. from the University of Cambridge. He served as a parish priest in various churches in England and Ireland before becoming the Anglican Chaplain to the British Forces in Germany (1974-1980).

Graham was deeply interested in interfaith dialogue and worked extensively with Muslim scholars and leaders. In 1982, he founded the Islamic-Anglican Dialogue, which aimed to promote understanding and cooperation between Muslims and Christians. He also played a key role in establishing the British Council of Churches' Commission on Inter-Faith Relations.

Throughout his career, Graham traveled widely, engaging in ecumenical and interfaith dialogue with people from diverse religious background

In [30]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

In [31]:
print(resp)

assistant: Arrrr, me hearty! Me name be Captain Blackheart Billy, the most feared and infamous pirate to ever sail the Seven Seas! I be a swashbucklin' scoundrel, with a heart o' gold and a spirit o' steel. Me ship be the "Maverick's Revenge," and me crew be the bravest and most loyal buccaneers on the high seas!

Now, what be bringin' ye to these fair waters? Are ye lookin' for adventure, treasure, or maybe just a bit o' pirate lore? Let's hoist the Jolly Roger and set sail fer a tale that'll make yer timbers shiver!


# Load a document

In [32]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-07-02 11:43:31--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: 'data/paul_graham/paul_graham_essay.txt'


2024-07-02 11:43:31 (18.8 MB/s) - 'data/paul_graham/paul_graham_essay.txt' saved [75042/75042]



# Index and Query document

In [33]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

#llm = OpenAI(model="gpt-3.5-turbo")

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

index = VectorStoreIndex.from_documents(documents)

In [34]:
print(index.as_query_engine(llm=llm).query("Who is Joe Biden?"))

There is no mention of Joe Biden in the provided context. The text only talks about Paul Graham, his experiences, and his writings. Therefore, I cannot provide an answer to the query as it is not relevant to the given context.


In [35]:
print(index.as_query_engine(llm=llm).query("Who is Paul?"))

Based on the provided context information, Paul Graham is the author of the essay.


In [37]:
from llama_index.core import PromptTemplate

text_qa_template_str = (
    "Context information is"
    " below.\n---------------------\n{context_str}\n---------------------\nUsing"
    " both the context information and also using your own knowledge, answer"
    " the question: {query_str}\nIf the context isn't helpful, you can also"
    " answer the question on your own.\n"
)
text_qa_template = PromptTemplate(text_qa_template_str)

refine_template_str = (
    "The original question is as follows: {query_str}\nWe have provided an"
    " existing answer: {existing_answer}\nWe have the opportunity to refine"
    " the existing answer (only if needed) with some more context"
    " below.\n------------\n{context_msg}\n------------\nUsing both the new"
    " context and your own knowledge, update or repeat the existing answer.\n"
)
refine_template = PromptTemplate(refine_template_str)

In [38]:
print(
    index.as_query_engine(
        text_qa_template=text_qa_template,
        refine_template=refine_template,
        llm=llm,
    ).query("Who is Joe Biden?")
)

I think there may be some confusion here!

The text provided is an essay written by Paul Graham, a well-known entrepreneur, investor, and writer. There is no mention of Joe Biden in this essay.

So, I won't be able to provide an answer to who Joe Biden is based on the context information you've provided. However, if you're interested, Joe Biden is the 46th Vice President of the United States, serving from 2009 to 2017 under President Barack Obama. He has also been a U.S. Senator from Delaware since 1973 and was a Democratic candidate for the presidential nomination in the 2020 election.


# Storing an index into memory

In [40]:
from llama_index.core import StorageContext, load_index_from_storage

index.storage_context.persist(persist_dir="<persist_dir>")

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="<persist_dir>")

# load index
index_loaded = load_index_from_storage(storage_context)

In [41]:
print(index.as_query_engine(llm=llm).query("Who is Donald Trump?"))

There is no mention of Donald Trump in the provided context. The text only discusses Paul Graham's experiences with Y Combinator, startups, Lisp, writing, and angel investing, but does not mention Donald Trump at all.


In [42]:
print(index_loaded.as_query_engine(llm=llm).query("Who is Donald Trump?"))

There is no mention of Donald Trump in the provided context. Therefore, I cannot provide an answer that references any specific information about him.


In [43]:
print(index.as_query_engine(llm=llm).query("Who is Paul Graham?"))

Based on the provided context information, Paul Graham is an individual who co-founded Y Combinator, a startup accelerator, along with Jessica Livingston. He also co-founded Viaweb, which later became Yahoo!. The essay describes his experiences and ideas about starting angel investments, creating Y Combinator, and developing software for online stores.


In [44]:
print(index_loaded.as_query_engine(llm=llm).query("Who is Paul Graham?"))

Based on the provided context information, Paul Graham is the author of the essay described in the text. He is a well-known entrepreneur, investor, and writer who co-founded Y Combinator, an accelerator program that has funded numerous successful startups. In this essay, Graham shares his experiences as an angel investor and founder of various companies, including Viaweb and Y Combinator.


# Retrieving and Querying

In [49]:
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

# build index
# index = VectorStoreIndex.from_documents(documents)

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)

In [52]:
# configure response synthesizer
response_synthesizer = get_response_synthesizer()

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.7)],
)

# query
response = query_engine.query("What did the author do growing up and what is his name?")
print(response)

The author, who grew up spending time at the Carnegie Institute, started taking art classes at Harvard while in a PhD program in computer science. His name is Paul Graham.
