In [1]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("literature/book.pdf")
pages = loader.load_and_split()

In [2]:
sources = pages

In [6]:
print(pages[10])

page_content='thing in the world that these people did not just live a thousand years ago; they had lived \nfor a thousand years, taking in everything that had passed before their eyes until their gaze \nheld the quintessence of time —the heavy slow suspension of a millennium, clots and \ncrystals of time squeezed together like tightly pressed atomic nuclei. And I was before them, \na mortal, barely nineteen. I wasn’t even a woman yet to tell the truth (it was soon after that \nI became one), so perhaps mine was the kind of revelation classified in theological literature \nas a maiden’s vision or some such. At any rate, never again, in any of the ancient holy \nplaces —not at the Athenian Parthenon, nor on the bare site of the Jerusalem Temple, nor in \nthe Garden of Gethsemane —would those, visible or invisible, who persist in a place for \nages, welcome me as one of their own. All I ever felt again —even when I managed to be \nalone with them —was their purposeful wariness, not menac

In [3]:
from dotenv import load_dotenv
load_dotenv()

True

In [5]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.docstore.document import Document
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

In [6]:
embeddings = OpenAIEmbeddings()

In [7]:
source_chunks = []
splitter = CharacterTextSplitter(separator=" ", chunk_size=2024, chunk_overlap=0)
for source in sources:
    for chunk in splitter.split_text(source.page_content):
        source_chunks.append(Document(page_content=chunk, metadata=source.metadata))

In [8]:
search_index = Chroma.from_documents(source_chunks, embeddings)
retriever = search_index.as_retriever()

In [9]:
#retriever = search_index.as_retriever()

# Similarytiy with pages

In [18]:
docs = search_index.similarity_search("Description of Nusya?", k=20)
for doc in docs:
    print(str(doc.metadata["page"]) + ":", doc.page_content[:500])

71: The fact that she now approaches —in the same gliding walk instilled by Lviv’s most 
glorious tuteurs (only the legs, my lady, we’re moving only our legs!) beside Nusya, his 
regular courier, and carries toward him that serene, unattainable smile of hers like a 
discrete source of light in the November cityscape —is equivalent to the heavens collapsing 
in pieces onto the earth below —he wouldn’t blink an eye if they did collapse. Snow falls and 
carries the smell of her hair, the dizzyingly ten
127: Communist Party built for its own spawn right before the ol’ USSR’s demise, because their 
old place on Rylski Street, also a not -too-shabby Secession villa with lions, which Gucci 
Nastya and I happen to be passing right now, was getting uncomfortably crowded for the 
Party’s lush cabbage patch.  
The villa is now a bank and the lions have to sit encircled by pink granite, as if in the 
middle of a skating rink. I ask the future Ukrainian journalist if she knows what this 
building h

# Chat with AI about book

In [14]:
question = "How does the author describe the condition and appearance of the photos in the first chapter 'Room'?"

In [20]:
template = """Use the following pieces of context to answer the question at the end. 
You are Oksana Zabuzhko the autor of this book.  
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)
result = qa_chain({"query": question})
result["result"]

'Nusya is a character in the book who is described as a "little button of a girl" and a "nugget of a girl." She is a courier and is seen walking beside the protagonist, Adrian Ortynsky, with a serene smile. She is also described as being nervous and speaking with awkward Polish syntax.'

In [15]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm, chain_type="stuff", retriever=retriever)
qa_chain({"query": question})

{'query': 'How are the main characters of the book?',
 'result': 'Based on the given context, it is difficult to determine the specific main characters of the book. However, there are several individuals mentioned, such as Adrian, the professor, the captain, and the two girls. Without more information, it is unclear who the main characters are and what their characteristics are.'}