# About
Here, I build a Q&A LLM chain to read the book "Time Machine" and answer questions about the book. 

The LLM model I ued is `mistral 7B` model (Ollma ID `61e88e884507`).

# Settings

### Packages

In [1]:
# Langchain related 
from langchain import PromptTemplate
from langchain.document_loaders import GutenbergLoader

from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.llms import HuggingFaceHub


from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler


import os
import sys

sys.path.append('../')

# put Huggingface API token in an py file (..config/token_access.py) and load it
from config import token_access

# set HUGGINGFACEHUB_API_TOKEN to my token_access
os.environ["HUGGINGFACEHUB_API_TOKEN"] = token_access.token_access


### Functions

In [2]:
class SuppressStdout:
    def __enter__(self):
        self._original_stdout = sys.stdout
        self._original_stderr = sys.stderr
        sys.stdout = open(os.devnull, 'w')
        sys.stderr = open(os.devnull, 'w')

    def __exit__(self, exc_type, exc_val, exc_tb):
        sys.stdout.close()
        sys.stdout = self._original_stdout
        sys.stderr = self._original_stderr


### Variables that require changes

In [3]:
#----------------#
# data related
#----------------#
# a variable name used to define specific path
author="HGWells"

# book txt file link
book_link ="https://www.gutenberg.org/cache/epub/35/pg35.txt"

#----------------#
# model related
#----------------#
#  embeddings
embeddings_model_id ="huggingface_embeddings"

# llm model used for reading the book
llm_model_id ="mistral"

#----------------#
# your project root path 
#----------------#
main_Dir = "../book-reader"


### Directories

In [4]:
#----------------#
# data dir
#----------------#
data_Dir = os.path.join(main_Dir,"data")

# embeddings dir
embedding_string= f"{embeddings_model_id}".replace("/", "_").replace("-","_")

# hugging face llm pipeline model dir
embedding_Dir = os.path.join(data_Dir,f"{author}_{embedding_string}")

#----------------#
# model dir
#----------------#
model_Dir= os.path.join(main_Dir,"model")
cache_Dir = os.path.join(model_Dir,"cache")


#----------------#
# make dirs
#----------------#
for f in [data_Dir, embedding_Dir , model_Dir, cache_Dir]:
    os.makedirs(f, exist_ok=True)


# Read a book 
### "The Time Machine by H. G. Wells" from Gutenberg Project

In [5]:
# load the book 
book_loader = GutenbergLoader(book_link)  

document = book_loader.load()

## Creating a QA LLM Chain
This chain will be used to do QA on the document. We will need
 * A vector database that can perform document retrieval
 * An LLM to do the language interpretation
 * Specification on how to deal with this data 

### A vector database that can perform document retrieval
 * split data into chunks
 * use Hugging Face's embedding LLM to embed this data for our vector store

In [6]:
# chunk sizes
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(document)

# create embeddings
embeddings = HuggingFaceEmbeddings(
  cache_folder=embedding_Dir
)

# Make Index using chromadb and the embeddings LLM. 
with SuppressStdout():   
    chromadb_index = Chroma.from_documents(documents=texts, 
                                           embedding=embeddings, 
                                           persist_directory= embedding_Dir)
    


  from .autonotebook import tqdm as notebook_tqdm


### An LLM which does interpretation

In [7]:
# using Ollama 
hf_llm = Ollama(model=llm_model_id, 
                callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))

### Template

In [8]:
# Prompt
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
{context}
Question: {question}
Helpful Answer:"""

QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)


In [9]:
# Convert index to retriever (A wrapper around the functionality of our vector database so we can search for similar documents/chunks in the vectorstore and retrieve the results)
retriever = chromadb_index.as_retriever()

# qa retriever
qa_chain = RetrievalQA.from_chain_type(
    hf_llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)


# Q&A

### Did ok

In [10]:
question = "What is the name of the Time Traveller in the novella?"
answer = qa_chain({"query": question})

 The Time Traveller is the name given to the main character in H.G. Wells' novella "The Time Machine."

In [11]:
question = "Who is the author of Time Machine?"
answer = qa_chain({"query": question})
display(answer)

 H. G. Wells wrote "The Time Machine."

{'query': 'Who is the author of Time Machine?',
 'result': ' H. G. Wells wrote "The Time Machine."'}

In [12]:
question = "How do Eloi look like?"
answer = qa_chain({"query": question})

 The Eloi are described as mere fatted cattle, indicating they have a stocky and possibly unintelligent appearance. They are contrasted with the ant-like Morlocks who prey upon them.

In [13]:
question = "How do Morlock look like?"
answer = qa_chain({"query": question})

 The text describes the Morlocks as pale-bodied beings, evoking a sense of unease and shrinking in the narrator. However, it does not provide a detailed description of their appearance.

In [14]:
question = "In this novella, what is the fourth dimension?"
answer = qa_chain({"query": question})

 The fourth dimension, as some people misunderstand it, is not a distinct spatial dimension but an alternative way of perceiving time.

In [15]:
question = "Who is Filby?"
answer = qa_chain({"query": question})

 Filby is a character in the given text who attempted to share an anecdote about a conjuror he had seen at Burslem, but was interrupted before finishing by the Time Traveller's return.

In [16]:
question = "What scientific principle does the Time Traveler's visit to the decaying world thirty millions in the future illustrate?"
answer = qa_chain({"query": question})

 The Time Traveler's visit to the decaying world thirty millions in the future illustrates the scientific principle of time and its effect on matter.

In [17]:
question = "Where is the Time Machine held?"
answer = qa_chain({"query": question})

 The Time Machine is inside a pedestal, which has no handles or keyholes for exterior access. Its location and how it got there remain a mystery.

In [18]:
question = "Who was the inventor of time machine in the novella?"
answer = qa_chain({"query": question})

 The inventor of the Time Machine in the novella is referred to as "I" by the speaker. However, the identity of the speaker is not explicitly stated in the context provided.

### Did not do well (Did not ask properly)

In [19]:
question = "Can you please summarize the novella?"
answer = qa_chain({"query": question})

 The narrator rushes through a dining hall under moonlight, knocking down one of the people and causing terror. They do not recall everything they did during this incident but believe their unexpected loss maddened them and left them feeling isolated from their kind.

In [20]:
question = "Who wrote this book?"
answer = qa_chain({"query": question})

 The text does not provide information about who wrote the book.

In [21]:
question = "What temporarily blinds the Morlocks?"
answer = qa_chain({"query": question})

 The text does not provide information on what temporarily blinds the Morlocks.

In [22]:
question = "What do the Eloi eat?"
answer = qa_chain({"query": question})

 The Eloi are likely used for breeding and then consumed by the ant-like Morlocks. They are described as "mere fatted cattle."

In [23]:
question = "Who invented the Time Machine?"
answer = qa_chain({"query": question})

 H. G. Wells wrote and invented the concept of the Time Machine in his science fiction novel published in 1895.