### SIMPLE RAG PIPELINE

##### Data Ingestion/Extraction
Famouse Books, summary texts and details. Ref : https://www.sayebrand.com/blogs/stories/25famousbooks

In [1]:
# Data loading, loading a single text file
from langchain_community.document_loaders import TextLoader
loader = TextLoader("./books/THE BIBLE.txt") # Implement a loop for the texts
text = loader.load()
text

[Document(metadata={'source': './books/THE BIBLE.txt'}, page_content='GENRE\n\nReligious Text.\n\nREAD IF\n\nYou enjoy philosophy and symbolism texts.\n\nFUN FACT\n\nOver 100 million copies of the Bible are sold each year.\n\nYes, The Bible is a book and is one of the most successful of all time and also one of the go-toâ€™s on deep morality. The Bible is a collection of religious texts or scriptures that have become sacred to Christians, Jews, Samaritans, Rastafari and other religious groups.\n\nThe influence the Bible has had on Western culture is immeasurable. For thousands of years this book has inspired the greatest writers, artists, musicians, religious leaders, painters of our time. Love it or hate it, the bible has been one of the most pivotal books of all time in the western world.')]

In [3]:
# Loading data from a PDF document
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("path_to_PDF")
pdfs = loader.load()
pdfs

In [None]:
# Loading data from html pages of a website using Web based loader
from langchain_community.document_loaders import WebBaseLoader
import bs4

loader = WebBaseLoader(web_paths=("url_here"), bs_kwargs=dict(parse_only=bs4.SoupStrainer(
    class_ = ("post-title", "post-content", "post-header") # Html elements on the webpage to extract
)))

html_text = loader.load()
html_text

In [2]:
# Loading the entire directory of books with all text files, note: mention loader class or it will use default loader leading to more dependencies
from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader('./books/.', glob="**/*.txt", loader_cls=TextLoader)
docs = loader.load()
docs

[Document(metadata={'source': 'books\\1984.txt'}, page_content='GENRE\n\nDystopian social science fiction.\n\nREAD IF\n\nYou love reading about dystopian worlds, possible futures and if youâ€™re a fan of the series Black Mirror.\n\nFUN FACT\n\nOrwell modeled the character of Julia on his second wife, Sonia Brownell.\n\n1984 is a dystopian social science fiction novel and is written by English novelist George Orwell. The story takes place in an imagined future (the year 1984) when much of the world has fallen victim to totalitarianism, mass surveillance, manipulation of the past and propaganda.\n\nThe superstate is called Oceania and is ruled by the Party who employ Thought Police, whose job it is to persecute individuality and independent thinking. Winston Smith is the protagonist of the novel and although he is a responsible and reliable worker in the system, he dreams of rebellion. When his colleague, Julia, and him begin a forbidden relationship, he begins to remember what life was 

##### Data Transformation

In [3]:
# Converting the loader data into chunks to create embeddings for the vectorDB and in extension for the LLM

#Using text splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
documents = text_splitter.split_documents(docs)
print(documents[0].page_content)

GENRE

Dystopian social science fiction.

READ IF

You love reading about dystopian worlds, possible futures and if youâ€™re a fan of the series Black Mirror.

FUN FACT

Orwell modeled the character of Julia on his second wife, Sonia Brownell.


##### Data Loading

In [4]:
# Creatin vector embeddings and VectorDB or Vector Store
# Various embeddings available paid and open-source, use as needed!

from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="llama2")
vectorstore = Chroma.from_documents(documents, embedding=oembed)

In [5]:
# Performing similarity search on the created vector store of ChromDB
question="Who is Harry Potter?"
docs = vectorstore.similarity_search(question)
len(docs)

4

In [10]:
print(docs)

[Document(metadata={'source': 'books\\HARRY POTTER SERIES.txt'}, page_content='Harry Potter is a series by British author, J. K. Rowling, and is made up of seven different novels. With the first book being published in 1997, the lives of little people around the world were changed forever. The Harry Potter world had well and truly began.'), Document(metadata={'source': 'books\\ON THE ORIGIN OF SPECIES.txt'}, page_content='GENRE\n\nScientific Literature.\n\nREAD IF\n\nYou are interested in how animals adapt to their environment to survive.\n\nFUN FACT\n\nIt took over 20 years for Darwin to research, develop and finally publish his book.'), Document(metadata={'source': 'books\\THE GREAT GATSBY.txt'}, page_content="GENRE\n\nNovel (however recent times would place it in the â€˜historical fictionâ€™ category).\n\nREAD IF\n\nYou are interested in the 1920â€™s Jazz Era in the United States.\n\nFUN FACT\n\nThe book sold no more than 25,000 copies in Fitzgerald's lifetime, and then 25 million c

In [6]:
# Connect with local llm, here Llama 2 using Ollama
from langchain_community.llms import Ollama
llm = Ollama(
    base_url='http://localhost:11434',
    model="llama2"
)
print(llm.invoke("why is the sky blue"))


The sky appears blue to our eyes due to a phenomenon called Rayleigh scattering. This occurs when sunlight enters Earth's atmosphere and encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths.

As a result, when sunlight reaches our eyes, it has been scattered in all directions and appears blue. The blue color of the sky is especially visible during sunrise and sunset, when the Sun is low on the horizon and the atmosphere has to scatter the light through more of the Earth's atmosphere, giving it a more intense blue color.

Other factors can also affect the appearance of the sky, such as dust and pollution in the atmosphere, which can scatter light in different ways and give the sky a yellowish or brownish tint. However, under normal conditions, the blue color of the sky is due to Rayleigh scattering.


In [7]:
# Design Chat Prompt Template with context and question query as well as llm instruction or prompting
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("""
Answer the question based on the provided context.
Make the response fast no need to give many details just a bried summary will do.
Make sure the response is helful and accurate. Be respectful to the user.
<context>
{context}
</context>
Question: {input}""")

In [8]:
# Create document chain to give documents as context to llm
from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)



In [9]:
# Create Retriever better than a vector store similarity search, vector store is the backbone of a retriever.
retriever = vectorstore.as_retriever()

#Implement document chain nd retriever i.e a retriever chain, llm uses documents to answer query
from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [10]:
response = retrieval_chain.invoke({"input":"What are some famous books which have a series of books?"})

In [11]:
print(response["answer"])

Thank you for providing the context! Based on the information provided, here are some famous book series:

1. Harry Potter by J.K. Rowling - This is a classic example of a popular book series that has captured the hearts of millions of readers worldwide. The series consists of seven novels that follow the adventures of Harry Potter and his friends at Hogwarts School of Witchcraft and Wizardry.
2. Lord of the Rings by J.R.R. Tolkien - This epic fantasy novel series is a classic in the genre, consisting of three volumes: The Fellowship of the Ring, The Two Towers, and The Return of the King. The series follows the quest of a hobbit named Frodo Baggins to destroy the One Ring and save Middle-earth from the darkness of Sauron.
3. Chronicles of Narnia by C.S. Lewis - This series consists of seven fantasy novels that follow the adventures of four siblings who find a magical land called Narnia, where they encounter talking animals, magic, and epic battles between good and evil.
4. A Series of

In [None]:
# from langchain.chains import RetrievalQA
# qachain=RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())
# res = qachain.invoke({"query": question})
# print(res['result'])