# Loading The Knowledge

In [1]:
#pip install --q unstructured langchain
#pip install "unstructured[all-docs]"

In [2]:
from langchain_community.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader

In [3]:
local_path = "/media/axel/SATA/cobaPython/NLP_RAG/data/billiards_tutorial.pdf"

if local_path:
    loader = UnstructuredPDFLoader(file_path=local_path)
    data = loader.load()
else:
    print("No local path provided")

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
data[0].page_content

'About the Tutorial\n\nBilliards is a game that can be compared to the game of carom. In this game cue is used as a striker and balls are to be put into the pockets. The tutorial lets you know about the game, its rules, and the method of playing.\n\nAudience\n\nAnyone who wants to learn about billiards can go through this tutorial, as this tutorial deals with various aspects of the game and will give a lot of information regarding the game.\n\nPrerequisite\n\nBefore proceeding with this tutorial, you are required to have a passion for this game and an eagerness to acquire knowledge on the same.\n\nCopyright & Disclaimer\n\n\uf0e3 Copyright 2016 by Tutorials Point (I) Pvt. Ltd.\n\nAll the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute, or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher.\n\nWe strive t

# Vector Embeddings

In [5]:
# !ollama pull nomic-embed-text

In [6]:
# !ollama list

In [7]:
# !pip3 install chromadb
# !pip3 install langchain-text-splitters

In [8]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [9]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [10]:
#add vector to database

vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local_rag"
)

OllamaEmbeddings: 100%|██████████| 30/30 [00:11<00:00,  2.59it/s]


# Retrieval

In [11]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [12]:
# !ollama pull mistral

In [13]:
# LLM from Ollama
local_model = "mistral"
llm = ChatOllama(model=local_model)

In [14]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [15]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [16]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


In [17]:
chain.invoke("what is it about?")

OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.23s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  6.94it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 10.29it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  5.62it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 10.19it/s]


' The given text appears to be a tutorial about the game of Billiards, specifically focusing on its rules, equipment, and terminologies used in the game.'

In [18]:
chain.invoke("tell me about the billiards board design")

OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.34s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  7.06it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 10.96it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  5.69it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 10.50it/s]


' The Billiards board is designed to be riveted to a table. Unlike other board games, this is the only game where the board is part of the table. The Billiards board is larger than any other boards, with a playing surface that measures 11ft 8 ½ in x 5ft 10in x 2ft 10 ½ in. At a distance of 29in from the bottom cushion, a parallel line is drawn called baulk line. Another mandatory marking on the board is ‘D’, which is drawn with the mid-point of the baulk line as its center and has a radius of 1 ½ in. The playing surface of the board is top quality cloth material that helps the balls to roll easily around the board and pocket them. The Billiards board also has four pockets in the corner and two on the side bars, similar to Carom boards.'

In [19]:
chain.invoke("tell me about the terms in billiards")

OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.26s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  7.55it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 11.31it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  5.73it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 11.00it/s]


'1. Game - It refers to the period of play from when the striker breaks the formation of balls and either finishes the game in concession or the total time of the game has elapsed.'

In [20]:
chain.invoke("how to start the game of billiards?")

OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.39s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  7.14it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 10.59it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  5.60it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 10.86it/s]


' The game of Billiards starts with a method called stringing, which is similar to toss in any other match. Both players play the cue ball towards the opposite cushion and ensure it comes back to baulk cushion. Whosoever manages to keep the ball closer to the baulk cushion shall give options to opponents. The striker breaks the formation of balls and attempts to pocket as many balls as possible during their turn.'