## Trying to use FAISS to see if it makes a difference in performance

In [1]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.llms import Ollama
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders.pdf import PyMuPDFLoader
from langchain.prompts import PromptTemplate
from langchain_community.vectorstores import FAISS
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_text_splitters import CharacterTextSplitter
#from langchain_text_splitters import TokenTextSplitter
from operator import itemgetter

In [2]:
Model = "llama3"; # Takes too much time to compute *sigh* we need better laptops
llmmodel = Ollama(model=Model)

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "Oof that's a tough one, i don't really know this"

Context : {context}

Question : {question}

"""

promt = PromptTemplate.from_template(template)

In [3]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [7]:
# Using faiss
db1 = FAISS.load_local("../VectorStores/webPDF",OllamaEmbeddings(model = Model),allow_dangerous_deserialization= True)
retriever1 = db1.as_retriever()

In [6]:
chain1 = (
    {"context" : itemgetter("question") | retriever1 | format_docs , "question" : itemgetter("question")}
    | promt
    | llmmodel 
)

In [7]:
chain1.invoke({"question" : "What is the browser's high level structure?"})

"Based on the context, the browser's high-level structure can be summarized as follows:\n\n1. **UI Backend**: This layer is responsible for drawing basic widgets like combo boxes and windows using operating system user interface methods.\n2. **JavaScript Engine**: This interpreter parses and executes JavaScript code.\n3. **DOM Tree** (Document Object Model): This tree represents the HTML document and its elements, serving as an interface to the outside world (e.g., JavaScript).\n4. **Render Tree**: This tree is a visual representation of the document, constructed while building the DOM tree, used for painting the contents in their correct order.\n\nThese layers work together to enable the rendering of web pages and interactive web applications."

In [8]:
retriever1.invoke("browser high level structure")

[Document(page_content='Layers are specified by the z-index CSS property. It represents the third dimension of the box: its position along the "z axis".\nThe boxes are divided into stacks (called stacking contexts). In each stack the back elements will be painted first and the\nforward elements on top, closer to the user. In case of overlap the foremost element will hide the former element. The stacks are\nordered according to the z-index property. Boxes with "z-index" property form a local stack.\nTrivia\nThe birth of the web\nTim Berners-Lee, a British scientist at CERN, invented the World Wide Web (WWW) in 1989. The web was originally conceived and\ndeveloped to meet the demand for automatic information-sharing between scientists in universities and institutes around the\nworld.\nThe first website at CERN - and in the world - was dedicated to the World Wide Web project itself and was hosted on Berners-\nLee\'s NeXT computer. The website described the basic features of the web; how t

No difference seen in the answer, probable it's the splitting and embedding functions that make a larger difference

## Using a md file instead of PDF so it is easier to split the chunks

In [8]:
# Using faiss
db2 = FAISS.load_local("../VectorStores/webMD",OllamaEmbeddings(model = Model),allow_dangerous_deserialization= True)
retriever2 = db2.as_retriever()
# faiss_index.similarity_search("")

In [None]:
chain2 = (
    {"context" : itemgetter("question") | retriever2 | format_docs , "question" : itemgetter("question")}
    | promt
    | llmmodel 
)

In [None]:
print(chain2.invoke({"question" : "What is the browser's high level structure?"}))

"According to the context, the browser's high-level structure consists of:\n\n1. **User Interface:** Includes the address bar, back/forward button, bookmarking menu, etc.\n2. **Browser Engine:** Marshals actions between the UI and the rendering engine.\n3. **Rendering Engine:** Responsible for displaying requested content (e.g., parsing HTML and CSS, and displaying the parsed content on the screen).\n4. **Networking:** For network calls such as HTTP requests, using different implementations for different platforms (behind a platform-independent interface).\n5. **UI Backend:** Used for drawing basic widgets like combo boxes and windows.\n6. **JavaScript Engine:** Interpreter used to parse and execute JavaScript code.\n7. **Data Storage:** This is a persistence layer, responsible for storing data locally, such as cookies, and supporting storage mechanisms like localStorage, IndexedDB, and FileSystem.\n\nThese components work together to process the browser's functionality."

This splitting gives much better answers

## Using a different model specifically built as an embedding model for embedding the chunks

Embedding model : mxbai-embed-large <br> LLM model : llama3 8B

In [9]:
# Using faiss
db3 = FAISS.load_local("../VectorStores/webMD", OllamaEmbeddings(model =  "mxbai-embed-large"), allow_dangerous_deserialization= True)
retriever3 = db3.as_retriever()
# faiss_index.similarity_search("")

In [None]:
chain3 = (
    {"context" : itemgetter("question") | retriever3 | format_docs , "question" : itemgetter("question")}
    | promt
    | llmmodel 
)

In [None]:
print(chain3.invoke({"question" : "What is the browser's high level structure?"}))

'According to the context, the browser\'s high-level structure is described in the section "#The browser\'s high level structure" which is not provided in the given text. Therefore, I can\'t answer this question accurately.\n\nOof that\'s a tough one, i don\'t really know this.'