## Trying to use FAISS to see if it makes a difference in performance

In [10]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.llms import Ollama
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders.pdf import PyMuPDFLoader
from langchain.prompts import PromptTemplate
from langchain_community.vectorstores import FAISS
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_text_splitters import CharacterTextSplitter
#from langchain_text_splitters import TokenTextSplitter
from operator import itemgetter

In [11]:
Model = "llama3"; # Takes too much time to compute *sigh* we need better laptops
llmmodel = Ollama(model=Model)

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "Oof that's a tough one, i don't really know this"

Context : {context}

Question : {question}

"""

promt = PromptTemplate.from_template(template)

## Using a md file instead of PDF so it is easier to split the chunks

In [12]:
from langchain_community.document_loaders import TextLoader
loader = DirectoryLoader(path="testData",glob="*.txt")
documents = loader.load()

text_splitter = CharacterTextSplitter(separator="##", chunk_size=1000, chunk_overlap=0)
docs2 = text_splitter.split_documents(documents)

Created a chunk of size 1161, which is longer than the specified 1000
Created a chunk of size 2033, which is longer than the specified 1000
Created a chunk of size 2603, which is longer than the specified 1000
Created a chunk of size 1782, which is longer than the specified 1000
Created a chunk of size 5099, which is longer than the specified 1000
Created a chunk of size 2110, which is longer than the specified 1000
Created a chunk of size 1398, which is longer than the specified 1000
Created a chunk of size 3672, which is longer than the specified 1000
Created a chunk of size 1463, which is longer than the specified 1000
Created a chunk of size 1654, which is longer than the specified 1000
Created a chunk of size 2628, which is longer than the specified 1000
Created a chunk of size 1706, which is longer than the specified 1000
Created a chunk of size 1773, which is longer than the specified 1000
Created a chunk of size 2232, which is longer than the specified 1000
Created a chunk of s

In [13]:
# Using faiss
db2 = FAISS.from_documents(docs2, OllamaEmbeddings(model = Model))
retriever2 = db2.as_retriever()
# faiss_index.similarity_search("")

In [14]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [15]:
chain2 = (
    {"context" : itemgetter("question") | retriever2 | format_docs , "question" : itemgetter("question")}
    | promt
    | llmmodel 
)

In [16]:
print(chain2.invoke({"question" : "What is the browser's high level structure?"}))

The browser's high-level structure consists of:

1. **User Interface**: Includes the address bar, back/forward button, bookmarking menu, etc.
2. **Browser Engine**: Marshals actions between the UI and the rendering engine.
3. **Rendering Engine**: Responsible for displaying requested content. For example, parses HTML and CSS, and displays the parsed content on the screen.
4. **Networking**: Handles network calls such as HTTP requests, using different implementations for different platforms (behind a platform-independent interface).
5. **UI Backend**: Used for drawing basic widgets like combo boxes and windows. This backend exposes a generic interface that is not platform specific. Underneath it uses operating system user interface methods.
6. **JavaScript Engine**: Interpreter used to parse and execute JavaScript code.
7. **Data Storage**: Persistence layer, responsible for storing data locally, such as cookies, and supporting storage mechanisms like localStorage, IndexedDB, and FileSy

This splitting gives much better answers

In [17]:
print(chain2.invoke({"question" : "Tell me about DNS Lookup"}))

DNS lookup!

In this context, the DNS (Domain Name System) lookup process happens when the browser tries to figure out the IP address for the entered domain. Here's a step-by-step breakdown:

1. **Browser cache**: The browser checks its own cache of DNS records. If it has the desired record, it retrieves the associated IP address and proceeds.
2. **OS cache**: If the browser cache doesn't have the record, the browser makes a system call to the OS (e.g., `gethostbyname` in Windows). The OS has its own cache of DNS records.
3. **Router cache**: If the OS cache doesn't have the record, the request continues on to your router, which typically has its own DNS cache.
4. **ISP DNS cache**: If the router's cache is empty, the request moves on to your ISP's DNS server, which has its own cache of DNS records.
5. **Recursive search**: If the ISP's DNS cache is empty, it begins a recursive search from the root nameserver, through the `.com` top-level nameserver, to Google's nameserver.

A recursiv

In [19]:
print(chain2.invoke({"question" : "Do you have information regarding rendering engines in browsers?"}))

Based on the context, I can answer your question:

Yes, I have information regarding rendering engines in browsers. In fact, the text mentions that when you touch the DOM in any way, you set a dirty bit on the whole tree that tells the browser it needs to figure out where everything goes again, which involves the rendering engine's layout algorithm (or more technically, its CSS recalc algorithm, then layout, then repaint, then re-compositing).


In [20]:
print(chain2.invoke({"question" : "Why did katappa kill bahubali?"}))

Oof that's a tough one, I don't really know this! The context seems to be about how the web works and the history of the World Wide Web. There's no mention of Katappa or Bahubali, which appears to be a reference to an Indian movie. I'm not familiar with that topic, so I won't be able to answer your question.


In [21]:
print(chain2.invoke({"question" : "What is a redering engine?"}))

A rendering engine is responsible for displaying requested content. It parses HTML and CSS, and displays the parsed content on the screen.
