In [1]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain_text_splitters import SentenceTransformersTokenTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_ollama import OllamaLLM
from langchain.chains import RetrievalQA
from utils import clean
from dotenv import load_dotenv

In [2]:
load_dotenv()

True

In [3]:
dir = "../../documents"
documents_path = os.listdir(dir)
documents_path = [f"{dir}/{file}" for file in documents_path]

In [4]:
documents = []
for file in documents_path:
    loader = PyPDFLoader(file)
    loaded_docs = loader.load()
    
    for doc in loaded_docs:
        doc.page_content = clean(doc.page_content)
    
    documents.extend(loaded_docs)

In [None]:
text_splitter = SentenceTransformersTokenTextSplitter(tokens_per_chunk=300, chunk_overlap=30)
chunks = text_splitter.split_documents(documents)

In [6]:
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
# embeddings = embedding_model.embed_documents([chunk.page_content for chunk in chunks])

In [7]:
vectorstore = FAISS.from_documents(documents=chunks, embedding=embedding_model)

In [8]:
retriever  = vectorstore.as_retriever()
llm = OllamaLLM(model="llama3.2:1b")

rag_pipeline = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, return_source_documents=True)

In [10]:
query = "how use nural network in nlp"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
I don't know. 

This text primarily discusses the application of neural networks (specifically recurrent neural networks) in Natural Language Processing (NLP), but it does not specifically explain how neural networks are used in NLP or provide a detailed explanation of their role in language modeling, sentiment analysis, and other tasks. The text jumps directly into discussing the use of neural networks in these applications without providing a step-by-step explanation of how they work in NLP.


In [11]:
query = "who author this book"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
I don't know, as I couldn't find any information about the author of this text.


In [12]:
query = "this book explain the lstm and rnn"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
I don't know whether the question refers to "LSTM" (Long Short-Term Memory) or "RNN" (Recurrent Neural Network), as they are often used interchangeably in popular contexts. However, based on the provided text, it appears that the book focuses primarily on RNN architectures and their applications.


In [13]:
query = "who best naive bays or transformer"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
I don't know, I'm not familiar with the specific context of this question and don't want to provide an incorrect answer. Can you please provide more information or clarify which part of the text you are referring to? I'll do my best to help.


In [14]:
query = "In this book what the chapter number for  vector semantic"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
I don't know. The question is about the concept of "document matrix" and how it relates to vector semantics in the context of NLP, but I'm not provided with enough information to answer the specific question about the chapter number for "vector semantic".


In [15]:
query = "what the transformer use case"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
I don't know. The text does not explicitly state a particular use case for transformers, but it does mention that they can be used to build language models and that they have a wide context window, which could be used for various applications such as machine translation or text summarization. However, without more information or context, it is difficult to determine a specific use case.


In [16]:
query = "what the naive bays use case"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
I don't know, since this question requires specific knowledge of the Multinomial Naive Bayes classiﬁer and its application to text classification tasks.


In [17]:
query = "what the self attention"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
The final set of equations for computing self-attention in a single self-attention output vector ai from a single input vector xi is as follows:

1. score(xi, xj) = qi · kj√dk (9. 11)
2. αi j = softmax(score(xi, xj)) [UNK] ≤ i (9. 12)
3. headi = [UNK] j≤i αi jvj (9. 13)

These equations calculate the attention score for each element ai in the output vector by summing the products of the query and key values and taking the exponential of half of these sums, which is equivalent to applying softmax to the weighted dot product.

And the range over the entire input, as shown in fig. 11.1b:

a) A causal self-attention layer
b) A bidirectional self-attention layer


In [19]:
query = "what the mask in transformer"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
I don't know. The text does not explicitly state what the mask in a transformer block is or how it is calculated. It only mentions that "the masked language model training" includes selecting three input tokens and replacing one with an unrelated word, but it does not provide information about the mask itself.
