In [1]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain_text_splitters import SentenceTransformersTokenTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_mistralai import ChatMistralAI
from langchain.chains import RetrievalQA
from utils import clean
from dotenv import load_dotenv

In [2]:
load_dotenv()

True

In [3]:
dir = "../../documents"
documents_path = os.listdir(dir)
documents_path = [f"{dir}/{file}" for file in documents_path]

In [4]:
documents = []
for file in documents_path:
    loader = PyPDFLoader(file)
    loaded_docs = loader.load()
    
    for doc in loaded_docs:
        doc.page_content = clean(doc.page_content)
    
    documents.extend(loaded_docs)

In [5]:
text_splitter = SentenceTransformersTokenTextSplitter(tokens_per_chunk=100, chunk_overlap=10)
chunks = text_splitter.split_documents(documents)

In [6]:
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
# embeddings = embedding_model.embed_documents([chunk.page_content for chunk in chunks])

In [7]:
vectorstore = FAISS.from_documents(documents=chunks, embedding=embedding_model)

In [13]:
retriever  = vectorstore.as_retriever()
llm = ChatMistralAI(model="mistral-medium-latest", temperature=0.8, max_retries=2)

rag_pipeline = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, return_source_documents=True)

In [14]:
query = "how use nural network in nlp"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
The provided context discusses the application of feedforward networks and recurrent neural networks (RNNs) in natural language processing (NLP) tasks, particularly for sentiment analysis and language modeling. Here are some key points on how neural networks are used in NLP based on the context:

1. **Feedforward Networks for Classification**:
   - **Sentiment Analysis**: Feedforward networks can be used for classification tasks like sentiment analysis. This involves using traditional hand-built features of the input text, such as lexicon words and word counts, to determine the sentiment of a given text.

2. **Neural Language Modeling**:
   - **Fixed Context Input**: Feedforward neural language models use a fixed context input to a weight matrix. This means that the network takes a specific number of previous tokens (words or characters) as input to predict the next token.
   - **RNNs for Sequential Data**: RNNs are particularly useful for processing sequential data. They can h

In [15]:
query = "who author this book"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
The book "Selected Papers of J. R. Firth" was authored by J. R. Firth. It was published by Longman in Harlow in 1968.


In [16]:
query = "this book explain the lstm and rnn"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
Yes, the provided context explains both Long Short-Term Memory (LSTM) networks and Recurrent Neural Networks (RNNs). Here are some key points covered:

1. **RNNs (Recurrent Neural Networks):**
   - Basic neural units used in feedforward and simple recurrent networks.
   - Bidirectional RNNs are mentioned, which process sequences in both forward and backward directions to capture more context.

2. **LSTMs (Long Short-Term Memory):**
   - LSTMs are an advanced type of RNN unit.
   - They have become the standard for modern systems using recurrent networks due to their effectiveness in capturing long-term dependencies.
   - The context mentions that LSTMs are trained using backpropagation and are more commonly used than basic RNNs in practice.

The text also discusses the application of these architectures in various tasks and summarizes common RNN and LSTM architectures used in Natural Language Processing (NLP).


In [17]:
query = "who best naive bays or transformer"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
The choice between Naive Bayes and Transformers depends on the specific task and context:

1. **Naive Bayes**:
   - **Pros**: Simple to implement, fast to train, and can perform well on small datasets or short documents. It is also efficient for larger documents or datasets and often makes correct classification decisions despite less accurate probabilities.
   - **Cons**: Makes strong independence assumptions between features, which may not hold true in many real-world scenarios. Generally less accurate than more complex models like logistic regression or Transformers.

2. **Transformers**:
   - **Pros**: Highly accurate and effective for a wide range of tasks, especially in natural language processing (NLP). They can capture complex patterns and dependencies in data due to their multi-head attention mechanisms.
   - **Cons**: More computationally intensive and require significant resources for training and fine-tuning. They are also more complex to implement and understand co

In [18]:
query = "In this book what the chapter number for  vector semantic"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
The provided context does not specify the chapter number for vector semantics. It only mentions that vector semantics is discussed in the text, but the exact chapter number is not given. Therefore, I cannot provide the chapter number based on the information available.


In [19]:
query = "what the transformer use case"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
The context provided describes the architecture and components of a transformer model, particularly focusing on its use in language modeling. Here are the key use cases for transformers based on the given context:

1. **Language Modeling**: Transformers are used to predict the next token in a sequence. This involves encoding input tokens, passing them through stacked transformer blocks, and using a language model head to generate logits for predicting the next token.

2. **Self-Attention Mechanisms**: Transformers utilize multi-head attention, a form of self-attention, to process input vectors. This mechanism allows the model to weigh the relevance of prior tokens when processing the current token, making it highly effective for tasks requiring contextual understanding.

3. **Sequence Processing**: Transformers are designed to handle sequences of data, making them suitable for tasks involving sequential input, such as text generation, translation, and other natural language pro

In [20]:
query = "what the naive bays use case"
response = rag_pipeline.invoke(query)

print("Answer:")
print(response["result"])

Answer:
Naive Bayes has several use cases, particularly in text classification and scenarios involving smaller datasets or documents. Here are some key use cases based on the provided context:

1. **Text Classification**: Naive Bayes is commonly used for classifying text documents. It can efficiently handle tasks like spam detection, sentiment analysis, and topic classification.

2. **Small Datasets**: Naive Bayes performs well on very small datasets, sometimes even better than more complex models like logistic regression.

3. **Short Documents**: It is effective for classifying short documents, making it suitable for applications like email filtering or social media post categorization.

4. **Speed and Simplicity**: Naive Bayes is easy to implement and very fast to train because it lacks an optimization step, making it a good choice for applications where training speed and simplicity are important.

5. **Baseline Model**: Due to its simplicity and speed, Naive Bayes is often used as 