In [1]:
import os
base_project_path = os.getcwd().split(os.environ['PROJECT_NAME'])[0] + os.environ['PROJECT_NAME'] 

from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain import FAISS

from langchain.prompts import PromptTemplate

## Chat Conversation

In [2]:
# Chat Model
chat = ChatOpenAI(openai_api_key=os.environ["OPENAI_API_KEY"], 
                  model_name="gpt-3.5-turbo", 
                  temperature=0.3)

# Messages
messages = [
    SystemMessage(content="You are expert in data science and machine learning, aswer the following questions with a detailed thecnical answer:"),
    HumanMessage(content="What is exploding/vanishing gradients")
]

# Chat
response=chat(messages)
print(response.content,end='\n')

Exploding and vanishing gradients are common issues that can occur during the training of deep neural networks, particularly in recurrent neural networks (RNNs). These issues arise due to the nature of backpropagation, where gradients are propagated backward through the network to update the weights.

Exploding gradients refer to the situation when the gradients grow exponentially as they are propagated backward through the network layers. This can lead to large updates to the weights, causing instability in the learning process. In extreme cases, the weights can become so large that they overflow, resulting in NaN (Not a Number) values and rendering the model useless.

On the other hand, vanishing gradients occur when the gradients diminish exponentially as they are propagated backward through the network layers. This means that the updates to the weights become very small, and the network fails to learn effectively. The problem becomes more pronounced in deep networks with many layer

## Chat Conversational Memory

- ConversationBufferMemory    
- ConversationSummaryMemory
- ConversationBufferWindowMemory
- ConversationSummaryBufferMemory
- ConversationKnowledgeGraphMemory 
- ConversationEntityMemory



In [14]:
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferMemory

model = ChatOpenAI(openai_api_key=os.environ["OPENAI_API_KEY"], model_name="gpt-3.5-turbo", temperature=0.3)

# Chain without memory
chain = ConversationChain(llm=model)
print(chain("My name is John!"))
print(chain("What is my name?"))

{'input': 'My name is John!', 'history': '', 'response': "Hello John! It's nice to meet you. How can I assist you today?"}
{'input': 'What is my name?', 'history': "Human: My name is John!\nAI: Hello John! It's nice to meet you. How can I assist you today?", 'response': 'Your name is John.'}


In [9]:
# Chain with buffer memory
chain_buff = ConversationChain(
    llm=model,
    memory=ConversationBufferMemory()
)
print(chain_buff("My name is John!"))
print(chain_buff("What is my name?"))

{'input': 'Good morning AI, my name is John!', 'history': '', 'response': "Good morning John! It's nice to meet you. How can I assist you today?"}
{'input': 'What is my name?', 'history': "Human: Good morning AI, my name is John!\nAI: Good morning John! It's nice to meet you. How can I assist you today?", 'response': 'Your name is John.'}


In [15]:
from langchain.callbacks import get_openai_callback

def count_tokens(chain, query):
    with get_openai_callback() as cb:
        result = chain.run(query)
        print(f'Spent a total of {cb.total_tokens} tokens')

    return result

count_tokens(chain_buff, "My name is John!")

Spent a total of 137 tokens


'Yes, I know. Your name is John.'

## RAG Model

### load_qa_chain

In [3]:
import PyPDF2
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain import FAISS

# Load PyPDF2
def read_pdf(filename):
    with open(filename, 'rb') as f:
        pdf_reader = PyPDF2.PdfReader(f)
        text = ''
        for page in pdf_reader.pages:
            text += page.extract_text()
    return text

pdf_path = base_project_path + '\\data\\raw\\ml_books\\NIPS-2017-attention-is-all-you-need-Paper.pdf'
text = read_pdf(pdf_path)

# Split in chunks using CharacterTextSplitter
text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len
)
all_splits = text_splitter.split_text(text)

# Convert the chunks of text into embeddings to form a knowledge base
embeddings = OpenAIEmbeddings()
knowledgeBase = FAISS.from_texts(all_splits, embeddings)

In [4]:
# Create the model
from langchain.chains.question_answering import load_qa_chain

# Use OpenAI as the LLM
model = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model_name="gpt-3.5-turbo",
    temperature=0.3,
)

chain = load_qa_chain(llm=model, chain_type="stuff")

question = "Explain optimizer for the text"
docs = knowledgeBase.similarity_search(question)

chain.run(input_documents=docs, question=question)

"The optimizer used in the described model is Adam optimizer. Adam is a popular optimization algorithm that is commonly used in deep learning. It is known for its efficiency and ability to handle large datasets.\n\nThe specific hyperparameters used for Adam in this model are:\n- β1 = 0.9\n- β2 = 0.98\n- ɛ = 10^-9\n\nThe learning rate is varied during training according to a specific formula:\nlrate = d * (step_num^(-0.5)) * min(step_num^(-0.5), step_num^(-1.5) * warmup_steps^(-0.5))\n\nIn this formula, d is a constant, step_num represents the current training step, and warmup_steps is a predefined value of 4000.\n\nThis learning rate schedule increases the learning rate linearly for the first warmup_steps training steps and then decreases it proportionally to the inverse square root of the step number.\n\nOverall, the Adam optimizer with this learning rate schedule helps in efficiently optimizing the model's parameters during training."

### RetrievalQA

In [13]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load PyPDFLoader
pdf_path = base_project_path + '\\data\\raw\\ml_books\\NIPS-2017-attention-is-all-you-need-Paper.pdf'
loader = PyPDFLoader(pdf_path)
text = loader.load()

# Split in chunks using RecursiveCharacterTextSplitter
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
# all_splits = text_splitter.split_documents(text)

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(text)

In [14]:
from langchain.vectorstores import Chroma
# Convert the chunks of text into embeddings to form a knowledge base
embeddings = OpenAIEmbeddings()

# knowledgeBase = FAISS.from_texts(all_splits, embeddings)

persist_directory = 'docs/chroma/'
knowledgeBase = Chroma.from_documents(documents=all_splits,
                                      embedding=embeddings,
                                      persist_directory=persist_directory)

In [16]:
# Use OpenAI as the LLM
model = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model_name="gpt-3.5-turbo",
    temperature=0.3,
)

from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(model,
                                       retriever=knowledgeBase.as_retriever())

question = "Explain optimizer for the text"
result = qa_chain({"query": question})

result

{'query': 'Explain optimizer for the text',
 'result': "The optimizer used in the text is called Adam. Adam is a popular optimization algorithm commonly used in deep learning. It is an extension of stochastic gradient descent (SGD) that combines the advantages of both adaptive learning rate methods and momentum methods.\n\nIn the text, the specific hyperparameters used for Adam are β1= 0.9, β2= 0.98, and ϵ= 10−9. These values determine the exponential decay rates for the first and second moments of the gradients, as well as a small constant added to the denominator to prevent division by zero.\n\nThe learning rate is varied over the course of training using a formula described in equation (3) in the text. The learning rate (lrate) is determined by the value of d, the step number (step_num), and the warmup_steps. The warmup_steps is set to 4000.\n\nThe formula for the learning rate increases it linearly for the first warmup_steps training steps and then decreases it proportionally to th

### VectorstoreIndexCreator

In [17]:
from langchain.indexes import VectorstoreIndexCreator
# Load PyPDFLoader
pdf_path = base_project_path + '\\data\\raw\\ml_books\\NIPS-2017-attention-is-all-you-need-Paper.pdf'
loader = PyPDFLoader(pdf_path)

# Use VectorstoreIndexCreator to generate the chain
index = VectorstoreIndexCreator(

    text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=0),

    embedding=OpenAIEmbeddings(),

    vectorstore_cls=Chroma
    
).from_loaders([loader])

query = "Explain optimizer for the text"

index.query(llm=model, question=query, chain_type="stuff")

'The optimizer used in the Transformer model is called Adam. Adam stands for Adaptive Moment Estimation and it is a popular optimization algorithm for training neural networks.\n\nAdam combines the ideas of two other optimization algorithms: AdaGrad and RMSprop. It maintains a running average of the past gradients and squared gradients, and uses these averages to update the parameters of the model.\n\nThe Adam optimizer uses two main hyperparameters: beta1 and beta2. Beta1 controls the decay rate of the running average of the gradients, while beta2 controls the decay rate of the running average of the squared gradients.\n\nDuring training, the learning rate is varied according to a schedule. In the case of the Transformer model, the learning rate is increased linearly for the first warmup_steps training steps, and then decreased proportionally to the inverse square root of the step number.\n\nOverall, the Adam optimizer helps to efficiently update the parameters of the Transformer mode

### ConversationalRetrievalChain

In [19]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load PyPDFLoader
pdf_path = base_project_path + '\\data\\raw\\ml_books\\NIPS-2017-attention-is-all-you-need-Paper.pdf'
loader = PyPDFLoader(pdf_path)
text = loader.load()

# Split in chunks using RecursiveCharacterTextSplitter
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
# all_splits = text_splitter.split_documents(text)

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(text)

In [20]:
from langchain.vectorstores import Chroma
# Convert the chunks of text into embeddings to form a knowledge base
embeddings = OpenAIEmbeddings()

# knowledgeBase = FAISS.from_texts(all_splits, embeddings)

persist_directory = 'docs/chroma/'
knowledgeBase = Chroma.from_documents(documents=all_splits,
                                      embedding=embeddings,
                                      persist_directory=persist_directory)

In [21]:
# Use OpenAI as the LLM
model = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model_name="gpt-3.5-turbo",
    temperature=0.3,
)

from langchain.chains import ConversationalRetrievalChain

qa = ConversationalRetrievalChain.from_llm(model, knowledgeBase.as_retriever())

In [22]:
chat_history = []
query = "Explain the transformer model archtiecture"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]

'The Transformer model architecture consists of two main components: the encoder and the decoder.\n\nThe encoder is composed of a stack of identical layers. Each layer has two sub-layers: a multi-head self-attention mechanism and a position-wise fully connected feed-forward network. The self-attention mechanism allows the encoder to weigh the importance of different words in the input sequence when encoding information. The feed-forward network helps to capture complex relationships between words. Each sub-layer is followed by a residual connection and layer normalization. The output of each sub-layer is then added to the input, and the result is normalized.\n\nThe decoder also consists of a stack of identical layers. In addition to the two sub-layers present in each encoder layer, the decoder inserts a third sub-layer. This additional sub-layer performs multi-head attention over the output of the encoder stack. This allows the decoder to focus on different parts of the input sequence 

In [23]:
chat_history = [(query, result["answer"])]
query = "Explain optimizer for the text"
result = qa({"question": query, "chat_history": chat_history})
result['answer']

"The optimizer plays a crucial role in text processing by adjusting the model's parameters during training to minimize the loss function. It determines how the model learns and updates its weights based on the gradients calculated during backpropagation. In the given context, the Adam optimizer with specific hyperparameters (β1, β2, and ϵ) was used to train the models. The learning rate was also varied over the course of training using a formula that increased it linearly for the first warmup_steps training steps and then decreased it proportionally to the inverse square root of the step number. This optimization technique helps the model converge to better performance and improve its ability to generate accurate and meaningful text."

In [24]:
chat_history

[('Explain the transformer model archtiecture',
  'The Transformer model architecture consists of two main components: the encoder and the decoder.\n\nThe encoder is composed of a stack of identical layers. Each layer has two sub-layers: a multi-head self-attention mechanism and a position-wise fully connected feed-forward network. The self-attention mechanism allows the encoder to weigh the importance of different words in the input sequence when encoding information. The feed-forward network helps to capture complex relationships between words. Each sub-layer is followed by a residual connection and layer normalization. The output of each sub-layer is then added to the input, and the result is normalized.\n\nThe decoder also consists of a stack of identical layers. In addition to the two sub-layers present in each encoder layer, the decoder inserts a third sub-layer. This additional sub-layer performs multi-head attention over the output of the encoder stack. This allows the decoder 

In [25]:
chat_history.append([(query, result["answer"])])

In [27]:
chat_history

[('Explain the transformer model archtiecture',
  'The Transformer model architecture consists of two main components: the encoder and the decoder.\n\nThe encoder is composed of a stack of identical layers. Each layer has two sub-layers: a multi-head self-attention mechanism and a position-wise fully connected feed-forward network. The self-attention mechanism allows the encoder to weigh the importance of different words in the input sequence when encoding information. The feed-forward network helps to capture complex relationships between words. Each sub-layer is followed by a residual connection and layer normalization. The output of each sub-layer is then added to the input, and the result is normalized.\n\nThe decoder also consists of a stack of identical layers. In addition to the two sub-layers present in each encoder layer, the decoder inserts a third sub-layer. This additional sub-layer performs multi-head attention over the output of the encoder stack. This allows the decoder 