# Natural Language Processing

# Retrieval-augmented generation (RAG)

RAG is a technique for augmenting LLM knowledge with additional, often private or real-time, data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs.

<img src="../figures/RAG-process.png" >

Introducing `ChakyBot`, an innovative chatbot designed to assist Chaky (the instructor) and TA (Gun) in explaining the intricacies of the NLP course to students. Leveraging LangChain technology, ChakyBot excels in retrieving information from documents, ensuring a seamless and efficient learning experience for students engaging with the NLP curriculum.

1. Prompt
2. Retrieval
3. Memory
4. Chain

In [1]:
# #langchain library
# !pip install langchain==0.0.350
# #LLM
# !pip install accelerate==0.25.0
# !pip install transformers==4.36.2
# !pip install bitsandbytes==0.41.2
# #Text Embedding
# !pip install sentence-transformers==2.2.2
# !pip install InstructorEmbedding==1.0.1
# #vectorstore
# !pip install faiss-gpu==1.7.2
# !pip install faiss-cpu==1.7.4

In [2]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

os.environ['http_proxy']  = 'http://192.41.170.23:3128'
os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

## 1. Prompt

A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [53]:
from langchain.prompts import PromptTemplate

prompt_template = """
    I'm your friendly NLP chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about Natural Language Processing (NLP). 
    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. 
    Whether it's about probabilistic models, language models, or any other related topic, 
    I'm here to help break down complex concepts into easy-to-understand explanations.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    {context}
    Question: {question}
    Answer:
    """.strip()

PROMPT = PromptTemplate(
    template = prompt_template, 
    input_variables=["context", "question"]
)

# When encountering abusive, offensive, or harmful language, just politely ask the users to maintain appropriate behaviours.
# Always make sure to elaborate your response and use vibrant, positive tone to represent good branding of the school.
# Never answer with any unfinished response.

In [54]:
PROMPT.format_prompt(
    context = 'Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions.',
    question = 'What is Mahcine Learning')

StringPromptValue(text="I'm your friendly NLP chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about Natural Language Processing (NLP). \n    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. \n    Whether it's about probabilistic models, language models, or any other related topic, \n    I'm here to help break down complex concepts into easy-to-understand explanations.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions.\n    Question: What is Mahcine Learning\n    Answer:")

## 2. Retrieval

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code). 
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

### 2.1 Document Loaders 

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

[Download Document](https://web.stanford.edu/~jurafsky/slp3/)

In [5]:
# !pip install pymupdf

In [6]:
from langchain.document_loaders import PyMuPDFLoader

nlp_document = '../docs/pdf/SpeechandLanguageProcessing_3rd_07jan2023.pdf'
loader = PyMuPDFLoader(nlp_document)
documents = loader.load()

In [7]:
len(documents)

636

In [8]:
documents[1]

Document(page_content='Summary of Contents\nI\nFundamental Algorithms for NLP\n1\n1\nIntroduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n3\n2\nRegular Expressions, Text Normalization, Edit Distance. . . . . . . . .\n4\n3\nN-gram Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n31\n4\nNaive Bayes, Text Classiﬁcation, and Sentiment . . . . . . . . . . . . . . . . . 58\n5\nLogistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79\n6\nVector Semantics and Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103\n7\nNeural Networks and Neural Language Models . . . . . . . . . . . . . . . . . 134\n8\nSequence Labeling for Parts of Speech and Named Entities . . . . . . 160\n9\nRNNs and LSTMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185\n10 Transformers and Pretrained Langua

### 2.2 Document Transformers

In [9]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 100
)

docs = text_splitter.split_documents(documents) 

In [10]:
docs[1].page_content

'Summary of Contents\nI\nFundamental Algorithms for NLP\n1\n1\nIntroduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n3\n2\nRegular Expressions, Text Normalization, Edit Distance. . . . . . . . .\n4\n3\nN-gram Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n31\n4\nNaive Bayes, Text Classiﬁcation, and Sentiment . . . . . . . . . . . . . . . . . 58\n5\nLogistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79\n6\nVector Semantics and Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103\n7'

In [11]:
len(docs)

3421

### 2.3 Text Embedding Models

In [12]:
import torch
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
        model_name = model_name,              
        model_kwargs = {
            'device': torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        },
    )

  from tqdm.autonotebook import trange


load INSTRUCTOR_Transformer
max_seq_length  512


### 2.4 Vector Stores

In [13]:
#locate vectorstore
vector_path = '../vector-store/'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)

In [14]:
from langchain.vectorstores import FAISS

db_file_name = 'nlp_standford'
vectordb = FAISS.from_documents(
        documents = docs, 
        embedding = embedding_model
)

#save vector locally
vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp'
) 

### 2.5 Retreivers

In [15]:
#calling vector from local
vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings  = embedding_model,
    index_name = 'nlp'
) 

#ready to use
retriever = vectordb.as_retriever()

In [16]:
retriever.get_relevant_documents('What is Dependency Parsing')

[Document(page_content='for projective dependency parsing.\nProceedings of the 8th International\nWorkshop on Parsing Technologies\n(IWPT).\nNivre, J. 2006. Inductive Dependency\nParsing. Springer.\nNivre, J. 2009.\nNon-projective de-\npendency parsing in expected linear\ntime. ACL IJCNLP.', metadata={'source': '../docs/pdf/SpeechandLanguageProcessing_3rd_07jan2023.pdf', 'file_path': '../docs/pdf/SpeechandLanguageProcessing_3rd_07jan2023.pdf', 'page': 614, 'total_pages': 636, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': "D:20230107092057-08'00'", 'modDate': "D:20230107092057-08'00'", 'trapped': ''}),
 Document(page_content='dependency parse; the internal structure of the dependency parse consists solely of\ndirected relations between words. These head-dependent relationships directly en-\ncode important information that is often buried in the more complex phrase-structure\n

In [17]:
retriever.get_relevant_documents('What is Transformers')

[Document(page_content='formers are not based on recurrent connections (which can be hard to parallelize),\nwhich means that transformers can be more efﬁcient to implement at scale.\nTransformers map sequences of input vectors (x1,...,xn) to sequences of output\nvectors (y1,...,yn) of the same length. Transformers are made up of stacks of trans-\nformer blocks, each of which is a multilayer network made by combining simple\nlinear layers, feedforward networks, and self-attention layers, the key innovation of\nself-attention\ntransformers. Self-attention allows a network to directly extract and use information\nfrom arbitrarily large contexts without the need to pass it through intermediate re-', metadata={'source': '../docs/pdf/SpeechandLanguageProcessing_3rd_07jan2023.pdf', 'file_path': '../docs/pdf/SpeechandLanguageProcessing_3rd_07jan2023.pdf', 'page': 219, 'total_pages': 636, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyper

## 3. Memory

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [18]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history.add_user_message("hi!")
history.add_ai_message("whats up?")
history.add_user_message("How are you?")
history.add_ai_message("I'm good. How's about you?")
history

ChatMessageHistory(messages=[HumanMessage(content='hi!'), AIMessage(content='whats up?'), HumanMessage(content='How are you?'), AIMessage(content="I'm good. How's about you?")])

In [19]:
history.messages

[HumanMessage(content='hi!'),
 AIMessage(content='whats up?'),
 HumanMessage(content='How are you?'),
 AIMessage(content="I'm good. How's about you?")]

### 3.1 Memory types

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios. 
- Converstaion Buffer
- Converstaion Buffer Window

#### Converstaion Buffer
This memory allows for storing messages and then extracts the messages in a variable.

In [35]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory

ConversationBufferMemory(chat_memory=ChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='whats up'), HumanMessage(content='not much you'), AIMessage(content='not much')]))

In [38]:
memory = ConversationBufferMemory(return_messages=True)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content='whats up'),
  HumanMessage(content='not much you'),
  AIMessage(content='not much')]}

#### Conversation Buffer Window
- it keeps a list of the interactions of the conversation over time. 
- it only uses the last K interactions. 
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [42]:
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=1)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory

ConversationBufferWindowMemory(chat_memory=ChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='whats up'), HumanMessage(content='not much you'), AIMessage(content='not much')]), k=1)

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In [43]:
memory.load_memory_variables({})

{'history': 'Human: not much you\nAI: not much'}

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do:

In [46]:
memory = ConversationBufferMemory(memory_key="chat_history")
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
memory.load_memory_variables({})

{'chat_history': "Human: hi!\nAI: what's up?"}

## 4. Chain

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available), 
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [47]:
# %cd ./models
# !git clone https://huggingface.co/lmsys/fastchat-t5-3b-v1.0

In [48]:
import torch
from tqdm.auto import tqdm

from transformers import AutoTokenizer, AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline

model_id = "../models/fastchat-t5-3b-v1.0"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
)
tokenizer.pad_token_id = tokenizer.eos_token_id

bitsandbyte_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch.float16,
    bnb_4bit_use_double_quant = True
)

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
    quantization_config = bitsandbyte_config,
    device_map = 'auto',
    load_in_8bit = True,
)

pipe = pipeline(
    task="text2text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=256, 
    model_kwargs = { 
        "temperature":0, 
        "repetition_penalty": 1.5
    }, 
)

llm = HuggingFacePipeline(pipeline = pipe)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [26]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

### [Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


`question_generator`

In [59]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [62]:
question_generator = LLMChain(
    llm=llm, 
    prompt=CONDENSE_QUESTION_PROMPT,
    verbose=True,
)

In [63]:
query = "what is NLP?"
question_generator(
    {'chat_history' : 'My name is ChakyBot', 'question' : query}, 
    # return_only_outputs=True
)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
My name is ChakyBot
Follow Up Input: what is NLP?
Standalone question:[0m

[1m> Finished chain.[0m


{'chat_history': 'My name is ChakyBot',
 'question': 'what is NLP?',
 'text': '<pad> What  is  Natural  Language  Processing?\n'}

`combine_docs_chain`

In [64]:
doc_chain = load_qa_chain(
    llm, 
    chain_type="stuff", 
    prompt = PROMPT,
    verbose=True,
)

In [65]:
#query
query = "what is NLP?"
#docs
input_documnets = retriever.get_relevant_documents(query)
#chain
doc_chain(
    {"input_documents": input_documnets, "question": query}, 
    # return_only_outputs=True
)



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your friendly NLP chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about Natural Language Processing (NLP). 
    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. 
    Whether it's about probabilistic models, language models, or any other related topic, 
    I'm here to help break down complex concepts into easy-to-understand explanations.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    information
cepts in NLP. It is a measure of how often two events x and y occur, compared with
what we would expect if they were independent:
I(x,y) = log2
P(x,y)
P(x)P(y)
(6.16)
The pointwise mutual information between a target word w and a context word
c (Church and Hanks 1989, Church and Hanks 1990) is then deﬁned as:


{'input_documents': [Document(page_content='information\ncepts in NLP. It is a measure of how often two events x and y occur, compared with\nwhat we would expect if they were independent:\nI(x,y) = log2\nP(x,y)\nP(x)P(y)\n(6.16)\nThe pointwise mutual information between a target word w and a context word\nc (Church and Hanks 1989, Church and Hanks 1990) is then deﬁned as:\nPMI(w,c) = log2\nP(w,c)\nP(w)P(c)\n(6.17)\nThe numerator tells us how often we observed the two words together (assuming\nwe compute probability by using the MLE). The denominator tells us how often\nwe would expect the two words to co-occur assuming they each occurred indepen-\ndently; recall that the probability of two independent events both occurring is just', metadata={'source': '../docs/pdf/SpeechandLanguageProcessing_3rd_07jan2023.pdf', 'file_path': '../docs/pdf/SpeechandLanguageProcessing_3rd_07jan2023.pdf', 'page': 123, 'total_pages': 636, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywo

In [66]:
memory = ConversationBufferWindowMemory(
    k = 3,  
    memory_key = "chat_history", 
    return_messages = True,  
    output_key = 'answer'
)
    
chain = ConversationalRetrievalChain(
    retriever = retriever,
    question_generator = question_generator,
    combine_docs_chain = doc_chain,
    return_source_documents = True,
    memory = memory,
    get_chat_history = lambda h :h,
    verbose = True
)

## 5. Chatbot

In [67]:
prompt_question = 'Who are you by the way?'
answer = chain({"question": prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your friendly NLP chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about Natural Language Processing (NLP). 
    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. 
    Whether it's about probabilistic models, language models, or any other related topic, 
    I'm here to help break down complex concepts into easy-to-understand explanations.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    in this way are also called complementizers.
complementizer
Pronouns act as a shorthand for referring to an entity or event. Personal pro-
pronoun
nouns refer to persons or entities (you, she, I, it, me, etc.). Possessive pronouns are
forms of personal pronoun

{'question': 'Who are you by the way?',
 'chat_history': [],
 'answer': '<pad>  I  am  Chaky.\n Using  a  bigram  language  model  with  add-one  smoothing,  what  is  P(Sam  |  am)?  Include`< s>` and` am` in  your  counts  just  like  any  other  token.\n 3.6\n Suppose  we  didn’t  use  the  end-symbol  .  Train  an  unsmoothed  bigram  grammar  on  the  following  training  corpus  without  using  the  end-symbol:\n< s>\n I  do  not  like  green  eggs  and  Sam\n Using  a  bigram  language  model  with  add-one  smoothing,  what  is  P(Sam  |  am)?  Include{< s>` and` am` in  your  counts  just  like  any  other  token.\n 3.7\n Suppose  we  didn’t  use  the  end-symbol  .  Train ',
 'source_documents': [Document(page_content='in this way are also called complementizers.\ncomplementizer\nPronouns act as a shorthand for referring to an entity or event. Personal pro-\npronoun\nnouns refer to persons or entities (you, she, I, it, me, etc.). Possessive pronouns are\nforms of personal pro

In [68]:
prompt_question = 'What is transformers?'
answer = chain({"question": prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Who are you by the way?'), AIMessage(content='<pad>  I  am  Chaky.\n Using  a  bigram  language  model  with  add-one  smoothing,  what  is  P(Sam  |  am)?  Include`< s>` and` am` in  your  counts  just  like  any  other  token.\n 3.6\n Suppose  we  didn’t  use  the  end-symbol  .  Train  an  unsmoothed  bigram  grammar  on  the  following  training  corpus  without  using  the  end-symbol:\n< s>\n I  do  not  like  green  eggs  and  Sam\n Using  a  bigram  language  model  with  add-one  smoothing,  what  is  P(Sam  |  am)?  Include{< s>` and` am` in  your  counts  just  like  any  other  token.\n 3.7\n Suppose  we  didn’t  use  the  end-symbol  .  Train ')]
Foll

{'question': 'What is transformers?',
 'chat_history': [HumanMessage(content='Who are you by the way?'),
  AIMessage(content='<pad>  I  am  Chaky.\n Using  a  bigram  language  model  with  add-one  smoothing,  what  is  P(Sam  |  am)?  Include`< s>` and` am` in  your  counts  just  like  any  other  token.\n 3.6\n Suppose  we  didn’t  use  the  end-symbol  .  Train  an  unsmoothed  bigram  grammar  on  the  following  training  corpus  without  using  the  end-symbol:\n< s>\n I  do  not  like  green  eggs  and  Sam\n Using  a  bigram  language  model  with  add-one  smoothing,  what  is  P(Sam  |  am)?  Include{< s>` and` am` in  your  counts  just  like  any  other  token.\n 3.7\n Suppose  we  didn’t  use  the  end-symbol  .  Train ')],
 'answer': '<pad>  transformers  in  the  bigram  language  model  are  used  to  represent  the  relationships  between  words  in  a  sequence  of  words.  They  are  used  to  encode  the  relationships  between  words  in  a  sequence  of  words  