**Reference Link:** [RAG Systems Essentials (Analytics Vidhya)](https://courses.analyticsvidhya.com/courses/take/rag-systems-essentials/lessons/60148017-hands-on-deep-dive-into-rag-evaluation-metrics-generator-metrics-i)

In [1]:
!pip install langchain==0.3.10
!pip install langchain-openai==0.2.12
!pip install langchain-community==0.3.11
!pip install langchain-huggingface==0.1.2
!pip install jq==1.8.0
!pip install pymupdf==1.25.1

Collecting langchain==0.3.10
  Using cached langchain-0.3.10-py3-none-any.whl.metadata (7.1 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain==0.3.10)
  Using cached langsmith-0.1.147-py3-none-any.whl.metadata (14 kB)
Collecting numpy<2,>=1.22.4 (from langchain==0.3.10)
  Using cached numpy-1.26.4-cp311-cp311-macosx_11_0_arm64.whl.metadata (114 kB)
INFO: pip is looking at multiple versions of langchain-core to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-core<0.4.0,>=0.3.22 (from langchain==0.3.10)
  Downloading langchain_core-0.3.73-py3-none-any.whl.metadata (5.8 kB)
  Using cached langchain_core-0.3.72-py3-none-any.whl.metadata (5.8 kB)
  Downloading langchain_core-0.3.71-py3-none-any.whl.metadata (5.8 kB)
  Downloading langchain_core-0.3.70-py3-none-any.whl.metadata (5.8 kB)
  Using cached langchain_core-0.3.69-py3-none-any.whl.metadata (5.8 kB)
  Using cached langchain_core-0.3.68-py3-none-any.whl.metadata (5.8 k

In [2]:
!pip install langchain-chroma==0.1.4

Collecting langchain-chroma==0.1.4
  Using cached langchain_chroma-0.1.4-py3-none-any.whl.metadata (1.6 kB)
Collecting chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0 (from langchain-chroma==0.1.4)
  Using cached chromadb-0.5.23-py3-none-any.whl.metadata (6.8 kB)
Collecting fastapi<1,>=0.95.2 (from langchain-chroma==0.1.4)
  Using cached fastapi-0.116.1-py3-none-any.whl.metadata (28 kB)
Collecting build>=1.0.3 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma==0.1.4)
  Downloading build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma==0.1.4)
  Using cached chroma_hnswlib-0.7.6-cp311-cp311-macosx_11_0_arm64.whl.metadata (252 bytes)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma==0.1.4)
  Using cached uvicorn-0.35.0-py3-none-any.whl.metadata (6.5 kB)
Collecting onnxruntime>=1.14.1 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->l

In [3]:
from getpass import getpass

OPENAI_KEY = getpass('Enter Open AI API Key: ')

In [5]:
import os

os.environ['OPENAI_API_KEY'] = OPENAI_KEY

In [6]:
from langchain_openai import OpenAIEmbeddings

# details here: https://openai.com/blog/new-embedding-models-and-api-updates
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

In [7]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [8]:
# create a chat prompt
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser


def generate_chunk_context(document, chunk):

    chunk_process_prompt = """You are an AI assistant specializing in research paper analysis.
                            Your task is to provide brief, relevant context for a chunk of text
                            based on the following research paper.

                            Here is the research paper:
                            <paper>
                            {paper}
                            </paper>

                            Here is the chunk we want to situate within the whole document:
                            <chunk>
                            {chunk}
                            </chunk>

                            Provide a concise context (3-4 sentences max) for this chunk,
                            considering the following guidelines:

                            - Give a short succinct context to situate this chunk within the overall document
                            for the purposes of improving search retrieval of the chunk.
                            - Answer only with the succinct context and nothing else.
                            - Context should be mentioned like 'Focuses on ....'
                            do not mention 'this chunk or section focuses on...'

                            Context:
                        """

    prompt_template = ChatPromptTemplate.from_template(chunk_process_prompt)

    agentic_chunk_chain = (prompt_template
                                |
                            chatgpt
                                |
                            StrOutputParser())

    context = agentic_chunk_chain.invoke({'paper': document, 'chunk': chunk})

    return context

In [9]:
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document


def create_contextual_chunks(file_path):

    print('Loading pages:', file_path)
    loader = PyMuPDFLoader(file_path)
    doc_pages = loader.load()

    print('Chunking pages:', file_path)
    splitter = RecursiveCharacterTextSplitter(chunk_size=3500,
                                              chunk_overlap=0)
    doc_chunks = splitter.split_documents(doc_pages)

    print('Generating contextual chunks:', file_path)
    original_doc = '\n'.join([doc.page_content for doc in doc_chunks])
    contextual_chunks = []
    for chunk in doc_chunks:
        context = generate_chunk_context(original_doc, chunk.page_content)
        contextual_chunks.append(Document(page_content=context+'\n'+chunk.page_content,
                                          metadata=chunk.metadata))
    print('Finished processing:', file_path)
    print()
    return contextual_chunks

In [12]:
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def create_simple_chunks(file_path, chunk_size=3500, chunk_overlap=0):

    print('Loading pages:', file_path)
    loader = PyMuPDFLoader(file_path)
    doc_pages = loader.load()

    print('Chunking pages:', file_path)
    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                              chunk_overlap=chunk_overlap)
    doc_chunks = splitter.split_documents(doc_pages)
    print('Finished processing:', file_path)
    print()
    return doc_chunks

In [10]:
from glob import glob

pdf_files = glob('./data/*.pdf')
pdf_files

['./data/2005.11401v4.pdf',
 './data/2005.14165v4.pdf',
 './data/1706.03762v7.pdf']

In [13]:
paper_docs = []
for fp in pdf_files:
    paper_docs.extend(create_simple_chunks(file_path=fp,
                                           chunk_size=3500,
                                           chunk_overlap=200))

Loading pages: ./data/2005.11401v4.pdf
Chunking pages: ./data/2005.11401v4.pdf
Finished processing: ./data/2005.11401v4.pdf

Loading pages: ./data/2005.14165v4.pdf
Chunking pages: ./data/2005.14165v4.pdf
Finished processing: ./data/2005.14165v4.pdf

Loading pages: ./data/1706.03762v7.pdf
Chunking pages: ./data/1706.03762v7.pdf
Finished processing: ./data/1706.03762v7.pdf



In [14]:
len(paper_docs)

158

In [15]:
# paper_docs = []
# for fp in pdf_files:
#     paper_docs.extend(create_contextual_chunks(fp))

In [17]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings - takes < 30s on Colab
chroma_db = Chroma.from_documents(documents=paper_docs,
                                  collection_name='final_project',
                                  embedding=openai_embed_model,
                                  # need to set the distance function to cosine else it uses euclidean by default
                                  # check https://docs.trychroma.com/guides#changing-the-distance-function
                                  collection_metadata={"hnsw:space": "cosine"},
                                  persist_directory="./final_project")

In [18]:
# load from disk
chroma_db = Chroma(persist_directory="./final_project",
                   collection_name='final_project',
                   embedding_function=openai_embed_model)

In [19]:
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 5})

In [20]:
from IPython.display import display, Markdown

def display_docs(docs):
    for doc in docs:
        print('Metadata:', doc.metadata)
        print('Content Brief:')
        display(Markdown(doc.page_content[:1000]))
        print()

In [21]:
query = "what is machine learning?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 3, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


loop of meta-learning. We further specialize the description to “zero-shot”, “one-shot”, or “few-shot” depending on how many
demonstrations are provided at inference time. These terms are intended to remain agnostic on the question of whether the model
learns new tasks from scratch at inference time or simply recognizes patterns seen during training – this is an important issue which
we discuss later in the paper, but “meta-learning” is intended to encompass both possibilities, and simply describes the inner-outer
loop structure.
4


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 3, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


Figure 1.2: Larger models make increasingly efﬁcient use of in-context information. We show in-context learning
performance on a simple task requiring the model to remove random symbols from a word, both with and without a
natural language task description (see Sec. 3.9.2). The steeper “in-context learning curves” for large models demonstrate
improved ability to learn a task from contextual information. We see qualitatively similar behavior across a wide range
of tasks.
sufﬁcient to enable a human to perform a new task to at least a reasonable degree of competence. Aside from pointing
to a conceptual limitation in our current NLP techniques, this adaptability has practical advantages – it allows humans
to seamlessly mix together or switch between many tasks and skills, for example performing addition during a lengthy
dialogue. To be broadly useful, we would someday like our NLP systems to have this same ﬂuidity and generality.
One potential route towards addressing these issues is meta


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 2, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


1
Introduction
Recent years have featured a trend towards pre-trained language representations in NLP systems, applied in increasingly
ﬂexible and task-agnostic ways for downstream transfer. First, single-layer representations were learned using word
vectors [MCCD13, PSM14] and fed to task-speciﬁc architectures, then RNNs with multiple layers of representations
and contextual state were used to form stronger representations [DL15, MBXS17, PNZtY18] (though still applied to
task-speciﬁc architectures), and more recently pre-trained recurrent or transformer language models [VSP+17] have
been directly ﬁne-tuned, entirely removing the need for task-speciﬁc architectures [RNSS18, DCLT18, HR18].
This last paradigm has led to substantial progress on many challenging NLP tasks such as reading comprehension,
question answering, textual entailment, and many others, and has continued to advance based on new architectures
and algorithms [RSR+19, LOG+19, YDY+19, LCG+19]. However, a major limitation 


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 2, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


the desired task. We use the term “in-context learning” to describe the inner loop of this process, which occurs within
the forward-pass upon each sequence. The sequences in this diagram are not intended to be representative of the data a
model would see during pre-training, but are intended to show that there are sometimes repeated sub-tasks embedded
within a single sequence.
3


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 39, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


task-speciﬁc [SDCW19, JYS+19, KR16] approaches to distillation of language models. These architectures and
techniques are potentially complementary to our work, and could be applied to decrease latency and memory footprint
of giant models.
As ﬁne-tuned language models have neared human performance on many standard benchmark tasks, considerable
effort has been devoted to constructing more difﬁcult or open-ended tasks, including question answering [KPR+19,
IBGC+14, CCE+18, MCKS18], reading comprehension [CHI+18, RCM19], and adversarially constructed datasets
designed to be difﬁcult for existing language models [SBBC19, NWD+19]. In this work we test our models on many
of these datasets.
Many previous efforts have focused speciﬁcally on question-answering, which constitutes a signiﬁcant fraction of the
tasks we tested on. Recent efforts include [RSR+19, RRS20], which ﬁne-tuned an 11 billion parameter language model,
and [GLT+20], which focused on attending over a large corpus of data at te




In [22]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [24]:
from langchain.retrievers.multi_query import MultiQueryRetriever
# Set logging for the queries
import logging

similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 5})

mq_retriever = MultiQueryRetriever.from_llm(
    retriever=similarity_retriever, llm=chatgpt
)

logging.basicConfig()
# so we can see what queries are generated by the LLM
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [25]:
query = "what is a cnn?"
top_docs = mq_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does CNN stand for and what are its main functions?  ', 'Can you explain the concept and applications of convolutional neural networks?  ', 'What are the key features and uses of CNNs in machine learning?']


Metadata: {'author': '', 'creationDate': 'D:20240410211143Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/1706.03762v7.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20240410211143Z', 'page': 5, 'producer': 'pdfTeX-1.40.25', 'source': './data/1706.03762v7.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Table 1: Maximum path lengths, per-layer complexity and minimum number of sequential operations
for different layer types. n is the sequence length, d is the representation dimension, k is the kernel
size of convolutions and r the size of the neighborhood in restricted self-attention.
Layer Type
Complexity per Layer
Sequential
Maximum Path Length
Operations
Self-Attention
O(n2 · d)
O(1)
O(1)
Recurrent
O(n · d2)
O(n)
O(n)
Convolutional
O(k · n · d2)
O(1)
O(logk(n))
Self-Attention (restricted)
O(r · n · d)
O(1)
O(n/r)
3.5
Positional Encoding
Since our model contains no recurrence and no convolution, in order for the model to make use of the
order of the sequence, we must inject some information about the relative or absolute position of the
tokens in the sequence. To this end, we add "positional encodings" to the input embeddings at the
bottoms of the encoder and decoder stacks. The positional encodings have the same dimension dmodel
as the embeddings, so that the two can be summed. Ther


Metadata: {'author': '', 'creationDate': 'D:20240410211143Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/1706.03762v7.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20240410211143Z', 'page': 2, 'producer': 'pdfTeX-1.40.25', 'source': './data/1706.03762v7.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Figure 1: The Transformer - model architecture.
The Transformer follows this overall architecture using stacked self-attention and point-wise, fully
connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,
respectively.
3.1
Encoder and Decoder Stacks
Encoder:
The encoder is composed of a stack of N = 6 identical layers. Each layer has two
sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-
wise fully connected feed-forward network. We employ a residual connection [11] around each of
the two sub-layers, followed by layer normalization [1]. That is, the output of each sub-layer is
LayerNorm(x + Sublayer(x)), where Sublayer(x) is the function implemented by the sub-layer
itself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding
layers, produce outputs of dimension dmodel = 512.
Decoder:
The decoder is also composed of a stack of N = 6 identical layers. 


Metadata: {'author': '', 'creationDate': 'D:20240410211143Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/1706.03762v7.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20240410211143Z', 'page': 1, 'producer': 'pdfTeX-1.40.25', 'source': './data/1706.03762v7.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


1
Introduction
Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks
in particular, have been firmly established as state of the art approaches in sequence modeling and
transduction problems such as language modeling and machine translation [35, 2, 5]. Numerous
efforts have since continued to push the boundaries of recurrent language models and encoder-decoder
architectures [38, 24, 15].
Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden
states ht, as a function of the previous hidden state ht−1 and the input for position t. This inherently
sequential nature precludes parallelization within training examples, which becomes critical at longer
sequence lengths, as memory constraints limit batching across examples. Recent work has achieved
significant improvements in computational efficiency through factor


Metadata: {'author': '', 'creationDate': 'D:20240410211143Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/1706.03762v7.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20240410211143Z', 'page': 4, 'producer': 'pdfTeX-1.40.25', 'source': './data/1706.03762v7.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


output values. These are concatenated and once again projected, resulting in the final values, as
depicted in Figure 2.
Multi-head attention allows the model to jointly attend to information from different representation
subspaces at different positions. With a single attention head, averaging inhibits this.
MultiHead(Q, K, V ) = Concat(head1, ..., headh)W O
where headi = Attention(QW Q
i , KW K
i , V W V
i )
Where the projections are parameter matrices W Q
i
∈Rdmodel×dk, W K
i
∈Rdmodel×dk, W V
i
∈Rdmodel×dv
and W O ∈Rhdv×dmodel.
In this work we employ h = 8 parallel attention layers, or heads. For each of these we use
dk = dv = dmodel/h = 64. Due to the reduced dimension of each head, the total computational cost
is similar to that of single-head attention with full dimensionality.
3.2.3
Applications of Attention in our Model
The Transformer uses multi-head attention in three different ways:
• In "encoder-decoder attention" layers, the queries come from the previous decoder layer,
and


Metadata: {'author': '', 'creationDate': 'D:20240410211143Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/1706.03762v7.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20240410211143Z', 'page': 1, 'producer': 'pdfTeX-1.40.25', 'source': './data/1706.03762v7.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


language modeling tasks [34].
To the best of our knowledge, however, the Transformer is the first transduction model relying
entirely on self-attention to compute representations of its input and output without using sequence-
aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate
self-attention and discuss its advantages over models such as [17, 18] and [9].
3
Model Architecture
Most competitive neural sequence transduction models have an encoder-decoder structure [5, 2, 35].
Here, the encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence
of continuous representations z = (z1, ..., zn). Given z, the decoder then generates an output
sequence (y1, ..., ym) of symbols one element at a time. At each step the model is auto-regressive
[10], consuming the previously generated symbols as additional input when generating the next.
2


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 3, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


Figure 1.2: Larger models make increasingly efﬁcient use of in-context information. We show in-context learning
performance on a simple task requiring the model to remove random symbols from a word, both with and without a
natural language task description (see Sec. 3.9.2). The steeper “in-context learning curves” for large models demonstrate
improved ability to learn a task from contextual information. We see qualitatively similar behavior across a wide range
of tasks.
sufﬁcient to enable a human to perform a new task to at least a reasonable degree of competence. Aside from pointing
to a conceptual limitation in our current NLP techniques, this adaptability has practical advantages – it allows humans
to seamlessly mix together or switch between many tasks and skills, for example performing addition during a lengthy
dialogue. To be broadly useful, we would someday like our NLP systems to have this same ﬂuidity and generality.
One potential route towards addressing these issues is meta


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 2, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


1
Introduction
Recent years have featured a trend towards pre-trained language representations in NLP systems, applied in increasingly
ﬂexible and task-agnostic ways for downstream transfer. First, single-layer representations were learned using word
vectors [MCCD13, PSM14] and fed to task-speciﬁc architectures, then RNNs with multiple layers of representations
and contextual state were used to form stronger representations [DL15, MBXS17, PNZtY18] (though still applied to
task-speciﬁc architectures), and more recently pre-trained recurrent or transformer language models [VSP+17] have
been directly ﬁne-tuned, entirely removing the need for task-speciﬁc architectures [RNSS18, DCLT18, HR18].
This last paradigm has led to substantial progress on many challenging NLP tasks such as reading comprehension,
question answering, textual entailment, and many others, and has continued to advance based on new architectures
and algorithms [RSR+19, LOG+19, YDY+19, LCG+19]. However, a major limitation 




In [26]:
query = "what is nlp?"
top_docs = mq_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does NLP stand for and what are its main applications?  ', 'Can you explain the concept of natural language processing and its significance?  ', 'What are the key components and techniques involved in NLP?']


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 2, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


1
Introduction
Recent years have featured a trend towards pre-trained language representations in NLP systems, applied in increasingly
ﬂexible and task-agnostic ways for downstream transfer. First, single-layer representations were learned using word
vectors [MCCD13, PSM14] and fed to task-speciﬁc architectures, then RNNs with multiple layers of representations
and contextual state were used to form stronger representations [DL15, MBXS17, PNZtY18] (though still applied to
task-speciﬁc architectures), and more recently pre-trained recurrent or transformer language models [VSP+17] have
been directly ﬁne-tuned, entirely removing the need for task-speciﬁc architectures [RNSS18, DCLT18, HR18].
This last paradigm has led to substantial progress on many challenging NLP tasks such as reading comprehension,
question answering, textual entailment, and many others, and has continued to advance based on new architectures
and algorithms [RSR+19, LOG+19, YDY+19, LCG+19]. However, a major limitation 


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 70, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


[LCG+19] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Sori-
cut. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint
arXiv:1909.11942, 2019.
[LCH+20] Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, and Jianfeng Gao.
Adversarial training for large neural language models. arXiv preprint arXiv:2004.08994, 2020.
[LDL19] Zhongyang Li, Xiao Ding, and Ting Liu. Story ending prediction by transferable bert. arXiv preprint
arXiv:1905.07504, 2019.
[LDM12] Hector Levesque, Ernest Davis, and Leora Morgenstern. The Winograd schema challenge. In Thirteenth
International Conference on the Principles of Knowledge Representation and Reasoning, 2012.
[LGG+20] Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and
Luke Zettlemoyer. Multilingual denoising pre-training for neural machine translation. arXiv preprint
arXiv:2001.08210, 2020.
[LGH+15] Xiaodong L


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 3, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


Figure 1.2: Larger models make increasingly efﬁcient use of in-context information. We show in-context learning
performance on a simple task requiring the model to remove random symbols from a word, both with and without a
natural language task description (see Sec. 3.9.2). The steeper “in-context learning curves” for large models demonstrate
improved ability to learn a task from contextual information. We see qualitatively similar behavior across a wide range
of tasks.
sufﬁcient to enable a human to perform a new task to at least a reasonable degree of competence. Aside from pointing
to a conceptual limitation in our current NLP techniques, this adaptability has practical advantages – it allows humans
to seamlessly mix together or switch between many tasks and skills, for example performing addition during a lengthy
dialogue. To be broadly useful, we would someday like our NLP systems to have this same ﬂuidity and generality.
One potential route towards addressing these issues is meta


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 69, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural ques-
tions: a benchmark for question answering research. Transactions of the Association of Computational
Linguistics, 2019.
[KR16] Yoon Kim and Alexander M. Rush. Sequence-level knowledge distillation. Arxiv, 2016.
[LB02] Edward Loper and Steven Bird. Nltk: The natural language toolkit, 2002.
[LC19] Guillaume Lample and Alexis Conneau. Cross-lingual language model pretraining. arXiv preprint
arXiv:1901.07291, 2019.
70


Metadata: {'author': '', 'creationDate': 'D:20240410211143Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/1706.03762v7.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20240410211143Z', 'page': 11, 'producer': 'pdfTeX-1.40.25', 'source': './data/1706.03762v7.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


[25] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated
corpus of english: The penn treebank. Computational linguistics, 19(2):313–330, 1993.
[26] David McClosky, Eugene Charniak, and Mark Johnson. Effective self-training for parsing. In
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference,
pages 152–159. ACL, June 2006.
[27] Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention
model. In Empirical Methods in Natural Language Processing, 2016.
[28] Romain Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive
summarization. arXiv preprint arXiv:1705.04304, 2017.
[29] Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact,
and interpretable tree annotation. In Proceedings of the 21st International Conference on
Computational Linguistics and 44th Annual Meeting of the ACL, pages 433–440. ACL, July
2006.
[30] Ofir Press 


Metadata: {'author': '', 'creationDate': 'D:20210413004838Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/2005.11401v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210413004838Z', 'page': 13, 'producer': 'pdfTeX-1.40.21', 'source': './data/2005.11401v4.pdf', 'subject': '', 'title': '', 'total_pages': 19, 'trapped': ''}
Content Brief:


approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing
Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, volume 1773 of CEUR Workshop
Proceedings. CEUR-WS.org, 2016.
URL http://ceur-ws.org/Vol-1773/CoCoNIPS_
2016_paper9.pdf.
[44] Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with BERT. arXiv preprint
arXiv:1901.04085, 2019. URL https://arxiv.org/abs/1901.04085.
[45] Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier,
and Michael Auli. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings
of the 2019 Conference of the North American Chapter of the Association for Computational
Linguistics (Demonstrations), pages 48–53, Minneapolis, Minnesota, June 2019. Association
for Computational Linguistics. doi: 10.18653/v1/N19-4009. URL https://www.aclweb.
org/anthology/N19-4009.
[46] Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, and Kyunghyun
Cho. Finding gen


Metadata: {'author': '', 'creationDate': 'D:20210413004838Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/2005.11401v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210413004838Z', 'page': 12, 'producer': 'pdfTeX-1.40.21', 'source': './data/2005.11401v4.pdf', 'subject': '', 'title': '', 'total_pages': 19, 'trapped': ''}
Content Brief:


for Computational Linguistics, pages 6086–6096, Florence, Italy, July 2019. Association for
Computational Linguistics. doi: 10.18653/v1/P19-1612. URL https://www.aclweb.org/
anthology/P19-1612.
[32] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed,
Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence
pre-training for natural language generation, translation, and comprehension. arXiv preprint
arXiv:1910.13461, 2019. URL https://arxiv.org/abs/1910.13461.
[33] Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. A diversity-promoting
objective function for neural conversation models. In Proceedings of the 2016 Conference of the
North American Chapter of the Association for Computational Linguistics: Human Language
Technologies, pages 110–119, San Diego, California, June 2016. Association for Computational
Linguistics. doi: 10.18653/v1/N16-1014. URL https://www.aclweb.org/anthology/
N16-1014.
[34] Margaret L


Metadata: {'author': '', 'creationDate': 'D:20210413004838Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/2005.11401v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210413004838Z', 'page': 10, 'producer': 'pdfTeX-1.40.21', 'source': './data/2005.11401v4.pdf', 'subject': '', 'title': '', 'total_pages': 19, 'trapped': ''}
Content Brief:


[7] Christopher Clark and Matt Gardner. Simple and Effective Multi-Paragraph Reading Compre-
hension. arXiv:1710.10723 [cs], October 2017. URL http://arxiv.org/abs/1710.10723.
arXiv: 1710.10723.
[8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of
Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Con-
ference of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis,
Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423.
URL https://www.aclweb.org/anthology/N19-1423.
[9] Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. Wiz-
ard of wikipedia: Knowledge-powered conversational agents. In International Conference on
Learning Representations, 2019. URL https://openreview.net/forum?id=r1l73iRqKm.
[10] Matthew Dunn, Levent Sagun, Mi




In [27]:
query = "what is nlp?"
top_docs = mq_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does NLP stand for and what are its main applications?  ', 'Can you explain the concept of natural language processing and its significance?  ', 'What are the key components and techniques involved in NLP?']


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 2, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


1
Introduction
Recent years have featured a trend towards pre-trained language representations in NLP systems, applied in increasingly
ﬂexible and task-agnostic ways for downstream transfer. First, single-layer representations were learned using word
vectors [MCCD13, PSM14] and fed to task-speciﬁc architectures, then RNNs with multiple layers of representations
and contextual state were used to form stronger representations [DL15, MBXS17, PNZtY18] (though still applied to
task-speciﬁc architectures), and more recently pre-trained recurrent or transformer language models [VSP+17] have
been directly ﬁne-tuned, entirely removing the need for task-speciﬁc architectures [RNSS18, DCLT18, HR18].
This last paradigm has led to substantial progress on many challenging NLP tasks such as reading comprehension,
question answering, textual entailment, and many others, and has continued to advance based on new architectures
and algorithms [RSR+19, LOG+19, YDY+19, LCG+19]. However, a major limitation 


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 70, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


[LCG+19] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Sori-
cut. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint
arXiv:1909.11942, 2019.
[LCH+20] Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, and Jianfeng Gao.
Adversarial training for large neural language models. arXiv preprint arXiv:2004.08994, 2020.
[LDL19] Zhongyang Li, Xiao Ding, and Ting Liu. Story ending prediction by transferable bert. arXiv preprint
arXiv:1905.07504, 2019.
[LDM12] Hector Levesque, Ernest Davis, and Leora Morgenstern. The Winograd schema challenge. In Thirteenth
International Conference on the Principles of Knowledge Representation and Reasoning, 2012.
[LGG+20] Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and
Luke Zettlemoyer. Multilingual denoising pre-training for neural machine translation. arXiv preprint
arXiv:2001.08210, 2020.
[LGH+15] Xiaodong L


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 3, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


Figure 1.2: Larger models make increasingly efﬁcient use of in-context information. We show in-context learning
performance on a simple task requiring the model to remove random symbols from a word, both with and without a
natural language task description (see Sec. 3.9.2). The steeper “in-context learning curves” for large models demonstrate
improved ability to learn a task from contextual information. We see qualitatively similar behavior across a wide range
of tasks.
sufﬁcient to enable a human to perform a new task to at least a reasonable degree of competence. Aside from pointing
to a conceptual limitation in our current NLP techniques, this adaptability has practical advantages – it allows humans
to seamlessly mix together or switch between many tasks and skills, for example performing addition during a lengthy
dialogue. To be broadly useful, we would someday like our NLP systems to have this same ﬂuidity and generality.
One potential route towards addressing these issues is meta


Metadata: {'author': '', 'creationDate': 'D:20200724000408Z', 'creator': 'LaTeX with hyperref package', 'file_path': './data/2005.14165v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20200724000408Z', 'page': 69, 'producer': 'pdfTeX-1.40.17', 'source': './data/2005.14165v4.pdf', 'subject': '', 'title': '', 'total_pages': 75, 'trapped': ''}
Content Brief:


Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural ques-
tions: a benchmark for question answering research. Transactions of the Association of Computational
Linguistics, 2019.
[KR16] Yoon Kim and Alexander M. Rush. Sequence-level knowledge distillation. Arxiv, 2016.
[LB02] Edward Loper and Steven Bird. Nltk: The natural language toolkit, 2002.
[LC19] Guillaume Lample and Alexis Conneau. Cross-lingual language model pretraining. arXiv preprint
arXiv:1901.07291, 2019.
70


Metadata: {'author': '', 'creationDate': 'D:20240410211143Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/1706.03762v7.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20240410211143Z', 'page': 11, 'producer': 'pdfTeX-1.40.25', 'source': './data/1706.03762v7.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


[25] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated
corpus of english: The penn treebank. Computational linguistics, 19(2):313–330, 1993.
[26] David McClosky, Eugene Charniak, and Mark Johnson. Effective self-training for parsing. In
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference,
pages 152–159. ACL, June 2006.
[27] Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention
model. In Empirical Methods in Natural Language Processing, 2016.
[28] Romain Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive
summarization. arXiv preprint arXiv:1705.04304, 2017.
[29] Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact,
and interpretable tree annotation. In Proceedings of the 21st International Conference on
Computational Linguistics and 44th Annual Meeting of the ACL, pages 433–440. ACL, July
2006.
[30] Ofir Press 


Metadata: {'author': '', 'creationDate': 'D:20210413004838Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/2005.11401v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210413004838Z', 'page': 13, 'producer': 'pdfTeX-1.40.21', 'source': './data/2005.11401v4.pdf', 'subject': '', 'title': '', 'total_pages': 19, 'trapped': ''}
Content Brief:


approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing
Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, volume 1773 of CEUR Workshop
Proceedings. CEUR-WS.org, 2016.
URL http://ceur-ws.org/Vol-1773/CoCoNIPS_
2016_paper9.pdf.
[44] Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with BERT. arXiv preprint
arXiv:1901.04085, 2019. URL https://arxiv.org/abs/1901.04085.
[45] Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier,
and Michael Auli. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings
of the 2019 Conference of the North American Chapter of the Association for Computational
Linguistics (Demonstrations), pages 48–53, Minneapolis, Minnesota, June 2019. Association
for Computational Linguistics. doi: 10.18653/v1/N19-4009. URL https://www.aclweb.
org/anthology/N19-4009.
[46] Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, and Kyunghyun
Cho. Finding gen


Metadata: {'author': '', 'creationDate': 'D:20210413004838Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/2005.11401v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210413004838Z', 'page': 12, 'producer': 'pdfTeX-1.40.21', 'source': './data/2005.11401v4.pdf', 'subject': '', 'title': '', 'total_pages': 19, 'trapped': ''}
Content Brief:


for Computational Linguistics, pages 6086–6096, Florence, Italy, July 2019. Association for
Computational Linguistics. doi: 10.18653/v1/P19-1612. URL https://www.aclweb.org/
anthology/P19-1612.
[32] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed,
Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence
pre-training for natural language generation, translation, and comprehension. arXiv preprint
arXiv:1910.13461, 2019. URL https://arxiv.org/abs/1910.13461.
[33] Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. A diversity-promoting
objective function for neural conversation models. In Proceedings of the 2016 Conference of the
North American Chapter of the Association for Computational Linguistics: Human Language
Technologies, pages 110–119, San Diego, California, June 2016. Association for Computational
Linguistics. doi: 10.18653/v1/N16-1014. URL https://www.aclweb.org/anthology/
N16-1014.
[34] Margaret L


Metadata: {'author': '', 'creationDate': 'D:20210413004838Z', 'creator': 'LaTeX with hyperref', 'file_path': './data/2005.11401v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210413004838Z', 'page': 10, 'producer': 'pdfTeX-1.40.21', 'source': './data/2005.11401v4.pdf', 'subject': '', 'title': '', 'total_pages': 19, 'trapped': ''}
Content Brief:


[7] Christopher Clark and Matt Gardner. Simple and Effective Multi-Paragraph Reading Compre-
hension. arXiv:1710.10723 [cs], October 2017. URL http://arxiv.org/abs/1710.10723.
arXiv: 1710.10723.
[8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of
Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Con-
ference of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis,
Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423.
URL https://www.aclweb.org/anthology/N19-1423.
[9] Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. Wiz-
ard of wikipedia: Knowledge-powered conversational agents. In International Conference on
Learning Representations, 2019. URL https://openreview.net/forum?id=r1l73iRqKm.
[10] Matthew Dunn, Levent Sagun, Mi




In [28]:
from langchain_core.prompts import ChatPromptTemplate

rag_prompt = """You are an assistant who is an expert in question-answering tasks.
                Answer the following question using only the following pieces of retrieved context.
                If the answer is not in the context, do not make up answers, just say that you don't know.
                Keep the answer detailed and well formatted based on the information from the context.

                Question:
                {question}

                Context:
                {context}

                Answer:
            """

rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)

In [29]:
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

qa_rag_chain = (
    {
        "context": (mq_retriever
                      |
                    format_docs),
        "question": RunnablePassthrough()
    }
      |
    rag_prompt_template
      |
    chatgpt
)

In [30]:
from IPython.display import display, Markdown

query = "What is machine learning?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

INFO:langchain.retrievers.multi_query:Generated queries: ['What are the key concepts and definitions of machine learning?  ', 'Can you explain the fundamentals and applications of machine learning?  ', 'How does machine learning work and what are its main types?']


Machine learning is not explicitly defined in the provided context. However, the context discusses concepts related to machine learning, particularly in the realm of natural language processing (NLP) and meta-learning. 

From the context, we can infer that machine learning involves the development of models that can learn from data and adapt to new tasks. Specifically, it mentions "meta-learning," which refers to a process where a model develops a broad set of skills and pattern recognition abilities during training and then uses these abilities at inference time to rapidly adapt to or recognize the desired task. This process can involve different learning paradigms such as "zero-shot," "one-shot," or "few-shot" learning, which describe how many demonstrations are provided to the model at inference time.

The context also highlights the importance of pre-trained language models and their ability to perform various NLP tasks without needing extensive task-specific datasets, which is a significant aspect of modern machine learning approaches. 

In summary, while the term "machine learning" itself is not defined in the context, it is closely related to the concepts of model training, adaptation, and the use of learned skills to perform tasks, as discussed in the context of meta-learning and language models. 

If you need a more specific definition or details about machine learning, I don't know.

In [31]:
from IPython.display import display, Markdown

query = "What are the main components of a RAG model and how do they interact?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

INFO:langchain.retrievers.multi_query:Generated queries: ['What are the key elements of a RAG model and what is their interaction process?', 'Can you explain the primary parts of a RAG model and how they work together?', 'What are the essential features of a RAG model and how do they connect with each other?']


The main components of a RAG (Retrieval-Augmented Generation) model are:

1. **Retriever**: This component, denoted as \( p_{\eta}(z|x) \), is responsible for retrieving relevant text documents based on a given input query \( x \). It returns a distribution over the top-K truncated text passages that are deemed relevant to the query. The retriever is initialized using the Dense Passage Retrieval (DPR) method, which employs retrieval supervision on datasets like Natural Questions and TriviaQA.

2. **Generator**: This component, represented as \( p_{\theta}(y_i|x, z, y_{1:i-1}) \), generates the target sequence \( y \) using the input query \( x \) and the retrieved documents \( z \) as additional context. The generator is designed to produce free-form, abstractive text responses, allowing it to generate answers that may not be verbatim from the retrieved documents.

### Interaction Between Components:
- The interaction between the retriever and the generator is crucial for the RAG model's performance. The retriever first processes the input query to fetch relevant documents, which are then used by the generator to create a response. This allows the model to leverage both the generative capabilities of the generator and the contextual information provided by the retrieved documents.
- The model demonstrates that it can achieve state-of-the-art results in open-domain question answering without relying on traditional extractive methods, as it can generate answers based on clues from the documents even if the exact answer is not present verbatim.

Overall, the RAG model effectively combines retrieval and generation, allowing for more flexible and accurate responses in various knowledge-intensive NLP tasks.