# Build a Simple RAG System


This notebook demonstrates how to build a simple Retrieval-Augmented Generation (RAG) system using LangChain and OpenAI models. The workflow covers the complete process of loading and processing both JSON and PDF documents, chunking them for efficient retrieval, generating embeddings with OpenAI's embedding models, and storing/retrieving document vectors using Chroma as a vector database. It then shows how to retrieve relevant context based on semantic similarity and construct a RAG pipeline that combines retrieval and generation to answer user queries using the retrieved context.

## Content

1. **Setup and API Keys**: Configure environment variables and set up the OpenAI API key.
2. **Embedding Models**: Initialize OpenAI embedding models for document vectorization.
3. **Data Loading and Processing**: Load and process both JSON and PDF documents, including chunking strategies for efficient retrieval.
4. **Vector Database**: Store document embeddings in a Chroma vector database and demonstrate how to persist and reload the database.
5. **Semantic Retrieval**: Retrieve relevant document chunks using semantic similarity search.
6. **RAG Pipeline Construction**: Build a RAG pipeline that combines retrieval and generation using LangChain's prompt templates and OpenAI's language models.
7. **Query Examples**: Run sample queries through the RAG pipeline and display the generated answers.
8. **Conclusion**: Summarize the workflow and suggest possible extensions for further experimentation.

This notebook provides a practical, end-to-end example of building a RAG system for question answering over custom document collections.

![RAG](../../images/rag.svg)

## SetUp Open AI API Key

In [3]:
import os
from dotenv import load_dotenv
load_dotenv()

## Setup Environment Variables
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

### Open AI Embedding Models

LangChain provides to access Open AI embedding models which include the two models from OpenAI: 

- A smaller and highly efficient `text-embedding-3-small` model 

- A larger and more powerful `text-embedding-3-large` model.

In [5]:
from langchain_openai import OpenAIEmbeddings

# Find more here: https://openai.com/blog/new-embedding-models-and-api-updates
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

## Load and Process JSON Data

In [17]:
import json
from langchain.docstore.document import Document
from langchain.document_loaders import JSONLoader
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [12]:

# Load JSON Documents
loader = JSONLoader(file_path='../../rag_docs/wikidata_rag_demo.jsonl',
                    jq_schema='.',
                    text_content=False,
                    json_lines=True)
wiki_docs = loader.load()

In [9]:
# Display number of documents loaded
len(wiki_docs)

1801

In [11]:
# Display a sample document
print(wiki_docs[3])

page_content='{"id": "71548", "title": "Chi-square distribution", "paragraphs": ["In probability theory and statistics, the chi-square distribution (also chi-squared or formula_1\u00a0 distribution) is one of the most widely used theoretical probability distributions. Chi-square distribution with formula_2 degrees of freedom is written as formula_3. It is a special case of gamma distribution.", "Chi-square distribution is primarily used in statistical significance tests and confidence intervals. It is useful, because it is relatively easy to show that certain probability distributions come close to it, under certain conditions. One of these conditions is that the null hypothesis must be true. Another one is that the different random variables (or observations) must be independent of each other."]}' metadata={'source': '/Users/saurabhbhardwaj/Desktop/Retrieval_Augmented_Generation_from_Basic_to_Advance/rag_docs/wikidata_rag_demo.jsonl', 'seq_num': 4}


In [None]:
# Process Documents
wiki_docs_processed = []

for doc in wiki_docs:
    doc = json.loads(doc.page_content)
    metadata = {
        "title": doc['title'],
        "id": doc['id'],
        "source": "Wikipedia"
    }
    data = ' '.join(doc['paragraphs'])
    wiki_docs_processed.append(Document(page_content=data, metadata=metadata))

In [15]:
# Display a sample processed document
print(wiki_docs_processed[3])

page_content='In probability theory and statistics, the chi-square distribution (also chi-squared or formula_1  distribution) is one of the most widely used theoretical probability distributions. Chi-square distribution with formula_2 degrees of freedom is written as formula_3. It is a special case of gamma distribution. Chi-square distribution is primarily used in statistical significance tests and confidence intervals. It is useful, because it is relatively easy to show that certain probability distributions come close to it, under certain conditions. One of these conditions is that the null hypothesis must be true. Another one is that the different random variables (or observations) must be independent of each other.' metadata={'title': 'Chi-square distribution', 'id': '71548', 'source': 'Wikipedia'}


### Load and Process PDF documents

#### Create Simple Document Chunks for Standard Retrieval

**Chunking** is the process of dividing large documents or texts into smaller, manageable pieces called "chunks."  

In RAG and information retrieval, chunking helps to:

- Improve search and retrieval accuracy by working with smaller, focused text segments.
- Ensure that each chunk fits within the input size limits of language models or embedding models.
- Enable more efficient and relevant context retrieval for question answering and summarization tasks.

Chunking strategies can vary, such as splitting by fixed character length, sentences, paragraphs, or semantic boundaries.

![chunk](../../images/chunks.png)
**source**: myscale.com

In the example below, we are using simple chunking where each chunk is fixed size of <= 3500 characters and overlap of 200 characters for any small isolated chunks.

I recommend that you can and should experiment with various chunk sizes and overlaps.

In [18]:

# Load and Process PDF Documents
def create_simple_chunks(file_path, chunk_size=3500, chunk_overlap=0):

    print('Loading pages:', file_path)
    loader = PyMuPDFLoader(file_path)
    doc_pages = loader.load()

    print('Chunking pages:', file_path)
    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                              chunk_overlap=chunk_overlap)
    doc_chunks = splitter.split_documents(doc_pages)
    print('Finished processing:', file_path)
    print()
    return doc_chunks

In [21]:
# Find all PDF files in the specified directory
from glob import glob

pdf_files = glob('../../rag_docs/*.pdf')
print(pdf_files)

['../../rag_docs/cnn_paper.pdf', '../../rag_docs/vision_transformer.pdf', '../../rag_docs/resnet_paper.pdf', '../../rag_docs/attention_paper.pdf']


In [23]:
# Process all PDF files and create chunks
pdf_docs = []
for file in pdf_files:
    pdf_docs.extend(
        create_simple_chunks(
            file_path=file,
            chunk_size=3500,
            chunk_overlap=200))

Loading pages: ../../rag_docs/cnn_paper.pdf
Chunking pages: ../../rag_docs/cnn_paper.pdf
Finished processing: ../../rag_docs/cnn_paper.pdf

Loading pages: ../../rag_docs/vision_transformer.pdf
Chunking pages: ../../rag_docs/vision_transformer.pdf
Finished processing: ../../rag_docs/vision_transformer.pdf

Loading pages: ../../rag_docs/resnet_paper.pdf
Chunking pages: ../../rag_docs/resnet_paper.pdf
Finished processing: ../../rag_docs/resnet_paper.pdf

Loading pages: ../../rag_docs/attention_paper.pdf
Chunking pages: ../../rag_docs/attention_paper.pdf
Finished processing: ../../rag_docs/attention_paper.pdf



In [27]:
# Display total number of PDF document chunks and a sample chunk
print(f'Total PDF Document Chunks: {len(pdf_docs)}')
print(pdf_docs[0])

Total PDF Document Chunks: 79
page_content='An Introduction to Convolutional Neural Networks
Keiron O’Shea1 and Ryan Nash2
1 Department of Computer Science, Aberystwyth University, Ceredigion, SY23 3DB
keo7@aber.ac.uk
2 School of Computing and Communications, Lancaster University, Lancashire, LA1
4YW
nashrd@live.lancs.ac.uk
Abstract. The ﬁeld of machine learning has taken a dramatic twist in re-
cent times, with the rise of the Artiﬁcial Neural Network (ANN). These
biologically inspired computational models are able to far exceed the per-
formance of previous forms of artiﬁcial intelligence in common machine
learning tasks. One of the most impressive forms of ANN architecture is
that of the Convolutional Neural Network (CNN). CNNs are primarily
used to solve difﬁcult image-driven pattern recognition tasks and with
their precise yet simple architecture, offers a simpliﬁed method of getting
started with ANNs.
This document provides a brief introduction to CNNs, discussing recently
publis

### Combine all document chunks in one list

In [28]:
# Combine all documents
documents = wiki_docs_processed + pdf_docs
print(len(documents))

1880


## Index Document Chunks and Embeddings in Vector DB

- Initialize a connection to a Chroma vector DB client. 
- Save to disk

In [31]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings
chroma_db = Chroma.from_documents(documents=documents,
                                  collection_name='docs_db',
                                  embedding=openai_embed_model,
                                  # check https://docs.trychroma.com/guides#changing-the-distance-function
                                  collection_metadata={"hnsw:space": "cosine"}, # choose cosine similarity
                                  persist_directory="./docs_db") # save to disk

### Load Vector DB from disk

If needed, we can load and create a connection to saved docs_db.

In [32]:
# # load from disk
# chroma_db = Chroma(persist_directory="./docs_db",
#                    collection_name='docs_db',
#                    embedding_function=openai_embed_model)

### Semantic Similarity based Retrieval

- Use cosine similarity
- Retrieve the top 5 similar documents based on the user input query

In [34]:
# Semantic Similarity based Retrieval using Cosine Similarity
retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 5}) # Retrieve the top 5 similar documents based on the user input query

In [35]:
from IPython.display import display, Markdown

# Function to display retrieved documents
def display_docs(docs):
    for doc in docs:
        print('Metadata:', doc.metadata)
        print('Content Brief:')
        display(Markdown(doc.page_content[:1000]))
        print()

In [37]:
# let's try with a sample query
query = "what is machine learning?"
retrieved_docs = retriever.invoke(query)
display_docs(retrieved_docs)

Metadata: {'source': 'Wikipedia', 'id': '564928', 'title': 'Machine learning'}
Content Brief:


Machine learning gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). It is a subfield of computer science. The idea came from work in artificial intelligence. Machine learning explores the study and construction of algorithms which can learn and make predictions on data. Such algorithms follow programmed instructions, but can also make predictions or decisions based on data. They build a model from sample inputs. Machine learning is done where designing and programming explicit algorithms cannot be done. Examples include spam filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), search engines and computer vision.


Metadata: {'title': 'Supervised learning', 'source': 'Wikipedia', 'id': '359370'}
Content Brief:


In machine learning, supervised learning is the task of inferring a function from labelled training data. The results of the training are known beforehand, the system simply learns how to get to these results correctly. Usually, such systems work with vectors. They get the training data and the result of the training as two vectors and produce a "classifier". Usually, the system uses inductive reasoning to generalize the training data.


Metadata: {'source': 'Wikipedia', 'title': 'Deep learning', 'id': '663523'}
Content Brief:


Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. As with other kinds of machine-learning, learning sessions can be unsupervised, semi-supervised, or supervised. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. Certain tasks, such as as recognizing and understanding speech, images or handwriting, is easy to do for humans. However, for a computer, these tasks are very difficult to do. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways, whic


Metadata: {'title': 'Artificial intelligence', 'id': '6360', 'source': 'Wikipedia'}
Content Brief:


Artificial intelligence (AI) is the ability of a computer program or a machine to think and learn. It is also a field of study which tries to make computers "smart". They work on their own without being encoded with commands. John McCarthy came up with the name "Artificial Intelligence" in 1955. In general use, the term "artificial intelligence" means a programme which mimics human cognition. At least some of the things we associate with other minds, such as learning and problem solving can be done by computers, though not in the same way as we do. Andreas Kaplan and Michael Haenlein define AI as a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation. An ideal (perfect) intelligent machine is a flexible agent which perceives its environment and takes actions to maximize its chance of success at some goal or objective. As machines become increasingly capable, mental facu


Metadata: {'title': 'Artificial neural network', 'id': '44742', 'source': 'Wikipedia'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.




In [38]:
query = "what is the difference between transformers and vision transformers?"
retrieved_docs = retriever.invoke(query)
display_docs(retrieved_docs)

Metadata: {'format': 'PDF 1.5', 'creator': 'LaTeX with hyperref', 'creationdate': '2021-06-04T00:19:58+00:00', 'keywords': '', 'trapped': '', 'page': 7, 'subject': '', 'modDate': 'D:20210604001958Z', 'author': '', 'total_pages': 22, 'title': '', 'source': '../../rag_docs/vision_transformer.pdf', 'creationDate': 'D:20210604001958Z', 'moddate': '2021-06-04T00:19:58+00:00', 'producer': 'pdfTeX-1.40.21', 'file_path': '../../rag_docs/vision_transformer.pdf'}
Content Brief:


Published as a conference paper at ICLR 2021
4.4
SCALING STUDY
We perform a controlled scaling study of different models by evaluating transfer performance from
JFT-300M. In this setting data size does not bottleneck the models’ performances, and we assess
performance versus pre-training cost of each model. The model set includes: 7 ResNets, R50x1,
R50x2 R101x1, R152x1, R152x2, pre-trained for 7 epochs, plus R152x2 and R200x3 pre-trained
for 14 epochs; 6 Vision Transformers, ViT-B/32, B/16, L/32, L/16, pre-trained for 7 epochs, plus
L/16 and H/14 pre-trained for 14 epochs; and 5 hybrids, R50+ViT-B/32, B/16, L/32, L/16 pre-
trained for 7 epochs, plus R50+ViT-L/16 pre-trained for 14 epochs (for hybrids, the number at the
end of the model name stands not for the patch size, but for the total dowsampling ratio in the ResNet
backbone).
Figure 5 contains the transfer performance versus total pre-training compute (see Appendix D.5
for details on computational costs). Detailed results per mode


Metadata: {'title': '', 'subject': '', 'creationDate': 'D:20210604001958Z', 'total_pages': 22, 'creator': 'LaTeX with hyperref', 'trapped': '', 'creationdate': '2021-06-04T00:19:58+00:00', 'moddate': '2021-06-04T00:19:58+00:00', 'format': 'PDF 1.5', 'source': '../../rag_docs/vision_transformer.pdf', 'producer': 'pdfTeX-1.40.21', 'keywords': '', 'page': 0, 'modDate': 'D:20210604001958Z', 'file_path': '../../rag_docs/vision_transformer.pdf', 'author': ''}
Content Brief:


Published as a conference paper at ICLR 2021
AN IMAGE IS WORTH 16X16 WORDS:
TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
Alexey Dosovitskiy∗,†, Lucas Beyer∗, Alexander Kolesnikov∗, Dirk Weissenborn∗,
Xiaohua Zhai∗, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer,
Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby∗,†
∗equal technical contribution, †equal advising
Google Research, Brain Team
{adosovitskiy, neilhoulsby}@google.com
ABSTRACT
While the Transformer architecture has become the de-facto standard for natural
language processing tasks, its applications to computer vision remain limited. In
vision, attention is either applied in conjunction with convolutional networks, or
used to replace certain components of convolutional networks while keeping their
overall structure in place. We show that this reliance on CNNs is not necessary
and a pure transformer applied directly to sequences of image patches can perform
very well on image classiﬁcation tasks. When pre-traine


Metadata: {'page': 1, 'producer': 'pdfTeX-1.40.21', 'source': '../../rag_docs/vision_transformer.pdf', 'trapped': '', 'subject': '', 'title': '', 'keywords': '', 'format': 'PDF 1.5', 'moddate': '2021-06-04T00:19:58+00:00', 'creationdate': '2021-06-04T00:19:58+00:00', 'file_path': '../../rag_docs/vision_transformer.pdf', 'creator': 'LaTeX with hyperref', 'creationDate': 'D:20210604001958Z', 'total_pages': 22, 'author': '', 'modDate': 'D:20210604001958Z'}
Content Brief:


Published as a conference paper at ICLR 2021
inherent to CNNs, such as translation equivariance and locality, and therefore do not generalize well
when trained on insufﬁcient amounts of data.
However, the picture changes if the models are trained on larger datasets (14M-300M images). We
ﬁnd that large scale training trumps inductive bias. Our Vision Transformer (ViT) attains excellent
results when pre-trained at sufﬁcient scale and transferred to tasks with fewer datapoints. When
pre-trained on the public ImageNet-21k dataset or the in-house JFT-300M dataset, ViT approaches
or beats state of the art on multiple image recognition benchmarks. In particular, the best model
reaches the accuracy of 88.55% on ImageNet, 90.72% on ImageNet-ReaL, 94.55% on CIFAR-100,
and 77.63% on the VTAB suite of 19 tasks.
2
RELATED WORK
Transformers were proposed by Vaswani et al. (2017) for machine translation, and have since be-
come the state of the art method in many NLP tasks. Large Transformer-based mo


Metadata: {'author': '', 'trapped': '', 'creator': 'LaTeX with hyperref', 'keywords': '', 'producer': 'pdfTeX-1.40.21', 'total_pages': 22, 'page': 4, 'creationdate': '2021-06-04T00:19:58+00:00', 'source': '../../rag_docs/vision_transformer.pdf', 'modDate': 'D:20210604001958Z', 'title': '', 'file_path': '../../rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'moddate': '2021-06-04T00:19:58+00:00', 'subject': '', 'creationDate': 'D:20210604001958Z'}
Content Brief:


Published as a conference paper at ICLR 2021
Model
Layers
Hidden size D
MLP size
Heads
Params
ViT-Base
12
768
3072
12
86M
ViT-Large
24
1024
4096
16
307M
ViT-Huge
32
1280
5120
16
632M
Table 1: Details of Vision Transformer model variants.
We also evaluate on the 19-task VTAB classiﬁcation suite (Zhai et al., 2019b). VTAB evaluates
low-data transfer to diverse tasks, using 1 000 training examples per task. The tasks are divided into
three groups: Natural – tasks like the above, Pets, CIFAR, etc. Specialized – medical and satellite
imagery, and Structured – tasks that require geometric understanding like localization.
Model Variants. We base ViT conﬁgurations on those used for BERT (Devlin et al., 2019), as
summarized in Table 1. The “Base” and “Large” models are directly adopted from BERT and we
add the larger “Huge” model. In what follows we use brief notation to indicate the model size and
the input patch size: for instance, ViT-L/16 means the “Large” variant with 16×16 input patch siz


Metadata: {'total_pages': 22, 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210604001958Z', 'author': '', 'creator': 'LaTeX with hyperref', 'trapped': '', 'subject': '', 'moddate': '2021-06-04T00:19:58+00:00', 'creationdate': '2021-06-04T00:19:58+00:00', 'keywords': '', 'file_path': '../../rag_docs/vision_transformer.pdf', 'title': '', 'modDate': 'D:20210604001958Z', 'format': 'PDF 1.5', 'page': 3, 'source': '../../rag_docs/vision_transformer.pdf'}
Content Brief:


Published as a conference paper at ICLR 2021
The MLP contains two layers with a GELU non-linearity.
z0 = [xclass; x1
pE; x2
pE; · · · ; xN
p E] + Epos,
E ∈R(P 2·C)×D, Epos ∈R(N+1)×D
(1)
z′
ℓ= MSA(LN(zℓ−1)) + zℓ−1,
ℓ= 1 . . . L
(2)
zℓ= MLP(LN(z′
ℓ)) + z′
ℓ,
ℓ= 1 . . . L
(3)
y = LN(z0
L)
(4)
Inductive bias.
We note that Vision Transformer has much less image-speciﬁc inductive bias than
CNNs. In CNNs, locality, two-dimensional neighborhood structure, and translation equivariance are
baked into each layer throughout the whole model. In ViT, only MLP layers are local and transla-
tionally equivariant, while the self-attention layers are global. The two-dimensional neighborhood
structure is used very sparingly: in the beginning of the model by cutting the image into patches and
at ﬁne-tuning time for adjusting the position embeddings for images of different resolution (as de-
scribed below). Other than that, the position embeddings at initialization time carry no information
about the 2D pos




## Build the RAG Pipeline

**Prompt** refers to the input text or instructions provided to a language model (such as GPT) to guide its response. 

In Retrieval-Augmented Generation (RAG) systems, a prompt typically includes the user's question and relevant context retrieved from documents. The prompt is carefully designed to instruct the model on how to answer, what information to use, and how to format the output.

For example, in this notebook, the `rag_prompt` variable below defines a template that combines the user's question and retrieved context to generate a well-structured answer.

In [39]:
from langchain_core.prompts import ChatPromptTemplate

# RAG Prompt Template
rag_prompt = """You are an assistant who is an expert in question-answering tasks.
                Answer the following question using only the following pieces of retrieved context.
                If the answer is not in the context, do not make up answers, just say that you don't know.
                Keep the answer detailed and well formatted based on the information from the context.

                Question:
                {question}

                Context:
                {context}

                Answer:
            """

# Create Chat Prompt Template provided by langchain ChatPromptTemplate
rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)

In [40]:
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

# Initialize ChatOpenAI model
chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

# Function to format documents into a single string
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build the RAG Chain

# The RAG chain first retrieves relevant documents based on the query,
# formats them, and then uses the prompt template to generate an answer using the ChatGPT model
# You can learn more about Langchain Runnables here: https://python.langchain.com/en/latest/modules/core/runnables.html
q_and_a_rag_chain = (
    {
        "context": (retriever
                      |
                    format_docs),
        "question": RunnablePassthrough() # pass the question as is
    }
      |
    rag_prompt_template # Use the RAG prompt template
      |
    chatgpt # Generate answer using ChatGPT model
)

In [41]:
from IPython.display import display, Markdown

query = "What is machine learning?"
result = q_and_a_rag_chain.invoke(query)
display(Markdown(result.content))

Machine learning is a subfield of computer science that provides computers with the ability to learn from data without being explicitly programmed. The concept was introduced by Arthur Samuel in 1959 and is rooted in artificial intelligence (AI). Machine learning focuses on the study and construction of algorithms that can learn from and make predictions based on data. These algorithms follow programmed instructions but can also make decisions or predictions based on the data they process.

In machine learning, algorithms build models from sample inputs, which allows them to perform tasks where traditional programming methods are insufficient. Common applications of machine learning include spam filtering, detecting network intrusions, optical character recognition (OCR), search engines, and computer vision.

There are different types of machine learning, including:

1. **Supervised Learning**: This involves inferring a function from labeled training data, where the results are known beforehand. The system learns to produce correct outputs based on the input data, typically using vectors to represent both the training data and the results.

2. **Deep Learning**: A subset of machine learning that utilizes neural networks with multiple layers (known as multi-layer neural networks). Deep learning is particularly effective for complex tasks such as speech recognition, image understanding, and handwriting recognition. It processes information in a hierarchical manner, with each layer extracting increasingly abstract features from the data.

Overall, machine learning enables computers to adapt and improve their performance on tasks through experience, making it a powerful tool in the realm of artificial intelligence.

In [42]:
query = "What is a CNN?"
result = q_and_a_rag_chain.invoke(query)
display(Markdown(result.content))

A CNN, or Convolutional Neural Network, is a type of artificial neural network primarily used for image-driven pattern recognition tasks. CNNs are designed to process data with a grid-like topology, such as images, and they consist of three main types of layers: convolutional layers, pooling layers, and fully-connected layers.

### Key Features of CNNs:

1. **Architecture**: 
   - CNNs are structured to handle the spatial dimensionality of input data, which includes height, width, and depth (for color channels in images).
   - The architecture typically involves stacking multiple convolutional layers followed by pooling layers, which helps in reducing the dimensionality of the data while retaining important features.

2. **Convolutional Layers**:
   - These layers apply a convolution operation to the input, which involves sliding a filter (or kernel) over the input data to produce feature maps. Each neuron in a convolutional layer is connected to a small region of the input, allowing the network to learn spatial hierarchies of features.

3. **Pooling Layers**:
   - Pooling layers are used to down-sample the feature maps, reducing their dimensionality and helping to make the representation more manageable. This also aids in making the model invariant to small translations in the input.

4. **Fully-Connected Layers**:
   - After several convolutional and pooling layers, the high-level reasoning in the neural network is done through fully-connected layers, where every neuron is connected to every neuron in the previous layer.

5. **Activation Functions**:
   - CNNs often use activation functions like the Rectified Linear Unit (ReLU) to introduce non-linearity into the model, which helps in learning complex patterns.

6. **Efficiency**:
   - CNNs are designed to be computationally efficient, especially when dealing with large images, by reducing the number of parameters through shared weights in convolutional layers.

7. **Applications**:
   - CNNs are widely used in various applications, including image classification, object detection, and image segmentation, due to their ability to automatically learn and extract features from images.

In summary, CNNs are a powerful class of neural networks that excel in tasks involving image data, leveraging their unique architecture to efficiently learn from and process visual information.

In [44]:
query = "What is NLP and its relation to linguistics?"
result = q_and_a_rag_chain.invoke(query)
display(Markdown(result.content))

Natural Language Processing (NLP) is a field within Artificial Intelligence that focuses on enabling computers to automatically understand and generate human languages. The term "Natural Language" specifically refers to human languages, distinguishing them from programming languages. The overarching goal of NLP is to facilitate seamless interaction between computers and humans through language, allowing for tasks such as automatic translation, sentiment analysis, and conversational agents.

NLP is closely related to linguistics, which is the scientific study of language and its structure. Linguistics provides the foundational theories and frameworks that inform NLP techniques, as understanding the nuances of human language—such as syntax, semantics, and pragmatics—is essential for developing effective NLP applications. By leveraging insights from linguistics, NLP aims to create systems that can comprehend and produce language in a way that is meaningful and contextually appropriate.

In [47]:
# try out a new query which answer is not present in the documents
query = "What is LangChain?"
result = q_and_a_rag_chain.invoke(query)
display(Markdown(result.content))

I don't know.

## Conclusion

In this notebook, we built a simple Retrieval-Augmented Generation (RAG) system using LangChain and OpenAI models. We demonstrated how to:

- Load and process both JSON and PDF documents.
- Chunk documents for efficient retrieval.
- Generate embeddings using OpenAI's embedding models.
- Store and retrieve document vectors using Chroma as a vector database.
- Retrieve relevant context based on semantic similarity.
- Construct a RAG pipeline that combines retrieval and generation to answer user queries using retrieved context.

This workflow enables robust question-answering over custom document collections, ensuring responses are grounded in the provided data. You can further extend this system by experimenting with different chunking strategies, embedding models, or integrating additional data sources.
"""