# Build a Simple RAG System

## Install OpenAI, and LangChain dependencies

In [1]:
!pip install langchain==0.3.10
!pip install langchain-openai==0.2.12
!pip install langchain-community==0.3.11
!pip install jq==1.8.0
!pip install pymupdf==1.25.1

Collecting langchain-openai==0.2.12
  Downloading langchain_openai-0.2.12-py3-none-any.whl.metadata (2.7 kB)
Collecting openai<2.0.0,>=1.55.3 (from langchain-openai==0.2.12)
  Downloading openai-1.57.3-py3-none-any.whl.metadata (24 kB)
Collecting tiktoken<1,>=0.7 (from langchain-openai==0.2.12)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading langchain_openai-0.2.12-py3-none-any.whl (50 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m823.7 kB/s[0m eta [36m0:00:00[0m
[?25hDownloading openai-1.57.3-py3-none-any.whl (390 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m390.2/390.2 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling coll

## Install Chroma Vector DB and LangChain wrapper

In [2]:
!pip install langchain-chroma==0.1.4

Collecting langchain-chroma==0.1.4
  Downloading langchain_chroma-0.1.4-py3-none-any.whl.metadata (1.6 kB)
Collecting chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0 (from langchain-chroma==0.1.4)
  Downloading chromadb-0.5.23-py3-none-any.whl.metadata (6.8 kB)
Collecting fastapi<1,>=0.95.2 (from langchain-chroma==0.1.4)
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting build>=1.0.3 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma==0.1.4)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma==0.1.4)
  Downloading chroma_hnswlib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma==0.1.4)
  Downloading uvicorn-0.32.1-py3-none-any.whl.metadata (6.6 kB)
Collecting posthog>=2.4.0 (from chromadb!=0.5.4,!=0.

## Enter Open AI API Key

In [3]:
from getpass import getpass

OPENAI_KEY = getpass('Enter Open AI API Key: ')

Enter Open AI API Key: ··········


## Setup Environment Variables

In [4]:
import os

os.environ['OPENAI_API_KEY'] = OPENAI_KEY

### Open AI Embedding Models

LangChain enables us to access Open AI embedding models which include the newest models: a smaller and highly efficient `text-embedding-3-small` model, and a larger and more powerful `text-embedding-3-large` model.

In [5]:
from langchain_openai import OpenAIEmbeddings

# details here: https://openai.com/blog/new-embedding-models-and-api-updates
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

## Loading and Processing the Data

### Get the dataset

In [None]:
# if you can't download using the following code
# go to https://drive.google.com/file/d/1aZxZejfteVuofISodUrY2CDoyuPLYDGZ download it
# manually upload it on colab
!gdown 1aZxZejfteVuofISodUrY2CDoyuPLYDGZ

Downloading...
From: https://drive.google.com/uc?id=1aZxZejfteVuofISodUrY2CDoyuPLYDGZ
To: /content/rag_docs.zip
100% 5.92M/5.92M [00:00<00:00, 24.4MB/s]


In [None]:
!unzip rag_docs.zip

Archive:  rag_docs.zip
   creating: rag_docs/
  inflating: rag_docs/attention_paper.pdf  
  inflating: rag_docs/cnn_paper.pdf  
  inflating: rag_docs/resnet_paper.pdf  
  inflating: rag_docs/vision_transformer.pdf  
  inflating: rag_docs/wikidata_rag_demo.jsonl  


### Load and Process JSON Documents

In [None]:
from langchain.document_loaders import JSONLoader

loader = JSONLoader(file_path='./rag_docs/wikidata_rag_demo.jsonl',
                    jq_schema='.',
                    text_content=False,
                    json_lines=True)
wiki_docs = loader.load()

In [None]:
len(wiki_docs)

1801

In [None]:
wiki_docs[3]

Document(metadata={'source': '/content/rag_docs/wikidata_rag_demo.jsonl', 'seq_num': 4}, page_content='{"id": "71548", "title": "Chi-square distribution", "paragraphs": ["In probability theory and statistics, the chi-square distribution (also chi-squared or formula_1\\u00a0 distribution) is one of the most widely used theoretical probability distributions. Chi-square distribution with formula_2 degrees of freedom is written as formula_3. It is a special case of gamma distribution.", "Chi-square distribution is primarily used in statistical significance tests and confidence intervals. It is useful, because it is relatively easy to show that certain probability distributions come close to it, under certain conditions. One of these conditions is that the null hypothesis must be true. Another one is that the different random variables (or observations) must be independent of each other."]}')

In [None]:
import json
from langchain.docstore.document import Document
wiki_docs_processed = []

for doc in wiki_docs:
    doc = json.loads(doc.page_content)
    metadata = {
        "title": doc['title'],
        "id": doc['id'],
        "source": "Wikipedia"
    }
    data = ' '.join(doc['paragraphs'])
    wiki_docs_processed.append(Document(page_content=data, metadata=metadata))

In [None]:
wiki_docs_processed[3]

Document(metadata={'title': 'Chi-square distribution', 'id': '71548', 'source': 'Wikipedia'}, page_content='In probability theory and statistics, the chi-square distribution (also chi-squared or formula_1\xa0 distribution) is one of the most widely used theoretical probability distributions. Chi-square distribution with formula_2 degrees of freedom is written as formula_3. It is a special case of gamma distribution. Chi-square distribution is primarily used in statistical significance tests and confidence intervals. It is useful, because it is relatively easy to show that certain probability distributions come close to it, under certain conditions. One of these conditions is that the null hypothesis must be true. Another one is that the different random variables (or observations) must be independent of each other.')

### Load and Process PDF documents

#### Create Simple Document Chunks for Standard Retrieval

Here we just use simple chunking where each chunk is a fixed size of <= 3500 characters and overlap of 200 characters for any small isolated chunks (you can and should experiment with various chunk sizes and overlaps)

In [None]:
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def create_simple_chunks(file_path, chunk_size=3500, chunk_overlap=0):

    print('Loading pages:', file_path)
    loader = PyMuPDFLoader(file_path)
    doc_pages = loader.load()

    print('Chunking pages:', file_path)
    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                              chunk_overlap=chunk_overlap)
    doc_chunks = splitter.split_documents(doc_pages)
    print('Finished processing:', file_path)
    print()
    return doc_chunks

In [None]:
from glob import glob

pdf_files = glob('./rag_docs/*.pdf')
pdf_files

['./rag_docs/resnet_paper.pdf',
 './rag_docs/vision_transformer.pdf',
 './rag_docs/cnn_paper.pdf',
 './rag_docs/attention_paper.pdf']

In [None]:
paper_docs = []
for fp in pdf_files:
    paper_docs.extend(create_simple_chunks(file_path=fp,
                                           chunk_size=3500,
                                           chunk_overlap=200))

Loading pages: ./rag_docs/resnet_paper.pdf
Chunking pages: ./rag_docs/resnet_paper.pdf
Finished processing: ./rag_docs/resnet_paper.pdf

Loading pages: ./rag_docs/vision_transformer.pdf
Chunking pages: ./rag_docs/vision_transformer.pdf
Finished processing: ./rag_docs/vision_transformer.pdf

Loading pages: ./rag_docs/cnn_paper.pdf
Chunking pages: ./rag_docs/cnn_paper.pdf
Finished processing: ./rag_docs/cnn_paper.pdf

Loading pages: ./rag_docs/attention_paper.pdf
Chunking pages: ./rag_docs/attention_paper.pdf
Finished processing: ./rag_docs/attention_paper.pdf



In [None]:
len(paper_docs)

80

In [None]:
paper_docs[0]

Document(metadata={'source': './rag_docs/resnet_paper.pdf', 'file_path': './rag_docs/resnet_paper.pdf', 'page': 0, 'total_pages': 12, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref package', 'producer': 'pdfTeX-1.40.12', 'creationDate': 'D:20151211011345Z', 'modDate': 'D:20151211011345Z', 'trapped': ''}, page_content='Deep Residual Learning for Image Recognition\nKaiming He\nXiangyu Zhang\nShaoqing Ren\nJian Sun\nMicrosoft Research\n{kahe, v-xiangz, v-shren, jiansun}@microsoft.com\nAbstract\nDeeper neural networks are more difﬁcult to train. We\npresent a residual learning framework to ease the training\nof networks that are substantially deeper than those used\npreviously. We explicitly reformulate the layers as learn-\ning residual functions with reference to the layer inputs, in-\nstead of learning unreferenced functions. We provide com-\nprehensive empirical evidence showing that these residual\nnetworks are easier to 

### Combine all document chunks in one list

In [None]:
len(wiki_docs_processed)

1801

In [None]:
total_docs = wiki_docs_processed + paper_docs
len(total_docs)

1881

## Index Document Chunks and Embeddings in Vector DB

Here we initialize a connection to a Chroma vector DB client, and also we want to save to disk, so we simply initialize the Chroma client and pass the directory where we want the data to be saved to.

In [None]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings - takes < 30s on Colab
chroma_db = Chroma.from_documents(documents=total_docs,
                                  collection_name='my_db',
                                  embedding=openai_embed_model,
                                  # need to set the distance function to cosine else it uses euclidean by default
                                  # check https://docs.trychroma.com/guides#changing-the-distance-function
                                  collection_metadata={"hnsw:space": "cosine"},
                                  persist_directory="./my_db")

### Load Vector DB from disk

This is just to show once you have a vector database on disk you can just load and create a connection to it anytime

In [None]:
# load from disk
chroma_db = Chroma(persist_directory="./my_db",
                   collection_name='my_db',
                   embedding_function=openai_embed_model)

In [None]:
chroma_db

<langchain_chroma.vectorstores.Chroma at 0x78f688238100>

### Semantic Similarity based Retrieval

We use simple cosine similarity here and retrieve the top 5 similar documents based on the user input query

In [None]:
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 5})

In [None]:
from IPython.display import display, Markdown

def display_docs(docs):
    for doc in docs:
        print('Metadata:', doc.metadata)
        print('Content Brief:')
        display(Markdown(doc.page_content[:1000]))
        print()

In [None]:
query = "what is machine learning?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '564928', 'source': 'Wikipedia', 'title': 'Machine learning'}
Content Brief:


Machine learning gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). It is a subfield of computer science. The idea came from work in artificial intelligence. Machine learning explores the study and construction of algorithms which can learn and make predictions on data. Such algorithms follow programmed instructions, but can also make predictions or decisions based on data. They build a model from sample inputs. Machine learning is done where designing and programming explicit algorithms cannot be done. Examples include spam filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), search engines and computer vision.


Metadata: {'id': '359370', 'source': 'Wikipedia', 'title': 'Supervised learning'}
Content Brief:


In machine learning, supervised learning is the task of inferring a function from labelled training data. The results of the training are known beforehand, the system simply learns how to get to these results correctly. Usually, such systems work with vectors. They get the training data and the result of the training as two vectors and produce a "classifier". Usually, the system uses inductive reasoning to generalize the training data.


Metadata: {'id': '663523', 'source': 'Wikipedia', 'title': 'Deep learning'}
Content Brief:


Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. As with other kinds of machine-learning, learning sessions can be unsupervised, semi-supervised, or supervised. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. Certain tasks, such as as recognizing and understanding speech, images or handwriting, is easy to do for humans. However, for a computer, these tasks are very difficult to do. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways, whic


Metadata: {'id': '6360', 'source': 'Wikipedia', 'title': 'Artificial intelligence'}
Content Brief:


Artificial intelligence (AI) is the ability of a computer program or a machine to think and learn. It is also a field of study which tries to make computers "smart". They work on their own without being encoded with commands. John McCarthy came up with the name "Artificial Intelligence" in 1955. In general use, the term "artificial intelligence" means a programme which mimics human cognition. At least some of the things we associate with other minds, such as learning and problem solving can be done by computers, though not in the same way as we do. Andreas Kaplan and Michael Haenlein define AI as a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation. An ideal (perfect) intelligent machine is a flexible agent which perceives its environment and takes actions to maximize its chance of success at some goal or objective. As machines become increasingly capable, mental facu


Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.




In [None]:
query = "what is the difference between transformers and vision transformers?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 7, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Published as a conference paper at ICLR 2021
4.4
SCALING STUDY
We perform a controlled scaling study of different models by evaluating transfer performance from
JFT-300M. In this setting data size does not bottleneck the models’ performances, and we assess
performance versus pre-training cost of each model. The model set includes: 7 ResNets, R50x1,
R50x2 R101x1, R152x1, R152x2, pre-trained for 7 epochs, plus R152x2 and R200x3 pre-trained
for 14 epochs; 6 Vision Transformers, ViT-B/32, B/16, L/32, L/16, pre-trained for 7 epochs, plus
L/16 and H/14 pre-trained for 14 epochs; and 5 hybrids, R50+ViT-B/32, B/16, L/32, L/16 pre-
trained for 7 epochs, plus R50+ViT-L/16 pre-trained for 14 epochs (for hybrids, the number at the
end of the model name stands not for the patch size, but for the total dowsampling ratio in the ResNet
backbone).
Figure 5 contains the transfer performance versus total pre-training compute (see Appendix D.5
for details on computational costs). Detailed results per mode


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 0, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Published as a conference paper at ICLR 2021
AN IMAGE IS WORTH 16X16 WORDS:
TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
Alexey Dosovitskiy∗,†, Lucas Beyer∗, Alexander Kolesnikov∗, Dirk Weissenborn∗,
Xiaohua Zhai∗, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer,
Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby∗,†
∗equal technical contribution, †equal advising
Google Research, Brain Team
{adosovitskiy, neilhoulsby}@google.com
ABSTRACT
While the Transformer architecture has become the de-facto standard for natural
language processing tasks, its applications to computer vision remain limited. In
vision, attention is either applied in conjunction with convolutional networks, or
used to replace certain components of convolutional networks while keeping their
overall structure in place. We show that this reliance on CNNs is not necessary
and a pure transformer applied directly to sequences of image patches can perform
very well on image classiﬁcation tasks. When pre-traine


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 0, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


1Fine-tuning
code
and
pre-trained
models
are
available
at
https://github.com/
google-research/vision_transformer
1
arXiv:2010.11929v2  [cs.CV]  3 Jun 2021


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 1, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Published as a conference paper at ICLR 2021
inherent to CNNs, such as translation equivariance and locality, and therefore do not generalize well
when trained on insufﬁcient amounts of data.
However, the picture changes if the models are trained on larger datasets (14M-300M images). We
ﬁnd that large scale training trumps inductive bias. Our Vision Transformer (ViT) attains excellent
results when pre-trained at sufﬁcient scale and transferred to tasks with fewer datapoints. When
pre-trained on the public ImageNet-21k dataset or the in-house JFT-300M dataset, ViT approaches
or beats state of the art on multiple image recognition benchmarks. In particular, the best model
reaches the accuracy of 88.55% on ImageNet, 90.72% on ImageNet-ReaL, 94.55% on CIFAR-100,
and 77.63% on the VTAB suite of 19 tasks.
2
RELATED WORK
Transformers were proposed by Vaswani et al. (2017) for machine translation, and have since be-
come the state of the art method in many NLP tasks. Large Transformer-based mo


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 4, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Published as a conference paper at ICLR 2021
Model
Layers
Hidden size D
MLP size
Heads
Params
ViT-Base
12
768
3072
12
86M
ViT-Large
24
1024
4096
16
307M
ViT-Huge
32
1280
5120
16
632M
Table 1: Details of Vision Transformer model variants.
We also evaluate on the 19-task VTAB classiﬁcation suite (Zhai et al., 2019b). VTAB evaluates
low-data transfer to diverse tasks, using 1 000 training examples per task. The tasks are divided into
three groups: Natural – tasks like the above, Pets, CIFAR, etc. Specialized – medical and satellite
imagery, and Structured – tasks that require geometric understanding like localization.
Model Variants. We base ViT conﬁgurations on those used for BERT (Devlin et al., 2019), as
summarized in Table 1. The “Base” and “Large” models are directly adopted from BERT and we
add the larger “Huge” model. In what follows we use brief notation to indicate the model size and
the input patch size: for instance, ViT-L/16 means the “Large” variant with 16×16 input patch siz




## Build the RAG Pipeline

In [None]:
from langchain_core.prompts import ChatPromptTemplate

rag_prompt = """You are an assistant who is an expert in question-answering tasks.
                Answer the following question using only the following pieces of retrieved context.
                If the answer is not in the context, do not make up answers, just say that you don't know.
                Keep the answer detailed and well formatted based on the information from the context.

                Question:
                {question}

                Context:
                {context}

                Answer:
            """

rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)

In [None]:
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

qa_rag_chain = (
    {
        "context": (similarity_retriever
                      |
                    format_docs),
        "question": RunnablePassthrough()
    }
      |
    rag_prompt_template
      |
    chatgpt
)

In [None]:
from IPython.display import display, Markdown

query = "What is machine learning?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

Machine learning is a subfield of computer science that provides computers with the ability to learn without being explicitly programmed. The concept was introduced by Arthur Samuel in 1959 and is rooted in artificial intelligence. Machine learning focuses on the study and construction of algorithms that can learn from data and make predictions or decisions based on that data. These algorithms follow programmed instructions but can also adapt and improve their performance by building models from sample inputs.

Machine learning is particularly useful in scenarios where designing and programming explicit algorithms is impractical. Common applications include spam filtering, detecting network intruders or malicious insiders, optical character recognition (OCR), search engines, and computer vision.

Within machine learning, there are different types of learning approaches, such as supervised learning, where a function is inferred from labeled training data. In this case, the system learns to produce correct results based on known outcomes, typically using vectors for training data and results to create a "classifier." Inductive reasoning is often employed to generalize from the training data.

Additionally, deep learning is a specialized area of machine learning that utilizes neural networks, particularly those with multiple layers (known as multi-layer neural networks). Deep learning is effective for complex tasks like speech recognition, image understanding, and handwriting recognition, which are challenging for computers but relatively easy for humans. These models are inspired by the information processing patterns of biological nervous systems, although they differ significantly from the structural and functional properties of human brains.

In summary, machine learning enables computers to learn from data, adapt their behavior, and make predictions, playing a crucial role in the development of intelligent systems.

In [None]:
query = "What is a CNN?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

A CNN, or Convolutional Neural Network, is a type of artificial neural network primarily used for image-driven pattern recognition tasks. CNNs are designed to process data with a grid-like topology, such as images, and they consist of three main types of layers: convolutional layers, pooling layers, and fully-connected layers.

### Key Features of CNNs:

1. **Architecture**:
   - CNNs are structured to handle the spatial dimensionality of input data, which includes height, width, and depth (for color channels in images).
   - The architecture typically involves stacking multiple convolutional layers followed by pooling layers, which helps in reducing the dimensionality of the data while retaining important features.

2. **Convolutional Layers**:
   - These layers apply a convolution operation to the input, which involves sliding a filter (or kernel) over the input image to produce feature maps. Each neuron in a convolutional layer is connected to a small region of the input, allowing the network to learn spatial hierarchies of features.

3. **Pooling Layers**:
   - Pooling layers are used to down-sample the feature maps, reducing their dimensionality and helping to make the representation more manageable. This also aids in making the model invariant to small translations in the input.

4. **Fully-Connected Layers**:
   - After several convolutional and pooling layers, the high-level reasoning in the neural network is done through fully-connected layers, where every neuron is connected to every neuron in the previous layer.

5. **Activation Functions**:
   - CNNs commonly use activation functions like the Rectified Linear Unit (ReLU) to introduce non-linearity into the model, which helps in learning complex patterns.

6. **Efficiency**:
   - CNNs are particularly efficient for image processing tasks because they reduce the number of parameters compared to traditional fully connected networks, making them less prone to overfitting and more computationally efficient.

7. **Applications**:
   - CNNs are widely used in various applications, including image classification, object detection, and image segmentation, due to their ability to automatically learn and extract features from images.

In summary, CNNs are a powerful class of neural networks specifically designed for processing and analyzing visual data, leveraging their unique architecture to effectively learn from images.

In [None]:
query = "How is a resnet better than a CNN?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

A ResNet (Residual Network) is considered better than a traditional CNN (Convolutional Neural Network) for several reasons, primarily related to its architecture and training efficiency:

1. **Overcoming Optimization Difficulties**: Traditional deep CNNs often face optimization challenges as their depth increases, leading to higher training errors. In contrast, ResNets utilize shortcut connections that allow gradients to flow more easily during backpropagation. This helps mitigate issues like vanishing gradients, enabling deeper networks to be trained effectively.

2. **Improved Accuracy with Depth**: ResNets can achieve better performance as they increase in depth. For instance, a 34-layer ResNet outperforms an 18-layer ResNet by 2.8% in top-1 error on the ImageNet validation set. This is significant because traditional networks often experience degradation in performance as they become deeper, but ResNets do not exhibit this degradation problem.

3. **Lower Training Error**: ResNets demonstrate considerably lower training errors compared to their plain counterparts. For example, the 34-layer ResNet shows a marked improvement in training error over a 34-layer plain network, indicating that ResNets are more effective at learning from the training data.

4. **Parameter Efficiency**: Despite being deeper, ResNets do not require additional parameters compared to traditional networks. This efficiency allows them to maintain lower complexity while achieving higher accuracy. For instance, a 152-layer ResNet has fewer parameters than VGG-16/19 networks, yet it achieves better performance.

5. **Generalization**: ResNets have shown to generalize better across various tasks, including object detection and localization. The architecture's ability to learn robust features contributes to its success in competitions, such as winning first place in multiple tracks of the ILSVRC & COCO 2015 competitions.

In summary, ResNets improve upon traditional CNNs by effectively addressing optimization challenges, allowing for deeper architectures without performance degradation, achieving lower training errors, maintaining parameter efficiency, and demonstrating superior generalization capabilities.

In [None]:
query = "What is NLP and its relation to linguistics?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

Natural Language Processing (NLP) is a field within Artificial Intelligence that focuses on enabling computers to automatically understand and generate human languages. The term "Natural Language" specifically refers to human languages, distinguishing them from programming languages. The overarching goal of NLP is to facilitate seamless interaction between humans and computers through language.

NLP is closely related to linguistics, which is the scientific study of language and its structure. Linguistics provides the foundational theories and frameworks that inform the development of NLP technologies. By leveraging insights from linguistics, NLP aims to decode the complexities of human language, including syntax, semantics, and pragmatics, to improve the accuracy and effectiveness of language processing tasks.

In summary, NLP is a crucial intersection of Artificial Intelligence and linguistics, aimed at programming computers to understand and communicate in human languages.

In [None]:
query = "What is the difference between AI, ML and DL?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

The difference between AI, ML, and DL can be summarized as follows:

### Artificial Intelligence (AI)
- **Definition**: AI refers to the ability of a computer program or machine to think and learn, mimicking human cognition. It encompasses a broad range of technologies and applications that enable machines to perform tasks that typically require human intelligence.
- **Scope**: AI is a field of study aimed at creating systems that can interpret external data, learn from it, and adapt to achieve specific goals. It includes various subfields, one of which is machine learning.
- **Examples**: AI applications can range from simple rule-based systems to complex algorithms that can learn and adapt over time.

### Machine Learning (ML)
- **Definition**: ML is a subfield of AI that focuses on the study and construction of algorithms that allow computers to learn from and make predictions based on data without being explicitly programmed.
- **Functionality**: ML algorithms build models from sample inputs and can make predictions or decisions based on new data. It is particularly useful in scenarios where traditional programming is impractical.
- **Examples**: Applications of ML include spam filtering, network intrusion detection, optical character recognition (OCR), and computer vision.

### Deep Learning (DL)
- **Definition**: DL is a specialized subset of machine learning that primarily uses neural networks with multiple layers (deep neural networks) to process data.
- **Characteristics**: In deep learning, the information processed becomes more abstract with each added layer, allowing for complex tasks such as speech and image recognition. It is inspired by the information processing patterns of biological nervous systems.
- **Examples**: DL is often used in applications that require high levels of abstraction, such as recognizing speech, understanding images, and handwriting recognition.

In summary, AI is the overarching field that includes both ML and DL, with ML being a method of achieving AI through data-driven learning, and DL being a more advanced technique within ML that utilizes deep neural networks.

In [None]:
query = "What is the difference between transformers and vision transformers?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

The difference between transformers and vision transformers primarily lies in their application and input processing methods.

1. **Transformers**: Originally designed for natural language processing (NLP), transformers operate on sequences of tokens (words) and utilize self-attention mechanisms to capture relationships between these tokens. The standard transformer architecture processes input data in a way that allows each token to attend to every other token, which is computationally efficient for text data but can be challenging for high-dimensional data like images.

2. **Vision Transformers (ViT)**: Vision transformers adapt the transformer architecture for image data by treating image patches as sequences of tokens. Instead of processing individual pixels, an image is divided into smaller patches (e.g., 16x16 pixels), which are then flattened and linearly embedded into a lower-dimensional space. These embeddings are treated similarly to tokens in NLP, allowing the model to apply self-attention across the entire image. This approach enables vision transformers to integrate information globally, even in the early layers of the model.

Key distinctions include:
- **Input Representation**: Transformers work with sequences of words, while vision transformers work with sequences of image patches.
- **Attention Mechanism**: Vision transformers utilize self-attention to capture relationships across the entire image, which is different from the traditional token-to-token attention in standard transformers.
- **Inductive Bias**: Traditional CNNs (Convolutional Neural Networks) have built-in inductive biases such as locality and translation equivariance, which are not inherently present in transformers. Vision transformers, however, can achieve competitive performance on image classification tasks when trained on large datasets, demonstrating that large-scale training can overcome the lack of these biases.

In summary, while both architectures leverage self-attention, vision transformers are specifically tailored for image data by processing image patches as sequences, allowing them to effectively handle visual tasks.

In [None]:
query = "How is self-attention important in transformers?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

Self-attention is crucial in transformers for several reasons:

1. **Global Information Integration**: Self-attention allows the model to integrate information across the entire input, such as an image, even in the lowest layers. This capability enables the model to attend to relevant regions of the input that are semantically important for tasks like classification. For instance, some attention heads can focus on most of the image early in the processing, demonstrating the model's ability to capture global context.

2. **Attention Mechanism**: The attention function in transformers maps a query and a set of key-value pairs to an output, where the output is computed as a weighted sum of the values based on the attention scores derived from the queries and keys. This mechanism allows the model to dynamically focus on different parts of the input based on their relevance to the task at hand.

3. **Layer-wise Attention Dynamics**: The attention distance, which refers to how far the model looks across the input to integrate information, tends to increase with the depth of the network. This means that as the data passes through more layers, the model can consider a broader context, enhancing its understanding and representation of the input.

4. **Comparison with CNNs**: Unlike convolutional neural networks (CNNs), which rely on local receptive fields and translation equivariance, transformers using self-attention do not have these inherent biases. This allows transformers to generalize better, especially when trained on large datasets, as they can learn to attend to relevant features regardless of their spatial arrangement.

5. **Scalability and Efficiency**: The self-attention mechanism is designed to be scalable, allowing transformers to handle larger inputs without the quadratic cost associated with naive pixel-wise attention. This scalability is essential for applying transformers to high-dimensional data like images.

In summary, self-attention is a foundational component of transformers that enables them to effectively process and understand complex inputs by integrating information globally, dynamically focusing on relevant features, and scaling efficiently to larger datasets.

In [None]:
query = "How does a resnet work?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

A ResNet, or Residual Network, operates on the principle of residual learning, which addresses the degradation problem often encountered in deep neural networks. Here’s a detailed explanation of how a ResNet works based on the provided context:

### Key Concepts of ResNet

1. **Residual Learning Framework**:
   - Instead of directly learning the desired underlying mapping \( H(x) \), ResNets learn a residual mapping \( F(x) = H(x) - x \). This reformulation allows the network to focus on learning the difference between the desired output and the input, which is often easier than learning the output directly.

2. **Shortcut Connections**:
   - ResNets incorporate shortcut connections that skip one or more layers. These connections perform identity mapping, meaning they pass the input \( x \) directly to the output of the stacked layers. The output of the stacked layers is then added to this input, resulting in the equation \( F(x) + x \).
   - These shortcuts do not introduce additional parameters or computational complexity, allowing the network to maintain efficiency while increasing depth.

3. **Optimization Benefits**:
   - The architecture of ResNets allows for easier optimization of very deep networks. Traditional deep networks (plain networks) tend to suffer from increased training error as depth increases, but ResNets can achieve lower training errors even with greater depth.
   - This is evidenced by the performance of ResNets on datasets like ImageNet, where deeper ResNets (e.g., 34-layer, 50-layer, and even 152-layer) show significant accuracy improvements compared to their plain counterparts.

4. **Architecture**:
   - ResNets are built using blocks of convolutional layers, typically with 3x3 filters, and include batch normalization and ReLU activation functions. The architecture can vary in depth, with configurations such as 18, 34, 50, 101, and 152 layers.
   - The network structure includes bottleneck layers, which are designed to reduce the number of parameters while maintaining performance.

5. **Training and Performance**:
   - ResNets are trained using standard techniques such as stochastic gradient descent (SGD) with momentum and weight decay. They have been shown to converge well, even with a large number of layers, and outperform traditional networks in terms of accuracy and generalization.

### Conclusion
In summary, ResNets leverage the concept of residual learning through the use of shortcut connections, allowing for the effective training of very deep networks. This architecture not only mitigates the degradation problem but also enhances the model's ability to learn complex mappings, leading to superior performance on various image recognition tasks.

In [None]:
query = "What is LangGraph?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

I don't know.

In [None]:
query = "What is an Agentic AI System?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

The context provided does not contain specific information about an "Agentic AI System." Therefore, I don't know what an Agentic AI System is.

In [None]:
query = "What is LangChain?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

I don't know.