# First Ragging Session

## Introduction

This notebook is meant to understand the various aspects of a Retrieval-Augmented Generation pipeline along with all of the associated fundamental concepts.

What does this notebook cover?

1. Selecting a passage to query from
2. Performing some basic pre-processing on the passage

## Import

The goal of this section is to import all required libraries for this notebook.

In [25]:
import numpy as np
from nltk.tokenize import sent_tokenize

List of current environment packages and versions:

In [16]:
!pip list --format=freeze

aiobotocore==2.7.0
aiofiles==22.1.0
aiohttp==3.9.3
aioitertools==0.7.1
aiosignal==1.2.0
aiosqlite==0.18.0
alabaster==0.7.12
anaconda-catalogs==0.2.0
anaconda-cloud-auth==0.1.4
anyio==4.2.0
appdirs==1.4.4
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
astroid==2.14.2
astropy==5.3.4
asttokens==2.0.5
async-lru==2.0.4
async-timeout==4.0.3
atomicwrites==1.4.0
attrs==23.1.0
Automat==20.2.0
autopep8==1.6.0
Babel==2.11.0
backcall==0.2.0
backports.functools-lru-cache==1.6.4
backports.tempfile==1.0
backports.weakref==1.0.post1
bcrypt==3.2.0
beautifulsoup4==4.12.2
binaryornot==0.4.4
black==23.11.0
bleach==4.1.0
bokeh==3.3.4
boltons==23.0.0
botocore==1.31.64
Bottleneck==1.3.7
Brotli==1.0.9
brotlipy==0.7.0
certifi==2024.2.2
cffi==1.16.0
chardet==4.0.0
charset-normalizer==2.0.4
click==8.1.7
cloudpickle==2.2.1
clyent==1.2.2
colorama==0.4.6
colorcet==3.0.1
comm==0.1.2
conda-content-trust==0.2.0
conda-pack==0.6.0
conda-package-handling==2.2.0
conda_package_streaming==0.9.0
conda-verify==

## The Context

The goal of this section is to select a context for utilization.


### Loading the Context

In [27]:
# The passage is obtained from: https://www.superteacherworksheets.com/reading-comp/science-atoms_WBRRM.pdf

context = """Have you ever walked through a cloud of gnats on a 
hot summer day, only to have them follow you? No 
matter how you swat at them, or even if you run, they 
won’t leave you alone. If so, then you have something
 in common with an atom. 
Atoms are the building blocks of molecules, which 
when combined, make up everything. From the 
smallest one-celled amoeba, to every person who has
 ever lived, to the largest and brightest stars in the sky, 
atoms are everywhere.
 Even way back in the time of ancient Greece, they 
wondered about atoms. That’s where the word 
comes from, ancient Greece. The word A'tomos, 
when translated into English, means: something that 
cannot be divided any further. So what’s an atom 
look like? Up until very recently no one could say one 
way or another.
 Technically we can’t see individual atoms, since there 
are no microscopes powerful enough.  Since 
technology improves all the time, it may not be long 
before we can actually see a whole atom through a 
special microscope.  Even though scientists cannot 
see atoms with microscopes, they have developed 
ways to detect them and learn about them.
 Atoms are made up of three basic parts; protons, 
neutrons, and electrons. There is a core, or nucleus, 
and an electron cloud. The nucleus is made up of 
positively charged protons and neutral neutrons. The 
nucleus is held closely together by electromagnetic 
force.
 Protons and neutrons make up the nucleus  of the atom. 
A cloud of electrons orbits the nucleus.
 The negatively charged electrons are bound to the 
nucleus, and zap around it in a cloud. Do you 
remember the cloud of gnats? The gnats would be 
the electrons zipping around you, the nucleus.
 There are different ways atoms are classified. They can
 be classified into elements, like oxygen, carbon, or 
hydrogen.  All of the elements known to man so far 
can be found on the periodic table.  The number of 
protons an atom has decides the chemical element. 
The number of electrons defines the atom's chemical 
properties, like its melting temperature and boiling 
point. 
The study of atoms and tiny particles that are even 
smaller is called quantum mechanics.  Scientists still 
have much to learn about atoms.   Maybe you will 
enter the study of quantum mechanics and find a 
brand new element.  Maybe they’ll even name it after
you!"""

In [28]:
context

"Have you ever walked through a cloud of gnats on a \nhot summer day, only to have them follow you? No \nmatter how you swat at them, or even if you run, they \nwon’t leave you alone. If so, then you have something\n in common with an atom. \nAtoms are the building blocks of molecules, which \nwhen combined, make up everything. From the \nsmallest one-celled amoeba, to every person who has\n ever lived, to the largest and brightest stars in the sky, \natoms are everywhere.\n Even way back in the time of ancient Greece, they \nwondered about atoms. That’s where the word \ncomes from, ancient Greece. The word A'tomos, \nwhen translated into English, means: something that \ncannot be divided any further. So what’s an atom \nlook like? Up until very recently no one could say one \nway or another.\n Technically we can’t see individual atoms, since there \nare no microscopes powerful enough.  Since \ntechnology improves all the time, it may not be long \nbefore we can actually see a whole at

### Tokenizing Context into Sentences

In [30]:
# Tokenize the setences in the context
# Also, removing the newline character "\n" 

context = [token.replace('\n',' ').replace('  ',' ') for token in sent_tokenize(context)]
context

['Have you ever walked through a cloud of gnats on a hot summer day, only to have them follow you?',
 'No matter how you swat at them, or even if you run, they won’t leave you alone.',
 'If so, then you have something in common with an atom.',
 'Atoms are the building blocks of molecules, which when combined, make up everything.',
 'From the smallest one-celled amoeba, to every person who has ever lived, to the largest and brightest stars in the sky, atoms are everywhere.',
 'Even way back in the time of ancient Greece, they wondered about atoms.',
 'That’s where the word comes from, ancient Greece.',
 "The word A'tomos, when translated into English, means: something that cannot be divided any further.",
 'So what’s an atom look like?',
 'Up until very recently no one could say one way or another.',
 'Technically we can’t see individual atoms, since there are no microscopes powerful enough.',
 'Since technology improves all the time, it may not be long before we can actually see a whol

## Creating a Vector DB

### Chunking the Sentences

In [112]:
# Defining a helper function 

def chunk(context: list[str], window: int) -> list[str]:
   """
   This is a helper function created to chunk the context.

   Args:
       context (list[str]): The context is a list of strings that are tokenized based on the sentences
       num (int): Number of setences you would like to include in each chunk

   Returns:
       list[str]: A list of chunks
   """

   chunks = []

   if len(context) < window:
      raise ValueError("The number of tokens present in the context must be greater than the number of sentences in each chunk.")

   for i in range(window, len(context)+1):
      chunk = ""
      for j in range(i-window,i):
         chunk = chunk+" "+context[j]
      chunks.append(chunk.strip())
   
   return chunks

In [115]:
# Creating the chunks 

chunks = chunk(context, 5)
chunks

['Have you ever walked through a cloud of gnats on a hot summer day, only to have them follow you? No matter how you swat at them, or even if you run, they won’t leave you alone. If so, then you have something in common with an atom. Atoms are the building blocks of molecules, which when combined, make up everything. From the smallest one-celled amoeba, to every person who has ever lived, to the largest and brightest stars in the sky, atoms are everywhere.',
 'No matter how you swat at them, or even if you run, they won’t leave you alone. If so, then you have something in common with an atom. Atoms are the building blocks of molecules, which when combined, make up everything. From the smallest one-celled amoeba, to every person who has ever lived, to the largest and brightest stars in the sky, atoms are everywhere. Even way back in the time of ancient Greece, they wondered about atoms.',
 'If so, then you have something in common with an atom. Atoms are the building blocks of molecules