- **ipylernel** -> It provides a IPython kernel for Jupyter notebook.
- **python-dotenv** -> It reads key-value pairs from a .env file and can set then as an environment variables.
- **langchain-community** -> 
- **pypdf** -> need to install when using PyPDFLoader

Reference: https://python.langchain.com/docs/tutorials/rag/
### Retrieval Augmented Generation (RAG)
RAG is a technique for augmenting LLM knowledge with additional data. The process of bringing and inserting appropriate information into the model prompt is known as RAG. We can use this with both **unstructured and structured** data like SQL data.

A typical RAG application has two main components:
- **Indexing:** a pipeline for ingesting data from a source and indexing it. This usually happens offline.
    - **Load** means Data Ingestion
    - **Split** means Data Transformation-dividing into chunks
    - **Store** means VectorStore DB- embed and then store

    ![image.png](attachment:image.png)
- **Retrieval and Generation:** the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.
    - **Retrieve** means Retrieval chain, which is an inteface responsible for querying vector store db and get context info
    - **Generate** means pass the prompt and context info to get the final response

    ![image-2.png](attachment:image-2.png)

Reference: https://python.langchain.com/docs/integrations/document_loaders/

### Data Ingestion - DocumentLoaders
DocumentLoaders load data into the standard LangChain Document format. Each DocumentLoader has its **own specific parameters**, but they can all be invoked in the same way with the **.load()** method.
- **TextLoader:** to load any **.txt** files
- **PyPDFLoader:** to load any **.pdf** files
- **WebBaseLoader:** to load any **website** using url path
- **ArxivLoader:** to load any **research paper completely** using that paper code
- **WikipediaLoader:** to load any **wikipedia content completely** based on any topic

In [3]:
# TextLoader - loading a speech.txt file
from langchain_community.document_loaders import TextLoader

# providing the filepath in the loader
loader=TextLoader('files/speech.txt')

# invoking the loader by calling .load() method
txt_doc=loader.load()
txt_doc

[Document(metadata={'source': 'files/speech.txt'}, page_content='Generative artificial intelligence (generative AI, GenAI, or GAI) is a subset of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models often generate output in response to specific prompts. Generative AI systems learn the underlying patterns and structures of their training data, enabling them to create new data.\n\nImprovements in transformer-based deep neural networks, particularly large language models (LLMs), enabled an AI boom of generative AI systems in the early 2020s. These include chatbots such as ChatGPT, Copilot, Gemini and LLaMA, text-to-image artificial intelligence image generation systems such as Stable Diffusion, Midjourney and DALL-E, and text-to-video AI generators such as Sora.Companies such as OpenAI, Anthropic, Microsoft, Google, and Baidu as well as numerous smaller firms have developed generative AI models.')]

In [5]:
# PyPDFLoader - loading a summary.pdf file
from langchain_community.document_loaders import PyPDFLoader

# providing the filepath in the loader
loader=PyPDFLoader('files/summary.pdf')

# invoking the loader by calling .load() method
pdf_doc=loader.load()
pdf_doc

[Document(metadata={'source': 'files/summary.pdf', 'page': 0}, page_content='1 | P a g e  \n \nPrinted Pages: Sub Code:  \nPaper Id:                                         Roll No.           \n \nNAME OF PROGRAM  \n(SEM ….) THEORY EXAMINATION 2019-20 \nSUBJECT NAME \nTime: 3 Hours        Total Marks: 100 \nNote:  1. Attempt all Sections. If require any missing data; then choose suitably. \n \nSECTION A \n1. Attempt all questions in brief.      2 x 10 = 20 \nQ no. Question Marks CO \na.  Explain two different usages of def in Python with examples. \n \n2 4 \nb.  Write a recursive Python program to print numbers from N to 1 (N >= \n1) in a single line, separated by space. \n \n2 5 \nc.  Describe the behavior of “ while” construct in Python with an \nExample. \n \n2 2 \nd.  Explain the difference between “break” and “continue” constructs \nin Python. \n \n2 2 \ne.  Write Python statement to create tuple containing a single eleme nt \n1054. \n \n2 1 \nf.  When evaluating a complicated exp