# Retrieval Augmented Generation (RAG) 
RAG is a model that combines the best of both worlds: the retrieval-based and the generation-based models. It uses a retriever to find relevant passages and then uses a generator to generate the answer.


## Document Loading
The first step is to load the documents that will be used for retrieval. 
Loaders are components in LangChain designed to ingest and preprocess data from various sources, preparing it for use with language models.

### PDF Loader

In [1]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("AI_Engineer_Roadmap.pdf")
pages = loader.load()

In [2]:
len(pages)

13

In [3]:
page = pages[0]
page

Document(metadata={'source': 'AI_Engineer_Roadmap.pdf', 'page': 0}, page_content=' \n   \ncodebasics.io  \n \n1 \nAI Engineer Roadmap for Beginners  \nFollowing is the roadmap  to learning  AI Engineer  (also known as ML Engineer ) skills for a total \nbeginner. It includes FREE learning resources for technical skills (or tool skills) and soft (or core) skills  \n          \nPrerequisites : You must have skills or interests  to build skills in Coding and Math. Without these two \nyou cannot become an AI engineer.  \nTotal Duration: 8 Months  (4 hours  of study Every Day ) \nAlso, AI Engineer = Data Scientist + Software Engineer  \n \n \nWeek 0: Do Proper Research and protect yourself from SCAMS.  \n \n Unfortunately, a lot of systematic scams are happening in ed tech, especially in the \ndata field where aspirants are provided with false promises like a 100% job guarantee or \ntrapped into “Masterclasses” which are nothing but sales pitches to upsell their l ow-grade \ncourses at exorb

In [9]:
print(page.page_content[0:500])

 
   
codebasics.io  
 
1 
AI Engineer Roadmap for Beginners  
Following is the roadmap  to learning  AI Engineer  (also known as ML Engineer ) skills for a total 
beginner. It includes FREE learning resources for technical skills (or tool skills) and soft (or core) skills  
          
Prerequisites : You must have skills or interests  to build skills in Coding and Math. Without these two 
you cannot become an AI engineer.  
Total Duration: 8 Months  (4 hours  of study Every Day ) 
Also, AI Engi


### Text Loader

In [12]:
from langchain_community.document_loaders import TextLoader
loader = TextLoader('sample.txt')
text = loader.load()

In [13]:
len(text)

1

In [14]:
text

[Document(metadata={'source': 'sample.txt'}, page_content='Install Anaconda or Miniconda\nMake sure you have Conda installed on your system. You can download and install it from the Anaconda or Miniconda website.\n\n2. Install the Python Extension in VS Code\nOpen VS Code.\nGo to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side or pressing Ctrl+Shift+X.\nSearch for Python and install the extension from Microsoft.')]

### Docs Loader

In [16]:
from langchain_community.document_loaders import Docx2txtLoader
loader = Docx2txtLoader('project dld.docx')
text = loader.load()

In [17]:
text



In [18]:
text[0].metadata

{'source': 'project dld.docx'}

### URL Loader

In [20]:
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader('https://www.python.org/')

In [21]:
docs = loader.load()

In [22]:
docs

[Document(metadata={'source': 'https://www.python.org/', 'title': 'Welcome to Python.org', 'description': 'The official home of the Python Programming Language', 'language': 'en'}, page_content='\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWelcome to Python.org\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNotice: While JavaScript is not essential for this website, your interaction with the content will be limited. Please turn JavaScript on for the full experience. \n\n\n\n\n\n\nSkip to content\n\n\n▼ Close\n                \n\n\nPython\n\n\nPSF\n\n\nDocs\n\n\nPyPI\n\n\nJobs\n\n\nCommunity\n\n\n\n▲ The Python Network\n                \n\n\n\n\n\n\n\n\n\nDonate\n\n≡ Menu\n\n\nSearch This Site\n\n\n                                    GO\n                                \n\n\n\n\n\nA A\n\nSmaller\nLarger\nReset\n\n\n\n\n\n\nSocialize\n\nLinkedIn\nMastodon\nChat on IRC\nTwitter\n\n\n\n\n\n\n\n\n\n\nAbout\n\nApplications\nQuotes\nGetting Started\nHelp\nPyt