![alt text](https://miro.medium.com/v2/resize:fit:999/1*q1CkGPwS7g4-f9rNbPrkig.png)

# 📌 How Does RAG Work?

**Retrieval-Augmented Generation (RAG)** enhances Large Language Models (LLMs) by **retrieving relevant documents from a vector database** before generating responses. This ensures more **accurate, relevant, and fact-based answers**.

---

## **🔹 How RAG Works When Documents Are Stored in a Vector Database**
### **1️⃣ Document Ingestion (Preprocessing)**
✅ Convert documents (PDFs, articles, research papers) into **vector embeddings**.  
✅ Store these embeddings in a **vector database** (e.g., **ChromaDB, FAISS, Pinecone, Weaviate**).  

🔹 **Example:**
- A legal AI system **stores thousands of case law documents** in a vector database.
- Each document is **converted into a vector embedding** using an embedding model like `text-embedding-ada-002`.

---

### **2️⃣ Retrieval Step (Fetching Relevant Documents)**
✅ User **asks a question** related to stored documents.  
✅ The query is **converted into an embedding** and compared against stored vectors.  
✅ The **most relevant documents** are retrieved from the **vector database** using **similarity search**.

🔹 **Example:**
- User: *"What are the latest data privacy regulations?"*
- The system **retrieves the top 3 relevant legal documents** from the vector database.

---

### **3️⃣ Generation Step (LLM Uses Retrieved Context)**
✅ The retrieved documents are **added to the prompt** as context.  
✅ The **LLM generates a response** using both the user query and the retrieved information.  

🔹 **Example:**
- Instead of guessing, the LLM **uses real legal documents** to provide a **factually correct** answer.

---

# 📌 Components of RAG in LangChain

| **Component**         | **Purpose** | **Examples in LangChain** |
|----------------------|------------|-------------------------|
| **Document Loader**  | Loads raw data from files, web pages, or databases. | `PyPDFLoader`, `WebBaseLoader`, `WikipediaLoader`, `CSVLoader` |
| **Text Splitter**    | Splits large documents into smaller chunks for better retrieval. | `RecursiveCharacterTextSplitter`, `CharacterTextSplitter.from_tiktoken_encoder()` |
| **Embedding Model**  | Converts text chunks into vector embeddings for similarity search. | `OpenAIEmbeddings`, `Sentence Transformers`, `CohereEmbeddings` |
| **Vector Database**  | Stores embeddings and enables fast semantic search. | `ChromaDB`, `FAISS`, `Pinecone`, `Weaviate`, `Milvus` |
| **Retriever**        | Fetches relevant documents based on query similarity. | `vectorstore.as_retriever()`, `MultiQueryRetriever` |
| **LLM (Generator)**  | Generates responses using retrieved documents as context. | `OpenAI (GPT-4)`, `Mistral`, `Anthropic Claude`, `Llama2` |
| **Memory (Optional)** | Stores past interactions to maintain conversation context. | `ConversationBufferMemory`, `VectorStoreRetrieverMemory` |
| **Chain (Orchestration)** | Combines all components into a workflow for retrieval & generation. | `RetrievalQA`, `ConversationalRetrievalChain` |


---

# **1. Document Loaders**

**Document loaders** in LangChain are tools that **automatically extract text** from different types of files, webpages, and APIs so that AI models can understand and process them.
- consists of text and associated metadata.


## 📌 Why Do We Need Document Loaders in LangChain?

Document loaders are essential for **efficiently extracting, processing, and preparing** text data for use in **LLMs, search systems, and AI applications**. Here’s why:

---

## **1️⃣ Handle Different File Formats Efficiently**
✅ **Documents come in various formats** (PDF, CSV, DOCX, JSON, Webpages, etc.).  
✅ Manually extracting text from these formats is **time-consuming and error-prone**.  
✅ Document loaders **automate this process**, making it seamless.

🔹 **Example Use Case:**  
- Loading **legal contracts (PDFs)**, **financial reports (Excel/CSV)**, or **emails (Outlook MSG)** into an AI-powered retrieval system.

---

## **2️⃣ Standardized Data Extraction**
✅ Raw text data may contain **headers, footers, and unwanted content**.  
✅ Document loaders **extract relevant text only**, ensuring **clean and structured data**.  
✅ Helps in **preprocessing** for downstream AI applications.

🔹 **Example Use Case:**  
- Extracting **text from invoices** without including unnecessary metadata.

---

## **3️⃣ Enables Scalable Processing of Large Datasets**
✅ Manually opening and processing thousands of files is **not scalable**.  
✅ Document loaders **batch-process** large datasets efficiently.  
✅ Crucial for **handling enterprise-level document retrieval and search systems**.

🔹 **Example Use Case:**  
- A **law firm** needs to process **thousands of case files** efficiently for AI-powered legal research.

---

## **4️⃣ Seamless Integration with AI & Vector Databases**
✅ Document loaders **prepare text for LLM-based search, chatbots, and embeddings**.  
✅ Easily integrates with **LangChain, ChromaDB, FAISS, Pinecone, and other vector databases**.  
✅ Supports **chunking, embedding generation, and indexing**.

🔹 **Example Use Case:**  
- Uploading **customer support emails** into a **retrieval-augmented chatbot**.

---

## **5️⃣ Supports Web & API-Based Content Extraction**
✅ Extracts data from **Wikipedia, Google Search, YouTube transcripts, Reddit discussions, and more**.  
✅ Useful for **dynamic knowledge retrieval** from online sources.  
✅ Helps **keep AI models updated** with the latest real-world information.

🔹 **Example Use Case:**  
- Fetching **real-time financial news** from the web and feeding it into a **market trend prediction model**.

---

## **🚀 Why Use Document Loaders?**
| **Problem** | **Solution with Document Loaders** |
|------------|----------------------------------|
| Different file formats (PDF, CSV, Web, etc.) | **Automates text extraction** across multiple formats |
| Messy text data with extra content | **Extracts and cleans** relevant text only |
| Processing large datasets manually is slow | **Batch-processes thousands of documents efficiently** |
| Preparing data for AI models | **Integrates seamlessly with vector databases and LLMs** |
| Fetching real-time content from the web | **Extracts content from APIs, webpages, and databases** |



## 📌 Common Document Loaders in LangChain

https://python.langchain.com/docs/concepts/document_loaders/

| **Loader Name**           | **Use Case**                                         |
|---------------------------|------------------------------------------------------|
| **TextLoader**            | Load plain text files (`.txt`).                      |
| **CSVLoader**             | Load structured data from CSV files.                 |
| **UnstructuredPDFLoader** | Extract text from PDFs, including multi-page documents. |
| **PyMuPDFLoader**         | Efficiently load structured PDFs with metadata.      |
| **Docx2txtLoader**        | Extract text from Microsoft Word documents (`.docx`). |
| **PyPDFLoader**           | Load PDFs **page by page** for granular processing.  |
| **WebBaseLoader**         | Load text content from a **website**.                |
| **GitLoader**             | Load code and documentation from **GitHub repositories**. |
| **YouTubeLoader**         | Extract and process transcripts from **YouTube videos**. |
| **JSONLoader**            | Load structured data from **JSON files**.            |
| **WikipediaLoader**       | Retrieves structured Wikipedia content via API **no ads, clean text**.|

---

## **1. Text**
- Load everythig as one page

In [12]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("some_data/US_Constitution.txt")
text = loader.load()

print(text)
len(text)

[Document(metadata={'source': 'some_data/US_Constitution.txt'}, page_content='We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.\n\nThe Constitutional Convention\nArticle I\nSection 1: Congress\nAll legislative Powers herein granted shall be vested in a Congress of the United States, which shall consist of a Senate and House of Representatives.\n\nSection 2: The House of Representatives\nThe House of Representatives shall be composed of Members chosen every second Year by the People of the several States, and the Electors in each State shall have the Qualifications requisite for Electors of the most numerous Branch of the State Legislature.\n\nNo Person shall be a Representative who shall not have attained to the 

1

In [13]:
print(text[0].page_content)

We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.

The Constitutional Convention
Article I
Section 1: Congress
All legislative Powers herein granted shall be vested in a Congress of the United States, which shall consist of a Senate and House of Representatives.

Section 2: The House of Representatives
The House of Representatives shall be composed of Members chosen every second Year by the People of the several States, and the Electors in each State shall have the Qualifications requisite for Electors of the most numerous Branch of the State Legislature.

No Person shall be a Representative who shall not have attained to the Age of twenty five Years, and been seven Years a Citizen of the United States, and who s

## **2. PDF**
- https://python.langchain.com/docs/integrations/document_loaders/pypdfloader/

`!pip install pypdf`

`pip install -qU pypdf`

In [14]:
from langchain.document_loaders import PyPDFLoader


loader = PyPDFLoader('./some_data/marvel_superheroes.pdf')

pages = loader.load()

print(pages)
len(pages)

[Document(metadata={'source': './some_data/marvel_superheroes.pdf', 'page': 0}, page_content='Spider-Man\nSpider-Man, also known as Peter Parker, is one of Marvel\'s most iconic superheroes. Created by\nwriter Stan Lee and artist Steve Ditko, he first appeared in Amazing Fantasy #15 in 1962. As a\nteenager, Peter Parker was bitten by a radioactive spider, which granted him extraordinary powers,\nincluding superhuman strength, agility, the ability to cling to walls, and a "spider-sense" that warns\nhim of impending danger.\nDespite his incredible abilities, Peter\'s life is fraught with hardship. After the tragic murder of his\nUncle Ben, Peter learns a valuable lesson: "With great power comes great responsibility." This\nphilosophy shapes his journey as he fights crime and protects the citizens of New York City from\nnotorious villains like the Green Goblin, Doctor Octopus, and Venom. \nAs Spider-Man, Peter has been a key member of teams like the Avengers and the Fantastic Four.\nHe ha

2

### `pages` is a list of `Document` object that has the text and meta data about it

In [15]:
print(pages[0].page_content)

Spider-Man
Spider-Man, also known as Peter Parker, is one of Marvel's most iconic superheroes. Created by
writer Stan Lee and artist Steve Ditko, he first appeared in Amazing Fantasy #15 in 1962. As a
teenager, Peter Parker was bitten by a radioactive spider, which granted him extraordinary powers,
including superhuman strength, agility, the ability to cling to walls, and a "spider-sense" that warns
him of impending danger.
Despite his incredible abilities, Peter's life is fraught with hardship. After the tragic murder of his
Uncle Ben, Peter learns a valuable lesson: "With great power comes great responsibility." This
philosophy shapes his journey as he fights crime and protects the citizens of New York City from
notorious villains like the Green Goblin, Doctor Octopus, and Venom. 
As Spider-Man, Peter has been a key member of teams like the Avengers and the Fantastic Four.
He has worn various suits, from the classic red and blue costume to advanced versions like the Iron
Spider suit 

## 📌 Difference Between `load()` and `lazy_load()` in `PyPDFLoader`

| **Method**      | **How It Works**                  | **Best Use Case**               |
|----------------|----------------------------------|--------------------------------|
| **`load()`**     | Loads **all pages at once**      | Small to medium-sized PDFs     |
| **`lazy_load()`** | Loads **pages one at a time**   | Large PDFs / Streaming         |

🔹 **Use `load()` when** you need all pages upfront and the PDF is small.  
🔹 **Use `lazy_load()` when** working with large PDFs to save memory and process pages incrementally.  


In [16]:
from langchain_community.document_loaders import PyPDFLoader


super_hero_names  = []

for doc in loader.lazy_load():
    super_hero_names.append(doc.page_content.split('\n')[0])
    # print(doc)


super_hero_names

['Spider-Man', 'Iron Man']

In [None]:
loader = PyPDFLoader('./some_data/file-sample_150kB.pdf')

pages = loader.load()

print(pages)
len(pages)

[Document(metadata={'source': './some_data/file-sample_150kB.pdf', 'page': 0}, page_content='Lorem ipsum \nLorem ipsum dolor sit amet, consectetur adipiscing \nelit. Nunc ac faucibus odio. \nVestibulum neque massa, scelerisque sit amet ligula eu, congue molestie mi. Praesent ut\nvarius sem. Nullam at porttitor arcu, nec lacinia nisi. Ut ac dolor vitae odio interdum\ncondimentum.  Vivamus  dapibus  sodales  ex,  vitae  malesuada  ipsum  cursus\nconvallis. Maecenas sed egestas nulla, ac condimentum orci.  Mauris diam felis,\nvulputate ac suscipit et, iaculis non est. Curabitur semper arcu ac ligula semper, nec luctus\nnisl blandit. Integer lacinia ante ac libero lobortis imperdiet. Nullam mollis convallis ipsum,\nac accumsan nunc vehicula vitae. Nulla eget justo in felis tristique fringilla. Morbi sit amet\ntortor quis risus auctor condimentum. Morbi in ullamcorper elit. Nulla iaculis tellus sit amet\nmauris tempus fringilla.\nMaecenas mauris lectus, lobortis et purus mattis, blandit dic

4

In [18]:
print(pages[1].page_content)

In non mauris justo. Duis vehicula mi vel mi pretium, a viverra erat efficitur. Cras aliquam
est ac eros varius, id iaculis dui auctor. Duis pretium neque ligula, et pulvinar mi placerat
et. Nulla nec nunc sit amet nunc posuere vestibulum. Ut id neque eget tortor mattis
tristique. Donec ante est, blandit sit amet tristique vel, lacinia pulvinar arcu. Pellentesque
scelerisque fermentum erat, id posuere justo pulvinar ut. Cras id eros sed enim aliquam
lobortis. Sed lobortis nisl ut eros efficitur tincidunt. Cras justo mi, porttitor quis mattis vel,
ultricies ut purus. Ut facilisis et lacus eu cursus.
In eleifend velit vitae libero sollicitudin euismod. Fusce vitae vestibulum velit. Pellentesque
vulputate lectus quis pellentesque commodo. Aliquam erat volutpat. Vestibulum in egestas
velit. Pellentesque fermentum nisl vitae fringilla venenatis. Etiam id mauris vitae orci
maximus ultricies. 
Cras fringilla ipsum magna, in fringilla dui commodo 
a.
Lorem ipsum Lorem ipsum Lorem ipsum
1 In el

## **3. CSV**
- https://python.langchain.com/docs/how_to/document_loader_csv/

In [19]:
from langchain.document_loaders import CSVLoader
from langchain_community.document_loaders.csv_loader import CSVLoader

In [20]:
import pandas as pd
data_csv = pd.read_csv('some_data/penguins.csv')

print(data_csv.shape)
data_csv.head()

(344, 7)


Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE


In [21]:
loader = CSVLoader('some_data/penguins.csv')
loader

<langchain_community.document_loaders.csv_loader.CSVLoader at 0x172f561ca30>

In [22]:
data = loader.load()
print(data)

[Document(metadata={'source': 'some_data/penguins.csv', 'row': 0}, page_content='species: Adelie\nisland: Torgersen\nbill_length_mm: 39.1\nbill_depth_mm: 18.7\nflipper_length_mm: 181\nbody_mass_g: 3750\nsex: MALE'), Document(metadata={'source': 'some_data/penguins.csv', 'row': 1}, page_content='species: Adelie\nisland: Torgersen\nbill_length_mm: 39.5\nbill_depth_mm: 17.4\nflipper_length_mm: 186\nbody_mass_g: 3800\nsex: FEMALE'), Document(metadata={'source': 'some_data/penguins.csv', 'row': 2}, page_content='species: Adelie\nisland: Torgersen\nbill_length_mm: 40.3\nbill_depth_mm: 18\nflipper_length_mm: 195\nbody_mass_g: 3250\nsex: FEMALE'), Document(metadata={'source': 'some_data/penguins.csv', 'row': 3}, page_content='species: Adelie\nisland: Torgersen\nbill_length_mm: \nbill_depth_mm: \nflipper_length_mm: \nbody_mass_g: \nsex: '), Document(metadata={'source': 'some_data/penguins.csv', 'row': 4}, page_content='species: Adelie\nisland: Torgersen\nbill_length_mm: 36.7\nbill_depth_mm: 19.

In [23]:
print(data[0].metadata)

{'source': 'some_data/penguins.csv', 'row': 0}


In [24]:
print(data[0].page_content)

species: Adelie
island: Torgersen
bill_length_mm: 39.1
bill_depth_mm: 18.7
flipper_length_mm: 181
body_mass_g: 3750
sex: MALE


## **4. HTML**
To load HTML from web URLs and parse it to text.
- https://python.langchain.com/docs/how_to/document_loader_html/
- https://python.langchain.com/docs/integrations/document_loaders/web_base/

`!pip install beautifulsoup4`

`!pip install --user unstructured`

In [25]:
from langchain_community.document_loaders import WebBaseLoader


loader = WebBaseLoader("https://en.wikipedia.org/wiki/Thanos")

# To bypass SSL verification errors during fetching, you can set the "verify" option:
loader.requests_kwargs = {'verify':False}


data = loader.load()
print(data[0].page_content)







Thanos - Wikipedia


































Jump to content







Main menu





Main menu
move to sidebar
hide



		Navigation
	


Main pageContentsCurrent eventsRandom articleAbout WikipediaContact us





		Contribute
	


HelpLearn to editCommunity portalRecent changesUpload fileSpecial pages



















Search











Search






















Appearance
















Donate

Create account

Log in








Personal tools





Donate Create account Log in





		Pages for logged out editors learn more



ContributionsTalk




























Contents
move to sidebar
hide




(Top)





1
Creation








2
Publication history








3
Fictional character biography








4
Powers and abilities








5
Cultural impact and legacy




Toggle Cultural impact and legacy subsection





5.1
Critical response








5.2
Impact










6
Other versions




Toggle Other versions subsection





6.1
Amalgam Comics








6.2
Earth X








6.3
Her

## **5. Wikipedia**

In [26]:
from langchain_community.document_loaders import WikipediaLoader

docs = WikipediaLoader(query="Thanos", load_max_docs=2).load()
docs

[Document(metadata={'title': 'Thanos', 'summary': "Thanos () is a supervillain appearing in American comic books published by Marvel Comics. Created by writer-artist Jim Starlin, the character first appeared in The Invincible Iron Man #55 (cover date February 1973). An Eternal–Deviant warlord from the moon Titan, Thanos is regarded as one of the most powerful beings in the Marvel Universe. He has clashed with many heroes including the Avengers, the Guardians of the Galaxy, the Fantastic Four, and the X-Men.\nIn creating Thanos, Starlin drew inspiration from Jack Kirby's New Gods series for DC Comics, particularly the character of Darkseid. Thanos is usually portrayed as a villain, although many stories depict him as believing his actions to be justified. Perhaps the character's best-known storyline is The Infinity Gauntlet (1991), the culmination of several story arcs that see him gather the six Infinity Gems and use them to kill half of the universe's population, including many of its

In [27]:
docs[0].metadata  # a part of the page content

{'title': 'Thanos',
 'summary': "Thanos () is a supervillain appearing in American comic books published by Marvel Comics. Created by writer-artist Jim Starlin, the character first appeared in The Invincible Iron Man #55 (cover date February 1973). An Eternal–Deviant warlord from the moon Titan, Thanos is regarded as one of the most powerful beings in the Marvel Universe. He has clashed with many heroes including the Avengers, the Guardians of the Galaxy, the Fantastic Four, and the X-Men.\nIn creating Thanos, Starlin drew inspiration from Jack Kirby's New Gods series for DC Comics, particularly the character of Darkseid. Thanos is usually portrayed as a villain, although many stories depict him as believing his actions to be justified. Perhaps the character's best-known storyline is The Infinity Gauntlet (1991), the culmination of several story arcs that see him gather the six Infinity Gems and use them to kill half of the universe's population, including many of its heroes, to woo Mi

In [28]:
print(docs[0].metadata['summary'])

Thanos () is a supervillain appearing in American comic books published by Marvel Comics. Created by writer-artist Jim Starlin, the character first appeared in The Invincible Iron Man #55 (cover date February 1973). An Eternal–Deviant warlord from the moon Titan, Thanos is regarded as one of the most powerful beings in the Marvel Universe. He has clashed with many heroes including the Avengers, the Guardians of the Galaxy, the Fantastic Four, and the X-Men.
In creating Thanos, Starlin drew inspiration from Jack Kirby's New Gods series for DC Comics, particularly the character of Darkseid. Thanos is usually portrayed as a villain, although many stories depict him as believing his actions to be justified. Perhaps the character's best-known storyline is The Infinity Gauntlet (1991), the culmination of several story arcs that see him gather the six Infinity Gems and use them to kill half of the universe's population, including many of its heroes, to woo Mistress Death, the living embodimen

In [29]:
print(docs[0].page_content)

Thanos () is a supervillain appearing in American comic books published by Marvel Comics. Created by writer-artist Jim Starlin, the character first appeared in The Invincible Iron Man #55 (cover date February 1973). An Eternal–Deviant warlord from the moon Titan, Thanos is regarded as one of the most powerful beings in the Marvel Universe. He has clashed with many heroes including the Avengers, the Guardians of the Galaxy, the Fantastic Four, and the X-Men.
In creating Thanos, Starlin drew inspiration from Jack Kirby's New Gods series for DC Comics, particularly the character of Darkseid. Thanos is usually portrayed as a villain, although many stories depict him as believing his actions to be justified. Perhaps the character's best-known storyline is The Infinity Gauntlet (1991), the culmination of several story arcs that see him gather the six Infinity Gems and use them to kill half of the universe's population, including many of its heroes, to woo Mistress Death, the living embodimen

## 📌 Which One Should You Use?

| **Feature**           | **WikipediaLoader**         | **WebBaseLoader (Wikipedia URL)** |
|----------------------|--------------------------|----------------------------------|
| **Speed**           | ⚡ Fast (API-based)       | 🐢 Slower (Scraping)            |
| **Data Quality**    | ✅ Clean text only       | ❌ Includes unwanted elements (ads, HTML, etc.) |
| **Works for Any URL?** | ❌ No, only Wikipedia    | ✅ Yes, any webpage             |
| **API or Scraping?** | 🌐 API-based (structured data) | 🔍 Scrapes full webpage        |
| **Best Use Case**   | Extracting Wikipedia summaries or full articles | Extracting generic webpage content |

🔹 **Use `WikipediaLoader` when** you need **clean and structured Wikipedia content** quickly.  
🔹 **Use `WebBaseLoader` when** you want to **scrape any webpage, including Wikipedia, but with raw HTML elements**.  


## **6. DirectoryLoader**

The **`DirectoryLoader`** in LangChain can load multiple file types from a directory by specifying different **`glob` patterns** and **parsing functions**.

- https://python.langchain.com/docs/how_to/document_loader_directory/
---

## **🔹 Loading Different File Types with `DirectoryLoader`**
| **File Type** | **Glob Pattern** | **Additional Argument Needed?** |
|--------------|------------------|--------------------------------|
| **Text Files** (`.txt`) | `"**/*.txt"` | No extra arguments |
| **PDF Files** (`.pdf`) | `"**/*.pdf"` | Requires `PyPDFLoader` |
| **Python Files** (`.py`) | `"**/*.py"` | Requires `TextLoader` |

---

## **🔹 Example: Load PDFs, Text, and Python Files**
```python
from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader, TextLoader

# Load text files
text_loader = DirectoryLoader(path, glob="**/*.txt", show_progress=True, use_multithreading=True, loader_cls=TextLoader)

# Load PDF files using PyPDFLoader
pdf_loader = DirectoryLoader(path, glob="**/*.pdf", show_progress=True, use_multithreading=True, loader_cls=PyPDFLoader)

# Load Python files as plain text
python_loader = DirectoryLoader(path, glob="**/*.py", show_progress=True, use_multithreading=True, loader_cls=TextLoader)

# Load all documents
text_docs = text_loader.load()
pdf_docs = pdf_loader.load()
python_docs = python_loader.load()


In [30]:
path = "./some_data/Text_Files"

In [31]:
from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader(path, glob="**/*.txt", show_progress=True, use_multithreading=True)
loaded_docs = loader.load()

100%|██████████| 4/4 [00:00<00:00, 28.67it/s]


In [32]:
len(loaded_docs)

4

In [33]:
loaded_docs

[Document(metadata={'source': 'some_data\\Text_Files\\Credentials.txt'}, page_content='Elasticsearch endpoint:\n\nhttps://cfba131c3e0f47d1b6f7f23ff7298194.us\n\ncentral1.gcp.cloud.es.io:443\n\nCloud ID:\n\nMy_deployment:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyRjZmJhMTMxYzNlMGY0N2QxYjZmN2YyM2ZmNzI5ODE5NCQ0YmUzZmNlMzBjYTM0MTZhOWU0ZWFkYThlN2VjOTMzMA==\n\nAPI Key:\n\nZENHcV9KRUJKUERmdzhNNS05d206TEtNQktwV2FUaVNBTVh3Wi1PR1MzZw=='),
 Document(metadata={'source': 'some_data\\Text_Files\\Instructions for Elastic .txt'}, page_content='1. open terminal 2. cd C:\\Users\\Seyed Barabadi\\Downloads\\Elastic\\Company_names_elastic_search 3. Create virtual env: python -m venv .venv 4. activate it: .venv\\Scripts\\activate 5. test flask: flask run 6. Ctrl +c to quit flask and pip install elasticsearch 7. pip freeze > requirements.txt 8. create a .env  file in which can be safely stored. ELASTIC_CLOUD_ID= ELASTIC_API_KEY= 9. check the connection by runing these command type: python then: from search im