### RAG - Document Loaders

##### Boilerplate code

In [None]:
import langchain
import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

load_dotenv()

google_api_key = os.getenv("GOOGLE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")

google_llm = ChatGoogleGenerativeAI(
    temperature=0, 
    model="gemini-2.0-flash", 
    api_key=google_api_key,
    max_tokens=200
)

openai_llm = ChatOpenAI(
    temperature=0, 
    model="gpt-4", 
    api_key=openai_api_key
)

##### TextLoader

In [19]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader('./docs_for_rag/coolie_large.txt')

documents = loader.load()

for document in documents:
    print(document)

page_content='Devaraj “Deva” runs a boarding house where he takes care of his late friend Rajasekhar’s daughter Preethi and although Rajasekhar’s sudden death is officially blamed on a heart attack Deva immediately suspects foul play and begins to investigate uncovering a dangerous criminal syndicate led by Simon and his men Dayalan (Dayal) and Kalyani who not only smuggle gold and luxury goods but also secretly kill people using a special cremation-chair device to dispose of bodies with Dayal even murdering an undercover policeman disguised as a coolie while Preethi who knows how the device works becomes a direct target of their sinister operations forcing Deva to step in to protect her gradually revealing shocking truths about the gang including Kalyani’s hidden identity Preethi’s actual relation as Deva’s daughter and Simon’s past connections to Deva’s own history recalling that years ago Deva was a union leader in Mandwa leading coolies against exploitation and that Simon is the so

##### CSVLoader

In [20]:
from langchain_community.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(
    './docs_for_rag/cars.csv'
)

data = loader.load()

# print(data)

for document in data:
    print(document.page_content, "\n")

Brand: Maruti Suzuki
Model: Swift
Year: 2023
Engine_CC: 1197
Fuel_Type: Petrol
Transmission: Manual
Price_Lakh: 8.5
Mileage_kmpl: 22.3 

Brand: Hyundai
Model: Creta
Year: 2024
Engine_CC: 1497
Fuel_Type: Diesel
Transmission: Automatic
Price_Lakh: 17.8
Mileage_kmpl: 19.1 

Brand: Tata
Model: Nexon EV
Year: 2023
Engine_CC: 0
Fuel_Type: Electric
Transmission: Automatic
Price_Lakh: 15.9
Mileage_kmpl: 320 km/charge 

Brand: Mahindra
Model: Scorpio N
Year: 2024
Engine_CC: 2198
Fuel_Type: Diesel
Transmission: Manual
Price_Lakh: 20.3
Mileage_kmpl: 16.2 

Brand: Honda
Model: Amaze
Year: 2022
Engine_CC: 1199
Fuel_Type: Petrol
Transmission: CVT
Price_Lakh: 9.7
Mileage_kmpl: 18.6 

Brand: Kia
Model: Seltos
Year: 2024
Engine_CC: 1493
Fuel_Type: Diesel
Transmission: Automatic
Price_Lakh: 18.9
Mileage_kmpl: 19.8 

Brand: Toyota
Model: Innova Hycross
Year: 2023
Engine_CC: 1987
Fuel_Type: Hybrid
Transmission: Automatic
Price_Lakh: 28.5
Mileage_kmpl: 21.1 

Brand: Skoda
Model: Slavia
Year: 2024
Engine_CC

##### WebBaseLoader

In [22]:
from langchain_community.document_loaders import WebBaseLoader

# loader = WebBaseLoader("https://www.orkut.com/")
# docs = loader.load()

loader_multiple_pages = WebBaseLoader(
    ["https://www.orkut.com/", "https://google.com", "https://facebook.com", "https://linkedin.com", "https://x.com"]
)
docs = loader_multiple_pages.lazy_load()


for doc in docs:
    print(doc, "\n")

page_content='








orkut










languages:

English


Português





Hi there,
I’m Orkut. Seventeen years ago I started a little social network while I was an engineer at Google. In just a few years, that social network - orkut.com - grew to a community of over 300 million people.
I believe that orkut.com found a community because it brought so many diverse voices from around the world together in one place. We worked hard to make orkut.com a community where hate and disinformation were not tolerated. We worked hard to make orkut.com a community where you could go meet real people who shared your interests, not just people who liked and commented on your photos.
The world needs kindness now more than ever. There is so much hate online these days, and our options for finding and building real connections are few and far between. I’ve always believed that a friendship is more than a friend request, and I have dedicated my life to helping millions of you build authentic connections

##### UnstructuredLoader - Loading Images

In [23]:
from langchain_unstructured import UnstructuredLoader

file_paths = [
    './docs_for_rag/images.jpeg',
    './docs_for_rag/nexon_brochure.pdf'
]

try:
    loader = UnstructuredLoader(file_paths)
    docs = loader.load()
    for doc in docs:
        if doc.page_content:
            print(doc.page_content, "\n")
        else:
            print("No text content found in the image")
except Exception as e:
    print(f"Error: {e}")

INFO: Reading image file: ./docs_for_rag/images.jpeg ...


No text content found in the image
ex Tata New 

Tata Neon Al Variants New Prices 

INTRODUCING 2025 

More Style. More Safety. More Tech. 

THE NEXON PHILOSOPHY 

People are its true inspiration. 

People who think far ahead. 

Who go the extra mile. And stay ahead of the curve. 

The Nexon is simply the icon of this attitude. 

It's beyond just a car, 

It’s an aspiration. It’s an ideology. 

A belief in blazing new trails. 

And moving ahead. 

PERSONAS 

Find the Nexon that matches you 

FEARLESS 

CREATIVE 

No is never your answer. 

Your inner child is creative. 

To this adventure called life. 

Wide-eyed and inquisitive. 

Be it a long drive. 

The world is yours to explore. 

Or a cross-country drive. 

You drive your passion. 

The answer is always yes. 

PURE 

You live in the moment. 

Enjoying the smallest of joys. 

You pride in being yourself. 

And lead a life of ultimate sophistication. 

SMART 

Pragmatism is your thing. 

You believe in results. 

Smart work over ha

### And much more - Refer langchain document loader code webpage