# Loading data using Langchain
### langchain_community.document_loaders
#### Loading text

In [1]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("doc.txt")
loader

<langchain_community.document_loaders.text.TextLoader at 0x186056e58b0>

In [8]:
doc = loader.load()
doc, doc[0], type(doc), type(doc[0])

([Document(metadata={'source': 'doc.txt'}, page_content='Hey, welcome to this document.')],
 Document(metadata={'source': 'doc.txt'}, page_content='Hey, welcome to this document.'),
 list,
 langchain_core.documents.base.Document)

#### Loading PDF

In [11]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(file_path="./Essay.pdf")
loader

<langchain_community.document_loaders.pdf.PyPDFLoader at 0x186058f1f70>

In [15]:
pdf = loader.load()
pdf[0]

Document(metadata={'source': './Essay.pdf', 'page': 0}, page_content="Why one should write?  \n \nWhen we look into the lives of some of the most successful leaders, ancient philosophers, innovators, \nphilanthropists from ancient times to modern times, there is one quality that distinguishes them from \nthe rest: Clarity of thought!  It is this clarity of thought that helps them build great organizations, \nlead people, bring new innova tions and inspire generations.  \nSo the ultimate question revolves around how they became so clear with their thoughts? Steve Jobs \nwas very clear about what he wanted his produ cts to be and what kind of people he wanted in his \norganization. Chanakya was very clear with his vision of a united India and its leadership. Lee Kuan \nYew, a brilliant statesman, was clear with his vision of a modern and economic powerhouse island \nstate, Singapore. Modi is clear with his vision of India -2047 in 2022. Elon Musk was clear with his \nvision of reusable r

#### Web based loader

In [29]:
from langchain_community.document_loaders import WebBaseLoader
import bs4

loader = WebBaseLoader(
    web_path="https://ai.google/", 
    show_progress=True,
    bs_kwargs=dict(parse_only=bs4.SoupStrainer(
        class_=("home-banner", "glue-grid")
    ))    
    )
loader

<langchain_community.document_loaders.web_base.WebBaseLoader at 0x1861e7ebf80>

In [30]:
web_content = loader.load()
web_content[0]



#### Arxiv papers

In [33]:
from langchain_community.document_loaders import ArxivLoader

loader = ArxivLoader(query="2408.13634")
loader

<langchain_community.document_loaders.arxiv.ArxivLoader at 0x1861e87a9f0>

In [34]:
doc = loader.load()
doc

[Document(metadata={'Published': '2024-08-24', 'Title': 'Enhanced Astronomical Source Classification with Integration of Attention Mechanisms and Vision Transformers', 'Authors': 'Srinadh Reddy Bhavanam, Sumohana S. Channappayya, P. K. Srijith, Shantanu Desai', 'Summary': "Accurate classification of celestial objects is essential for advancing our\nunderstanding of the universe. MargNet is a recently developed deep\nlearning-based classifier applied to SDSS DR16 dataset to segregate stars,\nquasars, and compact galaxies using photometric data. MargNet utilizes a\nstacked architecture, combining a Convolutional Neural Network (CNN) for image\nmodelling and an Artificial Neural Network (ANN) for modelling photometric\nparameters. In this study, we propose enhancing MargNet's performance by\nincorporating attention mechanisms and Vision Transformer (ViT)-based models\nfor processing image data. The attention mechanism allows the model to focus on\nrelevant features and capture intricate p

#### Wikipedia Loader

In [37]:
from langchain_community.document_loaders import WikipediaLoader

loader = WikipediaLoader(query="India", lang="en", doc_content_chars_max=25)
loader

<langchain_community.document_loaders.wikipedia.WikipediaLoader at 0x1861dddda00>

In [38]:
doc = loader.load()
doc

[Document(metadata={'title': 'India', 'summary': "India, officially the Republic of India (ISO: Bhārat Gaṇarājya), is a country in South Asia.  It is the seventh-largest country by area; the most populous country from June 2023 and from the time of its independence in 1947, the world's most populous democracy. Bounded by the Indian Ocean on the south, the Arabian Sea on the southwest, and the Bay of Bengal on the southeast, it shares land borders with Pakistan to the west; China, Nepal, and Bhutan to the north; and Bangladesh and Myanmar to the east. In the Indian Ocean, India is in the vicinity of Sri Lanka and the Maldives; its Andaman and Nicobar Islands share a maritime border with Thailand, Myanmar, and Indonesia.\nModern humans arrived on the Indian subcontinent from Africa no later than 55,000 years ago.\nTheir long occupation, initially in varying forms of isolation as hunter-gatherers, has made the region highly diverse, second only to Africa in human genetic diversity. Settle

#### Youtube  loader

In [43]:
from langchain_community.document_loaders import YoutubeLoader

loader = YoutubeLoader(video_id="EkdrxF5YL24")
loader

<langchain_community.document_loaders.youtube.YoutubeLoader at 0x1861e925400>

In [44]:
transcript = loader.load()
transcript

[Document(metadata={'source': 'EkdrxF5YL24'}, page_content='this brings us to the magic address feature which can help you reduce RTO by improving the accuracy of addresses which help ensure higher successful delivery rates all you need to do is simply enter the address and click on fill magic address will automatically fill all the address Fields click on next')]

In [47]:
import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLEAI_API_KEY")

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    google_api_key=GOOGLE_API_KEY,
    temperature=1
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You summaries the given document in 2 or 3 points."),
        ("human", "{input}")
    ]
)

parser = StrOutputParser()

chain = prompt|llm|parser

print(chain.invoke({"input": transcript[0]}))



Here are the key points of the document:

* **Magic Address feature reduces RTO (Return to Origin):** This feature helps improve delivery rates by automatically verifying and correcting addresses. 
* **Simple to use:**  Users simply enter the address, click "Fill Magic Address", and the tool will populate the address fields correctly. 
* **Next step:** After the address is filled, users can click "Next" to proceed. 

