## Document Loaders
https://python.langchain.com/docs/integrations/document_loaders/

### Text

In [None]:
from json import load
from langchain_community.document_loaders import TextLoader
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from dotenv import load_dotenv

load_dotenv()
model = ChatGoogleGenerativeAI(temperature=0, model="gemini-2.5-flash")

prompt = PromptTemplate(
    input_variables=["text"],
    template="You are a helpful assistant. Write a Summary for the following text- {text}",
)

parser = StrOutputParser()
loader = TextLoader('data/be-good.txt')
docs = loader.load()
#print(docs[0].page_content)
print(docs[0].metadata)

chain = prompt | model | parser

chain.invoke({'text':docs[0].page_content})

'In "Be Good," Paul Graham argues that highly successful startups often operate like charities in their early stages, focusing intensely on "making something people want" rather than immediately prioritizing revenue. He cites examples like Craigslist and early Google, which were indistinguishable from nonprofits in their initial phases, yet achieved immense success.\n\nGraham contends that this "benevolent" approach offers three key advantages:\n1.  **Morale:** It boosts founder morale, providing a sense of mission that helps them persist through difficult periods, much like caring for a "tamagotchi" (their users).\n2.  **Help:** It attracts external support from investors, customers, other companies, and especially top talent, as people are naturally inclined to help good causes.\n3.  **Compass:** It provides a clear, "stateless" decision-making framework: always doing what\'s best for the users, which simplifies complex choices.\n\nGraham suggests that companies like early Microsoft,

## CSV loader

In [5]:
from langchain_community.document_loaders import CSVLoader
loader = CSVLoader("data/Street_Tree_List.csv")
data = loader.load()
data

[Document(metadata={'source': 'data/Street_Tree_List.csv', 'row': 0}, page_content="TreeID: 168225\nqLegalStatus: DPW Maintained\nqSpecies: Arbutus 'Marina' :: Hybrid Strawberry Tree\nqAddress: 2547 Vallejo St\nSiteOrder: 1\nqSiteInfo: Sidewalk: Curb side : Cutout\nPlantType: Tree\nqCaretaker: Private\nqCareAssistant: \nPlantDate: \nDBH: 0\nPlotSize: Width 4ft\nPermitNotes: \nXCoord: 6001190.70767\nYCoord: 2117587.71154\nLatitude: 37.794558498932695\nLongitude: -122.43986908930563\nLocation: (37.794558498932695, -122.43986908930563)\nFire Prevention Districts: 13\nPolice Districts: 9\nSupervisor Districts: 1\nZip Codes: 57\nNeighborhoods (old): 27\nAnalysis Neighborhoods: 30"),
 Document(metadata={'source': 'data/Street_Tree_List.csv', 'row': 1}, page_content="TreeID: 168228\nqLegalStatus: DPW Maintained\nqSpecies: Arbutus 'Marina' :: Hybrid Strawberry Tree\nqAddress: 2547 Vallejo St\nSiteOrder: 4\nqSiteInfo: Sidewalk: Curb side : Cutout\nPlantType: Tree\nqCaretaker: Private\nqCareAssi

### PDF Loader
This is used to load content from .pdf files. There are different classes of pdf loader depending on the data present in the file.

- Simple, clean PDFs: Use PyPDFLoader
- PDFs with tables/columns: Use PDFPlumberLoader
- Scanned/image PDFs: Use UnstructuredPDFLoader or AmazonTextractPDFLoader
- Need layout and image data: Use PyMuPDFLoader
- Want best structure extraction: Use UnstructuredPDFLoader

In [4]:
## PDF Loader
from langchain_community.document_loaders import PyMuPDFLoader
loader = PyMuPDFLoader("Attenstion all you Need.pdf")
data = loader.load()
data

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'source': 'Attenstion all you Need.pdf', 'file_path': 'Attenstion all you Need.pdf', 'total_pages': 15, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'trapped': '', 'modDate': 'D:20240410211143Z', 'creationDate': 'D:20240410211143Z', 'page': 0}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle B

## Web_Loader

In [1]:
## Web_Loader
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
data

USER_AGENT environment variable not set, consider setting it to identify your requests.


[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final resu

### Wikipedialoader

In [2]:
from langchain_community.document_loaders import WikipediaLoader

loader = WikipediaLoader(query="Tesla",load_max_docs=1)
data =loader.load()

In [3]:
data

[Document(metadata={'title': 'Nikola Tesla', 'summary': "Nikola Tesla (10 July 1856 – 7 January 1943) was a Serbian-American engineer, futurist, and inventor. He is known for his contributions to the design of the modern alternating current (AC) electricity supply system.\nBorn and raised in the Austrian Empire, Tesla first studied engineering and physics in the 1870s without receiving a degree. He then gained practical experience in the early 1880s working in telephony and at Continental Edison in the new electric power industry. In 1884, he immigrated to the United States, where he became a naturalized citizen. He worked for a short time at the Edison Machine Works in New York City before he struck out on his own. With the help of partners to finance and market his ideas, Tesla set up laboratories and companies in New York to develop a range of electrical and mechanical devices. His AC induction motor and related polyphase AC patents, licensed by Westinghouse Electric in 1888, earned

In [6]:
from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("human", "Answer this {question}, here is some extra {context}"),
    ]
)

messages = chat_template.format_messages(
    name="Tesla",
    question="Tell me about Tesla",
    context=data
)

In [5]:
messages

[HumanMessage(content='Answer this Tell me about OpenAI, here is some extra [Document(metadata={\'title\': \'Nikola Tesla\', \'summary\': "Nikola Tesla (10 July 1856 – 7 January 1943) was a Serbian-American engineer, futurist, and inventor. He is known for his contributions to the design of the modern alternating current (AC) electricity supply system.\\nBorn and raised in the Austrian Empire, Tesla first studied engineering and physics in the 1870s without receiving a degree. He then gained practical experience in the early 1880s working in telephony and at Continental Edison in the new electric power industry. In 1884, he immigrated to the United States, where he became a naturalized citizen. He worked for a short time at the Edison Machine Works in New York City before he struck out on his own. With the help of partners to finance and market his ideas, Tesla set up laboratories and companies in New York to develop a range of electrical and mechanical devices. His AC induction motor 

In [8]:
response = LLM.invoke(messages)

In [None]:
response.content

'<think>\nOkay, I need to answer the question "Tell me about Tesla" using the provided document. Let me start by reading through the document carefully to gather the key points.\n\nFirst, the document gives a detailed summary of Nikola Tesla\'s life. He was a Serbian-American engineer, futurist, and inventor known for his work on the AC electricity supply system. Born in 1856 in the Austrian Empire, he studied engineering and physics but didn\'t get a degree. He worked in telephony and at Continental Edison in the 1880s. Then he moved to the US in 1884 and became a naturalized citizen. He worked with Edison but eventually went on his own. His AC induction motor and polyphase system were licensed by Westinghouse in 1888. \n\nHe did experiments with mechanical oscillators, electrical discharge tubes, and early X-rays. He also made a wireless-controlled boat. He was known for his public demonstrations and showmanship. In the 1890s, he worked on wireless lighting and power distribution, le

: 