## Document loaders
DocumentLoaders load data into the standard LangChain Document format.

Each DocumentLoader has its own specific parameters, but they can all be invoked in the same way with the .load method. An example use case is as follows:

In [None]:

from langchain_community.document_loaders import TextLoader

loader = TextLoader(["speech.txt","speech_1.txt"], encoding="UTF-8")
data = loader.load()
print(data)
print(data[0].page_content)


[Document(metadata={'source': 'speech.txt'}, page_content='The world must be made safe for democracy. Its peace must be planted upon the tested foundations of political liberty. We have no selfish ends to serve. We desire no conquest, no dominion. We seek no indemnities for ourselves, no material compensation for the sacrifices we shall freely make. We are but one of the champions of the rights of mankind. We shall be satisfied when those rights have been made as secure as the faith and the freedom of nations can make them.\n\nJust because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.\n\nIt will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness because

### Load PDF Using PypdfLoader

In [None]:
from langchain_community.document_loaders import PyPDFLoader
import pprint

loader = PyPDFLoader("attention.pdf")
data = loader.load()
print(data)
print(len(data),"pages")
print(data[0].metadata)
pprint.pp(data[1].metadata)

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-08-03T00:07:29+00:00', 'author': '', 'keywords': '', 'moddate': '2023-08-03T00:07:29+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'attention.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszk

## WebBaseLoader for load all text from HTML webpages into a document

In [None]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(["https://www.example.com/"])
data = loader.load()
print(data[0].metadata)
print(data )

{'source': 'https://www.example.com/', 'title': 'Example Domain', 'language': 'No language found.'}
[Document(metadata={'source': 'https://www.example.com/', 'title': 'Example Domain', 'language': 'No language found.'}, page_content='\n\n\nExample Domain\n\n\n\n\n\n\n\nExample Domain\nThis domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.\nMore information...\n\n\n\n')]


## ArxivLoader
arXiv is an open-access archive for 2 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.

In [None]:
from langchain_community.document_loaders import ArxivLoader

loader = ArxivLoader(
    query="1706.03762",
    load_max_docs=2,
    # load_all_available_meta=False
)
docs = loader.load()


## Wikipedia Loader

In [82]:
from langchain_community.document_loaders import WikipediaLoader

In [95]:
docs = WikipediaLoader(query="HUNTER X HUNTER", load_max_docs=2,doc_content_chars_max=7000).load()
len(docs)

2

In [96]:
docs

[Document(metadata={'title': 'Hunter × Hunter', 'summary': 'Hunter × Hunter (pronounced "hunter hunter") is a Japanese manga series written and illustrated by Yoshihiro Togashi. It has been serialized in Shueisha\'s shōnen manga magazine Weekly Shōnen Jump since March 1998, although the manga has frequently gone on extended hiatuses since 2006. Its chapters have been collected in 38 tankōbon volumes as of September 2024. The story focuses on a young boy named Gon Freecss who discovers that his father, who left him at a young age, is actually a world-renowned Hunter, a licensed professional who specializes in fantastical pursuits such as locating rare or unidentified animal species, treasure hunting, surveying unexplored enclaves, or hunting down lawless individuals. Gon departs on a journey to become a Hunter and eventually find his father. Along the way, Gon meets various other Hunters and encounters the paranormal.\nHunter × Hunter was adapted into a 62-episode anime television serie

In [97]:
docs[0].metadata

{'title': 'Hunter × Hunter',
 'summary': 'Hunter × Hunter (pronounced "hunter hunter") is a Japanese manga series written and illustrated by Yoshihiro Togashi. It has been serialized in Shueisha\'s shōnen manga magazine Weekly Shōnen Jump since March 1998, although the manga has frequently gone on extended hiatuses since 2006. Its chapters have been collected in 38 tankōbon volumes as of September 2024. The story focuses on a young boy named Gon Freecss who discovers that his father, who left him at a young age, is actually a world-renowned Hunter, a licensed professional who specializes in fantastical pursuits such as locating rare or unidentified animal species, treasure hunting, surveying unexplored enclaves, or hunting down lawless individuals. Gon departs on a journey to become a Hunter and eventually find his father. Along the way, Gon meets various other Hunters and encounters the paranormal.\nHunter × Hunter was adapted into a 62-episode anime television series by Nippon Animat

In [98]:
docs[0]

Document(metadata={'title': 'Hunter × Hunter', 'summary': 'Hunter × Hunter (pronounced "hunter hunter") is a Japanese manga series written and illustrated by Yoshihiro Togashi. It has been serialized in Shueisha\'s shōnen manga magazine Weekly Shōnen Jump since March 1998, although the manga has frequently gone on extended hiatuses since 2006. Its chapters have been collected in 38 tankōbon volumes as of September 2024. The story focuses on a young boy named Gon Freecss who discovers that his father, who left him at a young age, is actually a world-renowned Hunter, a licensed professional who specializes in fantastical pursuits such as locating rare or unidentified animal species, treasure hunting, surveying unexplored enclaves, or hunting down lawless individuals. Gon departs on a journey to become a Hunter and eventually find his father. Along the way, Gon meets various other Hunters and encounters the paranormal.\nHunter × Hunter was adapted into a 62-episode anime television series

In [99]:
pprint.pp(docs[0].page_content)

('Hunter × Hunter (pronounced "hunter hunter") is a Japanese manga series '
 'written and illustrated by Yoshihiro Togashi. It has been serialized in '
 "Shueisha's shōnen manga magazine Weekly Shōnen Jump since March 1998, "
 'although the manga has frequently gone on extended hiatuses since 2006. Its '
 'chapters have been collected in 38 tankōbon volumes as of September 2024. '
 'The story focuses on a young boy named Gon Freecss who discovers that his '
 'father, who left him at a young age, is actually a world-renowned Hunter, a '
 'licensed professional who specializes in fantastical pursuits such as '
 'locating rare or unidentified animal species, treasure hunting, surveying '
 'unexplored enclaves, or hunting down lawless individuals. Gon departs on a '
 'journey to become a Hunter and eventually find his father. Along the way, '
 'Gon meets various other Hunters and encounters the paranormal.\n'
 'Hunter × Hunter was adapted into a 62-episode anime television series by '
 'Ni