# Langchain Document loaders

Reference => https://python.langchain.com/docs/integrations/document_loaders/

##### Text Loader

In [1]:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("data/charan.txt")
loader

<langchain_community.document_loaders.text.TextLoader at 0x26adda34bc0>

In [2]:
text_documents = loader.load()
text_documents

[Document(metadata={'source': 'data/charan.txt'}, page_content="Welcome to my GitHub!\n\nLet me briefly introduce myself.\nI am Vummethala Venkata Sri Datta Charan. I completed my Bachelor's degree in Computer Science and Engineering at PDPM IIITDM Jabalpur, graduating in 2024. During my time at the institute, I gained a solid understanding of databases and worked with both front-end and back-end technologies such as React, Next.js, JavaScript, and Node.js. I have built some amazing applications using these technologies.\n\nAfter graduation, I joined Infosys as a Specialist Programmer. There, I had the opportunity to learn and work on Generative AI and Responsible AI principles, including safety, security, privacy, explainability, bias, and transparency. I feel privileged to work in such a high-demand and future-focused domain.\n\nRegarding my JEE journey, I secured a 98.97 percentile in JEE Main with an All India Rank of 11,627. I also qualified JEE Advanced with a score of 37% and ac

##### PyPDFLoader

In [3]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("data/charan.pdf")

In [4]:
documents = loader.load()
documents

[Document(metadata={'producer': 'Microsoft® Word 2021', 'creator': 'Microsoft® Word 2021', 'creationdate': '2025-05-30T15:52:28+05:30', 'author': 'VUMMETHALA VENKATA SRI DATTA CHARAN', 'moddate': '2025-05-30T15:52:28+05:30', 'source': 'data/charan.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content="Welcome to my GitHub! \n \nLet me briefly introduce myself. \nI am Vummethala Venkata Sri Datta Charan. I completed my Bachelor's degree in Computer Science \nand Engineering at PDPM IIITDM Jabalpur, graduating in 2024. During my time at the institute, I \ngained a solid understanding of databases and worked with both front-end and back-end \ntechnologies such as React, Next.js, JavaScript, and Node.js. I have built some amazing applications \nusing these technologies. \n \nAfter graduation, I joined Infosys as a Specialist Programmer. There, I had the opportunity to learn \nand work on Generative AI and Responsible AI principles, including safety, security, privacy, \nexpla

##### Web based Loader

In [5]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(web_paths=("https://medium.com/@vipra_singh/ai-agents-introduction-part-1-fbec7edb857d",),)

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [6]:
web_documents = loader.load()
web_documents

[Document(metadata={'source': 'https://medium.com/@vipra_singh/ai-agents-introduction-part-1-fbec7edb857d', 'title': 'AI Agents: Introduction (Part-1). Discover AI agents, their design, and… | by Vipra Singh | Medium', 'description': '· 1. From LLMs to AI agents\n ∘ 1.1 Traditional chatbots to LLM-powered chatbots\n ∘ 1.2 Introduction of LLM-Powered Chatbots\n ∘ 1.3 From LLM-Powered Chatbots to RAG Chatbots and AI Agents\n· 2. What…', 'language': 'en'}, page_content='AI Agents: Introduction (Part-1). Discover AI agents, their design, and… | by Vipra Singh | MediumSitemapOpen in appSign upSign inMedium LogoWriteSign upSign inMember-only storyAI Agents: Introduction (Part-1)Vipra SinghFollow13 min read·Feb 2, 2025--52ShareDiscover AI agents, their design, and real-world applications.Posts in this SeriesIntroduction (This Post)Build an Agent from ScratchAI Agent FrameworksTypes of AI AgentsWorkflow vs AgentAgent ArchitecturesMulti-Agent ArchitecturesBuilding Multi-Agent SystemShort-Term a

##### Arxiv loader

In [7]:
from langchain_community.document_loaders import ArxivLoader
loader = ArxivLoader(query="2408.12935",load_max_docs=2)

In [8]:
docs = loader.load()
docs

[Document(metadata={'Published': '2025-01-15', 'Title': 'Trustworthy, Responsible, and Safe AI: A Comprehensive Architectural Framework for AI Safety with Challenges and Mitigations', 'Authors': 'Chen Chen, Xueluan Gong, Ziyao Liu, Weifeng Jiang, Si Qi Goh, Kwok-Yan Lam', 'Summary': "AI Safety is an emerging area of critical importance to the safe adoption and\ndeployment of AI systems. With the rapid proliferation of AI and especially\nwith the recent advancement of Generative AI (or GAI), the technology ecosystem\nbehind the design, development, adoption, and deployment of AI systems has\ndrastically changed, broadening the scope of AI Safety to address impacts on\npublic safety and national security. In this paper, we propose a novel\narchitectural framework for understanding and analyzing AI Safety; defining its\ncharacteristics from three perspectives: Trustworthy AI, Responsible AI, and\nSafe AI. We provide an extensive review of current research and advancements in\nAI safety fr

##### Wikipedia loader

In [9]:
from langchain_community.document_loaders import WikipediaLoader
loader = WikipediaLoader(query="Generative AI",load_max_docs=2)

In [10]:
docs = loader.load()
docs

[Document(metadata={'title': 'Generative artificial intelligence', 'summary': 'Generative artificial intelligence (Generative AI, GenAI, or GAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts.\nGenerative AI tools have become more common since an "AI boom" in the 2020s. This boom was made possible by improvements in transformer-based deep neural networks, particularly large language models (LLMs). Major tools include chatbots such as ChatGPT, DeepSeek, Copilot, Gemini, Llama, and Grok; text-to-image artificial intelligence image generation systems such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video AI generators such as Sora. Technology companies developing generative AI include OpenAI, Anthropic, Micros