Document loaders provide a **standard interface** for reading data from different sources into LangChain’s Document format.

In [None]:
!pip install -q "langchain==0.3.27" "langchain-community==0.3.31" pypdf pymupdf unstructured libmagic python-magic

### text loader

In [2]:
from langchain.document_loaders import TextLoader

In [None]:
loader = TextLoader("/content/gold.txt")
data = loader.load()
print(data)

[Document(metadata={'source': '/content/gold.txt'}, page_content='The sharp rally comes as markets digest signs that the Federal Reserve may soon shift its policy stance. With inflation concerns easing but job market risks rising, traders are betting heavily on a 25-basis-point rate cut in both October and December. Those expectations have weakened the dollar, boosted commodities, and made gold more appealing to investors seeking stability over volatility.\n\nAt the same time, the ongoing U.S. government shutdown, now stretching into its 13th day, has raised fears about political gridlock and fiscal disruption. The combination of economic uncertainty and political strain has amplified investors’ preference for hard assets like gold and silver. Safe-haven buying has surged across global markets, from New York to Singapore.\n\nThe precious metals rally isn’t just about policy expectations. Renewed tensions between Washington and Beijing over trade, technology, and market access have resu

### load PDF

In [None]:
from langchain_community.document_loaders import PyPDFLoader

file_path = "/content/Transformer archtecture.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

print(len(docs))

14


In [None]:
from langchain_community.document_loaders import PyMuPDFLoader

file_path = "/content/Transformer archtecture.pdf"
loader = PyMuPDFLoader(file_path)

docs = loader.load()
docs[0].metadata

{'producer': 'Skia/PDF m139 Google Docs Renderer',
 'creator': '',
 'creationdate': '',
 'source': '/content/Transformer archtecture.pdf',
 'file_path': '/content/Transformer archtecture.pdf',
 'total_pages': 14,
 'format': 'PDF 1.4',
 'title': 'Transformer architecture Details',
 'author': '',
 'subject': '',
 'keywords': '',
 'moddate': '',
 'trapped': '',
 'modDate': '',
 'creationDate': '',
 'page': 0}

1.   PyPDFLoader - Extract Images
2.   PyMuPDFLoader  - Extract Images, Extract Tables

### UnstructuredURLLoader

In [None]:
from langchain.document_loaders import UnstructuredURLLoader

loader = UnstructuredURLLoader(urls = [
    "https://huggingface.co/blog/swift-transformers",
    "https://huggingface.co/blog/openvino-vlm",
])

In [None]:
docs = loader.load()

In [None]:
docs[0].page_content

'Back to Articles\n\nSwift Transformers Reaches 1.0 – and Looks to the Future\n\nPublished September 26, 2025\n\nUpdate on GitHub\n\nUpvote\n\n30\n\n\n\n\n\n\n\n\n\n\n\n\n\nPedro Cuenca\'s avatar\n\nPedro Cuenca\n\npcuenq\n\nChristopher Fleetwood\'s avatar\n\nChristopher Fleetwood\n\nFL33TW00D-HF\n\nMattt\'s avatar\n\nMattt\n\nmattt\n\nVaibhav Srivastav\'s avatar\n\nVaibhav Srivastav\n\nreach-vb\n\nWe released swift-transformers two years ago (!) with the goal to support Apple developers and help them integrate local LLMs in their apps. A lot has changed since then (MLX and chat templates did not exist!), and we’ve learned how the community is actually using the library.\n\nWe want to double down on the use cases that provide most benefits to the community, and lay out the foundations for the future. Spoiler alert: after this release, we’ll focus a lot on MLX and agentic use cases 🚀\n\nWhat is swift-transformers\n\nswift-transformers is a Swift library that aims to reduce the friction 

### Text spliter

In [None]:
text = """To showcase the real-world potential, we deployed our optimized setup with the smolagents library. With this integration, developers can plug in Qwen3-8B (paired with our pruned draft) to build agents that call APIs and external tools, write and execute code, handle long-context reasoning and run efficiently on Intel® Core™ Ultra. The benefits aren’t limited to Hugging Face, this model pairing can also be used seamlessly with frameworks like AutoGen or QwenAgent, further strengthening the agentic ecosystem.
In our demo, we assigned the accelerated Qwen3-based agent a task: Summarize the key features of the Qwen3 model series and present them in a slide deck.

Here’s how it worked: 1. The agent used a web search tool to gather up-to-date information. 2. It then switched to the Python interpreter to generate slides with the python-pptx library. This simple workflow highlights just a fraction of the possibilities unlocked when accelerated Qwen3 models meet frameworks like smolagents, bringing practical, efficient AI agents to life on AI PC. Try it here.
"""

CharacterTextSplitter

In [None]:
from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(
    separator = "\n\n",
    chunk_size = 200,
    chunk_overlap = 0
)

chunks = splitter.split_text(text)
len(chunks)



2

In [None]:
for chunk in chunks:
  print(len(chunk))

666
398


RecursiveCharacterTextSplitter

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

r_splitter = RecursiveCharacterTextSplitter(
    separators = ["\n\n", "\n", " "],
    chunk_size = 200,
    chunk_overlap = 0

)

chunks = r_splitter.split_text(text)
len(chunks)

6

In [None]:
for chunk in chunks:
  print(len(chunk))

200
199
111
153
198
199
