#### 1. Load Pdf Files:

In [15]:
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_google_genai import ChatGoogleGenerativeAI

##### Use GEMINI Api
- Here we are using google gemini api key, which we set in .env files.
- You can create this API Key from Google AI Studio

In [16]:
os.environ["GOOGLE_API_KEY"] = "GEMINI_API_KEY"

##### Use GEMINI Model
- Now you can use whatever model you want of google

In [17]:
llm = ChatGoogleGenerativeAI(model="gemini-pro")

##### Load Pdf Files
- Here we are ready to extract any data of pdf files.
- Because we set the langchain and use gemini model.


In [19]:
loader = PyPDFLoader("day1-langchain.pdf")
documents = loader.load()
print(documents)

[Document(metadata={'producer': 'Skia/PDF m137', 'creator': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/137.0.0.0 Safari/537.36', 'creationdate': '2025-09-28T16:19:18+00:00', 'title': 'Day 1 - LangChain Fundamentals and Data Loader', 'moddate': '2025-09-28T16:19:18+00:00', 'source': 'day1-langchain.pdf', 'total_pages': 7, 'page': 0, 'page_label': '1'}, page_content="Day 1 - LangChain \nFundamentals and Data Loader\nIntroduction to LangChain for Beginners\nWelcome to this beginner-friendly guide to LangChain! We'll explore the basics, \nsetup, and how to work with different data loaders.\nWhat is LangChain?\nLangChain is a framework for building applications with large language models \n\ue081LLMs). Think of it as a toolkit that helps you connect AI models to different \ndata sources and applications.\nLangChain is like a bridge between AI models and your data. It helps you:\nTalk to AI models\nFeed your own data to these models\nBuild useful a

Now, here it shows first 200 chars

In [20]:
print(documents[0].page_content[:200])

Day 1 - LangChain 
Fundamentals and Data Loader
Introduction to LangChain for Beginners
Welcome to this beginner-friendly guide to LangChain! We'll explore the basics, 
setup, and how to work with dif


#### 2. Load Webpages:

In [21]:
from langchain_community.document_loaders import WebBaseLoader

USER_AGENT environment variable not set, consider setting it to identify your requests.


To extract any data we add a link of that webpage.
- Here I enter a python for beginners link.

In [30]:
loader = WebBaseLoader("https://www.python.org/about/gettingstarted/")
documents = loader.load()

Now we have to remove extra spaces

In [31]:
text = documents[0].page_content.strip()
text = " ".join(text.split()) 

print(text[:1000]) 

Python For Beginners | Python.org Notice: While JavaScript is not essential for this website, your interaction with the content will be limited. Please turn JavaScript on for the full experience. Skip to content ▼ Close Python PSF Docs PyPI Jobs Community ▲ The Python Network Donate ≡ Menu Search This Site GO A A Smaller Larger Reset Socialize LinkedIn Mastodon Chat on IRC Twitter About Applications Quotes Getting Started Help Python Brochure Downloads All releases Source code Windows macOS Android Other Platforms License Alternative Implementations Documentation Docs Audio/Visual Talks Beginner's Guide Developer's Guide FAQ Non-English Docs PEP Index Python Books Python Essays Community Diversity Mailing Lists IRC Forums PSF Annual Impact Report Python Conferences Special Interest Groups Python Logo Python Wiki Code of Conduct Community Awards Get Involved Shared Stories Success Stories Arts Business Education Engineering Government Scientific Software Development News Python News PSF

#### 3. Load HTML Data:

In [43]:
from langchain_community.document_loaders import UnstructuredHTMLLoader

In [46]:
loader = UnstructuredHTMLLoader("index.html")
documents = loader.load()

print(documents[0].page_content[:200])

This HTML file is dummy to understand document loader

Linkedin LangChain Series

Linkedin LangChain Series

Linkedin LangChain Series

Linkedin LangChain Series

Linkedin LangChain Series


#### 4. Load Markdown Data:

In [47]:
from langchain_community.document_loaders import UnstructuredMarkdownLoader

In [50]:
loader = UnstructuredMarkdownLoader("README.md")
documents = loader.load()

print(documents[0].page_content)

LangChain

Installation Guide

conda create -n environment-name python=3.11 conda activate environment-name

pip install --upgrade "langchain>=0.3,<0.4" pip install --upgrade "langchain-community>=0.3,<0.4" pip install --upgrade "langchain-text-splitters>=0.3,<0.4" pip install --upgrade "langchain-core>=0.3,<0.4" pip install google-generativeai


#### Write a Custom Document Loader?

In [40]:
from langchain_community.document_loaders.base import BaseLoader
from langchain_core.documents import Document

In [41]:
class MyCustomLoader(BaseLoader):
    def load(self):
        data = "I am creating my custom loader. This is my first lecture of LangChain Series."
        return [Document(page_content=data)]

In [42]:
loader = MyCustomLoader()
documents = loader.load()

print(documents[0].page_content)

I am creating my custom loader. This is my first lecture of LangChain Series.
