### Document Loaders and Integrations

In [1]:
from dotenv import find_dotenv, load_dotenv
import os
from langchain_openai import AzureChatOpenAI

load_dotenv(find_dotenv())

model = AzureChatOpenAI(
    openai_api_type="azure",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    azure_deployment=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    openai_api_key=os.getenv("AZURE_OPENAI_API_KEY")
)

1. CSV Loader

In [3]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path="../docs/sf_aliens.csv")
data = loader.load()
print(data[0])

page_content='Book_Title: Obsidian
Original_Book_Title: Obsidian
Author_Name: Jennifer L. Armentrout
Edition_Language: English
Rating_score: 4.17
Rating_votes: 236780
Review_number: 18161
Book_Description: Starting over sucks.When we moved to West Virginia right before my senior year, I’d pretty much resigned myself to thick accents, dodgy internet access, and a whole lot of boring… until I spotted my hot neighbor, with his looming height and eerie green eyes. Things were looking up.And then he opened his mouth.Daemon is infuriating. Arrogant. Stab-worthy. We do not get along. At all. But when a stranger attacks me and Daemon literally freezes time with a wave of his hand, well, something… unexpected happens. The hot alien living next door marks me.You heard me. Alien. Turns out Daemon and his sister have a galaxy of enemies wanting to steal their abilities, and Daemon’s touch has me lit up like the Vegas Strip. The only way I’m getting out of this alive is by sticking close to Daemon 

In [4]:
print(data[1])

page_content='Book_Title: Onyx
Original_Book_Title: Onyx
Author_Name: Jennifer L. Armentrout
Edition_Language: English
Rating_score: 4.27
Rating_votes: 153429
Review_number: 10497
Book_Description: BEING CONNECTED TO DAEMON BLACK SUCKS… Thanks to his alien mojo, Daemon’s determined to prove what he feels for me is more than a product of our bizarro connection. So I’ve sworn him off, even though he’s running more hot than cold these days. But we’ve got bigger problems.SOMETHING WORSE THAN ARUM HAS COME TO TOWN The Department of Defense is here. If they ever find out what Daemon can do and that we’re linked, I’m a goner. So is he. And there’s this new boy in school who’s got a secret of his own. He knows what’s happened to me and he can help, but to do so, I have to lie to Daemon and stay away from him. Like that’s possible. Against all common sense, I’m falling for Daemon. Hard.BUT THEN, EVERYTHING CHANGES I’ve seen someone who shouldn’t be alive. And I have to tell Daemon, even though 

In [5]:
print(data[0].page_content)

Book_Title: Obsidian
Original_Book_Title: Obsidian
Author_Name: Jennifer L. Armentrout
Edition_Language: English
Rating_score: 4.17
Rating_votes: 236780
Review_number: 18161
Book_Description: Starting over sucks.When we moved to West Virginia right before my senior year, I’d pretty much resigned myself to thick accents, dodgy internet access, and a whole lot of boring… until I spotted my hot neighbor, with his looming height and eerie green eyes. Things were looking up.And then he opened his mouth.Daemon is infuriating. Arrogant. Stab-worthy. We do not get along. At all. But when a stranger attacks me and Daemon literally freezes time with a wave of his hand, well, something… unexpected happens. The hot alien living next door marks me.You heard me. Alien. Turns out Daemon and his sister have a galaxy of enemies wanting to steal their abilities, and Daemon’s touch has me lit up like the Vegas Strip. The only way I’m getting out of this alive is by sticking close to Daemon until my alien

2. HTML Loader

In [8]:
from langchain.document_loaders import UnstructuredHTMLLoader
# This loader requires python "unstructured" package

loader = UnstructuredHTMLLoader(file_path="../docs/simple.html")
data = loader.load()
print(data[0].page_content)

My First Heading

My first paragraph.


3. PDF Loader

In [10]:
# pip install pypdf
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader(file_path="../docs/sample.pdf")
data = loader.load()
print(data[0].page_content)

Sample PDFThis is a simple PDF ﬁle. Fun fun fun.
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Phasellus facilisis odio sed mi. 
Curabitur suscipit. Nullam vel nisi. Etiam semper ipsum ut lectus. Proin aliquam, erat eget 
pharetra commodo, eros mi condimentum quam, sed commodo justo quam ut velit. 
Integer a erat. Cras laoreet ligula cursus enim. Aenean scelerisque velit et tellus. 
Vestibulum dictum aliquet sem. Nulla facilisi. Vestibulum accumsan ante vitae elit. Nulla 
erat dolor, blandit in, rutrum quis, semper pulvinar, enim. Nullam varius congue risus. 
Vivamus sollicitudin, metus ut interdum eleifend, nisi tellus pellentesque elit, tristique 
accumsan eros quam et risus. Suspendisse libero odio, mattis sit amet, aliquet eget, 
hendrerit vel, nulla. Sed vitae augue. Aliquam erat volutpat. Aliquam feugiat vulputate nisl. 
Suspendisse quis nulla pretium ante pretium mollis. Proin velit ligula, sagittis at, egestas a, 
pulvinar quis, nisl.
Pellentesque sit amet lectus. P

Integrations are third party interfaces available to be called within langchain, see more at: https://python.langchain.com/docs/integrations/document_loaders/youtube_transcript

4. Youtube Loader

As of 11/11/2024, pytube got issue fetching youtube video info, below is a quick fix locally, by adding `use_oauth` and `allow_oauth_cache` to local python environment, like `/workspaces/LangChain/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/youtube.py`.

![image](../docs/youtube_py_local_rewrite.jpg)

In [2]:
# pip install pytube
# pip install youtube-transcript-api
from langchain.document_loaders import YoutubeLoader

loader = YoutubeLoader.from_youtube_url(
    youtube_url="https://www.youtube.com/watch?v=1LR6NPpFxw4",
    add_video_info = True,
    # use_oauth=True,
    # all_oauth_cache=True
)

data = loader.load()
print(data[0].page_content)

Please open https://www.google.com/device and input code QGV-VQD-HLQ
sixty minutes rewind captain hopper is a whiz at mathematics some would say a genius one of that small band of brothers and sisters who ushered in the computer revolution in World War two at age 37 she left her professor's job at Vassar to serve as a lieutenant in the Navy Reserve she was sent to Harvard to help program the very first computer it had the unglamorous name of mark 1 and as far as we've progressed since this vacuum tube monster captain hopper says we ain't seen nothing yet you talk a lot about the computer revolution I thought we're in it and it's over no we're only at the beginning we've been through the preliminaries well what's it gonna be well we've got the Model T just the Model T that's where we are now she's up many mornings before 5:00 for the ride to Washington's National Airport 200 days a year she lectures two computer scientists at military bases she's also in demand on college campuses and a