# Document Loading

### Data Connection
Many LLM applications require user-specific data that is not part of the model's training set. LangChain gives you the building blocks to load, transform, store and query your data via:

Document loaders: Load documents from many different sources
Document transformers: Split documents, drop redundant documents, and more
Text embedding models: Take unstructured text and turn it into a list of floating point numbers
Vector stores: Store and search over embedded data
Retrievers: Query your data
![Data Connection](https://python.langchain.com/assets/images/data_connection-c42d68c3d092b85f50d08d4cc171fc25.jpg)

### Install LangChain

In [50]:
! pip install langchain



### Set Environmental Variabel

In [51]:
import os
import openai
import sys
sys.path.append('../..')

def set_env_var(name, value):
    os.environ[name] = value

set_env_var("OPENAI_API_KEY", "OPENAI_API_KEY")

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

# PDF

In [5]:
! pip install pypdf



In [57]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("Book -- Deep Learning with TensorFlow and Keras.pdf")
pages = loader.load_and_split()

In [58]:
len(pages)

688

In [59]:
page = pages[0]

In [60]:
print(page.page_content[0:500])

Deep Learning with TensorFlow 
and Keras
Third Edition
Build and deploy supervised, unsupervised, deep, and reinforcement 
learning models
Amita Kapoor
Antonio Gulli
Sujit Pal
BIRMINGHAM—MUMBAI


In [61]:
page.metadata

{'source': 'Book -- Deep Learning with TensorFlow and Keras.pdf', 'page': 1}

# EXCEL

In [62]:
from langchain.document_loaders.csv_loader import CSVLoader


loader = CSVLoader(file_path='file_example_XLSX_5000.csv')
data = loader.load()

In [63]:
print(data)

[Document(page_content='\ufeff: 1\nFirst Name: Dulce\nLast Name: Abril\nGender: Female\nCountry: United States\nAge: 32\nDate: 15/10/2017\nId: 1562', metadata={'source': 'file_example_XLSX_5000.csv', 'row': 0}), Document(page_content='\ufeff: 2\nFirst Name: Mara\nLast Name: Hashimoto\nGender: Female\nCountry: Great Britain\nAge: 25\nDate: 16/08/2016\nId: 1582', metadata={'source': 'file_example_XLSX_5000.csv', 'row': 1}), Document(page_content='\ufeff: 3\nFirst Name: Philip\nLast Name: Gent\nGender: Male\nCountry: France\nAge: 36\nDate: 21/05/2015\nId: 2587', metadata={'source': 'file_example_XLSX_5000.csv', 'row': 2}), Document(page_content='\ufeff: 4\nFirst Name: Kathleen\nLast Name: Hanner\nGender: Female\nCountry: United States\nAge: 25\nDate: 15/10/2017\nId: 3549', metadata={'source': 'file_example_XLSX_5000.csv', 'row': 3}), Document(page_content='\ufeff: 5\nFirst Name: Nereida\nLast Name: Magwood\nGender: Female\nCountry: United States\nAge: 58\nDate: 16/08/2016\nId: 2468', meta

# YouTube

In [64]:
from langchain.document_loaders import YoutubeLoader

In [44]:
!pip install youtube-transcript-api
!pip install pytube

Collecting pytube
  Downloading pytube-15.0.0-py3-none-any.whl (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m134.5 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: pytube
Successfully installed pytube-15.0.0


In [65]:
loader = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=9Mjcs1R0tfA", add_video_info=True
)

In [66]:
loader.load()

[Document(page_content="foreign and welcome back to my channel in this tutorial we'll be diving into the exciting world of deep learning and exploring the power of neural network we are going to build an Activision neural network using tensorflow and python with multiple neuron and fully connected layers the data sets that we are going to use belong to a bank that is studying its customer to see whether they are going to leave or stay and it contains around 10 000 observations you will see that in this tutorial we are going to use the letter P processing template that we created before on this channel and we are going to use different set of tools as well I would highly recommend for you guys to watch my previous tutorial about the artificial neural network on this channel I'm going to use Google collab but you can choose any of your favorite IDE on this tutorial to begin in order to build an artificial neural network we have to complete fog faces data preprocessing phase building an a

In [67]:
docs = loader.load()

In [68]:
print(docs[0].page_content[:500])

foreign and welcome back to my channel in this tutorial we'll be diving into the exciting world of deep learning and exploring the power of neural network we are going to build an Activision neural network using tensorflow and python with multiple neuron and fully connected layers the data sets that we are going to use belong to a bank that is studying its customer to see whether they are going to leave or stay and it contains around 10 000 observations you will see that in this tutorial we are 


# URL

In [69]:
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://en.wikipedia.org/wiki/Machine_learning")

In [70]:
docs = loader.load()

In [71]:
print(docs[0].page_content[:500])





Machine learning - Wikipedia





































Jump to content








Main menu





Main menu
move to sidebar
hide



		Navigation
	

Main pageContentsCurrent eventsRandom articleAbout WikipediaContact usDonate




		Contribute
	

HelpLearn to editCommunity portalRecent changesUpload file




Languages

Language links are at the top of the page across from the title.



















Search











Search







Create accountLog in






Personal tools




 Create 
