#### How to load CSVs
LangChain offers a useful CSV loader that allows for reading and processing CSV files as documents. Each row of a CSV file is treated as an individual document, where each column is converted into a key-value pair. This setup allows the CSV data to be easily used in a retrieval-augmented generation (RAG) pipeline or similar applications.

To load a CSV file in LangChain, you can use the CSVLoader from the langchain_community.document_loaders.csv_loader module. The CSVLoader automatically reads each row of the CSV file and outputs it as a document. Here’s an example of how to use it:

\
https://python.langchain.com/

In [20]:
from langchain_community.document_loaders import CSVLoader

loader = CSVLoader(file_path='./data/customers-100.csv')
data = loader.load()
print(data[0].page_content)


Index: 1
Customer Id: DD37Cf93aecA6Dc
First Name: Sheryl
Last Name: Baxter
Company: Rasmussen Group
City: East Leonard
Country: Chile
Phone 1: 229.077.5154
Phone 2: 397.884.0519x718
Email: zunigavanessa@smith.info
Subscription Date: 2020-08-24
Website: http://www.stephenson.com/


## How to load PDFs

LangChain provides several ways to load and process PDF files, particularly through the use of the PyPDFLoader in the langchain_community.document_loaders module. This loader allows you to extract text from PDF documents and split them into chunks for further processing. Each page of the PDF can be treated as an individual document with associated metadata (e.g., page number, file source).\
https://python.langchain.com/docs/how_to/document_loader_pdf/


## Steps for Using PyPDFLoader:
1- Install Dependencies:\
You need to install the necessary libraries, including langchain_community and pypdf. You can do this using:

In [9]:
pip install -qU langchain_community pypdf


Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-experimental 0.0.65 requires langchain-community<0.3.0,>=0.2.16, but you have langchain-community 0.3.0 which is incompatible.
langchain-experimental 0.0.65 requires langchain-core<0.3.0,>=0.2.38, but you have langchain-core 0.3.4 which is incompatible.
llama-index-readers-file 0.2.1 requires pypdf<5.0.0,>=4.0.1, but you have pypdf 5.0.0 which is incompatible.


### Load and Split PDF:
Use PyPDFLoader to load the PDF document, and split it into sections for analysis. Here’s a sample code snippet:

In [10]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('./data/Understanding_Climate_Change.pdf')
docs = loader.load_and_split()
print(docs[0].page_content)  # Display the content of the first page


Understanding Climate Change  
Chapter 1: Introduction to Climate Change  
Climate change refers to significant, long -term changes in the global climate. The term 
"global climate" encompasses the planet's overall weather patterns, including temperature, 
precipitation, and wind patterns, over an extended period. Over the past cent ury, human 
activities, particularly the burning of fossil fuels and deforestation, have significantly 
contributed to climate change.  
Historical Context  
The Earth's climate has changed throughout history. Over the past 650,000 years, there have 
been seven cycles of glacial advance and retreat, with the abrupt end of the last ice age about 
11,700 years ago marking the beginning of the modern climate era and  human civilization. 
Most of these climate changes are attributed to very small variations in Earth's orbit that 
change the amount of solar energy our planet receives. During the Holocene epoch, which 
began at the end of the last ice age, human 

## How to load HTML
LangChain provides a variety of tools for loading and processing HTML documents, including the use of UnstructuredHTMLLoader and BSHTMLLoader to transform HTML content into usable text data.\

1- UnstructuredHTMLLoader: This loader is used to load and process HTML documents, extracting text content from web pages or other HTML files. Each document is converted into a LangChain document with metadata like the source URL or file path. For example:\
The resulting data contains the page content, and relevant metadata can be extracted from the HTML file.\
https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/html

In [13]:
from langchain_community.document_loaders import UnstructuredHTMLLoader
loader = UnstructuredHTMLLoader('./data/azure openai.html')
data = loader.load()


In [14]:
print(data)

[Document(metadata={'source': './data/azure openai.html'}, page_content="Accessibility Links\n\nSkip to main contentAccessibility help\n\nAccessibility feedback\n\nSearch Results\n\nHow to generate embeddings with Azure OpenAI Service\n\nMicrosoft Learn\n\nhttps://learn.microsoft.com › en-us › azure › ai-services\n\nMicrosoft Learn\n\nhttps://learn.microsoft.com › en-us › azure › ai-services\n\nAug 29, 2024 — An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms.\n\nAzure OpenAI Service embeddings tutorial\n\nMicrosoft Learn\n\nhttps://learn.microsoft.com › Learn › Azure › AI Services\n\nMicrosoft Learn\n\nhttps://learn.microsoft.com › Learn › Azure › AI Services\n\nAug 30, 2024 — Learn how to use Azure OpenAI's embeddings API for document search with the BillSum dataset.\n\nPrerequisites\n\nSet up\n\nAzure OpenAI - embeddings and cosine similarity\n\nMicrosoft Learn\n\nhttps://learn.microsoft.com › Learn › Azure ›

## Loading TXT Files 

1. **`TextLoader`**:
   - The `TextLoader` class from `langchain_community.document_loaders` is used to load text (.txt) files.
   - It returns the content of the file wrapped in a `Document` object, which can be further used in LangChain for text processing or document retrieval tasks.
   - The `.load()` method reads and loads the text into the Document format.

2. **Demonstration**:
   - A sample TXT file is loaded using the `TextLoader`. The content is printed, displaying the first 500 characters of the file.


In [22]:
from langchain_community.document_loaders import TextLoader

# Function to load TXT files using langchain_community loader
def load_txt_file_with_langchain(file_path):
    """
    Load a TXT file and return its content using the langchain_community TextLoader.
    
    Args:
        file_path (str): Path to the TXT file.
    
    Returns:
        Document: A Document object containing the content of the TXT file.
    """
    loader = TextLoader(file_path)
    documents = loader.load()
    return documents

# Demonstrating loading a TXT file
txt_file_path = './data/nike_2023_annual_report.txt'  # Replace with your TXT file path
try:
    txt_documents = load_txt_file_with_langchain(txt_file_path)
    print("TXT File Content Using LangChain Community Loader:")
    for doc in txt_documents:
        print(doc.page_content[:500])  # Printing the first 500 characters of each document
except Exception as e:
    print(f"An error occurred while loading the TXT file: {e}")


TXT File Content Using LangChain Community Loader:
FORM 10-K FORM 10-KUNITED STATES
SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
FORM 10-K 
(Mark One)
☑ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934
FOR THE FISCAL YEAR ENDED MAY 31, 2023 
OR
☐TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934
FOR THE TRANSITION PERIOD FROM TO .
Commission File No. 1-10635 
NIKE, Inc. 
(Exact name of Registrant as specified in its charter)
Oregon 93-0584541
(State or other j


## Loading JSON Files

1. **`JSONLoader`**:
   - The `JSONLoader` class is used to load and parse JSON files into a Document object format.
   - JSONLoader can handle JSON files, automatically converting them into documents that can be used in LangChain workflows for retrieval and analysis.

## Handling Non-Text Content in JSON Files

1. **`text_content=False`**:
   - By default, `JSONLoader` expects `page_content` to be a string, but JSON content can often be a list or a dictionary.
   - Setting `text_content=False` allows the loader to accept non-string content such as lists or dictionaries.

2. **Converting Content to String**:
   - Since `page_content` needs to be a string for `Document` objects in LangChain, we check if the content is a list or a dictionary.
   - If the content is non-text (like a list or dict), we convert it to a string using `str()` before displaying or processing it.

3. **Demonstration**:
   - The sample JSON file is loaded with the updated loader.
   - If the content is a list or dictionary, it is converted to a string for display, and the first 500 characters of each document are printed.



In [26]:
from langchain_community.document_loaders import JSONLoader

# Function to load JSON files using langchain_community loader
def load_json_file_with_langchain(file_path, jq_schema):
    """
    Load a JSON file and return its content using the langchain_community JSONLoader.
    
    Args:
        file_path (str): Path to the JSON file.
        jq_schema (str): JQ schema to extract specific fields from the JSON file.
    
    Returns:
        Document: A Document object containing the content of the JSON file.
    """
    # Set `text_content=False` to handle non-string content
    loader = JSONLoader(file_path, jq_schema=jq_schema, text_content=False)
    documents = loader.load()
    
    # Converting list or dict content into string if necessary
    for doc in documents:
        if isinstance(doc.page_content, (list, dict)):
            doc.page_content = str(doc.page_content)  # Convert list/dict to string for display purposes
    return documents

# Demonstrating loading a JSON file
json_file_path = './data/q_a.json'  # Replace with your JSON file path
jq_schema = '.'  # Replace with JQ schema ('.' extracts the entire content of the file)

try:
    json_documents = load_json_file_with_langchain(json_file_path, jq_schema)
    print("JSON File Content Using LangChain Community Loader:")
    for doc in json_documents:
        print(doc.page_content[:500])  # Printing the first 500 characters of each document
except Exception as e:
    print(f"An error occurred while loading the JSON file: {e}")


JSON File Content Using LangChain Community Loader:
[{'question': 'What does climate change refer to?', 'answer': 'Climate change refers to significant, long-term changes in the global climate.'}, {'question': "What encompasses the planet's overall weather patterns?", 'answer': "The term 'global climate' encompasses the planet's overall weather patterns, including temperature, precipitation, and wind patterns, over an extended period."}, {'question': 'What activities have significantly contributed to climate change over the past century?', 'answe
