# Arxiv 載入器（Arxiv Loader）

## 概覽

[```arXiv```](https://arxiv.org/) 是一個開放存取的學術文章資料庫，涵蓋領域包括物理、數學、電腦科學、定量生物學、定量金融、統計、電機工程與系統科學，以及經濟學，收錄超過 200 萬篇論文。

👉 [API 文件](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.arxiv.ArxivLoader.html#langchain_community.document_loaders.arxiv.ArxivLoader)

若要使用 Arxiv 文件載入器，你需要安裝以下套件：

- ```arxiv```
- ```PyMuPDF```
- ```langchain-community``` 整合套件

其中 ```PyMuPDF``` 用於將從 arxiv.org 下載的 PDF 論文轉換為文字格式。

---

### 目錄

- [概覽](#overview)
- [環境設置](#environment-setup)
- [建立 Arxiv 載入器](#arxiv-loader-instantiate)
- [載入資料](#load)
- [延遲載入（Lazy Load）](#lazy-load)
- [非同步載入（Asynchronous Load）](#asynchronous-load)
- [以摘要作為文件使用](#use-summaries-of-articles-as-docs)

### 參考資料

- [ArxivLoader API 文件](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.arxiv.ArxivLoader.html#langchain_community.document_loaders.arxiv.ArxivLoader)
- [arXiv API 存取說明](https://info.arxiv.org/help/api/index.html)

---

## Environment Setup

Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.

**[Note]**
- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. 
- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

In [1]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [2]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langchain-community",
        "arxiv",
        "pymupdf",
    ],
    verbose=False,
    upgrade=False,
)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## 建立 Arxiv 載入器（Arxiv-Loader-Instantiate）

你可以建立 Arxiv 載入器實例，從 arxiv.org 載入文件。

透過初始化時提供搜尋查詢（search query），即可在 arxiv.org 中查找並載入相關論文。

此載入器支援 ```ArxivAPIWrapper``` 的所有參數設定，讓你能自訂搜尋行為與結果範圍。

In [3]:
from langchain_community.document_loaders import ArxivLoader

### Enter the research topic you want to search for in the Query parameter
loader = ArxivLoader(
    query="Chain of thought",
    load_max_docs=2,  # max number of documents
    load_all_available_meta=True,  # load all available metadata
)

### Load

Use ```Load``` method to load documents from arxiv.org with ```ArxivLoader``` instance.

In [4]:
# Print the first document's content and metadata
docs = loader.load()
print(docs[0].page_content[:100])
print(docs[0].metadata)

Contrastive Chain-of-Thought Prompting
Yew Ken Chia∗1,
Guizhen Chen∗1, 2
Luu Anh Tuan2
Soujanya Pori
{'Published': '2023-11-15', 'Title': 'Contrastive Chain-of-Thought Prompting', 'Authors': 'Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing', 'Summary': 'Despite the success of chain of thought in enhancing language model\nreasoning, the underlying process remains less well understood. Although\nlogically sound reasoning appears inherently crucial for chain of thought,\nprior studies surprisingly reveal minimal impact when using invalid\ndemonstrations instead. Furthermore, the conventional chain of thought does not\ninform language models on what mistakes to avoid, which potentially leads to\nmore errors. Hence, inspired by how humans can learn from both positive and\nnegative examples, we propose contrastive chain of thought to enhance language\nmodel reasoning. Compared to the conventional chain of thought, our approach\nprovides both valid and invalid reasoning 

- If ```load_all_available_meta``` is False, only partial metadata is displayed, not the complete metadata.

### 延遲載入（Lazy Load）

當你需要載入大量文件時，如果下游任務只需要處理其中一部分內容，則可以使用 ```lazy_load``` 方法逐筆載入文件，以達到節省記憶體的目的。

In [5]:
docs = []
docs_lazy = loader.lazy_load()

# append docs to docs list
# async variant : docs_lazy = await loader.lazy_load()

for doc in docs_lazy:
    docs.append(doc)

print(docs[0].page_content[:100])
print(docs[0].metadata)

Contrastive Chain-of-Thought Prompting
Yew Ken Chia∗1,
Guizhen Chen∗1, 2
Luu Anh Tuan2
Soujanya Pori
{'Published': '2023-11-15', 'Title': 'Contrastive Chain-of-Thought Prompting', 'Authors': 'Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing', 'Summary': 'Despite the success of chain of thought in enhancing language model\nreasoning, the underlying process remains less well understood. Although\nlogically sound reasoning appears inherently crucial for chain of thought,\nprior studies surprisingly reveal minimal impact when using invalid\ndemonstrations instead. Furthermore, the conventional chain of thought does not\ninform language models on what mistakes to avoid, which potentially leads to\nmore errors. Hence, inspired by how humans can learn from both positive and\nnegative examples, we propose contrastive chain of thought to enhance language\nmodel reasoning. Compared to the conventional chain of thought, our approach\nprovides both valid and invalid reasoning 

In [6]:
len(docs)

3

### Asynchronous Load

Use ```aload``` method to load documents from arxiv.org asynchronously.

In [7]:
docs = await loader.aload()
print(docs[0].page_content[:100])
print(docs[0].metadata)

Contrastive Chain-of-Thought Prompting
Yew Ken Chia∗1,
Guizhen Chen∗1, 2
Luu Anh Tuan2
Soujanya Pori
{'Published': '2023-11-15', 'Title': 'Contrastive Chain-of-Thought Prompting', 'Authors': 'Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing', 'Summary': 'Despite the success of chain of thought in enhancing language model\nreasoning, the underlying process remains less well understood. Although\nlogically sound reasoning appears inherently crucial for chain of thought,\nprior studies surprisingly reveal minimal impact when using invalid\ndemonstrations instead. Furthermore, the conventional chain of thought does not\ninform language models on what mistakes to avoid, which potentially leads to\nmore errors. Hence, inspired by how humans can learn from both positive and\nnegative examples, we propose contrastive chain of thought to enhance language\nmodel reasoning. Compared to the conventional chain of thought, our approach\nprovides both valid and invalid reasoning 

## Use Summaries of Articles as Docs

Use ```get_summaries_as_docs``` method to get summaries of articles as docs.

In [8]:
from langchain_community.document_loaders import ArxivLoader

loader = ArxivLoader(
    query="reasoning"
)

docs = loader.get_summaries_as_docs()
print(docs[0].page_content[:100])
print(docs[0].metadata)

Large language models (LLMs) have demonstrated impressive reasoning
abilities, but they still strugg
{'Entry ID': 'http://arxiv.org/abs/2410.13080v1', 'Published': datetime.date(2024, 10, 16), 'Title': 'Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models', 'Authors': 'Linhao Luo, Zicheng Zhao, Chen Gong, Gholamreza Haffari, Shirui Pan'}
