# Arxiv Loader

[`arXiv`]([https://arxiv.org/](https://arxiv.org/)) là một kho lưu trữ truy cập mở cho 2 triệu bài báo khoa học trong các lĩnh vực vật lý,

toán học, khoa học máy tính, sinh học định lượng, tài chính định lượng, thống kê, kỹ thuật điện và hệ thống

khoa học và kinh tế.

[Tài liệu API](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.arxiv.ArxivLoader.html#langchain_community.document_loaders.arxiv.ArxivLoader)

Để truy cập trình tải tài liệu Arxiv, bạn cần cài đặt các gói tích hợp `arxiv`, `PyMuPDF` và `langchain-community`.

`PyMuPDF` chuyển đổi các tệp PDF được tải xuống từ arxiv.org sang định dạng văn bản.

```bash
pip install arxiv pymupdf
```


## Arxiv-Loader-Instantiate

Bạn có thể tạo một instance trình tải arxiv để tải tài liệu từ arxiv.org.

Khởi tạo với truy vấn tìm kiếm để tìm tài liệu trong Arixiv.org.
Hỗ trợ tất cả các đối số của `ArxivAPIWrapper`.


In [2]:
from langchain_community.document_loaders import ArxivLoader

### Enter the research topic you want to search for in the Query parameter
loader = ArxivLoader(
    query="Chain of thought",
    load_max_docs=2,  # max number of documents
    load_all_available_meta=True,  # load all available metadata
)

### Load

Use `Load` method to load documents from arxiv.org with `ArxivLoader` instance.

In [3]:
# Print the first document's content and metadata
docs = loader.load()
print(docs[0].page_content[:100])
print(docs[0].metadata)

Contrastive Chain-of-Thought Prompting
Yew Ken Chia∗1,
Guizhen Chen∗1, 2
Luu Anh Tuan2
Soujanya Pori
{'Published': '2023-11-15', 'Title': 'Contrastive Chain-of-Thought Prompting', 'Authors': 'Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing', 'Summary': 'Despite the success of chain of thought in enhancing language model\nreasoning, the underlying process remains less well understood. Although\nlogically sound reasoning appears inherently crucial for chain of thought,\nprior studies surprisingly reveal minimal impact when using invalid\ndemonstrations instead. Furthermore, the conventional chain of thought does not\ninform language models on what mistakes to avoid, which potentially leads to\nmore errors. Hence, inspired by how humans can learn from both positive and\nnegative examples, we propose contrastive chain of thought to enhance language\nmodel reasoning. Compared to the conventional chain of thought, our approach\nprovides both valid and invalid reasoning 

- If `load_all_available_meta` is False, only partial metadata is displayed, not the complete metadata.

### Lazy Load

When loading large amounts of documents, If you can perform downstream tasks on a subset of all loaded documents, you can `lazy_load` documents one at a time to minimize memory usage.

In [4]:
docs = []
docs_lazy = loader.lazy_load()

# append docs to docs list
# async variant : docs_lazy = await loader.lazy_load()

for doc in docs_lazy:
    docs.append(doc)

print(docs[0].page_content[:100])
print(docs[0].metadata)

Contrastive Chain-of-Thought Prompting
Yew Ken Chia∗1,
Guizhen Chen∗1, 2
Luu Anh Tuan2
Soujanya Pori
{'Published': '2023-11-15', 'Title': 'Contrastive Chain-of-Thought Prompting', 'Authors': 'Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing', 'Summary': 'Despite the success of chain of thought in enhancing language model\nreasoning, the underlying process remains less well understood. Although\nlogically sound reasoning appears inherently crucial for chain of thought,\nprior studies surprisingly reveal minimal impact when using invalid\ndemonstrations instead. Furthermore, the conventional chain of thought does not\ninform language models on what mistakes to avoid, which potentially leads to\nmore errors. Hence, inspired by how humans can learn from both positive and\nnegative examples, we propose contrastive chain of thought to enhance language\nmodel reasoning. Compared to the conventional chain of thought, our approach\nprovides both valid and invalid reasoning 

In [5]:
len(docs)

3

### Asynchronous Load

Use `aload` method to load documents from arxiv.org asynchronously.

In [6]:
docs = await loader.aload()
print(docs[0].page_content[:100])
print(docs[0].metadata)

Contrastive Chain-of-Thought Prompting
Yew Ken Chia∗1,
Guizhen Chen∗1, 2
Luu Anh Tuan2
Soujanya Pori
{'Published': '2023-11-15', 'Title': 'Contrastive Chain-of-Thought Prompting', 'Authors': 'Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing', 'Summary': 'Despite the success of chain of thought in enhancing language model\nreasoning, the underlying process remains less well understood. Although\nlogically sound reasoning appears inherently crucial for chain of thought,\nprior studies surprisingly reveal minimal impact when using invalid\ndemonstrations instead. Furthermore, the conventional chain of thought does not\ninform language models on what mistakes to avoid, which potentially leads to\nmore errors. Hence, inspired by how humans can learn from both positive and\nnegative examples, we propose contrastive chain of thought to enhance language\nmodel reasoning. Compared to the conventional chain of thought, our approach\nprovides both valid and invalid reasoning 

## Use Summaries of Articles as Docs

Use `get_summaries_as_docs` method to get summaries of articles as docs.

In [7]:
from langchain_community.document_loaders import ArxivLoader

loader = ArxivLoader(
    query="reasoning"
)

docs = loader.get_summaries_as_docs()
print(docs[0].page_content[:100])
print(docs[0].metadata)

Large language models (LLMs) have demonstrated impressive reasoning
abilities, but they still strugg
{'Entry ID': 'http://arxiv.org/abs/2410.13080v1', 'Published': datetime.date(2024, 10, 16), 'Title': 'Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models', 'Authors': 'Linhao Luo, Zicheng Zhao, Chen Gong, Gholamreza Haffari, Shirui Pan'}
