# Paper Creeper
## 介绍
这是一个简单的论文辅助检索工具，可以从arXiv智能检索科学论文。

## 安装

### 获取api_key
支持OpenAI、Deepseek以及其它代理商(例如ChatAnyWhere)

### 安装依赖项

In [1]:
! pip install arxiv openai python-dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


或者

In [2]:
! pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## 运行

### 外部函数

In [8]:
from llm_util import *
from arxiv_util import *
from ResearchAgent import *

### 主函数
参数说明：
- `model_name`: LLM模型名称
- `user_input`: 文献检索查询语句

In [9]:
def search_and_summary(model_name: str="gpt-4o", user_input:str=None, num_each_query:int=5):
    load_dotenv()
    llm = LLM_client(model_name)
    agent = ResearchAgent(llm)
    logging.info("开始主函数, 用户输入: %s", user_input)
    try:
        response = agent.search_and_summarize(user_input, num_each_query)
        # print(response)
        return response
    except Exception as e:
        logging.error("An error occurred: %s", e)
        return None

### 设置模型名称
默认支持的模型名称有：
- deepseek-chat: 需要在`.env`环境中指定`DEEPSEEK_API_KEY`和`DEEPSEEK_BASE_URL`
- GPT*: 需要在`.env`环境中指定`OPENAI_API_KEY`和`OPENAI_BASE_URL`

In [10]:
# 设置
model_name = "gpt-4o"

In [11]:
# display util
from IPython.display import Markdown, display

def display_response_as_markdown(response):
    display(Markdown(response))


### 检索论文

In [12]:
# 检索并显示总结结果
response = search_and_summary(model_name,
                              user_input="""
Find a paper about the application of non-negative matrix decomposition to text clustering or image analysis.""",
                              num_each_query=3)
display_response_as_markdown(response)


2025-02-09 21:03:17,585 - INFO - 开始主函数, 用户输入: 
Find a paper about the application of non-negative matrix decomposition to text clustering or image analysis.
2025-02-09 21:03:17,599 - INFO - 开始搜索和总结.
2025-02-09 21:03:17,605 - INFO - 分析用户查询.
2025-02-09 21:03:17,612 - INFO - 从用户输入中提取关键词.
2025-02-09 21:03:18,995 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:20,928 - INFO - 从用户输入中提取日期.
2025-02-09 21:03:21,256 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:22,083 - INFO - 从用户输入中识别研究领域.
2025-02-09 21:03:22,381 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:23,269 - INFO - 提取查询关键词: ['non-negative matrix decomposition text clustering', 'non-negative matrix factorization image analysis', 'applications of non-negative matrix decomposition in clustering', 'non-negative matrix factorization for image clustering', '

<itertools.islice object at 0x7fa591013dd0>


2025-02-09 21:03:27,052 - INFO - Got first page: 100 of 374141 total results
2025-02-09 21:03:27,064 - INFO - Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=non-negative+matrix+factorization+image+analysis&id_list=&sortBy=relevance&sortOrder=descending&start=0&max_results=100


<itertools.islice object at 0x7fa5911e5c60>


2025-02-09 21:03:29,615 - INFO - Got first page: 100 of 838857 total results
2025-02-09 21:03:29,619 - INFO - Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=applications+of+non-negative+matrix+decomposition+in+clustering&id_list=&sortBy=relevance&sortOrder=descending&start=0&max_results=100


<itertools.islice object at 0x7fa5911e5c60>


2025-02-09 21:03:32,458 - INFO - Got first page: 100 of 2655713 total results
2025-02-09 21:03:32,463 - INFO - Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=non-negative+matrix+factorization+for+image+clustering&id_list=&sortBy=relevance&sortOrder=descending&start=0&max_results=100


<itertools.islice object at 0x7fa5913bc540>


2025-02-09 21:03:35,240 - INFO - Got first page: 100 of 2194061 total results
2025-02-09 21:03:35,264 - INFO - Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=text+clustering+using+non-negative+matrix+decomposition&id_list=&sortBy=relevance&sortOrder=descending&start=0&max_results=100


<itertools.islice object at 0x7fa5911c0a90>


2025-02-09 21:03:38,293 - INFO - Got first page: 100 of 1403851 total results
2025-02-09 21:03:38,295 - INFO - Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=image+analysis+with+non-negative+matrix+factorization&id_list=&sortBy=relevance&sortOrder=descending&start=0&max_results=100


<itertools.islice object at 0x7fa591013ab0>


2025-02-09 21:03:41,946 - INFO - Got first page: 100 of 2087920 total results
2025-02-09 21:03:41,953 - INFO - 删除早期搜索结果.
2025-02-09 21:03:41,957 - INFO - 重新排序搜索结果.
2025-02-09 21:03:42,637 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:43,853 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:44,965 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:46,075 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:47,202 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:48,279 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:49,544 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:03:50,444 

**Title:** Image Analysis Based on Nonnegative/Binary Matrix Factorization
**Authors:** Hinako Asaoka, Kazue Kudo
**Published:** 2020
**Summary:** Using nonnegative/binary matrix factorization (NBMF), a matrix can be decomposed into a nonnegative matrix and a binary matrix. Our analysis of facial images, based on NBMF and using the Fujitsu Digital Annealer, leads to successful image reconstruction and image classification. The NBMF algorithm converges in fewer iterations than those required for the convergence of nonnegative matrix factorization (NMF), although both techniques perform comparably in image classification.
**URL:** http://arxiv.org/abs/2007.00889v1
**Title:** Matrix Factorization-Based Clustering Of Image Features For Bandwidth-Constrained Information Retrieval
**Authors:** Jacob Chakareski, Immanuel Manohar, Shantanu Rane
**Published:** 2016
**Summary:** We consider the problem of accurately and efficiently querying a remote server to retrieve information about images ca

2025-02-09 21:04:01,800 - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
2025-02-09 21:04:22,411 - INFO - Search and summary completed.


# Summary

The selected papers primarily explore various matrix factorization techniques applied to image analysis and clustering tasks, with some extending to text data dimensionality reduction. Below is a detailed analysis of each paper’s research direction, innovations, interconnections, and AI’s potential role in further advancing these areas.

1. **Image Analysis Based on Nonnegative/Binary Matrix Factorization**

   **Main Research Direction and Contributions:** This paper investigates the use of Nonnegative/Binary Matrix Factorization (NBMF) for image analysis, specifically focusing on facial image reconstruction and classification. The authors leverage the Fujitsu Digital Annealer to enhance the performance of the NBMF algorithm.
   
   **Innovations and Key Findings:** The NBMF algorithm shows quicker convergence compared to traditional Nonnegative Matrix Factorization (NMF) while achieving similar classification results. This efficiency in convergence is a notable advantage.
   
   **Connections and Relationships:** This paper shares a thematic link with other works focused on matrix factorization for image analysis, particularly those assessing computational efficiency and accuracy.

   **Relevance to the Broader Field:** The research contributes to the broader field of computer vision by potentially improving the speed and efficiency of image processing tasks, which are crucial for real-time applications.
   
   **AI Insights:** Future AI advancements could explore integrating NBMF with more sophisticated deep learning models to enhance facial recognition systems.
   
   **Score:** 9.0/10  
   **Download:** [arXiv](http://arxiv.org/abs/2007.00889v1)

2. **Matrix Factorization-Based Clustering Of Image Features For Bandwidth-Constrained Information Retrieval**

   **Main Research Direction and Contributions:** This paper addresses the challenge of efficient image retrieval under bandwidth constraints by employing matrix factorization for clustering image features. The focus is on reducing the computational and transmission overhead in querying remote servers.
   
   **Innovations and Key Findings:** By combining Principal Component Analysis (PCA) with Non-negative Matrix Factorization (NMF), the authors achieve more accurate image retrieval than using either method alone. The proposed method significantly reduces computational complexity by clustering features rather than processing them individually.
   
   **Connections and Relationships:** The use of NMF aligns this paper with others in the collection that address image analysis through matrix factorization, though it uniquely emphasizes bandwidth and computational efficiency.
   
   **Relevance to the Broader Field:** This work is highly relevant for applications in mobile and remote sensing technologies where bandwidth and computational resources are limited.
   
   **AI Insights:** AI techniques such as neural network-based feature extraction could further enhance the clustering and retrieval accuracy while maintaining low computational demands.
   
   **Score:** 9.0/10  
   **Download:** [arXiv](http://arxiv.org/abs/1605.02140v1)

3. **Deep Approximately Orthogonal Nonnegative Matrix Factorization for Clustering**

   **Main Research Direction and Contributions:** This paper proposes a deep variant of NMF that incorporates orthogonality constraints to improve clustering performance. The method leverages hierarchical data abstractions typical of deep learning models.
   
   **Innovations and Key Findings:** The integration of orthogonality into NMF allows for better separation of features, enhancing clustering accuracy over traditional NMF methods.
   
   **Connections and Relationships:** This work connects with the broader theme of improving matrix factorization techniques, particularly by incorporating deep learning architectures to tackle clustering tasks.
   
   **Relevance to the Broader Field:** Enhancing clustering accuracy is crucial in fields like image processing and bioinformatics, where precise grouping of data is essential.
   
   **AI Insights:** Further exploration could involve integrating these techniques with deep neural networks to harness their hierarchical feature extraction capabilities.
   
   **Score:** 8.0/10  
   **Download:** [arXiv](http://arxiv.org/abs/1711.07437v1)

4. **Robust Non-Linear Matrix Factorization for Dictionary Learning, Denoising, and Clustering**

   **Main Research Direction and Contributions:** The paper introduces a robust nonlinear matrix factorization method (RNLMF) for handling sparse noise and outliers in data, applicable to tasks like denoising and clustering.
   
   **Innovations and Key Findings:** RNLMF's ability to decompose data into sparse noise and clean components is a significant advancement, particularly for noisy datasets.
   
   **Connections and Relationships:** While focusing on robustness against noise, this paper complements others by addressing the limitations of linear factorization methods in handling real-world data imperfections.
   
   **Relevance to the Broader Field:** The method is highly applicable in scenarios where data is prone to noise, such as medical imaging and sensor data analysis.
   
   **AI Insights:** Machine learning techniques could improve RNLMF by learning adaptive thresholds for noise detection and removal.
   
   **Score:** 6.0/10  
   **Download:** [arXiv](http://arxiv.org/abs/2005.01317v2)

5. **Application of Fuzzy Clustering for Text Data Dimensionality Reduction**

   **Main Research Direction and Contributions:** This paper explores fuzzy clustering as a strategy for dimensionality reduction in text data, addressing issues of sparsity and high dimensionality in document-term matrices.
   
   **Innovations and Key Findings:** The application of fuzzy clustering demonstrates superior performance over traditional methods like PCA and SVD in reducing text data dimensionality.
   
   **Connections and Relationships:** Although primarily focused on text data, the paper is indirectly related to others through its goal of efficient data representation, a common theme in matrix factorization research.
   
   **Relevance to the Broader Field:** This approach is significant for text mining and natural language processing, where efficient representation of high-dimensional data is crucial.
   
   **AI Insights:** AI-driven semantic analysis could be integrated with fuzzy clustering to improve the interpretability and relevance of reduced dimensions.
   
   **Score:** 3.0/10  
   **Download:** [arXiv](http://arxiv.org/abs/1909.10881v1)

6. **Separable Quaternion Matrix Factorization for Polarization Images**

   **Main Research Direction and Contributions:** The focus here is on quaternion matrix factorization for analyzing polarization images, which are used in various imaging applications to reveal information about light sources.
   
   **Innovations and Key Findings:** The proposed separable quaternion matrix factorization method effectively handles the unique characteristics of polarization data, providing a robust framework for signal analysis.
   
   **Connections and Relationships:** While this work is specialized towards polarization imaging, it shares a mathematical foundation with other matrix factorization approaches discussed in the collection.
   
   **Relevance to the Broader Field:** The method has implications for enhancing imaging technologies, particularly in fields requiring precise analysis of light properties.
   
   **AI Insights:** Neural networks could be trained to automate the quaternion factorization process, improving efficiency and scalability.
   
   **Score:** 3.0/10  
   **Download:** [arXiv](http://arxiv.org/abs/2207.14039v1)

7. **On Matrix Factorizations in Subspace Clustering**

   **Main Research Direction and Contributions:** This paper explores the use of CUR decompositions for subspace clustering, analyzing the impact of various hyperparameters on performance.
   
   **Innovations and Key Findings:** The study provides practical guidelines for hyperparameter selection, enhancing the applicability of subspace clustering techniques in real-world datasets.
   
   **Connections and Relationships:** The work ties into the broader exploration of matrix factorizations for clustering, contributing to methodological advancements in this area.
   
   **Relevance to the Broader Field:** Accurate subspace clustering is essential in many data-driven fields, such as computer vision and bioinformatics, where data dimensionality is a challenge.
   
   **AI Insights:** Machine learning optimization algorithms could further refine hyperparameter tuning processes for subspace clustering applications.
   
   **Score:** 3.0/10  
   **Download:** [arXiv](http://arxiv.org/abs/2106.12016v1)

# Conclusion

The collection of papers presented here highlights significant advancements and diverse applications of matrix factorization techniques across image analysis, clustering, and dimensionality reduction tasks. The consistent use of matrix factorization underscores its versatility and effectiveness in handling complex data challenges. Future AI developments could leverage these foundational techniques to further enhance data processing capabilities, particularly through the integration of deep learning and adaptive algorithms.

## 与arXiv官方网站搜索功能对比的优势

1. PaperCreeper支持自然语句描述的查询，并根据查询语句自动生成若干查询词，多次迭代搜索，优化搜索结果。
2. PaperCreeper支持对检索结果进行打分和重新排序，提升检索质量。
3. PaperCreeper支持LLM对检索结果的阅读和总结，并生成独特见解，方便用户预览文章内容，快速聚焦有用的论文。


### 相关工具
1. [Semantic Scholar](https://www.semanticscholar.org/)
2. [OpenScholar](https://openscholar.allen.ai/)
3. [Google Scholar](https://scholar.google.com/)