ResearchQuantize is a powerful CLI tool for aggregating and searching research papers from many academic sources like ArXiv, PubMed, and Semantic Scholar.
Features:
- Parallel Processing: Fast concurrent API calls for optimal performance
- GUI Interface: Modern Textual-based interactive interface
- Database Storage: SQLite database with full paper metadata
- Python 3.8 or higher
pip(Python package manager)
-
Clone the repository:
git clone https://github.com/desenyon/ResearchQuantize.git cd ResearchQuantize -
Create and activate a virtual environment:
python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Test the installation:
python src/cli.py version
Environment variables (create .env file if needed):
# Optional API keys (not required for basic usage)
PUBMED_API_KEY=your_key_here # Only needed for >100 requests
SEMANTIC_SCHOLAR_API_KEY=your_key_here # Optional for rate limiting
DATABASE_PATH=papers.db # Custom database locationAggregate papers from all sources:
python src/cli.py aggregate --query "machine learning" --limit 10Search with source filtering:
python src/cli.py search --query "neural networks" --source arxiv --year 2024Export to JSON:
python src/cli.py --format json --output results.json aggregate --query "AI" --limit 20Launch GUI:
python src/cli.py guiMultiple output formats:
# Table format (default)
python src/cli.py aggregate --query "quantum computing"
# JSON export
python src/cli.py --format json --output papers.json search --query "ML" --source semantic_scholar
# CSV export
python src/cli.py --format csv --output data.csv aggregate --query "deep learning" --limit 50Source-specific searches:
# ArXiv only
python src/cli.py search --query "computer vision" --source arxiv
# PubMed only
python src/cli.py search --query "cancer research" --source pubmed
# Semantic Scholar only
python src/cli.py search --query "NLP" --source semantic_scholarYear filtering:
python src/cli.py search --query "AI ethics" --year 2024 --limit 15Launch the interactive GUI for an enhanced user experience:
python src/cli.py guiFeatures:
- Interactive search interface
- Real-time filtering options
- Beautiful results display
- Export capabilities
ResearchQuantize automatically saves all papers to an SQLite database (papers.db).
View saved papers:
sqlite3 papers.db "SELECT title, authors, source FROM papers LIMIT 5;"Count papers by source:
sqlite3 papers.db "SELECT source, COUNT(*) FROM papers GROUP BY source;"Export database to CSV:
sqlite3 -header -csv papers.db "SELECT * FROM papers;" > all_papers.csv- ArXiv - Physics, mathematics, computer science preprints
- PubMed - Medical and life science literature
- Semantic Scholar - Academic papers with rich metadata and citations
Customize behavior via src/config/preferences.json:
{
"default_query_limit": 10,
"default_source": "arxiv",
"default_year_filter": null,
"theme": "dark",
"show_advanced_options": false
}Optional .env file configuration:
# API Keys (optional)
PUBMED_API_KEY=your_key_here
SEMANTIC_SCHOLAR_API_KEY=your_key_here
# Database
DATABASE_PATH=papers.db
# Logging
LOG_LEVEL=INFORun the test suite:
# All tests
python -m pytest src/tests/ -v
# Quick test
python -m pytest src/tests/ -q
# With coverage
python -m pytest src/tests/ --cov=src --cov-report=htmlBenchmarks:
- Parallel aggregation: ~16 papers in <1 second
- ArXiv: ~3 second rate limiting between requests
- Semantic Scholar: ~0.1 second rate limiting
- Deduplication: Advanced similarity matching with configurable thresholds
Database Schema
The SQLite database (papers.db) stores comprehensive paper metadata:
| Field | Type | Description |
|---|---|---|
id |
INTEGER | Primary key |
title |
TEXT | Paper title |
authors |
TEXT | Comma-separated author list |
published_date |
TEXT | Publication date (ISO format) |
source |
TEXT | Data source (arxiv, pubmed, semantic_scholar) |
abstract |
TEXT | Paper abstract |
url |
TEXT | Paper URL |
doi |
TEXT | Digital Object Identifier |
keywords |
TEXT | Comma-separated keywords |
citations |
INTEGER | Citation count |
journal |
TEXT | Journal name |
volume |
TEXT | Volume number |
issue |
TEXT | Issue number |
pages |
TEXT | Page range |
pdf_url |
TEXT | PDF download URL |
arxiv_id |
TEXT | ArXiv identifier |
pubmed_id |
TEXT | PubMed identifier |
semantic_scholar_id |
TEXT | Semantic Scholar identifier |
created_at |
TEXT | Record creation timestamp |
This project is licensed under the MIT License. See the LICENSE file for details.
$ python src/cli.py aggregate --query "machine learning" --limit 5
╭─────────────────────────────────── Welcome ───────────────────────────────────╮
│ │
│ ResearchQuantize v1.1.0 - Aggregate and Search Research Papers │
│ │
╰───────────────────────────────────────────────────────────────────────────────╯
Aggregating papers for query: machine learning
⠋ Fetching papers from multiple sources...
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Title ┃ Authors ┃ Publish… ┃ Source ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Machine Learning Potential Repository │ Atsuto Seko │ 2020-07… │ ArXiv │
│ Physics-informed machine learning │ G. Karniadakis, I. │ 2021-05… │ Semantic │
│ │ Kevrekidis, Lu Lu │ │ Scholar │
│ TensorFlow: Large-Scale Machine Learning │ Martín Abadi, Ashish │ 2016-03… │ Semantic │
│ │ Agarwal, P. Barham │ │ Scholar │
└────────────────────────────────────────────────┴────────────────────────────┴──────────┴────────────┘
Total papers found: 10
Paper Sources Statistics
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Source ┃ Count ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ ArXiv │ 5 │
│ Semantic Scholar │ 5 │
└──────────────────┴───────┘