We deploy the model on ModelScope[https://www.modelscope.cn/studios/aivolcano/CiteScanning/summary].
CiteScan is an open-source and free tool designed to detect hallucinated references in academic writing. As AI coding assistants and writing tools become more prevalent, they sometimes generate plausible-sounding citations that do not actually exist. CiteScan addresses this issue by validating every bibliography entry against multiple authoritative academic databases—including arXiv, CrossRef, DBLP, Semantic Scholar, OpenAlex, and Google Scholar—to confirm their authenticity.
Going beyond simple verification, CiteScan uses rule-based algorithms to analyze whether the cited papers genuinely support the claims made in your text. Thanks to the free accessibility for academic databases across CS and AI areas, our system will cost $0 for maintenance after development.
# Install dependencies
pip install -r requirements.txt
# Run Gradio interface
python app.pyAccess at http://localhost:7860
# Install dependencies
pip install -r requirements.txt
# Run API service
python main.pyAccess API at http://localhost:8000
API Documentation at http://localhost:8000/docs
# Run both services with Docker Compose
docker-compose up -d
# Gradio: http://localhost:7860
# API: http://localhost:8000- API Documentation - Complete API reference and examples
- Deployment Guide - Production deployment instructions
-
🚫 NO Hallucinations: Annotate citations that don't exist or have mismatched metadata across year, authors, and title.
-
📋 Ground Truth Reference: Provide the link if the citations are flagged to issued entry. You can click the Open paper or DOI button to access the real-world metadata, and then cite the BibTeX from the press website.
-
🏠 Top-tier Research Organizations: Cooperate with National University of Singapore (NUS) and Shanghai Jiao Tong University (SJTU).
-
🔌 RESTful API: Production-ready API for integration with other tools and services.
-
Multi-Source Verification: Validates metadata against arXiv, CrossRef, DBLP, Semantic Scholar, OpenAlex, and Google Scholar.
-
Covert citation from pre-print version to official version: After clicking the blue button (
Open paperorDOI), the official website will display. Click thecitebutton, you can copy the official BibTex.
- Parse BibTeX: Extract entries and metadata
- Priority-based Search: Query databases in priority order
- Metadata Comparison: Compare title, authors, year, venue
- Duplicate Detection: Identify duplicate entries
- Result Generation: Provide detailed verification report
import requests
url = "http://localhost:8000/api/v1/verify"
bibtex = """
@article{vaswani2017attention,
title={Attention is all you need},
author={Vaswani, Ashish and Shazeer, Noam},
year={2017}
}
"""
response = requests.post(url, json={"bibtex_content": bibtex})
result = response.json()
print(f"Verified: {result['verified_count']}/{result['total_count']}")curl -X POST "http://localhost:8000/api/v1/verify" \
-H "Content-Type: application/json" \
-d '{"bibtex_content": "@article{example,title={Test},year={2023}}"}'See API_DOCS.md for complete API documentation.
Create a .env file from the template:
cp .env.example .envKey configuration options:
# Server ports
API_PORT=8000
GRADIO_PORT=7860
# Performance
MAX_WORKERS=10
CACHE_ENABLED=true
CACHE_TTL=3600
# Logging
LOG_LEVEL=INFO
LOG_FORMAT=jsonSee DEPLOYMENT.md for complete configuration guide.
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy environment template
cp .env.example .env
# Run in development mode
ENVIRONMENT=development python main.pycurl http://localhost:8000/api/v1/healthcurl http://localhost:8000/api/v1/statsLogs are stored in logs/citescan.log in JSON format:
tail -f logs/citescan.log | jq '.'-
Authors Mismatch:
- Reason: Different databases deal with a longer list of authors with different strategies, like truncation.
- Action: Verify if main authors match
-
Venues Mismatch:
- Reason: Abbreviations vs. full names, such as "ICLR" vs. "International Conference on Learning Representations"
- Action: Both are correct.
-
Year GAP (±1 Year):
- Reason: Delay between preprint (arXiv) and final version publication
- Action: Verify which version you intend to cite. We recommend citing the version from the official press website. Lower ratio of arXiV BibTex will make your paper more convincing.
-
Non-academic Sources:
- Reason: Blogs and APIs are not indexed in academic databases.
- Action: Verify URL, year, and title manually.
CiteScan uses multiple data sources:
- arXiv API
- CrossRef API
- Semantic Scholar API
- DBLP API
- OpenAlex API
- Google Scholar (web scraping)
Contributions are welcome! Please feel free to submit a Pull Request.
For questions and support:
- Email: e1143641@u.nus.edu
- GitHub Issues: [Repository URL]
To deploy on ModelScope 创空间:
# Add ModelScope remote
git remote add modelscope "http://oauth2:YOUR_TOKEN@www.modelscope.cn/studios/YOUR_USERNAME/CiteScan.git"
# Push to ModelScope
git push modelscope main
# Or force push if needed
git push modelscope main --forceAfter pushing, visit your ModelScope studio and click "上线空间展示" or "立即发布" to deploy the Gradio application.
将代码推送到 Hugging Face Spaces:
-
安装 Hugging Face CLI 并登录(如未安装):
pip install huggingface_hub huggingface-cli login
-
添加 Hugging Face 远程仓库:
git remote add hf https://huggingface.co/spaces/yancan/CiteScan
-
推送到 Spaces(HF 不允许普通 git 推送二进制文件,需用无图片分支
hf-main):- 重要:HF 上显示的是 已提交到 main 的代码。若本地有未提交的修改(如
main.py、src/等),需先提交到main,再更新并推送hf-main。 - 一键脚本:
./scripts/push_to_hf.sh(会提示先提交未提交的修改,再重建hf-main并推送)。 - 或手动:先
git add -A && git commit -m "说明",再运行脚本或按脚本内步骤重建hf-main并git push hf hf-main:main --force。
- 重要:HF 上显示的是 已提交到 main 的代码。若本地有未提交的修改(如
-
推送完成后,在 Space 页面 等待构建结束即可访问 Gradio 应用。
注意:README 顶部的 YAML 配置(title、sdk、app_file 等)为 Spaces 必需,请勿删除。

