Academic project websites can more effectively disseminate research when they clearly present core content and enable intuitive navigation and interaction. However, current approaches such as direct generation, templates, or direct HTML conversion struggle to produce layout-aware, interactive sites, and a comprehensive evaluation suite for this task has been lacking.
PAPER2WEB is an autonomous pipeline that converts scientific papers into explorable academic homepages. The agent iteratively refines both content and layout to create engaging, interactive websites that bring academic papers to life.
-
[2025-10-21] π₯π₯ We are thrilled to hear that EvoPresent will be integrated into our pipeline in the future. EvoPresent brings advanced aesthetic agents for academic presentations with self-improvement capabilities. Stay tuned for this exciting collaboration!
-
[2025-10-21] π Paper2Web dataset and benchmark are currently uploaded. You can use the benchmark to improve performance. You can also use the dataset for structural analysis, preference analysis, or to survey past work with tens of thousands of carefully categorized data.
-
[2025-10-18] π₯π₯ Paper2ALL released! Thanks to Paper2Video, Paper2Poster and AutoPR, we have established a comprehensive pipeline for generating promotional materials for Paper2ALL.
- ππ Overview
- π₯ News & Updates
- π Table of Contents
- π Installation
- βοΈ Configuration
- πββοΈ Quick Start
- Data for Paper2Web
- Benchmark for Paper2Web
- π€ Contributing
- π Acknowledgments
- Citation
- Python 3.11 or higher
- Conda (recommended)
- LibreOffice
- Poppler-utils
conda create -n p2w python=3.11
conda activate p2wpip install -r requirements.txtLibreOffice:
sudo apt install libreofficeAlternative (without sudo): Download LibreOffice from https://www.libreoffice.org/download/download-libreoffice/ and add to PATH.
Poppler:
conda install -c conda-forge popplerBefore running the code, configure your LLM API credentials.
Create a .env file in the project root:
# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1
# Alternative: OpenRouter (recommended)
OPENAI_API_BASE=https://openrouter.ai/api/v1
OPENAI_API_KEY=sk-or-your-openrouter-key-herecp AutoPR/.env.example AutoPR/.env
# Edit AutoPR/.env with your API credentialsGOOGLE_SEARCH_API_KEY=your_google_search_api_key
GOOGLE_SEARCH_ENGINE_ID=your_search_engine_idThe pipeline automatically detects the target platform based on folder names:
papers/
βββ 12345/ # Numeric β Twitter (English)
β βββ paper.pdf
βββ research_project/ # Alphanumeric β Xiaohongshu (Chinese)
βββ paper.pdf
Run all modules (automatic PDF detection):
python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output"Run all modules with specific PDF:
python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --pdf-path "path/to/paper.pdf"Website generation only:
python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 1Poster generation only (default 48x36 inches):
python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 2Poster generation with custom size:
python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 2 --poster-width-inches 60 --poster-height-inches 40PR material generation only:
python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 3Prepare the environment:
cd paper2all/Paper2Video/src
conda create -n p2v python=3.10
conda activate p2v
pip install -r requirements.txt
conda install -c conda-forge tectonic ffmpeg poppler[Optional] Skip this part if you do not need a human presenter.
You need to prepare the environment separately for talking-head generation to potential avoide package conflicts, please refer to Hallo2. After installing, use which python to get the python environment path.
cd hallo2
conda create -n hallo python=3.10
conda activate hallo
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
huggingface-cli download fudan-generative-ai/hallo2 --local-dir ../pretrained_modelsOnce you have installed hallo2, the --talking_head_env argument should point to the Python environment where hallo2 is installed. You can find the path to your hallo2 environment by running the following command:
which pythonThis will give you the path to the Python executable used by hallo2. You should use this path in the --talking_head_env argument in the pipeline.
The script pipeline.py provides an automated pipeline for generating academic presentation videos. It takes LaTeX paper sources together with reference image/audio as input, and goes through multiple sub-modules (Slides β Subtitles β Speech β Cursor β Talking Head) to produce a complete presentation video. β‘ The minimum recommended GPU for running this pipeline is NVIDIA A6000 with 48G.
Run the following command to launch a fast generation (without talking-head generation):
python pipeline_light.py \
--model_name_t gpt-4.1 \
--model_name_v gpt-4.1 \
--result_dir /path/to/output \
--paper_latex_root /path/to/latex_proj \
--ref_img /path/to/ref_img.png \
--ref_audio /path/to/ref_audio.wav \
--gpu_list [0,1,2,3,4,5,6,7]Run the following command to launch a full generation (with talking-head generation):
python pipeline.py \
--model_name_t gpt-4.1 \
--model_name_v gpt-4.1 \
--model_name_talking hallo2 \
--result_dir /path/to/output \
--paper_latex_root /path/to/latex_proj \
--ref_img /path/to/ref_img.png \
--ref_audio /path/to/ref_audio.wav \
--talking_head_env /path/to/hallo2_env \
--gpu_list [0,1,2,3,4,5,6,7]See here to view our curated dataset, which contains metadata and categories of papers with and without project websites, along with citation counts. This dataset can be used for analyzing website preferences and trends, as well as for current hot topics and paper research materials!
Please note that our pipeline as shown, we defines papers without project websites as those that have neither a dedicated project homepage nor a GitHub repository with a homepage link.
We categorize these papers into 13 categories:
- 3D Vision and Computational Graphics
- Multimodal Learning
- Generation Model
- Speech and Audio Process
- AI for Science
- ML System and Infrastructure
- Deep learning Architecture
- Probabilistic Inference
- Nature Language Understanding
- Information Retrieval Recommend System
- Reinforcement Learning
- Trustworthy
- ML Theory and Optimization
See here to get Paper2Web benchmark, including selected original website source code URLs, paper metadata, and partial results from PWAgent.
Below are some comparison examples showing the differences between original websites and PWAgent generated versions.
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Thanks to Dongping Chen and Jianuo Huang for their guidance and assistance.
- Thanks to the open-source community for the amazing tools and libraries
- Special thanks to contributors and users of the Paper2AI ecosystem
- Grateful to the Paper2VideoοΌ Paper2Poster, AutoPR, and EvoPresent teams for their excellent work in academic presentation generation and PR material creation
Please kindly cite our paper if you find this project helpful.
@misc{chen2025paper2webletsmakepaper,
title={Paper2Web: Let's Make Your Paper Alive!},
author={Yuhang Chen and Tianpeng Lv and Siyi Zhang and Yixiang Yin and Yao Wan and Philip S. Yu and Dongping Chen},
year={2025},
eprint={2510.15842},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.15842},
}β If you find this project helpful, please give it a star!




