🎉 PAPER2WEB: LET’S MAKE YOUR PAPER ALIVE!

📄🌐 Overview

Academic project websites can more effectively disseminate research when they clearly present core content and enable intuitive navigation and interaction. However, current approaches such as direct generation, templates, or direct HTML conversion struggle to produce layout-aware, interactive sites, and a comprehensive evaluation suite for this task has been lacking.

PAPER2WEB is an autonomous pipeline that converts scientific papers into explorable academic homepages. The agent iteratively refines both content and layout to create engaging, interactive websites that bring academic papers to life.

🔥 News & Updates

[2025-10-21] 🔥🔥 We are thrilled to hear that EvoPresent will be integrated into our pipeline in the future. EvoPresent brings advanced aesthetic agents for academic presentations with self-improvement capabilities. Stay tuned for this exciting collaboration!
[2025-10-21] 📊 Paper2Web dataset and benchmark are currently uploaded. You can use the benchmark to improve performance. You can also use the dataset for structural analysis, preference analysis, or to survey past work with tens of thousands of carefully categorized data.
[2025-10-18] 🔥🔥 Paper2ALL released! Thanks to Paper2Video, Paper2Poster and AutoPR, we have established a comprehensive pipeline for generating promotional materials for Paper2ALL.

🚀 Installation

Prerequisites

Python 3.11 or higher
Conda (recommended)
LibreOffice
Poppler-utils

Step 1: Create Conda Environment

conda create -n p2w python=3.11
conda activate p2w

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Install System Dependencies

LibreOffice:

sudo apt install libreoffice

Alternative (without sudo): Download LibreOffice from https://www.libreoffice.org/download/download-libreoffice/ and add to PATH.

Poppler:

conda install -c conda-forge poppler

⚙️ Configuration

Before running the code, configure your LLM API credentials.

For All Components

Create a .env file in the project root:

# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1

# Alternative: OpenRouter (recommended)
OPENAI_API_BASE=https://openrouter.ai/api/v1
OPENAI_API_KEY=sk-or-your-openrouter-key-here

For AutoPR Component

cp AutoPR/.env.example AutoPR/.env
# Edit AutoPR/.env with your API credentials

Optional: Google Search API (for logo search)

GOOGLE_SEARCH_API_KEY=your_google_search_api_key
GOOGLE_SEARCH_ENGINE_ID=your_search_engine_id

🏃‍♂️ Quick Start

Input Directory Structure

The pipeline automatically detects the target platform based on folder names:

papers/
├── 12345/                    # Numeric → Twitter (English)
│   └── paper.pdf
└── research_project/         # Alphanumeric → Xiaohongshu (Chinese)
    └── paper.pdf

Basic Usage

Run all modules (automatic PDF detection):

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output"

Run all modules with specific PDF:

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --pdf-path "path/to/paper.pdf"

Website generation only:

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 1

Poster generation only (default 48x36 inches):

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 2

Poster generation with custom size:

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 2 --poster-width-inches 60 --poster-height-inches 40

PR material generation only:

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 3

Paper2Video

1. Requirements

Prepare the environment:

cd paper2all/Paper2Video/src
conda create -n p2v python=3.10
conda activate p2v
pip install -r requirements.txt
conda install -c conda-forge tectonic ffmpeg poppler

[Optional] Skip this part if you do not need a human presenter.

You need to prepare the environment separately for talking-head generation to potential avoide package conflicts, please refer to Hallo2. After installing, use which python to get the python environment path.

cd hallo2
conda create -n hallo python=3.10
conda activate hallo
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
huggingface-cli download fudan-generative-ai/hallo2 --local-dir ../pretrained_models

Once you have installed hallo2, the --talking_head_env argument should point to the Python environment where hallo2 is installed. You can find the path to your hallo2 environment by running the following command:

which python

This will give you the path to the Python executable used by hallo2. You should use this path in the --talking_head_env argument in the pipeline.

2. Inference

The script pipeline.py provides an automated pipeline for generating academic presentation videos. It takes LaTeX paper sources together with reference image/audio as input, and goes through multiple sub-modules (Slides → Subtitles → Speech → Cursor → Talking Head) to produce a complete presentation video. ⚡ The minimum recommended GPU for running this pipeline is NVIDIA A6000 with 48G.

Example Usage

Run the following command to launch a fast generation (without talking-head generation):

python pipeline_light.py \
    --model_name_t gpt-4.1 \
    --model_name_v gpt-4.1 \
    --result_dir /path/to/output \
    --paper_latex_root /path/to/latex_proj \
    --ref_img /path/to/ref_img.png \
    --ref_audio /path/to/ref_audio.wav \
    --gpu_list [0,1,2,3,4,5,6,7]

Run the following command to launch a full generation (with talking-head generation):

python pipeline.py \
    --model_name_t gpt-4.1 \
    --model_name_v gpt-4.1 \
    --model_name_talking hallo2 \
    --result_dir /path/to/output \
    --paper_latex_root /path/to/latex_proj \
    --ref_img /path/to/ref_img.png \
    --ref_audio /path/to/ref_audio.wav \
    --talking_head_env /path/to/hallo2_env \
    --gpu_list [0,1,2,3,4,5,6,7]

Data for Paper2Web

See here to view our curated dataset, which contains metadata and categories of papers with and without project websites, along with citation counts. This dataset can be used for analyzing website preferences and trends, as well as for current hot topics and paper research materials!

Please note that our pipeline as shown, we defines papers without project websites as those that have neither a dedicated project homepage nor a GitHub repository with a homepage link. We categorize these papers into 13 categories:

3D Vision and Computational Graphics
Multimodal Learning
Generation Model
Speech and Audio Process
AI for Science
ML System and Infrastructure
Deep learning Architecture
Probabilistic Inference
Nature Language Understanding
Information Retrieval Recommend System
Reinforcement Learning
Trustworthy
ML Theory and Optimization

Benchmark for Paper2Web

See here to get Paper2Web benchmark, including selected original website source code URLs, paper metadata, and partial results from PWAgent.

Below are some comparison examples showing the differences between original websites and PWAgent generated versions.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

🙏 Acknowledgments

Thanks to Dongping Chen and Jianuo Huang for their guidance and assistance.
Thanks to the open-source community for the amazing tools and libraries
Special thanks to contributors and users of the Paper2AI ecosystem
Grateful to the Paper2Video， Paper2Poster, AutoPR, and EvoPresent teams for their excellent work in academic presentation generation and PR material creation

Citation

Please kindly cite our paper if you find this project helpful.

@misc{chen2025paper2webletsmakepaper,
      title={Paper2Web: Let's Make Your Paper Alive!}, 
      author={Yuhang Chen and Tianpeng Lv and Siyi Zhang and Yixiang Yin and Yao Wan and Philip S. Yu and Dongping Chen},
      year={2025},
      eprint={2510.15842},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.15842}, 
}

⭐ If you find this project helpful, please give it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
DocLayout-YOLO-DocStructBench		DocLayout-YOLO-DocStructBench
Paper2Web		Paper2Web
assets		assets
paper2all		paper2all
.gitignore		.gitignore
README.md		README.md
pipeline_all.py		pipeline_all.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎉 PAPER2WEB: LET’S MAKE YOUR PAPER ALIVE!

📄🌐 Overview

🔥 News & Updates

📋 Table of Contents

🚀 Installation

Prerequisites

Step 1: Create Conda Environment

Step 2: Install Dependencies

Step 3: Install System Dependencies

⚙️ Configuration

For All Components

For AutoPR Component

Optional: Google Search API (for logo search)

🏃‍♂️ Quick Start

Input Directory Structure

Basic Usage

Paper2Video

1. Requirements

2. Inference

Example Usage

Data for Paper2Web

Benchmark for Paper2Web

🤝 Contributing

🙏 Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Languages

Bearcoder6/Paper2All

Folders and files

Latest commit

History

Repository files navigation

🎉 PAPER2WEB: LET’S MAKE YOUR PAPER ALIVE!

📄🌐 Overview

🔥 News & Updates

📋 Table of Contents

🚀 Installation

Prerequisites

Step 1: Create Conda Environment

Step 2: Install Dependencies

Step 3: Install System Dependencies

⚙️ Configuration

For All Components

For AutoPR Component

Optional: Google Search API (for logo search)

🏃‍♂️ Quick Start

Input Directory Structure

Basic Usage

Paper2Video

1. Requirements

2. Inference

Example Usage

Data for Paper2Web

Benchmark for Paper2Web

🤝 Contributing

🙏 Acknowledgments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages