Skip to content

Bearcoder6/Paper2All

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ‰ PAPER2WEB: LET’S MAKE YOUR PAPER ALIVE!

πŸ“„πŸŒ Overview

Academic project websites can more effectively disseminate research when they clearly present core content and enable intuitive navigation and interaction. However, current approaches such as direct generation, templates, or direct HTML conversion struggle to produce layout-aware, interactive sites, and a comprehensive evaluation suite for this task has been lacking.

PAPER2WEB is an autonomous pipeline that converts scientific papers into explorable academic homepages. The agent iteratively refines both content and layout to create engaging, interactive websites that bring academic papers to life.

Academic Presentation Pipeline

πŸ”₯ News & Updates

  • [2025-10-21] πŸ”₯πŸ”₯ We are thrilled to hear that EvoPresent will be integrated into our pipeline in the future. EvoPresent brings advanced aesthetic agents for academic presentations with self-improvement capabilities. Stay tuned for this exciting collaboration!

  • [2025-10-21] πŸ“Š Paper2Web dataset and benchmark are currently uploaded. You can use the benchmark to improve performance. You can also use the dataset for structural analysis, preference analysis, or to survey past work with tens of thousands of carefully categorized data.

  • [2025-10-18] πŸ”₯πŸ”₯ Paper2ALL released! Thanks to Paper2Video, Paper2Poster and AutoPR, we have established a comprehensive pipeline for generating promotional materials for Paper2ALL.

πŸ“‹ Table of Contents

πŸš€ Installation

Prerequisites

  • Python 3.11 or higher
  • Conda (recommended)
  • LibreOffice
  • Poppler-utils

Step 1: Create Conda Environment

conda create -n p2w python=3.11
conda activate p2w

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Install System Dependencies

LibreOffice:

sudo apt install libreoffice

Alternative (without sudo): Download LibreOffice from https://www.libreoffice.org/download/download-libreoffice/ and add to PATH.

Poppler:

conda install -c conda-forge poppler

βš™οΈ Configuration

Before running the code, configure your LLM API credentials.

For All Components

Create a .env file in the project root:

# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1

# Alternative: OpenRouter (recommended)
OPENAI_API_BASE=https://openrouter.ai/api/v1
OPENAI_API_KEY=sk-or-your-openrouter-key-here

For AutoPR Component

cp AutoPR/.env.example AutoPR/.env
# Edit AutoPR/.env with your API credentials

Optional: Google Search API (for logo search)

GOOGLE_SEARCH_API_KEY=your_google_search_api_key
GOOGLE_SEARCH_ENGINE_ID=your_search_engine_id

πŸƒβ€β™‚οΈ Quick Start

Input Directory Structure

The pipeline automatically detects the target platform based on folder names:

papers/
β”œβ”€β”€ 12345/                    # Numeric β†’ Twitter (English)
β”‚   └── paper.pdf
└── research_project/         # Alphanumeric β†’ Xiaohongshu (Chinese)
    └── paper.pdf

Basic Usage

Run all modules (automatic PDF detection):

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output"

Run all modules with specific PDF:

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --pdf-path "path/to/paper.pdf"

Website generation only:

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 1

Poster generation only (default 48x36 inches):

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 2

Poster generation with custom size:

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 2 --poster-width-inches 60 --poster-height-inches 40

PR material generation only:

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output" --model-choice 3

Paper2Video

1. Requirements

Prepare the environment:

cd paper2all/Paper2Video/src
conda create -n p2v python=3.10
conda activate p2v
pip install -r requirements.txt
conda install -c conda-forge tectonic ffmpeg poppler

[Optional] Skip this part if you do not need a human presenter.

You need to prepare the environment separately for talking-head generation to potential avoide package conflicts, please refer to Hallo2. After installing, use which python to get the python environment path.

cd hallo2
conda create -n hallo python=3.10
conda activate hallo
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
huggingface-cli download fudan-generative-ai/hallo2 --local-dir ../pretrained_models

Once you have installed hallo2, the --talking_head_env argument should point to the Python environment where hallo2 is installed. You can find the path to your hallo2 environment by running the following command:

which python

This will give you the path to the Python executable used by hallo2. You should use this path in the --talking_head_env argument in the pipeline.

2. Inference

The script pipeline.py provides an automated pipeline for generating academic presentation videos. It takes LaTeX paper sources together with reference image/audio as input, and goes through multiple sub-modules (Slides β†’ Subtitles β†’ Speech β†’ Cursor β†’ Talking Head) to produce a complete presentation video. ⚑ The minimum recommended GPU for running this pipeline is NVIDIA A6000 with 48G.

Example Usage

Run the following command to launch a fast generation (without talking-head generation):

python pipeline_light.py \
    --model_name_t gpt-4.1 \
    --model_name_v gpt-4.1 \
    --result_dir /path/to/output \
    --paper_latex_root /path/to/latex_proj \
    --ref_img /path/to/ref_img.png \
    --ref_audio /path/to/ref_audio.wav \
    --gpu_list [0,1,2,3,4,5,6,7]

Run the following command to launch a full generation (with talking-head generation):

python pipeline.py \
    --model_name_t gpt-4.1 \
    --model_name_v gpt-4.1 \
    --model_name_talking hallo2 \
    --result_dir /path/to/output \
    --paper_latex_root /path/to/latex_proj \
    --ref_img /path/to/ref_img.png \
    --ref_audio /path/to/ref_audio.wav \
    --talking_head_env /path/to/hallo2_env \
    --gpu_list [0,1,2,3,4,5,6,7]

Data for Paper2Web

See here to view our curated dataset, which contains metadata and categories of papers with and without project websites, along with citation counts. This dataset can be used for analyzing website preferences and trends, as well as for current hot topics and paper research materials!

Please note that our pipeline as shown, we defines papers without project websites as those that have neither a dedicated project homepage nor a GitHub repository with a homepage link. assets/pic3.png We categorize these papers into 13 categories:

  • 3D Vision and Computational Graphics
  • Multimodal Learning
  • Generation Model
  • Speech and Audio Process
  • AI for Science
  • ML System and Infrastructure
  • Deep learning Architecture
  • Probabilistic Inference
  • Nature Language Understanding
  • Information Retrieval Recommend System
  • Reinforcement Learning
  • Trustworthy
  • ML Theory and Optimization

assets/pic4.png

Benchmark for Paper2Web

See here to get Paper2Web benchmark, including selected original website source code URLs, paper metadata, and partial results from PWAgent.

assets/pic6.png

Below are some comparison examples showing the differences between original websites and PWAgent generated versions.

assets/pic6.png assets/pic6.png

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ™ Acknowledgments

  • Thanks to Dongping Chen and Jianuo Huang for their guidance and assistance.
  • Thanks to the open-source community for the amazing tools and libraries
  • Special thanks to contributors and users of the Paper2AI ecosystem
  • Grateful to the Paper2Video, Paper2Poster, AutoPR, and EvoPresent teams for their excellent work in academic presentation generation and PR material creation

Citation

Please kindly cite our paper if you find this project helpful.

@misc{chen2025paper2webletsmakepaper,
      title={Paper2Web: Let's Make Your Paper Alive!}, 
      author={Yuhang Chen and Tianpeng Lv and Siyi Zhang and Yixiang Yin and Yao Wan and Philip S. Yu and Dongping Chen},
      year={2025},
      eprint={2510.15842},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.15842}, 
}

⭐ If you find this project helpful, please give it a star!

About

One Station paper 2 everything for academic presentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.2%
  • HTML 13.0%
  • TeX 4.4%
  • Cuda 1.0%
  • C++ 0.7%
  • BibTeX Style 0.5%
  • Shell 0.2%