Skip to content

AI-Application-and-Integration-Lab/RIDGE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Relation-Rich Visual Document Generator for Visual Information Extraction (RIDGE)

arXiv License ModelWeight

This is the official implementation of our CVPR 2025 paper:
"Relation-Rich Visual Document Generator for Visual Information Extraction."

RIDGE Pipeline Overview

📦 Installation

We tested with Python 3.8.
You may need to install the PyTorch version that corresponds to your CUDA version.

conda create -n RIDGE python=3.8
conda activate RIDGE
git clone https://github.com/AI-Application-and-Integration-Lab/RIDGE.git
cd ./RIDGE
pip install -r requirements.txt

✍️ Content Generation

Set OpenAI API key

Modify your key in global_config.py .

Generating document titles

Run the corresponding file to generate document titles of English, Traditional Chinese, and Simplified Chinese.
Search for [CUSTOM] in the code to find available theme options. You can also add your own.
The generated files will appear in headers/ directory.

--num_form: number of form titles you want to generate
--theme: theme of the generated titles
--file_name: output file name

English

python header_EN.py --num_form 10 --theme business --file_name example.txt

Traditional Chinese

python header_TC.py --num_form 10 --theme government --file_name example_TC.txt

Simplified Chinese

python header_SC.py --num_form 10 --theme government --file_name example_SC.txt

Generating contents and parsing annotations

--content_dir is the name of the header file from the previous step.
--language can be choosed from EN (English), TC (Traditional Chinese), and SC (Simplified Chinese).

python content_gen.py --content_dir example --language EN

After running, the generated contents in Hierarchical Structure Text (HST) format and their annotations will appear in contents/[content_dir] directory.
Also, the formatted input for layout generation model will be saved in input_files/ directory.

🪄 Content-driven Layout Generation

Datasets and Checkpoint

Datasets
Our training data come from RVL-CDIP, FUNSD, XFUND, ICDAR23 HUST-CELL.
We preprocessed the raw data into our training format.
👉 You can download the processed training data here.

If you'd like to use your custom training data, please format it in the following format (each line in .jsonl file represents one document):

{"input": "{\"width\": xxx, \"height\": xxx, \"entities\": [{\"text\": \"xxxxx\", \"box\": [\"<FILL_0>\"]}, {\"text\": \"xxxxx\", \"box\": [\"<FILL_1>\"]}, ...]}", 
"output": "{\"<FILL_0>\": \"x1,y1,x2,y2\", \"<FILL_1>\": \"x1,y1,x2,y2\", ...}"}

Checkpoint
We fine-tune LLaMA-3.1-8B with LoRA, so please download LLaMA-3.1-8B first.
👉 We provide our layout generation model checkpoint here. Put it in models/ directory.
You can run this command for downloading:

git lfs install
git clone https://huggingface.co/jiangzh/RIDGE ./models/RIDGE

Training

python train.py --dataset_path datasets.jsonl

Inference

You can adjust generation configs such as --temperature, --top_p, and --do_sample to produce different layouts.

python inference.py --adapter_dir ./models/RIDGE --input_file input_files/example.jsonl --output_file output_files/example.jsonl

Rendering

  1. Prepare fonts and put the .ttf file in fonts/ directory:

  2. Render
    Choose the corresponding rendering script based on language.
    --input_file: the input file to layout generation model
    --output_file: the output file from layout generation model
    --font_paths: list of font paths to choose from. If you provide multiple paths, one will be randomly selected for each document rendering.

    English

    python render_EN.py --input_file example.jsonl --output_file example.jsonl --font_paths fonts/CourierPrime-Bold.ttf fonts/Roboto-Regular.ttf fonts/Neuton-Regular.ttf
    

    Chinese

    python render_ZH.py --input_file example_TC.jsonl --output_file example_TC.jsonl --font_paths fonts/GuanKiapTsingKhai.ttf
    

    After running, the generated images, final annotations, and visualized annotation images will appear in outputs/[output file name] directory.

📖 Citation

If you find this work useful, please cite our paper:

@article{jiang2025ridge,
  title={Relation-Rich Visual Document Generator for Visual Information Extraction},
  author={Jiang, Zi-Han and Lin, Chien-Wei and Li, Wei-Hua and Liu, Hsuan-Tung and Yeh, Yi-Ren and Chen, Chu-Song},
  journal={arXiv preprint arXiv:2504.10659},
  year={2025}
}

About

[CVPR 2025] RIDGE: Relation-Rich Visual Document Generator for Visual Information Extraction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages