This is the official implementation of our CVPR 2025 paper:
"Relation-Rich Visual Document Generator for Visual Information Extraction."
We tested with Python 3.8.
You may need to install the PyTorch version that corresponds to your CUDA version.
conda create -n RIDGE python=3.8
conda activate RIDGE
git clone https://github.com/AI-Application-and-Integration-Lab/RIDGE.git
cd ./RIDGE
pip install -r requirements.txt
Modify your key in global_config.py .
Run the corresponding file to generate document titles of English, Traditional Chinese, and Simplified Chinese.
Search for [CUSTOM] in the code to find available theme options. You can also add your own.
The generated files will appear in headers/ directory.
--num_form: number of form titles you want to generate
--theme: theme of the generated titles
--file_name: output file name
English
python header_EN.py --num_form 10 --theme business --file_name example.txt
Traditional Chinese
python header_TC.py --num_form 10 --theme government --file_name example_TC.txt
Simplified Chinese
python header_SC.py --num_form 10 --theme government --file_name example_SC.txt
--content_dir is the name of the header file from the previous step.
--language can be choosed from EN (English), TC (Traditional Chinese), and SC (Simplified Chinese).
python content_gen.py --content_dir example --language EN
After running, the generated contents in Hierarchical Structure Text (HST) format and their annotations will appear in contents/[content_dir] directory.
Also, the formatted input for layout generation model will be saved in input_files/ directory.
Datasets
Our training data come from RVL-CDIP, FUNSD, XFUND, ICDAR23 HUST-CELL.
We preprocessed the raw data into our training format.
👉 You can download the processed training data here.
If you'd like to use your custom training data, please format it in the following format (each line in .jsonl file represents one document):
{"input": "{\"width\": xxx, \"height\": xxx, \"entities\": [{\"text\": \"xxxxx\", \"box\": [\"<FILL_0>\"]}, {\"text\": \"xxxxx\", \"box\": [\"<FILL_1>\"]}, ...]}",
"output": "{\"<FILL_0>\": \"x1,y1,x2,y2\", \"<FILL_1>\": \"x1,y1,x2,y2\", ...}"}
Checkpoint
We fine-tune LLaMA-3.1-8B with LoRA, so please download LLaMA-3.1-8B first.
👉 We provide our layout generation model checkpoint here. Put it in models/ directory.
You can run this command for downloading:
git lfs install
git clone https://huggingface.co/jiangzh/RIDGE ./models/RIDGE
python train.py --dataset_path datasets.jsonl
You can adjust generation configs such as --temperature, --top_p, and --do_sample to produce different layouts.
python inference.py --adapter_dir ./models/RIDGE --input_file input_files/example.jsonl --output_file output_files/example.jsonl
-
Prepare fonts and put the
.ttffile infonts/directory:- DejaVu Sans (required)
- CourierPrime-Bold
- Roboto-Regular
- Neuton-Regular
- GuanKiapTsingKhai
- or others you like
-
Render
Choose the corresponding rendering script based on language.
--input_file: the input file to layout generation model
--output_file: the output file from layout generation model
--font_paths: list of font paths to choose from. If you provide multiple paths, one will be randomly selected for each document rendering.English
python render_EN.py --input_file example.jsonl --output_file example.jsonl --font_paths fonts/CourierPrime-Bold.ttf fonts/Roboto-Regular.ttf fonts/Neuton-Regular.ttfChinese
python render_ZH.py --input_file example_TC.jsonl --output_file example_TC.jsonl --font_paths fonts/GuanKiapTsingKhai.ttfAfter running, the generated images, final annotations, and visualized annotation images will appear in
outputs/[output file name]directory.
If you find this work useful, please cite our paper:
@article{jiang2025ridge,
title={Relation-Rich Visual Document Generator for Visual Information Extraction},
author={Jiang, Zi-Han and Lin, Chien-Wei and Li, Wei-Hua and Liu, Hsuan-Tung and Yeh, Yi-Ren and Chen, Chu-Song},
journal={arXiv preprint arXiv:2504.10659},
year={2025}
}
