Skip to content

Pengfei-Hu/MTD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Multimodal Tree Decoder for Table of Contents Extraction in Document Images

This repository contains the source code of: Multimodal Tree Decoder for Table of Contents Extraction in Document Images.

Requirements

To execute this code, it is mandatory to prepare the following:

  • Bert Model
  • Pretrained ResNet-34 weights
  • The proposed dataset HierDoc

The Bert Model is available here. We recommend pretraining the ResNet-34 on scientific papers with a text detection task.

Training

python runner/train_valid.py --cfg default --visual_pretrain_weights path_to_pretrained_renet34_weights

Testing

python runner/infer.py --cfg default

Citation

If you find our paper useful in your research, please consider citing:

@INPROCEEDINGS{9956301,
  author={Hu, Pengfei and Zhang, Zhenrong and Zhang, Jianshu and Du, Jun and Wu, Jiajia},
  booktitle={2022 26th International Conference on Pattern Recognition (ICPR)}, 
  title={Multimodal Tree Decoder for Table of Contents Extraction in Document Images}, 
  year={2022},
  volume={},
  number={},
  pages={1756-1762},
  doi={10.1109/ICPR56361.2022.9956301}}

About

Official PyTorch implementation of our paper "Multimodal Tree Decoder for Table of Contents Extraction in Document Images"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages