Skip to content
Generic framework for historical document processing
Branch: master
Clone or download
Latest commit 875c547 May 15, 2019


Documentation Status

dhSegment is a tool for Historical Document Processing. Its generic approach allows to segment regions and extract content from different type of documents. See some examples here.

The complete description of the system can be found in the corresponding paper.

It was created by Benoit Seguin and Sofia Ares Oliveira at DHLAB, EPFL.

Installation and usage

The installation procedure and examples of usage can be found in the documentation (see section below).


Have a try at the demo to train (optional) and apply dhSegment in page extraction using the script.


Under construction

The documentation is available on readthedocs.

If you are using this code for your research, you can cite the corresponding paper as :

  title={dhSegment: A generic deep-learning approach for document segmentation},
  author={Ares Oliveira, Sofia and Seguin, Benoit and Kaplan, Frederic},
  booktitle={Frontiers in Handwriting Recognition (ICFHR), 2018 16th International Conference on},
You can’t perform that action at this time.